This article is part of our series covering the company of artificial intelligence
Since GPT-2 there has been a lot of excitement about the applications of large language models. And in recent years, we’ve seen LLMs being used for many exciting tasks, such as writing articles, designing websites, creating images, and even writing code.
But as I have argued before, there is a big gap between showing off a new technology doing something cool and using the same technology to create a successful product with a workable business model.
I think Microsoft just launched the first true LLM product with the public release of GitHub Copilot last week. This is an application that is a strong fit for the product/market, has tremendous added value, is hard to beat, is cost efficient, has very strong distribution channels and can become a source of big profit.
The release of GitHub Copilot is a reminder of two things: First, LLMs are fascinating, but they are useful when applied to specific tasks rather than artificial general intelligence† And second, the nature of LLMs puts big tech companies like Microsoft and Google on the lurk an unfair advantage to commercialize them – LLMs are not democratic.
Copilot is a AI programming tool which is installed as an extension on popular IDEs such as Visual Studio and VS Code. It offers suggestions as you write code, kind of like autocomplete, but for programming. The possibilities range from filling in a line of code to creating entire code blocks, such as functions and classes.
Copilot is powered by Codexa version of OpenAI .’s famous GPT-3 model, a major language model that made headlines for its ability to perform a wide variety of tasks. However, unlike GPT-3, Codex is refined only for programming tasks. And it delivers impressive results.
The success of GitHub Copilot and Codex underscores one important fact. When it comes to actually using LLMs, specialization beats generalization. When Copilot was first introduced in 2021, CNBC reported:: “…when OpenAI first trained [GPT-3]the start-up had no intention of teaching him to code, [OpenAI CTO Greg] said Brokman. It was more intended as a general purpose language model [emphasis mine] this can, for example, generate articles, correct incorrect grammar and translate from one language into another.”
But while GPT-3 has had minor success in several applications, Copilot and Codex have proven to be big hits in one specific area. Codex cannot write poetry or articles such as GPT-3, but it has proven to be very useful for developers with different levels of expertise. Codex is also much smaller than GPT-3, meaning it has more memory and computational efficiency. And since it is trained for a specific task as opposed to the open and ambiguous world of human language, it is less susceptible to the pitfalls that models like GPT-3 often fall into†
However, it is worth noting that just like GPT-3 knows nothing about human languageCopilot knows nothing about computer code. It is a transformer model that is trained in millions of code repositories. Given a prompt (for example, a piece of code or a textual description), it will try to predict the next set of instructions that makes the most sense.
With its huge training corpus and huge neural network, Copilot usually makes good predictions. But sometimes it can make stupid mistakes that the most novice programmer would avoid. It doesn’t think about programs like a programmer does. It cannot design software or think and think step by step about user requirements and experience and all the other things that go into building successful apps. To be no substitute for human programmers†
Copilot’s product/market fit
One of the milestones for any product is achieving a product/market fit, or proving that it can solve a problem better than alternative solutions in the market. In this regard, Copilot has been a stunning success.
GitHub released Copilot as a preview product last June and has since been used by more than a million developers.
According to GitHub, in files in which Copilot is activated, it accounts for about 40 percent of the code written. Developers and Engineers I spoke last week say that while there are limits to Copilot’s capabilities, there’s no denying that it significantly improves their productivity.
For some use cases, Copilot competes with StackOverflow and other code forums, where users must search for the solution to a specific problem they are facing. In this case, the added value of Copilot is very clear and tangible: less frustration and distraction, more focus. Instead of leaving their IDE and searching the web for a solution, developers simply type the description or docstring of the functionality they want, and Copilot does most of the work for them.
In other cases, Copilot competes with writing frustrating code manually, like configuring matplotlib diagrams in Python (a super frustrating task). While Copilot’s output may require some tweaking, it eases most of the burden on developers.
In many other use cases, Copilot has been able to cement itself as a superior solution to problems many developers face every day. Developers told me about things like running test cases, setting up web servers, documenting code, and many other tasks that previously required manual effort and difficulty. Copilot has helped them save a lot of time in their daily work.
Distribution and cost efficiency
Product/market fit is just one of several components of creating a successful product. If you have a good product, but can’t find the right distribution channels to deliver its value cost-effectively and profitably, you’re doomed. At the same time, you need a plan to maintain your competitive edge, prevent other companies from following your success, and ensure you can continue to deliver value over the long term.
To make Copilot a successful product, Microsoft needed to bring together some very important components, including technology, infrastructure and market.
First, it needed the right technology, which it acquired thanks to its exclusive license for OpenAI . technology† Since 2019, OpenAI has stopped open sourcing its technology and instead licenses its funders, including Microsoft. Codex and Copilot were created based on GPT-3 with the help of the scientists at OpenAI.
Other major tech companies have been able to create large language models similar to GPT-3. But there’s no denying that LLMs are very expensive to train and run†
“For a model that is 10 times smaller than Codex – the model behind Copilot (with 12B parameters on paper) – it will cost hundreds of dollars to evaluate it. benchmark which they used in their paper,” Loubna Ben Allal, machine learning engineer at Hugging Face ., told TechTalks† Ben Allal referred to another yardstick used for Codex evaluation, which cost thousands of dollars for its own smaller model.
“There are also security issues because you have to run untrusted programs to evaluate the model that may be malicious, sandboxes are mostly used for security,” said Ben Allal.
Leandro von Werra, another ML engineer at Hugging Face, estimated the training cost to be between tens and hundreds of thousands of dollars, depending on the size and number of experiments necessary to get it right.
“Inference is one of the biggest challenges,” von Werra added, commenting on TechTalks. “While almost anyone with resources can train a 10B model these days, getting the inference latency low enough to be responsive to the user is a technical challenge.”
This is where Microsoft’s second advantage comes in. The company has been able to create a large cloud infrastructure specialized for machine learning models such as Codex. It makes inferences and makes suggestions in milliseconds. And more importantly, Microsoft can run and deliver Copilot at a very affordable price. Currently, Copilot is offered for $10 per month or $100 per year, and is provided free to students and administrators of popular open-source repositories.
Most developers I spoke to were very happy with the pricing model because it saved them much more time than the price.
Abhishek Thakur, another ML engineer at Hugging Face with whom I spoke earlier this week, said: “As a machine learning engineer, I know there is a lot that goes into building these types of products, especially Copilot, which offers suggestions with a latency of less than milliseconds. Building an infrastructure that serves these kinds of models for free is not feasible in the real world for an extended period of time.”
However, it is not impossible to run LLMs with code generators at affordable rates.
“As for the computational power to build these models and the data needed, that’s quite feasible and there have been a few replications of Codex, such as Incoder from Meta and CodeGen (now accessible for free on the Hugging Face Hub) from Salesforce that match Codex’s performance,” said von Werra. “It certainly takes some engineering to build the models into a quick and beautiful product, but it seems a lot of companies could do this if they wanted to.”
However, this is where the third piece of the puzzle begins. Microsoft’s acquisition of GitHub gave it access to the largest developer market, making it easy for the company to put Copilot in the hands of millions of users. Microsoft also owns Visual Studio and VS Code, two of the most popular IDEs with hundreds of millions of users. This reduces the friction for developers to use Copilot as opposed to another similar product.
With its pricing, efficiency and market reach, Microsoft appears to have cemented its position as a leader in the emerging market for AI-enabled software development. The market can take other turns. What is certain (and if .) I’ve pointed this out before) is that large language models offer many opportunities to create new applications and markets. But they don’t change the fundamentals of good product management.