https://a16z.com/2022/11/16/creativity-as-an-app/
Perhaps the most mind-bending implication we’re seeing from generative AI is that, contrary to the common view that creativity will be the last bastion of human ingenuity in the face of automation, it actually appears to be far easier to automate rather difficult creative tasks than to automate relatively simple programming tasks. To get a sense of this we compare two of the more popular use cases for generative AI: code generation and image generation. But we believe the claim holds up more generally, even as generative models expand into more complex applications.
The short version of the argument (which we tackle in more detail below) is that although a product like GitHub Copilot, in its current form, can make coding somewhat more efficient, it doesn’t obviate the need for capable software developers with programming knowledge. One big reason is that, when it comes to building a program, correctness really matters. If AI generates a program, it still requires a human to verify it is correct — an effort at nearly the same level as creating it to begin with.
On the other hand, anyone who can type can use a model like Stable Diffusion to produce high-quality, one-of-a-kind images in minutes, at many orders of magnitude less cost. Creative work products often do not have strict correctness constraints, and the outputs of the models are stunningly complete. It’s hard not to see a full phase shift in industries that rely on creative visuals because, for many uses, the visuals that AI is able to produce now are already sufficient, and we’re still in the very early innings of the technology.
We fully acknowledge that it’s hard to be confident in any predictions at the pace the field is moving. Right now, though, it seems we’re much more likely to see applications full of creative images created strictly by programmers than applications with human-designed art built strictly by creators.
Before we get into the specifics of code-generation versus image generation, it’s useful to get a sense of just how popular AI overall and generative AI, specifically, are at the moment.
Generative AI is seeing the fastest uptake by developers we’ve ever seen. As we write this, Stable Diffusion easily tops the trending charts of GitHub repositories by a wide margin. Its growth is far ahead of any recent technology in infrastructure or crypto (see the figure above). There are almost daily launch and funding announcements of startups using the technology, and online social networks are being flooded with content created by generative models.
The overall level of investment in AI over the last decade is also hard to overstate. We’ve seen an exponential increase in publications alone since the mid 2010s (see figure below). Today, about 20% of all articles posted on arXiv are about AI, ML, and NLP. Importantly, the theoretical results have crossed a critical threshold where they have become easily consumable and triggered a Cambrian explosion of new techniques, software, and startups.
The most recent spike in the figure above is largely due to generative AI. In a single decade, we’ve gone from experts-only AI models that could classify images and create word embeddings to publicly usable models that can write effective code and create remarkably accurate images using natural language prompts. It’s no surprise that the pace of innovation has only picked up, and it should be no surprise when generative models begin making inroads into other areas once dominated by humans.
One of the earliest uses of generative AI has been as a programmer’s aid. The way it works is that a model is trained on a large corpus of code (e.g. all the public repos in GitHub) and then makes a suggestion to a programmer as they code. The results are outstanding. So much so that it’s reasonable this approach will become synonymous with programming going forward.
Generated code: secure against attacks that don’t use semicolons.
However, the productivity gains have been modest relative to image generation, which we cover below. Part of the reason for this, as mentioned above, is that correctness is critical in programming (and indeed engineering problems more broadly, but we focus on programming in this post). For example, a recent study found that for scenarios matching high-risk CWEs (common weakness enumerations), 40% of AI generated code contained vulnerabilities.
Thus, the user has to strike a balance between generating enough code to provide a meaningful productivity boost, while still limiting it so it’s possible to check for correctness. As a result, Copilot has helped improve developer productivity — recent studies (here and here) put gains on the order of 2x or less — but to a level on par with what we’ve seen in previous advances of developer languages and tooling. The jump from assembly to C, for example, improved productivity 2-5x by some estimates.
For more experienced programmers, concerns might extend beyond code correctness and into overall code quality. As fast.ai’s Jeremy Howard has explained with regard to recent versions of the OpenAI Codex model, “[I]t writes verbose code because it’s generating average code. For me, taking average code and making it into code that I like and I know to be correct is much slower than just writing it from scratch — at least in languages I know well.”