Graphic design plays a fundamental role in shaping how customers and consumers view your business, but not every founder has the budget or time to hire a professional designer for every project. Those limiting factors could soon be a thing of the past thanks to text-to-image generation, a new type machine learning which can create original images by processing simple text prompts.
OpenAI, a so-called research and implementation company, is pioneering the technology with its program Dall-E 2, which was released in April to a closed beta audience. The program ingests massive amounts of images with corresponding descriptions to learn how to identify objects visually (think “cat”) and the relationships between objects (think “cat driving a car”). When you enter a prompt, this data is called to make the best approximation of your request. The model can even identify and replicate different styles of performers (think “cat driving a car in the style of Jack Kirby”).
Interest in text-to-image technology went viral in June, after Craiyon, a less sophisticated third-party version of OpenAI’s model (formerly called Dall-E Mini) exploded on social media, with thousands of people posting their creations online. Images such as a chicken nugget smoking a cigarette in the rain, or Darth Vader participating in the cooking show Minced meat (both below) was widely shared as people gave the model their most ridiculous clues to find the limits of the technology.
The value of text-to-image as a fun toy is immediately apparent, but what about its potential business applications? An OpenAI spokesperson told Inc. that the researchers behind Dall-E are still discovering how people want to use it, but that they see the program as “a useful creative tool for artists, architects, product designers, and magazine cover designers.”
Another potential use for the technology offered by OpenAI is in video games and interactive experiences, such as the metaverse. According to the company’s spokesperson, text-to-image technology could be used by game designers and developers as a tool to “inspire designs for AR avatars or experiences.”
The purpose of text-to-image technology, according to OpenAI, is not to replace artists and graphic designers, but to assist them in their work while providing the ability to create original images for anyone with an imagination. In a blog after Published in June 2022, Google software engineer Yonghui Wu and research scientist David Fleet wrote that Google’s text-to-image models, known as Imagen and Parti, “deliver user experiences based on these models in a safe, responsible manner to the world.” will bring. creativity will inspire.”
To help artists, Dall-E 2 has a feature called Inpainting, which allows users to mark a part of an image that they want to change. An interior designer could use the tool to remove a throw pillow from a living room image by simply highlighting the pillow and typing ‘regular sofa’.
Another opportunity to monetize the technology is creating NFTs, although OpenAI says it will take time to understand the capabilities and limits of its models when creating digital tokens before taking official steps in that direction. An important question: who owns an NFT created by a text prompt? OpenAI currently owns all the images produced with the program, but the company says it will review the decision after the program’s official launch.
One of the main risks of artificially generated images is that they can easily be used to fuel disinformation or deepfake images, so providing ways to easily verify whether an image is legit or artificial will be incredibly important to the success of the technology. For now, every image generated by Dall-E 2 shows a small series of colored boxes in the lower-right corner, a kind of signature, according to OpenAI.
The company is quick to point out that text-to-image technology isn’t perfect just yet, and that’s inherent in the design. Dall-E 2 has barriers to prevent photorealistic depictions of real people’s faces, and the program has very little ability to display violent or hateful images because researchers have removed such explicit content from the training data.
However, for budding entrepreneurs with high imaginations and few artistic skills, technology can be both an inspiration and a practical solution to an image-obsessed world.