In 2020, OpenAI’s machine learning algorithm GPT-3 blew people away when, after taking billions of words scraped from the internet, it started spit out well-crafted sentences† This year, DALL-E 2, a cousin of GPT-3 trained in text and graphics, caused a similar stir online when it started whipping up surreal images of astronauts riding horses and, more recently, making weird things, photorealistic faces of people that don’t exist†
Now, the company says its latest AI learned to play Minecraft after watching some 70,000 hours of video showing people playing the game on YouTube.
My school
Compared to countless previous Minecraft algorithms that work in much simpler “sandbox” versions of the game, the new AI plays in the same environment as humans, using standard keyboard and mouse commands.
In a blog post and pre-print the OpenAI team describes the work and says the algorithm learned basic skills out of the box, such as chopping down trees, making planks and building work tables. They also saw him swimming, hunting, cooking and “pillar jumping.”
“To our knowledge, there is no published work that operates in the full, unmodified human action space, including drag-and-drop inventory management and item creation,” the authors wrote in their paper.
Fine-tuning — that is, training the model on a more focused data set — they found that the algorithm performed all of these tasks more reliably, but also began to increase its technological prowess by manufacturing wood and stone tools and building base shelters, villages, and loot boxes.
After further refining it with reinforcement learning, it learned to build a diamond pickaxe – a skill that takes human players about 20 minutes and 24,000 actions to accomplish.
This is a remarkable result. AI has long struggled with Minecraft’s wide-open gameplay. Games like chess and Go, which have already mastered AI, have clear objectives and progress towards those objectives can be measured. To conquer Go, researchers used reinforcement learning, where an algorithm is given a goal and rewarded for progress toward that goal. Minecraft, on the other hand, has any number of possible goals, progress is less linear, and deep reinforcement learning algorithms usually keep their wheels spinning.
For example, in the 2019 MineRL Minecraft competition for AI developers, none of the 660 entries made it to the the relatively simple goal of the competition to mine diamonds†
It’s worth noting that in order to reward creativity and show that throwing computing power at a problem isn’t always the solution, the MineRL organizers imposed strict restrictions on the participants: they were given one NVIDIA GPU and 1,000 hours of recorded gameplay. Although the participants performed admirably, the OpenAI result, achieved with more data and 720 NVIDIA GPUs, seems to show that computing power still has its advantages.
AI is getting cunning
With its video pre-training (VPT) algorithm for Minecraft, OpenAI returned to the approach it uses with GPT-3 and DALL-E: pre-training an algorithm on a towering dataset of man-made content. But the algorithm’s success was not only made possible by computing power or data. Training a Minecraft AI on so much video was not practical before.
Raw video is not as useful for behavioral AIs as it is for content generators such as GPT-3 and DALL-E. It shows what people do, but it doesn’t explain how they do it. In order for the algorithm to associate video with actions, it needs labels. For example, a video frame showing a player’s collection of objects should be labeled “inventory” next to the command key “E” used to open the inventory.
Tagging every frame in 70,000 hours of video would be… insane. So the team paid Upwork contractors to record and label basic Minecraft skills. They used 2000 hours of this video to teach a second algorithm how to label Minecraft videos, and That algorithm, IDM, annotated all 70,000 hours of YouTube footage. (The team says IDM was more than 90 percent accurate in labeling keyboard and mouse commands.)
This approach of people training a data labeling algorithm to unlock online behavioral data sets could also help AI learn other skills. “VPT paves the way to allowing agents to learn to act by watching the vast numbers of videos on the internet,” the researcher wrote. In addition to Minecraft, OpenAI thinks VPT could bring new applications to the real world, such as algorithms that operate computers at a prompt (for example, imagine asking your laptop to find a document and email it to your boss).
Diamonds are not forever
Much to the chagrin of the MineRL competition organizers, the results seem to show that computing power and resources are still moving the needle on the most advanced AI.
Regardless of the cost of computing, OpenAI said the Upwork contractors cost $160,000 alone. But to be fair, manually labeling the entire dataset would run into the millions and take a long time to complete. And while the computing power was not negligible, the model was actually quite small. The hundreds of millions of parameters of VPT are orders of magnitude less than the hundreds of billions of GPT-3.
Still, the drive to find smart new approaches that use less data and computing is valid. A child can learn the basics of Minecraft by watching one or two videos. Today’s AI requires a lot more to learn even simple skills. To make AI more efficient is a great, worthy challenge.
In any case, OpenAI is in a partial vote this time around. The researchers say VPT is not without risk — they have strictly controlled access to algorithms like GPT-3 and DALL-E, in part to mitigate abuse — but the risk is minimal for now. They have made the data, environment, and algorithm open source and are partnering with MineRL. This year’s participants will be able to use, modify and refine the latest Minecraft AI.
Chances are they will get way past the mining diamonds this time around.