In brief The American hardware startup Cerebras claims to have trained the largest AI model on a single device that is powered by the world’s largest Wafer Scale Engine 2 chip the size of a plate.
“With the help of the Cerebras Software Platform (CSOFT) our customers can easily train the most modern GPT language models (such as GPT-3 and GPT-J) with a maximum of 20 billion parameters on a single CS-2 system,” the company claimed this week. “These models run on a single CS-2, it takes a few minutes to set them and users can quickly switch between models with just a few keystrokes.”
The CS-2 has no less than 850,000 cores and has 40 GB on-chip memory that can reach a memory bandwidth of 20 pb/sec. The specifications of other types of AI accelerators and GPUs fade in comparison, which means that machine learning engineers have to train huge AI models with billions of parameters on more servers.
Although Cerebras apparently succeeded in training the largest model on one device, it will still have difficulty winning large AI customers. The largest neural network systems nowadays contain hundreds of billions to billions of parameters. In reality, many more CS-2 systems would be needed to train these models.
Machine learning engineers will probably come across similar challenges that they are already confronted with in distributing training on countless machines with GPUs or TPUs-so why switch to a less well-known hardware system that does not offer so much software support?
Surprise, surprise: Robot trained on internet data was racist, sexist
A robot that was trained on a poor data set that had been scraped from the internet showed racist and sexist behavior during an experiment.
Researchers from Johns Hopkins University, Georgia Institute of Technology and the University of Washington gave a robot to do blocks in a box. The blocks were covered with images of human faces. The robot received instructions to put the block that he thought was a doctor, housewife or criminal in a colored box.
The robot was powered by a clip-based computer vision model, which is often used in text-to-image systems. These models are trained to learn the visual allocation of an object in his word description. Given a caption, it can then generate an image that corresponds to the sentence. Unfortunately, these models often show the same prejudices as in their training data.
For example, the robot identified blocks with women’s faces such as housewives, or associated black faces more than criminals than white men. The device also seemed to give preference to women and people with a dark skin, less than white and Asian men. Although the research is only an experiment, the use of robots that have been trained on poor data can have consequences for real life.
“In a house, the robot might pick up the white doll when a child asks for the beautiful doll,” says Vicky Zeng, a graduate student computer science at the Johns Hopkins. said† “Or maybe in a warehouse where many products with models are on the box, you can imagine that the robot reaches the products with white faces on it more often.”
Largest Open Source Language Model released
This week, the Russian internet company Yandex published the code for a language model with 100 billion parameters.
The system, called Yalm, was trained at 1.7 TB of text data that had been scraped from the internet and 800 Nvidia A100 GPUs needed for computing power. Interesting that the code was published under the Apache 2.0 license, which means that the model can be used for research and commercial purposes.
Academics and developers are taken with attempts to replicate large language models and make open source. These systems are a challenge to build, and usually only large technology companies have the resources and expertise to develop them. They are often patented and without access they are difficult to study.
“We really believe that global technological progress is only possible through collaboration,” a Yandex spokesperson told The register† “Large technology companies owe a lot to the open results of researchers. In recent years, advanced NLP technologies, including large language models, have become inaccessible to the scientific community because the resources for training are only available to large technology.”
“Researchers and developers around the world need access to these solutions. Without new research, growth will decrease. The only way to prevent this is to share best practices with the community. By sharing our language model, we support the pace from the development of global NLP. “
Instagram will use AI to verify the age of users
The parent company of Instagram, Meta, test new methods to verify that users are 18 years or older, including the use of AI to analyze photos.
Research and anecdotal evidence have shown that the use of social media can be harmful to children and young teenagers. Users on Instagram give up their date of birth to confirm that they are old enough to use the app. You must be at least 13 and there are more restrictions for people under 18.
Now parent company Meta tries three different ways to verify that someone is older than 18 when they change their date of birth.
“If someone tries to edit their date of birth on Instagram from under the age of 18 to 18 years or older, we ask him to verify his age with one of the following three options: upload his identity certificate, make a video selfie or ask common friends for theirs to verify age, “the company announced this week.
Meta said it collaborated with Yoti, a platform for digital identity, to analyze the ages of people. Images of Video-selfie are carefully examined by Yoti’s software to predict someone’s age. Meta said Yoti to use a “data set on anonymous images of different people from all over the world”.
GPT-4Chan was a bad idea, say researchers
Hundreds of academics have signed a letter in which they condemn GPT-4ChanThe AI-language model trained on more than 130 million messages on the notorious toxic internet shot 4chan.
“Large language models, and more in general basic models, are powerful technologies that entail a potential risk of considerable damage,” the letter, under the leadership of two professors at Stanford University, started. “Unfortunately, we, the AI community, currently have no community standards regarding their responsible development and commitment. Nevertheless, it is essential for members of the AI community to condemn clearly irresponsible practices.”
These types of systems are trained on large quantities of text and learn to simulate the data. Feed GPT-4CHAN which looks like a conversation between Netizens, and it will continue to add more fake tubes to the mix. 4chan is notorious because of the relaxed rules for moderation content – users are anonymous and can post everything as long as it is not illegal. It is not surprising that GPT-4Chan also started spewing text with similar levels of toxicity and content. When it was released at 4chan, some users were not sure if it was a bone or not.
Now experts De Maker, YouTuber Yannic Kilcher, have criticized for the irresponsible use of the model. “It is possible to come up with a reasonable argument for training a language model on poisonous speech – for example to detect and understand toxicity on the internet, or for general analysis. The decision of Kilcher to use this bone is sufficient, however Not in any test of reasonableness his actions earn censorship. He undermines the responsible practice of AI science, “concluded the letter.