We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today†
All machine learning libraries and projects rely on data to learn, train, and work.
In an effort to help developers more easily take advantage of labeled datasets and machine learning models for: computer vision† Roboflow today announced an expansion of its datasets and AI models as part of its Roboflow Universe initiative, which could well be one of the largest such open source repositories. Roboflow claims it now has over 90,000 data sets with over 66 million images in the Roboflow Universe service launched in August 2021.
Roboflow was founded in 2019 and Raised $20 Million in a Series A funding round in September 2021. Roboflow provides the open-source Universe repository of datasets and models for computer vision and data labels, model development and hosting capabilities. Roboflow’s business model is to provide entry-level users with free service levels and then as usage grows, or for organizations working with proprietary sets, the company offers paid support and service options.
The Roboflow universe isn’t just about providing images for a developer to use; it’s about delivering images that are curated in a way that allows datasets to be used for AI-powered applications.
“A project is basically something that contains both a dataset that someone could use and a trained model on top of that dataset,” Joseph Nelson, co-founder and CEO told VentureBeat. “The dataset is both the images and the annotations.”
Data is fun, labeled data is more fun
Nelson said organizations typically spend a significant amount of time preparing machine learning data.
The data preparation process involves labeling and classifying data so that a model can be trained effectively. Nelson said the labeling in Roboflow Universe isn’t just a description of an image either.
Labels that Roboflow Universe can include for a given data set are things like a bounding box, which provides a frame around an object, which can be useful for object detection in a busy landscape. Another type of labeling Roboflow performs is instance segmentation, which produces a polygon shape that is neatly displayed around the object of interest.
Data labeling formats used in machine learning are also often complex and varied. That’s why Nelson said Roboflow supports exporting datasets to 36 data label annotation formats. Among the supported formats are: COCO JSON† VOC XML and the YOLO Darknet TXT format†
“By making the image data widely available and usable, someone can immediately find a dataset, include it in their training pipeline, and get to work,” Nelson said.
How developers integrate Roboflow Universe datasets into applications
Bringing computer vision data sets and models into AI-powered applications can often be a complex integration.
Nelson’s goal with Roboflow is to help minimize complexity. He said that Roboflow Universe datasets can be accessed through open APIs. For example, he noted that Roboflow has a Python package hosted on the Python Package Index (PyPI) that allows developers to programmatically download images, annotations, and models and then embed these components directly into an application.
Implementing a Roboflow Universe model in popular cloud machine learning services, including AWS Sagemaker or Google’s Vertex, is also a simple action via an API call, according to Nelson. In addition, Roboflow makes datasets and models available as Docker containers, enabling deployment to edge devices. There is also a software development kit (SDK) to support Apple iOS devices.
“If we make it really easy to use a model wherever you want to use it, then ideally an engineer would spend their time doing what their business logic is actually doing,” Nelson said.
The intersection of open source models and AI bias
Making it easier to access computer vision datasets and models to build applications is a key goal for Roboflow. Another effect of having such a large corpus of open source data is to allay concerns about AI bias.
“Bias in AI is never a solved problem,” Nelson said. “But offering explanation, accessibility and findability can help.”
Nelson explained that AI bias is often about trying to understand why a model made a particular decision. Essentially, the way models make decisions is based on data on which the models have been trained. By having a larger dataset with more diversity, a model can potentially become more representative, with less risk of bias.
“Ultimately, a lot of AI bias problems stem from underrepresentation,” Nelson said. “The way to solve underrepresentation is to enable active collection of datasets of the underrepresented class and make that data accessible, searchable and actionable.”
The mission of VentureBeat should be a digital city square for tech decision makers to gain knowledge about transformative business technology and transactions. Learn more about membership.