LinkedIn Engineering recently feature store are open source feathersthat helps engineers develop machine learning products by simplifying feature management and use in production.
Feather is the data management layer for machine learning applications. It defines functions, calculates them for training and inference purposes, and makes them discoverable by other machine learning developers. It helps to scale and manage the machine learning products by reducing the steps to generate, maintain, and observe common functions.
As shown in the following image, pipelines for generating machine learning functions must contain and participate in several time-sensitive data sources. These functions are kept in databases or caches for training and inference purposes (real-time or batch). Consistency is very important in this process. It means that functions should be prepared in the same way for training and inference to avoid inconsistency and leakage in the machine learning models.
Generic machine learning functions and inference pipelines
Feather is an abstraction layer that provides the namespace for defining, calculating, operating, and discovering common machine learning functions. The high-level architecture is like the producer-consumer architecture where the producers define, generate and register machine learning functions and consumers use those functions in training and inference. Feather has a simple programming model. Developers only specify the names of the features they want to import and use in their machine learning models. All other background processes, such as how to get and calculate everything, happens in Feathr. As stated in the LinkedIn blog post †
Under the hood, Feathr is figuring out how to deliver the requested feature data in the required manner for model training and production inference. For model training, functions are computed and merged with input labels in a timely manner, and for model inference, functions are pre-materialized and implemented in online datastores for low-latency online service. Features defined by different teams and projects can easily be used together, enabling collaboration and reuse.
As part of this announcement, LinkedIn engineering has open sourced Ferry in Github and made this service available on azure blue (Microsoft Cloud Service) for developers.
Feature Store is one of the most important services essential in machine learning operations (MLops† It accelerates adoption and democratizes machine learning products in any enterprise. There is a special community around this topic which also has its peak.
AWS SageMaker (Amazon Machine learning Service) feature store and Google Cloud Vertex AI are some examples for feature store solutions on public clouds. There are also other open-source feature stores for the public, such as Party† Databricks Feature Storeand Hopsworks†