Achieving QbD with machine learning

Regulators prefer processes designed with drug quality in mind. The challenge is that as production methods become more complex, established modeling techniques struggle to cope.

So says Ian Walsh, PhD, a staff scientist at the bioprocessing technology institute A*STAR in Singapore, who looked at evolving quality-by-design (QbD) challenges in a study published earlier this year.

“There is tremendous complexity in the omics data that we can derive from the cell culture media, the physiochemical properties of the bioprocess and other bioreactor readouts that can be derived from improved characterization,” he explains. “What we see now is the number of CPPs [critical process parameters] are growing beyond the small number of variables used in the industry even a few years ago. Who knows how many CPPs there will be in five years.”

machine learning

To cope with the rise in CPP, the industry needs an alternative to multivariate data analytics techniques, Walsh says, citing machine learning (ML) as a potential solution.

“MVDA techniques can mathematically model the relationships between the input CPPs and output variables such as titer, cell growth, and critical quality characteristics. MVDA methods are popular for their simplicity and ease of use,” he continues. number of sensors, increasing quality of sensors and ‘deeper-faster’ testing of the cell culture media, the relationships that exist between CPPs and the bioprocess output variables are likely to be non-linear and require more advanced modeling algorithms such as machine learning.”

An ML algorithm can automatically build a model of a real-world problem without being explicitly programmed. It achieves this by examining sample data and optimizing itself in such a way that it can predict results when confronted with new data.

And the ability to predict outcomes is where biopharma can benefit, according to Walsh, who adds that “ML can often do better than humans at that one task, for example predicting whether a bioprocess produces a drug that is of substandard quality.”

Easy coding, less data requirements

And there is good news for companies interested in using ML to model processes. Much of the code needed to build the models has already been written, Walsh points out.

“Creating and training the ML algorithm is the easy part – there are many open source libraries available to do this,” he says. “The hardest part is developing a large, diverse dataset with high-quality process data. However, with new sensors such as Raman, high-throughput LC-MS workflows, the development of real-time assays and the ability to characterize omics in depth, we can derive this data for modeling and/or training processes.”

Currently, ML algorithms are process specific. However, if the industry is willing to invest in expertise and collaborate, it may be able to create models that are useful for multiple parties, notes Walsh.

“The holy grail would be ML algorithms that have general predictive power, meaning the algorithm can be used in different factories without having to retrain it. This would be challenging, but possible,” he says. GENE

Whether drug makers would collaborate on developing ML is unclear, Walsh says, because “the data is extremely valuable for any company.” However, the benefits of such cooperation have already been demonstrated elsewhere.

“In other biological domains, some interesting algorithms have been developed because all the data was shared,” he explains. “For example, the protein data available in the protein database and UNIProt have led to interesting ML algorithms such as alphafold developed by Google’s deepmind.”

Leave a Comment

Your email address will not be published.