Big data has become a major challenge for space scientists analyzing huge data sets from increasingly powerful space instrumentation. To address this, a team at the Southwest Research Institute has developed a machine learning tool to efficiently label large, complex data sets so that deep learning models can search and identify potentially dangerous solar events. The new labeling tool can be applied or modified to address other challenges with large datasets.
As room Collecting instrument packs more and more complex data in ever increasing volumes, it is becoming an increasing challenge for scientists to process and analyze relevant trends. Machine learning (ML) is becoming a critical tool for processing large complex data sets, where algorithms learn from existing data to make decisions or predictions that can simultaneously account for more information than humans. However, to take advantage of ML techniques, people must first label all of the data — often a monumental undertaking.
“Tagging data with meaningful annotations is a critical step of supervised ML. However, labeling datasets is tedious and time-consuming,” said Dr. Subhamoy Chatterjee, a postdoctoral researcher at SwRI specializing in solar astronomy and instrumentation and lead author of a paper on these findings published in the journal natural astronomy† “New research shows how convolutional neural networks (CNNs), trained on crudely labeled astronomical videos, can be used to improve the quality and breadth of data labels and reduce the need for human intervention.”
Deep learning techniques can automate the processing and interpretation of large amounts of complex data by extracting and learning complex patterns. The SwRI team used videos of the solar magnetic field to identify areas where strong, complex magnetic fields form on the solar surface, the main precursors to space weather events.
“We trained CNNs using raw labels, manually verifying only our disagreements with the machine,” said co-author Dr. Andrés Muñoz-Jaramillo, a SwRI solar physicist with expertise in machine learning† “Then we have the algorithm with the corrected data and repeated this process until we all agreed. While flux emergence labeling is usually done manually, this iterative interaction between the human and ML algorithm reduces manual verification by 50%.”
Iterative labeling approaches, such as active learning, can significantly save time, reducing production costs big data ML ready. By gradually masking the videos and searching for the moment when the ML algorithm changes its classification, SwRI scientists have further leveraged the trained ML algorithm to provide an even richer and usable database.
“We have developed an end-to-end, in-depth approach to classifying videos of magnetic patch evolution without providing explicitly segmented images, tracking algorithms, or other handcrafted features,” said Dr. Derek Lamb of SwRI, a co-author who specializes in the evolution of magnetic fields on the sun’s surface. “This database will be critical in developing new methodologies for predicting the emergence of the complex regions conducive to space weather events, potentially extending the lead time we have to prepare for space weather.”
Subhamoy Chatterjee et al, Efficient labeling of videos on solar flux evolution through deep learning model, natural astronomy (2022). DOI: 10.1038/s41550-022-01701-3
Southwest Research Institute
Quote: Scientists demonstrate machine learning tool to efficiently process complex solar data (2022, July 6), retrieved July 6, 2022 from https://phys.org/news/2022-07-scientists-machine-tool-efficiently-complex. html
This document is copyrighted. Other than fair dealing for personal study or research, nothing may be reproduced without written permission. The content is provided for informational purposes only.