Major language models are widely applied in a range of natural language tasks, such as answering questions, reasoning with common sense, and summaries. However, these models struggled with tasks that required quantitative reasoning, such as solving problems in math, physics, and engineering.

Researchers find quantitative reasoning an intriguing application for language models because it puts language models to the test in different ways. The ability to accurately parse a query with normal language and math notation, remember relevant formulas and constants, and produce step-by-step answers that require numerical calculations and symbolic manipulation are necessary for solving math and science problems. Therefore, scientists believe that machine learning models will require significant improvements in model architecture and training methods to solve such reasoning problems.

A new Google study introduces Minerva, a language model that uses sequential reasoning to answer math and science problems. Minerva solves such problems by providing solutions with numerical calculations and symbolic manipulation.

Their findings show that performance on a range of challenging quantitative reasoning tasks improves significantly by focusing on collecting training data relevant to quantitative reasoning challenges, training models to scale, and using best inference approaches.

The researchers trained Minerva on a 118 GB dataset of scientific articles from the arXiv preprint service and web pages containing mathematical expressions in LaTeX, MathJax or other formats. The model maintains the symbols and formatting information in the training data as critical to the semantic meaning of mathematical equations. This allows the model to communicate using conventional mathematical notation.

In order to answer math problems more effectively, Minerva also uses contemporary prompting and grading procedures. These include majority vote and train of thought or notepad. Like most language models, Minerva gives probabilities to several possible outcomes. It generates different answers by stochastically sampling all possible outcomes while answering a question. Although the stages in these methods are different, they often lead to the same conclusion. Minerva then selects the most common solution as the final answer by majority vote.

The researchers examined Minerva on STEM benchmarks ranging in difficulty from elementary school-level challenges to graduate-level courses to test numerical reasoning skills. These benchmarks include:

- Problems of high school math competitions
- MMLU-STEM, a subset of the Massive Multitask Language Understanding benchmark that focuses on high school and college-level STEM subjects, including engineering, chemistry, math, and physics.
- GSM8k with basic arithmetic operations used in elementary school math problems
- OCWCourses, a series of MIT OpenCourseWare college and graduate-level challenges that span a range of STEM topics, such as solid state chemistry, astrophysics, differential equations, and special relativity.

Their findings show that Minerva consistently delivers cutting edge results, sometimes significantly.

As mentioned in their recent paper, the team emphasizes that their quantitative reasoning strategy is not based on formal mathematics. With no apparent underlying mathematical structure, Minerva analyzes queries and produces answers using a combination of natural language and LaTeX mathematical expressions. According to them, the inability of the method to automatically verify the answers of the model is a major drawback. Even when the final result is known and verifiable, the model may use flawed reasoning processes that cannot be automatically identified to arrive at the final response.

Machine learning models are excellent tools in many scientific fields, but they are often used only to solve certain problems. The team hopes their model, which is capable of quantitative reasoning, will help researchers and students learn about new opportunities.

This Article is written as a summary article by Marktechpost Staff based on the paper 'Solving Quantitative Reasoning Problems with Language Models'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper and blog post.Please Don't Forget To Join Our ML Subreddit