Skip to main content

AI, Math, and Language: Bridging the Divide

Robots reading math

New York Times journalist Steve Lohr recently wrote a piece in which he explored the problem Large Language Models (LLMs) have with math. He noted that in the early days of computation, the power of the machine was centered in its ability to perform math faster and far more accurately than any human while struggling with both language understanding and generation. With LLMs, this equation is flipped. The new generative models have extraordinary language abilities but have only rudimentary math skills. 

This flip points to the thesis that we are looking at two different kinds of skills and two different kinds of processing underlying intelligence. As we move forward to generative models, it is important for us to look at these different modes of reasoning and consider how they should be integrated. 

Remember, LLMs are designed as completion engines. They analyze preceding text and predict what comes next. Their predictions are not unlike our ability to either begin a sentence and then use it as the guide for what comes next or respond to a prompt. 

Consider this in the context of an exercise of completing the following sentences: 

“pizza”, “wings”, and “Dakota” flow easily as the completions to these sentences. Each is the most likely word, given the text that proceeds. Your frictionless fluency is the product of seeing variants of these sentences hundreds if not thousands of times. 

This mirrors how language models function—they excel at recognizing patterns and guessing the most likely next word. 

However, math presents a different challenge.  

Consider the statement: 

For most of us, dealing with this statement simply feels different. There is no “most likely” number popping out as an answer. We have not seen this specific problem in other forms, so there is nothing guiding a specific prediction.  

The problem might bring to mind the steps for long division, but knowing the process doesn’t equate to knowing the answer. Like us, this is where language models hit a barrier—they might predict steps but cannot compute the results without external assistance. 

Interestingly, we have seen this issue before. Prolog, an early logic programming language, was once seen as the future of AI. Unfortunately, Prolog had problems with math. Using logic to build a multi-step proof that two numbers added together equal a third one (essentially what Prolog does) is not the most efficient way to add 5 and 7. 

To deal with this issue, Prolog often relied on Lisp, another more traditional language, to handle the math. It was a hack referred to as escape to Lisp 

Today, LLMs perform similarly, using external programs to handle mathematical tasks as well as search and complex analytics. This reliance underscores a critical point: calling upon external systems does not mean the core AI engine understands the answer or the underlying process used to determine it—it just knows what other system is needed to find the solution. 

This distinction is essential as we consider the future of AI. Language models are not evolving into comprehensive standalone problem-solvers; they are becoming part of a broader, interconnected system of systems. This includes specialized tools for various tasks, (e.g., math, search, data analysis) each contributing to the overall capabilities of AI. Achieving artificial general intelligence (AGI) will require integrating numerous specialized functions, not merely enhancing a single model. 

The idea that a single, all-encompassing AI model can solve every problem is a strange and unwarranted assumption. Even today, AI is and will continue to be a composite, an ensemble of technologies working in concert. To move towards AGI, we must recognize and harness the strengths of these diverse components. 

By acknowledging the strengths and limitations of individual models and fostering their integration, we can harness the full potential of AI. This approach not only paves the way for AGI but also ensures that we build systems that are adaptable, reliable, and truly intelligent. 

Kristian Hammond
Bill and Cathy Osborn Professor of Computer Science
Director of the Center for Advancing Safety of Machine Intelligence (CASMI)
Director of the Master of Science in Artificial Intelligence (MSAI) Program

Back to top