AI for mathematics
Over the last few months, AIs have helped solve several open problems within mathematics. This February, Epoch added a set of open problems within FrontierMath, their benchmark for mathematical research abilities. One of these problems – a Ramsey-style problem on hypergraphs – was recently solved by AIs: researchers creamed out solutions from Gemini 3.1 Pro, GPT-5.4 (xhigh), and Opus 4.6 (max). And by now (late March 2026), around 50 of the odd Erdös problems have been solved with the aid of AI tools. Amazingly, some problems were largely solved autonomously, without any human in the loop.
State of the art #
AIs have a number of strengths relative to humans. They excel at cracking competition-style problems – self-contained problems which involve a small set of techniques – and have done so for years. They’re also good at analysing large sets of data and generating hypotheses. For example, Gauss made long tables of $\int_2^x (\log t)^{-1} \ dt$ before conjecturing the Prime Number Theorem; such tasks seem amenable to AI automation. AIs are also good at exhaustive case analysis. Some results, such as the Four Colour Theorem, are difficult to prove using other approaches.
However, there are still areas where humans seem to outperform machines. While I’ve only used ordinary LLMs, rather than dedicated research tools like Prism, I’ll share some of my experiences.
Ordinary LLMs sometimes struggle to produce rigorous arguments, arguments where every step needs to be correct. The terminology ‘AI slop’ is fitting: unless they’re coupled with a proof verification engine, LLMs can be sloppy. For niche topics, I also find LLMs to be frustratingly inaccurate. LLMs often flood the chat with irrelevant information; good supervisors have an uncanny ability to focus on what matters. Experts give you the right reference, figure out the actual crux, ask the right questions.
This said, I imagine these weaknesses – the lack of rigour and general research taste – seem fixable with the right scaffolding. Overall, it seems quite plausible to me (≥ 60%) that AIs could automate most aspects of mathematical research within 10-20 years.
The future of mathematics #
Regardless of whether full automation is possible, it’s worth reflecting on how AI will transform the field of mathematics.
Terry Tao helped co-found the Foundation for Science and AI Research (SAIR) with the partial aim of exploring this question, and he has many interesting ideas on this topic1.
First, as Terry Tao points out in the Dwarkesh interview, the use of AIs shifts the bottleneck to verifying arguments rather than coming up with ideas2. Hence, I suspect proof verification software like Lean will play a larger role in the future of mathematics. In the most extreme scenario (we like the extremal principle, no?), mathematical research might reduce to operating AIs: have AIs generate 1 000 hypotheses, make other AIs double-check the reasoning, then employ your paper-writing AIs, iterate.
Second, the ease of generating new ideas significantly lowers the entry barrier for doing advanced mathematics: if you can obtain a proof from an LLM and have another AI system check it, you’ve effectively produced a proof. You might prove something without understanding what you’re doing3.
Conclusions #
Needless to say, AIs will play an important role in the future of mathematics, though it’s unclear how things will play out exactly.
I find it worth monitoring AI progress in mathematics for several reasons. Compared with more practical sciences like biology or chemistry, maths lends itself more easily to AI-driven progress. The nature of the subject, with results being either true or false, also enables efficient AI training; softer subjects give less RL signal. Moreover, using AIs to solve mathematical problems is also a problem that appeals to AI developers – many of whom are nerd-sniped maths graduates – so there’s a lot of effort going into AI for maths. In some sense, the ‘AI for science’ movement seems upper-bounded by the ability of AIs to do maths research.
In this post, he describes his use of AlphaEvolve to establish an integral bound. He writes: ‘Quite possibly AI tools would also have been able to assist with these steps [some intermediary bounds], but they were not necessary here; their main value for me was in quickly confirming that the approach I had in mind was numerically plausible, and in recognizing the right technique to solve one part of the toy problem I had isolated.’ ↩︎
To some extent, this is already the case – the review process in mathematics is notoriously slow. ↩︎
You could prove results without insight before even LLMs, thanks to software like Lean. Case in point: the liquid tensor experiment. However, AIs facilitate the generation of such proofs. ↩︎