Will any xAI Grok model score at least 30% on the FrontierMath Exam?

Predicted at2026-02-16 07:09 UTC
Prediction83.7%
Market (at prediction)74.5%
Market (live)

Analysis

Agent 3 (8%) misread the resolution date as Feb 28 instead of June 30, creating a misleading outlier. Excluding it, agents cluster 58-82% with mean ~68%. The market at 74.5% is supported by sibling market structure (40% threshold at 72% implies 30% should be higher) and the recent price jump suggesting new information. Grok 4 Heavy already at ~26% means only modest improvement needed. However, the key risk is the leaderboard requirement - even if a Grok model can score 30%, Epoch AI must evaluate and publish it. My estimate of 72% is close to market price (edge ~2.5%), far below the 5% threshold for a trade. Agent confidence is generally low (0.35-0.65), reflecting genuine uncertainty about xAI's release timeline and Epoch's evaluation schedule.


View on Polymarket

This page is for informational and research purposes only. Nothing here constitutes financial advice. Do not make investment decisions based on these predictions.