April 2026

Muse Spark Benchmark Results

Full evaluation results across reasoning, science, medical, coding, and retrieval benchmarks

Last updated

May 27, 2026

Interpretation

Strong on health, charts, and science; weaker on coding and abstract reasoning.

CategoryBenchmarkMuse SparkGPT-5.4Gemini 3.1 ProNotesSource
OverallAA v4.0525757Comprehensive indexArtificial Analysis
Chart UnderstandingCharXiv86.482.880.2Chart comprehensionMeta / AA
MedicalHealthBench Hard42.840.120.6Medical QAMeta / HealthBench
Deep SearchDeepSearchQA74.869.7Research retrievalMeta
ReasoningHLE (Fast)36.543.948.4Meta
ReasoningHLE (Contemplating)50.243.948.4Extended reasoningMeta
ScienceFrontierScience38.336.723.3Research frontierMeta / FrontierScience
AbstractARC AGI 242.576.176.5Pattern reasoningMeta / AA
CodingTerminal-Bench 2.059.075.168.5Agentic codingMeta / Terminal-Bench

Sources combine Meta's launch materials and public benchmark summaries. Treat all launch-week benchmark data as subject to revision as third-party evaluators publish updated results.

86.4
CharXiv
#1 Chart Understanding
42.8
HealthBench Hard
#1 Medical QA
50.2
HLE (Contemplating)
#1 Extended Reasoning

Muse Spark leads on chart understanding, medical reasoning, deep search, science, and extended reasoning.

Try Muse Spark at meta.ai →