Research

Our research sits at the intersection of AI, cognitive science, and STEM education. We are working to build models that don't just know the answer. They know how to teach it.

Visual Reasoning in STEM

How can AI models learn to reason through diagrams, graphs, and spatial representations the way expert educators do?

STEM fields rely on visual thinking: free-body diagrams in physics, molecular structures in chemistry, phase plots in engineering. Current LLMs struggle with this because they lack training data that connects visual representations to structured reasoning. Our research uses Numerade's 5M+ educator-created videos to study how experts construct and narrate visual aids, with the goal of distilling those patterns into training signals for multimodal models.

Diagram-grounded chain-of-thoughtSpatial reasoning from video framesGraph and chart interpretationVisual-textual alignment in STEM

Multimodal Generation for Learning

Can AI create the visual aids, diagrams, and step-by-step illustrations that human teachers use to make concepts click?

Generative AI has made remarkable progress in image and video quality, but educational visual generation is an open problem. A physics diagram is not a photograph. It requires precise notation, spatial accuracy, and pedagogical intent. Our research explores how to fine-tune generative models on educator-created visual content so they can produce diagrams, annotated graphs, and visual walkthroughs that meet the standards of classroom instruction.

Diagram and figure synthesisStep-by-step visual walkthrough generationNotation-aware image generationEvaluation metrics for educational visuals

Pedagogical AI & Cognitive Science

What makes an AI explanation effective, not just correct, and how do we train models that genuinely teach?

There is a measurable difference between a correct answer and an effective explanation. Drawing on decades of research in learning science (worked example theory, cognitive load theory, desirable difficulties), we study how to encode pedagogical quality into training data and reward models. The goal is AI that scaffolds understanding, manages cognitive load, and promotes active learning rather than passive consumption.

Cognitive load modelingScaffolded explanation generationWorked example and fading effectsStudent knowledge state inference

STEM Benchmark Development

How do we rigorously measure whether an AI model can teach, not just answer?

Existing benchmarks test whether a model can produce a correct answer, but correctness alone does not capture teaching quality. We are developing evaluation frameworks that assess explanation clarity, pedagogical structure, visual reasoning accuracy, and student comprehension outcomes. These benchmarks are built on expert-annotated data and designed to drive progress toward AI that is genuinely useful in educational settings.

Explanation quality metricsVisual reasoning benchmarksPedagogical rubric developmentCross-discipline evaluation suites

View our STEM Leaderboard

Research Principles

Four commitments that guide every research effort at Numerade.

Expert-sourced

All research data originates from verified subject-matter experts, ensuring that the patterns we study reflect genuine domain expertise.

Pedagogically grounded

We draw on established learning science, not intuition, to define what makes an explanation, visual aid, or interaction effective.

Multimodal by default

STEM reasoning is inherently visual and textual. Our research treats multimodality as a first-class concern, not an afterthought.

Evaluation-driven

We build the benchmarks alongside the capabilities, ensuring that progress is measurable and that metrics reflect real educational value.

Collaborate with us

We partner with leading AI labs and research institutions to advance multimodal STEM reasoning. If you're working on related problems, we'd love to hear from you.