Teaching AI to See: Unlocking the Visual Frontier of Learning

Placeholder visual representing AI and visual learning

Of the many great potential applications of today’s AI, education may perhaps be the most impactful for humanity. With the power of today’s most advanced AI models, we are close to creating a digital tutor capable of understanding the idiosyncratic needs of every student, and flexibly teaching to every one of those needs. Bringing real, personalized teaching to all students is no longer theoretical, but practical. The opportunities this unlocks, the barriers this removes, will have generational implications.

Yet, while we have seen the math and science skills of LLMs rapidly improve, academic research and anecdotal experience have shown LLMs to still be ineffective educators. Dozens of articles have been written expressing the deep concerns around the cognitive debt and atrophy brought about by students’ AI usage patterns today. Instead of interacting in a way to deeply understand a concept, students resort to passive copy-and-pasting, rendering LLMs no more than word problem calculators.

To wholly blame this on students overlooks the fundamental constraints of AI technology. The reason students resort to passive copy-and-pasting is because LLMs are unable to teach in the most important modality of learning – vision.

Vision is 80% of learning. It’s embedded in our biology, with 30 to 50% of the cerebral cortex being dedicated to visual processing. Studies on information retention indicate that visual information (such as graphics and illustrations) is retained 3-5 times more effectively than text alone. MRI scans further show that when a person conceptualizes or visualizes a concept or problem, the same neural pathways activate as if they were actually viewing a visual aid. This finding is particularly applicable to STEM fields, where visual thinking is paramount to understanding.

Despite how important vision is for learning, LLMs are still unable to generate human teacher quality visual output. Consequently, with only text material available, students aren’t able to make use of the necessary, efficient brain learning systems they have, and end up struggling to understand complex ideas. This culminates in the brain and cognitive decline we’ve seen spreading across school campuses today.

Fortunately, ongoing research in AI will solve these shortcomings. Much of the core discoveries have already taken place, as we’ve seen with the marked progress of AI images. To bring human teacher quality visual output to LLMs, the only remaining ingredient is rich training data. Numerade’s STEM videos, laboriously crafted by real human educators, are filled with visual aids like diagrams, illustrations, graphs, and models, all tied to pedagogical, spoken descriptions of the visual aid (which are then transcribed). The high dimensionality of these videos provide the necessary training examples to help AI models understand the relationships between questions, solutions, explanations and what is needed to visually package everything together into a complete lesson.

In the field of generative AI, the course of the last 1-2 years has taken us from amorphous, indeterminate objects to astounding, premium quality images and videos. Despite the advancements there, generative AI for education, in the form of visual aids, is still an open research area. Without a robust solution, students these days have no option but to rely on the text-only outputs of LLMs, and consequently aren’t able to truly understand the material, in turn suffering cognitive decline. With Numerade’s datasets, we are now able to fill the gaps in training data and teach today’s generative AI models how the best human educators teach with visual aids. For students, this will bring genuine, human quality instruction and unlock the true potential of personalized education for all.