Learning Data Insights market research reveals critical infrastructure gaps holding back responsible AI adoption in EdTech
A new market research report from Learning Data Insights finds that while demand for AI-powered education technology is surging, most tools available today still fall short of the quality standards required for responsible classroom use. The report, Not Ready Yet: AI Infrastructure for EdTech Market Research, is based on 15 interviews with 22 key stakeholders across Digital Learning Platform (DLP) providers and R&D teams in the Learning Engineering Virtual Institute, along with a review of 26 reports and documents on AI and education.
“It’s easy to get something from GenAI,” said one EdTech leader with direct experience working with state-of-the-art AI tools. “But when you’re responsible for what ends up in front of a student, the gap between ‘interesting’ and ‘professionally defensible’ is still huge.”
The research surfaces a recurring pattern across EdTech teams: product leaders feel pressure to launch AI features to stay competitive, even as their quality assurance repeatedly flags outputs as below the quality bar of their existing curriculum and assessment tools. Several teams described pausing or rolling back AI pilots after discovering that reviewing and correcting generated content took more time than producing it manually.
"We are seeing a rush into content generation because it’s easy, obvious, fits into spreadsheet calculations for existing business models, and can appear to be low-risk," noted one DLP executive. Others warned that evaluating AI-generated content can take longer than simply creating that content by hand.
The report identifies five areas where targeted infrastructure investment could unlock higher-quality, more equitable AI deployment in education:
Evaluation & quality assurance tools: Current benchmarks and automated assessment methods are underdeveloped, leaving teams reliant on slow, expensive manual review. Several respondents described QA as the primary bottleneck preventing student-facing deployment.
Privacy & security solutions: Data protection concerns are among the biggest blockers for AI deployment, especially when student data, including audio and video, is involved.
Contextualization & implementation frameworks: LLMs frequently lack knowledge of a student’s background, learning history, and curriculum context, dramatically reducing the relevance of their outputs.
Classroom-optimized Automated Speech Recognition (ASR): Current ASR systems are not designed for the acoustic and linguistic realities of K–12 classrooms, blocking a wide range of promising audio-based applications.
Training & AI literacy programs: The rapid pace of AI change is outrunning educators’ ability to develop the expertise needed to use these tools effectively and critically.
Interviewees consistently described a phased approach to AI adoption: teams start with internal uses such as quality assurance and editorial workflows, expand cautiously to teacher-facing tools, and reserve student-facing applications for last. This sequencing reflects the higher stakes of direct student interaction, where reliability, safety, and trust failures are harder to detect and far more costly. As a result, many providers view student-facing AI not as a near-term feature launch, but as a longer-term goal contingent on significant infrastructure improvements.
The research points to a clear division of labor. Philanthropic funders are well positioned to invest in public goods such as evaluation standards, equity focused datasets, implementation guidance, and data annotation infrastructure that are essential for responsible AI use but unlikely to emerge through market forces alone. Frontier model providers, meanwhile, can contribute technical expertise, usage guidance, and insight into emerging capabilities. Interviewees stressed that neither sector can address the full range of infrastructure gaps independently.
The report underscores that AI implementation in education cannot be separated from equity. Without deliberate infrastructure choices, AI systems risk reinforcing existing disparities rather than improving learning outcomes, particularly for students from low-resourced families who are least represented in current data and development pipelines.
About the Report
Not Ready Yet was authored by Alexis Andres and John Whitmer of Learning Data Insights (LDI) and draws on interviews with EdTech leaders, product teams, and researchers across the sector. The research was conducted with the support of the Walton Family Foundation. The full report, executive summary, and presentation slides are available at https://osf.io/preprints/edarxiv/ngbkv_v1.
Media Contacts
Alexis Andres, Learning Data Insights
alexis@ld-insights.com
John Whitmer, Learning Data Insights
john@ld-insights.com
###
Instead, they disagreed on a basic question: how many AI models were tested. One said two, another said three, and two said six. All of those answers were defensible based on the way the paper was written.
This moment highlighted something we had not fully anticipated: even seemingly simple study features can be difficult to classify consistently.
The GenAI Evidence Hub is a structured review of more than 250 studies on generative AI for educational assessment. We are not just reading papers. We are building and stress testing a shared framework for interpreting what those papers report and how their claims should be evaluated.
As we continuously refine our coding process, several patterns have emerged. Terminology is inconsistent, with “model” used to mean a base architecture, a fine tuned variant, or a prompting configuration depending on the author. Key results are often buried, as when one paper reported “moderate agreement” for one of six features while glossing over the rest. And results vary widely, with the same model on the same task producing very different outcomes depending on the domain, dataset, or evaluation method.
If experienced researchers struggle to agree on basic study features, it becomes much harder for practitioners to assess vendor claims with confidence.
We are thirteen rounds into calibrating our coding framework and have revised our protocol three times. We are sharing all of it, including what did not work, because that process is part of the evidence.
For researchers, our preregistration, coding instruments, and full methodology are available on OSF.
For everyone else, you can read the welcome blog and see what we are building.
Contact us with questions or paper suggestions.
The GenAI Evidence Hub for Educational Assessment is a project to systematically analyze more than 250 studies examining generative AI use for education. We will examine research in automated scoring, formative feedback, and item generation to understand what actually works, what doesn’t, and what the evidence shows. While there's a lot of AI hype in the market, we’re also finding a treasure trove of research that can help technology developers evaluate the robustness of their solutions and practitioners make informed decisions about what is working and what questions they should ask. We also hope to provide suggestions and resources for researchers to understand new methods being used and build up our understanding of how to do research in this rapidly changing field.
The hub will go beyond summarizing findings to systematically coding the methodological details that matter for validity: which models were tested, under what conditions, against which baselines, and using which metrics. We will also document disagreements, false starts, and unanswered questions so the limits of the research can be seen and built upon.
We are doing this work using open science practices because we believe that progress accelerates when researchers show their work in progress. Behind every clean conclusion are many ideas that did not pan out, and sharing those paths is part of how fields actually move forward.
This project is philanthropically supported, and all outputs will be open access.
📖 Read our welcome blog.
📬 Subscribe to The Hubdate for monthly progress updates.