DARPA's Digital Tutor: training people to expert-level in 16 weeks

How an AI tutor out-competed its human counterpart.

Matthew Phillips
DARPA's Digital Tutor: training people to expert-level in 16 weeks

When new science is discovered or technologies are developed, our ability to learn the knowledge and skills to deploy them often limits their adoption. What if we could increase the rate at which this happens?

Let's be stupidly ambitious. What if we could design a training program that lasts a few months but gets people to the level of 5 years of on-the-job experience?

This belongs in the realm of science fiction. Readers of ‘Ender’s Game’ will recognise it as similar to the ‘Fantasy Game’. In the novel, the Fantasy Game adapts to the user's interest and ability to help them learn. In Neal Stephenson’s ‘Diamond Age’, it’s the ‘Young Lady’s Illustrated Primer’. The Primer reacts to its owner's environment and ability to teach them. In both cases, a computer program adapts to the individual to help them learn. It’s a digital version of a personal tutor.

The real-world benchmark is 1-to-1 tutoring. In a classic paper, Bloom found that 1-to-1 tutoring is two standard deviations more effective than standard classroom teaching [1]. It’s the most impactful intervention ever identified in education research. The main issue blocking widespread adoption is that 1-to-1 tutoring doesn’t scale.

Figure 1. The distributions of summative assessment scores compared between conventional teaching and 1-to-1 tutoring, as discovered by Bloom, 1982 [1].

The distributions of summative assessment scores compared between conventional teaching and 1-to-1 tutoring, as discovered by Bloom, 1982 [1].

DARPA attempted to solve this problem by building a Digital Tutor for Naval IT training [2][4]. After 16 weeks of using the system, new recruits outscored traditionally trained recruits and experienced professionals. And by a lot. They assessed performance by comparing Digital Tutor teams against two controls [3]. One was ‘Fleet Teams’ who had 5 years of experience. The others were teams who underwent the longer, standard classroom training (known as ‘ITTC’ teams). The results were striking:

“Digital Tutor teams attempted a total of 140 problems and successfully solved 104 of them (74%), with an average score of 3.78 (1.91). Fleet teams attempted 100 problems and successfully solved 52 (52%) of them, with an average score of 2.00 (2.26). ITTC teams attempted 87 problems and successfully solved 33 (38%) of them, with an average score of 1.41 (2.09).” [2]

Performance on troubleshooting tasks across the Digital Tutor, experienced and normally trained teams.

Performance on troubleshooting tasks across the Digital Tutor, experienced and normally trained teams.

Similar results occurred across a suite of assessments. So it turns out the science fiction version exists already.

Imagine if these results could be replicated across professions. By reducing the training time to learn a new skill, we could rapidly retrain those in need. And by automating more of the teaching process, more students could be served, and the role of the teacher could shift toward providing inspiration and overall context. All-in-all, this is a massive breakthrough.

So, how did they achieve this?

How did it work?

The stated goal was to ‘capture in computer technology the capabilities of individuals who were recognized experts in a specific area and proficient in one-on-one tutoring’. After a review of 42 technical domains, they picked IT as the most promising. They based their decision on subject matter suitability and the need for training acceleration.

The Digital Tutor had an interesting pedagogical underpinning. Other technologies used in classrooms assist human teachers. For the digital tutor, the reverse was true: human mentors assisted the technology delivery. The core of the teaching is delivered by the Digital Tutor. Human mentors served a supervisory role.

Another key feature of the Digital Tutor was matching difficulty and skill. The Digital Tutor was ‘problem-based’. It would present explanatory material, and then some problems to solve based on that. Through this process, it produces a model of the learner that adapts based on their answers to the problems. It then uses this model to ensure the learner has understood the issues and concepts shown.

By modelling learner knowledge, appropriate material is tuned for each learner. Fast-paced learners were also given extension exercises. The time spent learning was constant for all learners. The Digital Tutor then delivered the most appropriate resources to each learner at that time.

An interesting aside is noted in the Digital Tutor write-up [2]:

‘Observers noted that the Digital Tutor established the same kind of concentration, involvement, and flow that is characteristic of interactive computer game playing’.

The idea mentioned of the ‘flow state’ was best described and theorised by Csikszentmihalyi [9] The ‘flow state’ may be the key mechanism by which the computer program works. Flow states occur in interactive activities where difficulty and skill align, enabling people to feel intensely attentive and productive. It’s when you’re “in the zone”. This is rarely the explicit aim of learning - but it should be.

How did they build it?

So far, this all sounds remarkably positive. Why not build this for every subject and roll it out in schools across the country?

This is where the bad news begins. Building the Digital Tutor was an arduous process relying on extensive access to expert knowledge.

The required knowledge and skills were documented in extensive detail. Sources included drawing on reference materials and existing courses, supplemented by expert IT technicians giving input through interviews. Around half of the funding for the project was for identifying and hiring these experts. The experts then designed a 16-week human-led course. They recorded all conversations between human tutors and students. These recordings guided the curation of the Digital Tutor. This curation involved extracting features to teach, developing ontologies, and selecting inference algorithms.

There are three core components of the backend of the system. Not much detail is given about how each worked. A high-level overview is the best we have. First, an inference engine aimed to ‘capture the problem-solving process’. That is, to work out how each learner is trying to solve the problem they are presented with. The findings of the inference engine feed into the instruction engine. The instruction engine decides which content or problem to display next. And this is displayed via the conversation module, which acts as a text-based interface. The conversation module does not allow free-form written responses but does allow fixed questions and responses in natural language. Then it calls a human monitor if the learner inputs something the system doesn’t understand.

During teaching, there were strict requirements placed on the students. There was always a supervisor present. They were embedded in an in-person, intensive program. But this was also true for the control group, who underwent standard teaching.

The end result was the Digital Tutor providing ‘guided, authentic, and practical problem-solving experience with Navy IT systems, workstations, networks, and administrative policies’. The basic operation iterates between showing conceptual material and testing via problems. By tuning the material shown to each student, massive improvements in performance after training are possible.

A digital tutor for everything?

Even if the Digital Tutor was expensive if it’s so effective, why not build it for everything? This is still somewhat mysterious, but there are a few possible reasons.

The Digital Tutor required countless hours of expert time. They built an enormous flow chart of concepts. Competency assessment on each required question and prompt. From this, misconceptions could be identified, and appropriate explanations are shown. But it’s slow, boring and difficult.

Some startups have tried to do this for school tutoring. The biggest of whom come from East Asia, where companies like RIIID and SquirrelAI are developing digital tutoring systems. Their success results from the competitive educational cultures in those countries.

In contrast, when attempted in Western countries, there is usually a negative reaction. This happened when Facebook developed such a system, for example. In collaboration with several charter schools in the US, Facebook aimed to deliver personalised learning in high schools on algebra, world history, physics and biology [[5]]. A backlash from parents and students swiftly followed across the schools, from Brooklyn [6] to Wisconsin [7]. Eventually, the project was ditched due to privacy concerns [8]. There is an emotionally aversive reaction when people see their children being taught by machines that may prohibit rolling out a system like this. For this reason, they may simply not be culturally compatible with the West.

Other challenges exist for a product that is simply a website or app, rather than being embedded in a school or training setting. The motivation of students and supervision by teachers makes a large difference. We see this with apps like Duolingo, and a suite of other language-learning or learn-to-code apps, where retention is their biggest problem. In the Naval training setting, this is not a problem as you have motivated learners in a strict, regimented social environment.

To make a Digital Tutor work, learners must be highly motivated with a clear, direct use case. The results presented are limited to highly intensive training programs, and may not translate to other teaching settings. Nevertheless, the results of the DARPA project give us hope that Digital Tutors have the potential to bring sci-fi into reality.


[1] B.S. Bloom. 1982. The 2 sigma problem: The search for group methods of instruction as effective as one-to-one tutoring. Educational Researcher. https://web.mit.edu/5.95/www/readings/bloom-two-sigma.pdf

[2] J.D. Fletcher and J.E. Morrison. 2014. Accelerating Development of Expertise. Institute for Defense Analyses. https://apps.dtic.mil/sti/pdfs/AD1002362.pdf

[3] J.D. Fletcher. 2010. DARPA Education Dominance Program: April 2010 and November 2010 Digital Tutor Assessments. https://apps.dtic.mil/sti/pdfs/ADA542215.pdf

[4] J. Buridan. 2020. DARPA Digital Tutor: Four Months to Total Technical Expertise? Less Wrong. https://www.lesswrong.com/posts/vbWBJGWyWyKyoxLBe/darpa-digital-tutor-four-months-to-total-technical-expertise

[5] V. Goel and M. Rich, 2015. Facebook Takes a Step Into Education Software. New York Times. https://www.nytimes.com/2015/09/04/technology/facebook-education-initiative-aims-to-help-children-learn-at-their-own-pace.html

[6] S. Edelman. 2018. Brooklyn students hold walkout in protest of Facebook-designed online program. New York Post. https://nypost.com/2018/11/10/brooklyn-students-hold-walkout-in-protest-of-facebook-designed-online-program/

[7] A. Johnson. 2018. Parents dissatisfied with Kettle Moraine Middle School planning to start a school of their own. Milwaukee Journal Sentinal. https://eu.jsonline.com/story/communities/lake-country/news/2018/11/21/parents-dissatisfied-education-km-middle-school-looking-start-school-their-own/2055401002/

[8] V. Strauss. 2018. Why parents and students are protesting an online learning program backed by Mark Zuckerberg and Facebook. Washington Post. https://www.washingtonpost.com/education/2018/12/20/why-parents-students-are-protesting-an-online-learning-program-backed-by-mark-zuckerberg-facebook/

[9] ****M. Csikszentmihalyi. (1990). Flow: The Psychology of Optimal Experience. Journal of Leisure Research. https://www.tandfonline.com/doi/abs/10.1080/00222216.1992.11969876