Comprehensive meta-analysis challenges Benjamin Bloom's famous "two sigma" claim…

title: On Bloom's two sigma problem: A systematic review of the effectiveness of mastery learning, tutoring, and direct instruction

author: Jose Luis Ricon

content_type: article

publication: Nintil

published: 2019-07-28T00:00:00

source_url: https://nintil.com/bloom-sigma

word_count: 13204

On Bloom's two sigma problem: A systematic review of the effectiveness of mastery learning, tutoring, and direct instruction

One of the Collison questions is

Is Bloom's "Two Sigma" phenomenon real? If so, what do we do about it?Educational psychologist

[Benjamin Bloom][found]that one-on-one tutoring using[mastery learning]led to atwo sigma(!) improvement in student performance. The results were replicated. He asks in his paper that identified the "2 Sigma Problem": how do we achieve these results in conditions more practical (i.e., more scalable) than one-to-one tutoring?In a related vein,

[this large-scale meta-analysis]shows large (>0.5[Cohen's d]) effects from direct instruction using mastery learning. "Yet, despite the very large body of research supporting its effectiveness, DI has not been widely embraced or implemented."

Answering the question requires first to explain what *Direct Instruction* and *Mastery Learning* mean.

Scope of the present article

This article is concerned with a general study of Bloom's two sigma problem, which in turn involves an examination of an educational method, mastery learning, and tutoring. I have also included a review of software-based tutoring. Later on I look at educational research in general, spaced repetition, and deliberate practice, as these seem closely related to the core topics of this review for reasons that will be obvious after reading through it.

I am only concerned here with student performance in tests, not with other putative benefits from education; I don't look in detail at what keeps students motivated, what makes them feel well, what makes them more creative, or better citizens.. I could have looked at longer term measures of success (e.g. income later on in life) but I couldn't find such studies.

As a general note, when discussing effect sizes here, unless otherwise noted, the effect sizes are of the intervention being discussed vs business as usual, using whatever educational method the school was using.

Definitions

The Two Sigma problem

Benjamin Bloom, decades ago, found that individual tutoring raised student's performance relative to a baseline class by two standard deviations, which is a MASSIVE 1 effect. As 1:1 tutoring is very expensive, he wondered if there are approaches that approximate such an effect that were applicable for larger classrooms. Finding such a method was the "two sigma problem". And Mastery Learning seemed to be the promising way to solve it.

Direct Instruction

From the meta-analysis cited above, Direct Instruction is a teaching program originally developed by Siegfried Engelmann in the 60s that assumes that any student can learn any given piece of material, and this will happen when

(a) they have mastered prerequisite knowledge and skills and (b) the instruction is unambiguous.

This doesn't sound that helpful; fortunately the National Institute for Direct Instruction has a bit more information.

There are four main features of DI that ensure students learn faster and more efficiently than any other program or technique available:

Students are placed in instruction at their skill level.When students begin the program, each student is tested to find out which skills they have already mastered and which ones they need to work on. From this, students are grouped together with other students needing to work on the same skills. These groups are organized by the level of the program that is appropriate for students, rather than the grade level the students are in.

The program’s structure is designed to ensure mastery of the content.The program is organized so that skills are introduced gradually, giving children a chance to learn those skills and apply them before being required to learn another new set of skills. Only 10% of each lesson is new material. The remaining 90% of each lesson’s content is review and application of skills students have already learned but need practice with in order to master. Skills and concepts are taught in isolation and then integrated with other skills into more sophisticated, higher-level applications. All details of instruction are controlled to minimize the chance of students' misinterpreting the information being taught and to maximize the reinforcing effect of instruction.

Instruction is modified to accommodate each student’s rate of learning.A particularly wonderful part about DI is that students are retaught or accelerated at the rate at which they learn. If they need more practice with a specific skill, teachers can provide the additional instruction within the program to ensure students master the skill. Conversely, if a student is easily acquiring the new skills and needs to advance to the next level, students can be moved to a new placement so that they may continue adding to the skills they already possess.

Programs are field tested and revised before publication.DI programs are very unique in the way they are written and revised before publication. All DI programs are field tested with real students and revised based on those tests before they are ever published. This means that the program your student is receiving has already been proven to work.

Direct Instruction is highly scripted, including even the words they should speak while teaching.

Note that Direct Instruction (titlecase) is not the same as direct instruction (lowercase), there are various programmes around that have "direct instruction" in the name, like Explicit Direct Instruction . Unless otherwise noted, we'll be talking about Direct Instruction in this review. Both are teacher-centered methods in that the teacher is seen as the one who is transmitting knowledge to the student rather than, say, the student being aided by the teacher in a quest to discover knowledge. Direct Instruction is regulated by the National Institute for Direct Instruction, as mentioned above, while direct instruction is not.

Mastery learning

Mastery learning (ML) is not the same as Direct Instruction, but ML *is a component of Direct Instruction*. It is also one of the methods Bloom originally looked at, so we also examine ML in this review. One key difference is that ML does not called for scripted lessons, while DI requires them.

The key principle of ML is simply to force students to master a lesson before moving on to the next one. At the end of each lesson, on a monthly or weekly basis, students' knowledge is tested. Those students that do not pass are given remediation classes, and they have to re-sit the test until they master it. This can be done in a group setting, as with Bloom's original Learning for Mastery (LFM) programme, or individually, as in Keller's Personalized System of Instruction (PSI), where each student advances as their own pace.

Summary

The literatures examined here are full of small sample, non-randomized trials, and highly heterogeneous results.

Tutoring in general, most likely, does not reach the 2-sigma level that Bloom suggested. Likewise, it's unlikely that mastery learning provides a 1-sigma improvement.

But high quality tutors, and high quality software are likely able to reach a 2-sigma improvement and beyond.

All the methods (mastery learning, direct instruction, tutoring, software tutoring, deliberate practice, and spaced repetition) studied in this essay are found to work to various degrees, outlined below.

This essay covers many kinds of subjects being taught, and likewise many groups (special education vs regular schools, college vs K-12). The effect sizes reported here are averages that serve as general guidance.

The methods studied tend to be more effective for lower skilled students relative to the rest.

The methods studied work at all levels of education, with the exception of direct instruction: There is no evidence to judge its effectiveness at the college level.

The methods work substantially better when clear objectives and facts to be learned are set. There is little evidence of

learning transfer: Practicing or studying X subject does not improve much performance outside of X. - There is some suggestive evidence that the underlying reasons these methods work are increased and repeated exposure to the material, the

testing effect, and fine-grained feedback on performance in the case of tutoring. - Long term studies tend to find evidence of a fade-out effect, effect sizes decrease over time. This is likely due to the skills being learned not being practiced.

Effect sizes

Assessing if an effect size is meaningful may be hard. A common way of doing so is as follows:

Effect size | d

Very small | 0.01

Small | 0.20

Medium | 0.50

Large | 0.80

Very large | 1.20

Huge | 2.0

However, one should be able to finetune the descriptive language used, by using a domain-specific reference. In this case, the average effect on performance from one year of schooling (going from 5th to 6th grade) is d=0.26 for reading performance, and the average effect from 141 large scale RCTs of educational interventions is 0.06, from Hugues & Matthew (2019). Because of this, I will be using a scale adapted from Kraft (2018):

Effect size (E.S.) | d

Small | 1.5

With that in mind, here is the summary of the main results, along with the best studies I could find to back up the claims. For comparison, I include Bloom's findings:

Method | E.S. (general) | E.S. (disadvantaged) | E.S. (Bloom) | Key references

Tutoring* | Very large | - | Huge | VanLehn (

2011), Kulik &Fletcher (2016)1990), Slavin (1987)2003), Stockard et al. (2018)* With really good tutors and really good software, the effect size can indeed be Huge.

When considering narrow knowledge of a series of facts, or basic skills taught at the elementary level, the effects of ML and DI can be Large for the general population and Extremely Large for disadvantaged students.

The evidence behind direct instruction

The meta-analysis I start the article with has a literature review, noting that all the previous literature, systematic reviews, and meta-analyses do show strong, positive effects of DI. This is no "mixed results" literature, which by itself is quite surprising, *even suspicious*; it is rarely the case that I find something so good and apparently uncontroversial.

DI/ML is what, as the meme goes, peak education looks like; you might not like it but it's the truth (as far as test scores are concerned).

In the late 1960s, DI was accepted as one of the programs to be part of Project Follow Through, a very large government-funded study that compared the outcomes of over 20 different educational interventions in high-poverty communities over a multiyear period. Communities throughout the nation selected programs to be implemented in their schools, and DI was chosen by 19 different sites, encompassing a broad range of demographic and geographic characteristics. External evaluators gathered and analyzed outcome data using a variety of comparison groups and analysis techniques. The final results indicated that DI was the only intervention that had significantly positive impacts on all of the outcome measures (Adams, 1996; Barbash, 2012; Bereiter & Kurland, 1996; Engelmann 2007; Engelmann, Becker, Carnine, & Gersten, 1988; Kennedy, 1978). The developers of DI had hoped that the conclusions of the Project Follow Through evaluators would lead to widespread adoption of the programs, but a variety of political machinations seem to have resulted in the findings being known to only a few scholars and policy makers (Grossen, 1996; Watkins, 1996).

Some of those interventions are (And I hadn't heard of them before): Direct Instruction, Parent Education, Behaviour analysis, Southwest Labs, Bank Street, Responsive Education, TEEM, Cognitive Curriculum, and Open Education.

Most of these actually caused a substantially lower performance compared to traditional school instruction. This supports the idea that at least on the metrics measured by Follow Through, the choice of educational methodology matters. In particular, the worst performer, Open Education sounds like something hip teachers would find cool:

focused on building the child’s responsibility for his own learning. Reading and writing are not taught directly, but through stimulating the desire to communicate. Flexible schedules, child-directed choices, and a focus on intense personal involvement characterize this model.

The effect sizes found in the meta-analysis are around 0.5, which, in the social sciences are considered to be fairly high.

The paper also looks at variability of the results between studies. Because methodology differs, it could be that the estimate is overstated because of the overabundance of shoddy studies. But even when controlling for everything you might think you can control for, the effects still remain, or so they claim; and the control variables did little to reduce it: It seems a very robust effect that shows up no matter how you slice the data.

As far as meta-analysis go, this is good, perhaps too good. I am reminded of Daryl Bem's now infamous meta-analysis on the possibility of some people being able to see the future. If the underlying literature is not good then the meta-analysis will yield biased estimates.

Critiques of DI

Sounding suspiciously good, I tried explicitly searching for criticism of DI.

One broad critique, more of a warning is that a substantial part of the literature on DI has been produced by people associated with the National Institute for Direct Instruction (NIDI), including the meta-analysis with which this article begins; that said the meta-analysis itself found no difference between the NIDI-sponsored studies and other studies.

Here's one professor of education, saying that sure, DI works at what it's designed to do, but, he argues, the price is an environment devoid of creativity, joy, and spontaneity. He does not give evidence for this, nor were these tested in the meta-analysis from before.

Alfie Kohn, an education researcher has an essay up with a critique of DI, starting with the Follow Through study. He also mentions that DI techniques have led, in some cases he cites, to students knowing well the material but being incapable of a deeper understanding or generalization.

Eppley & Dudley-Marling (2018) find the DI literature wanting. They look at work published between 2002 and 2013 and find that the literature is low quality, claiming that it doesn't work at all, except in a very limited number of cases, with a small effect. But they don't seem to quantify this, it's not a meta-analysis, nor do they comment on previous systematic reviews and meta-analysis that do find positive effects.

The What Works Clearinghouse reviewed 7 studies of Direct Instruction, out of which only one was considered good enough to include in their summary of evidence. They concluded that it has no effect. The study in question was an RCT with 164 students with learning disabilities and very low IQ (mean 76) at a pre-school level (mean age ~5).

As one might imagine, the NIDI people have a page dedicated to answer the above. Now, given that the WWC didn't review that many studies, and the study they did look at was a very atypical sample, I'm going to ignore this and look at more studies.

Comprehensive meta-analysis challenges Benjamin Bloom's famous "two sigma" claim…

Brief

Why it matters

Key details