Problems of Test-based School Ratings

by Craig Bolon
(February 21. 2001)


It has been known for many years that scores on tests of mental skills and knowledge strongly reflect social advantages and disadvantages. In the 1920s labor unions protested use of the newly developed IQ tests to place students in school programs or "tracks." Studies in the 1950s and again in the 1970s showed that scores on Scholastic Aptitude Tests, as they were then called, correlated closely with incomes of students’ households. These scores help give students from upper-income households preferential access to selective colleges.

Unless intentional discrimination can be proved, US courts do not regard such characteristics as actionable. In a widely cited 1976 decision, the Supreme Court upheld a vocabulary test used by the District of Columbia Police Department as a qualifying exam. The failure rate of black applicants on this test was more than four times that of white applicants, and the District of Columbia presented little evidence of relevance to police duties.

Mental skills testing is applied with impunity to sort the supposedly qualified or competent from the supposedly unqualified or incompetent. It probably tends to promote class stratification wherever it is used. Public approval of mental skills testing remains strong because hardly anyone calls attention to its deficiencies. Although class bias in mental skills testing has been demonstrated for decades, most US writers and reporters do not seem to know or care about this.

School-based testing of mental skills has a long history of manipulation. For example, from 1844 through 1846 Horace Mann and his allies on the Boston school committee tried to embarrass Boston schools by giving complicated written tests to eighth-grade students. Students in the then separate and prosperous town of Roxbury did much better than poorer Boston students. Samuel Gridley Howe, Mann’s most vocal supporter, claimed these results justified "radical reform." What Boston eventually got was centralized control. Some latter-day enthusiasts for Mann ignore his background as an agile politician who received much of his backing from mill owners.

Since the 1983 "Nation at Risk" philippic from the Reagan administration, school-based mental skills testing has become a US growth industry. Executives of large business corporations, who have dominated public school boards for decades, see opportunities to enhance their control and increase their profits, using tests as social weapons. They have several related interests:

In 1998 Massachusetts began state-sponsored, annual achievement testing of all students in three public school grades. It has created a school and district rating system for which scores on these tests are the sole factor. Michigan began the first such state-sponsored testing program in 1969. By the time Massachusetts began a program, 48 other states also conducted some form of state-sponsored student testing, with test scores disclosed to the public by school or by district. Many state education agencies, newspapers and private organizations have devised schemes of using state-sponsored test scores to rate and rank schools like the teams in a sports league.

Massachusetts is distinguished mainly by the public disclosure it provides. Massachusetts and Texas have consistently published all questions used in test scoring several months after tests are administered. In addition Massachusetts publishes, via the Internet, databases of test results, technical reports and audited financial and student information submitted by schools and school districts. While these fail to provide a complete picture, such sources make it possible to study some aspects of the Massachusetts program without private access to data. Although similar studies for other states might be more difficult, there is no reason to suppose that patterns found in the Massachusetts program would be remarkably different from those elsewhere.

As in other states, Massachusetts and its newspapers and organizations review and publish test scores as though they were nearly exact, highly significant measures of school performance. A detailed analysis of results from tenth grade mathematics tests for academic high schools in metropolitan Boston showed that statistically they are not. Two factors describing characteristics of student populations explained about 80 percent of the variance among schools:

Once contributions from these factors had been subtracted from test score averages for schools, the residual scores appeared to be statistical noise. With few exceptions, the differences among schools and the changes from year to year looked nearly random—educationally useless.

In the study mentioned, the correlation between test scores and incomes was so strong that it completely displaced other potential factors, including:

The relationship between incomes and test scores found in this study was linear, not just some threshold of poverty. At all levels, test scores tended to increase in direct proportion to incomes. Because of the limitations in data published for Massachusetts, this study examined only school-averaged scores and factors for schools and districts, not those for individual students.

For the Massachusetts program, comparisons of average tenth grade mathematics test scores among schools are mainly comparisons of income levels, a contest that rich neighborhoods and communities will win at the expense of poor neighborhoods and communities. When one looks for large changes in test scores, a similar pattern emerges. In 1999 Swampscott, a high-income suburb, increased its average tenth grade mathematics score by 16 points, within a total score range of only 80 points. The next year Boston Latin Academy, an elite school with entrance exams, increased its average score by 18 points. Once again the advantaged schools prosper under such a system, while the disadvantaged ones suffer by comparison.

Some liberal critics of state-sponsored testing in Massachusetts have proposed to replace the current system with different tests, such as commercial "achievement" tests or tests that use "authentic," "performance" or "portfolio" methods. Those liberals have rarely if ever conducted critical evaluations, in the context of state-sponsored test programs, for the measures they are now promoting. Other observers have found such measures, as used in state-sponsored testing, to show class bias similar to the current system. There is no credible evidence that changing test techniques can somehow cure the problems of state-sponsored student testing. Testing has become a cult, sanctimoniously promoted by large business corporations to advance their own interests.


 Craig Bolon is President of Planwright Systems of Brookline, MA, a software development firm.