How do u.s. students compared internationally




















The increase for each of the other social class groups was more than twice or three times the national average increase. This, too, is a composition effect, attributable to the fact that the TIMSS sample for the United States had a considerably worse social class distribution than the sample. For example, in , 22 percent of U.

This deterioration in the social class composition of the TIMSS sample from to is greater than the deterioration in the PISA social class distribution during a roughly similar period. The large TIMSS shifts could be attributable to a change in reporting behavior on the part of students asked to record books in the home, or to TIMSS sampling flaws—sampling an unrepresentatively large number of disadvantaged students and a correspondingly unrepresentatively small number of advantaged students in both England and the United States in , compared to As we discuss in greater detail below, one reason for caution about these data is that the increase in the share of U.

TIMSS test takers with few books in the home does not seem consistent with the stability of the share of NAEP test takers over the same period whose mothers had a high school education or less. Of the 84 percent of U. In , the proportion of the TIMSS sample reporting mothers with less than a high school degree increased to 15 percent, and in to 16 percent.

In contrast, in the NAEP sample, the proportion of test takers reporting that their mothers had less than a high school degree remained stable at about 10 percent in the same period. We were also able to estimate the proportion of those students. In the PISA samples, the proportion of test takers with mothers having less than a high school diploma was 9 percent in , 11 percent in , and 11 percent in Whatever the explanation, these implausible shifts in social class in the TIMSS over such a short period of time provide further reason to treat international test scores with considerable caution and to avoid making policy pronouncements based on superficial score comparisons.

Table 16 explores what we can learn from the participation of U. Because the other components of the United Kingdom Wales and Northern Ireland are not large enough to explain this difference and, in any event, are unlikely to have fewer disadvantaged students than England and Scotland , this discrepancy is unexplained. It is another reason to make us cautious about taking the results of these assessments too literally. For the United States and Canada, TIMSS sampled enough students to generate statistically reliable national results, but the samples were not large enough to generate results for individual Canadian provinces or U.

But in each social class group, Quebec students perform better than students in British Columbia, Ontario, and Canada overall. Upper-middle social class Group 4 students in the United States perform better than comparable social class students in Ontario. Without TIMSS data from other Canadian provinces, it is not possible to say with certainty where in Canada we should look to find the cause of this overall superior performance.

However, based on data we have, it is at least a possibility that for mathematics, the key can be found in Quebec. Within the United States, only Massachusetts and Minnesota participated separately and were also included in the overall U. It shows that students in each social class group in Minnesota outperformed comparable students in British Columbia and Ontario.

Minnesota performance, compared to that in British Columbia, was substantially better in all social class groups except for lower social class Group 2 students. Minnesota performance, compared to that in Ontario, was better in all social class groups, and substantially better for lower social class Group 2 and upper-middle social class Group 4 students.

Massachusetts students in almost every social class group perform substantially better than comparable social class students in the three Canadian provinces. The exceptions are the lowest social class Group 1 students in Quebec, who perform substantially better than comparable social class students in Massachusetts; upper-middle social class Group 4 students in Quebec, who perform better, but not substantially better, than comparable social class students in Massachusetts; and the lowest social class Group 1 students in Ontario and the lower social class Group 2 students in Quebec, both of whom perform about the same as comparable social class students in Massachusetts.

Minnesota and Massachusetts are relatively high per-capita-income states, with relatively low percentages of low-income minority students, so it might seem that the higher socioeconomic background of students in these states compared to that of the average U. But Table 17 shows that, except for the lowest social class Group 1 students in Quebec, students in Massachusetts and Minnesota perform about the same or better than comparable social class students in the three Canadian provinces.

Row 7 of Table 18 reweights the average scores, assuming that each province or state had a social class distribution that was similar to that of the United States nationwide. It shows that adjusting for social class composition makes almost no difference in the overall average scores of these provinces and states. The greatest difference is in the case of Massachusetts, where about one-quarter of its seeming superiority to that of the United States overall can be attributed to its more advantageous social class composition.

Thus, the superior overall performance of students in Massachusetts and Minnesota could be attributable in part to social class differences not identified by the books-in-the-home measure for example, disadvantaged students in Massachusetts and Minnesota may be less geographically concentrated than comparable students in the United States generally , or to better curriculum or instruction, or to other factors.

Table 19 displays these results. Table 19 displays state or country overall average mathematics scores in two ways. Row 6 shows the average score that would be reported based on the actual sample distribution of that state or country. Row 7 shows the state or country overall average score with a standardized social class distribution—in this case, as if each state or country had the same social class distribution as the United States.

With either calculation, Massachusetts and Minnesota outperform the three similar post-industrial countries, in some comparisons substantially. Table 19 also shows that Massachusetts students perform substantially better in mathematics than comparable social class students in almost every social class comparison with the three similar post-industrial countries. Upper-middle social class Group 4 students in Massachusetts perform better than comparable social class students in France, but not substantially better.

Disadvantaged social class Groups 1 and 2 students in Minnesota perform substantially better than comparable social class students in each of the similar post-industrial countries. Performance of middle social class students in Minnesota is more similar to the performance of these groups in the similar post-industrial countries; Minnesota middle social class Groups 3 and 4 students perform better than comparable social class students in the United Kingdom; upper-middle social class Group 4 students perform worse than comparable social class students in Germany; and lower-middle social class Group 3 students in France and Germany, and upper-middle social class students in France perform about the same as comparable social class students in Minnesota.

From — to —, average U. In this section, we explore what light it is possible to shed on U. Since , individual states have had the option to request a large enough sample to generate state-level results, and since , state-level sampling has been mandatory for all 50 states. As noted above, the LTT purports to assess a constant set of mathematical skills, while the Main NAEP purports to assess skills that reflect contemporary curriculum and expectations.

What this means in practice is that the LTT stresses only basic computational skills, while the Main NAEP has more emphasis on mathematical reasoning, including some constructed response items.

In reading, the LTT also purports to assess an unchanging set of more basic skills, while the Main NAEP purports to assess more inferential and interpretive reading skills. Table 20 displays the average reading and math scores for U. Although the standard deviation on each NAEP test and administration varies, in general it is about 32 scale points. Thus, it is apparent from Table 20 that the overall reading achievement of 8th-graders or year-olds nationwide is about the same as it was when these tests were first given.

In math, however, the story is different. The improvement on both tests has been substantial, with the average annual rate of improvement on the Main NAEP about twice as great as that on the LTT. While the unchanging performance in reading over this period is similar in each test, there were gains in math in each test, with the gains occurring at a considerably more rapid rate on the Main NAEP than on the LTT.

NAEP does, however, report data on student characteristics in several categories that generally indicate social class status. Students are eligible for this program if their family incomes are below percent of the federal poverty line. Although this income eligibility level varies by family size, for a family of four it is about 35 percent of the national median income.

Another indicator is parent educational level. NAEP collects data on both mother and father parent education levels. Another indicator is race and ethnicity. We do not claim that these indicators describe the same students who fall into Groups 1 and 2 on the BH measure in PISA and TIMSS, but only that students who have these characteristics are, on average, more disadvantaged than the average U.

Table 22 shows improvement in reading performance by more disadvantaged students, especially by African American students and especially on the Main NAEP assessment.

The relatively greater improvement in reading for more disadvantaged U. Figure E compares the U. Keep in mind in interpreting this and subsequent figures that increases or decreases of about 0. Figure F compares the U.

Although the LTT performance from to is flat, the direction of overall U. It was after that the share of disadvantaged students in the total test-taking pool began to increase.

It compares the U. As in reading, the collapse of U. PISA scores in does not seem to be replicated in any of the other tests we are considering. PISA math scores then collapsed further in Neither the to U. Yet from to , U. We know of no plausible explanation for these apparent trends; the most likely assumption is that the math curriculum assessed in PISA and PISA was not aligned with that assessed by the Main NAEP, but that in the alignment was improved.

We noted above that some U. Source: Harmon et al. North Carolina is worth further attention. When Secretary Duncan announced the TIMSS overall average results in December , he highlighted North Carolina as proving that demographically diverse states can outperform others. But further study is needed before concluding that the even more impressive TIMSS rate should be believed.

Table 24 summarizes what we have learned about U. For each test, and for each year of available data since , the table shows the average U.

In the case of the NAEP tests, these are African American students and students whose mothers did not graduate from high school. The next to last right-hand column displays the average annual rate of change in scores for the full period shown. The right-hand column displays the average annual rate of change in scores from the year closest to for which data are available to the year closest to for which data are available.

Overall, it seems that these tests provide consistent confirmation that U. For years when NAEP introduced testing with accommodations, scores shown are averages of results with and without accommodations. Such conflicting results suggest caution about drawing policy inferences without delving more deeply into what these tests measure. But beyond conflicting results among various evaluations of student learning, each test has its sampling peculiarities that can affect results.

Some of these sampling peculiarities, such as the oversampling of U. Other aspects of the tests, such as the greater tendency of students in some countries to random mark rather than leave answers blank, can also bias results in ways that we cannot estimate.

In most cases, it is not possible to re-estimate U. But we can adjust for the effect on scores of the unusually disadvantaged sample of U. We conclude that correcting for these two problems would improve the U. We have limited ability to make precise adjustments of international or interstate comparisons for these decisions, but we can show that they affect common judgments about relative national performance.

In this section, we review these various conflicts, flaws, and other possible biases in test results that suggest the need for caution in interpreting average national test score differences as valid measures of the comparative quality of U.

Rather, the test is given to a small sample, but one that statisticians deem large enough to be representative of all students. The larger the sample, the more representative it can be.

PISA, for example, constructed samples that were large enough for analysts to be confident of a 95 percent probability that results in the United States for reading are within about 7.

For each PISA test administration, it is necessary for each nation to determine a necessary sample size and then make a random selection of its year-olds. If the sampling process is flawed, the reported results can be quite inaccurate. The sampling methodology is complex, and the possibility of sampling flaws is another reason why results should be treated with caution. Sampling requires selecting schools that are large enough to have a sufficient number of year-olds and that seem to be representative of geographic regions; public and private schools; rural, suburban, and urban schools; schools with minority populations; and a few other characteristics.

Unfortunately, in a sampling flaw in the United States seems to have produced a PISA sample whose average score was lower than the average score would have been from an accurately representative U. In this respect, the sample seems representative. However, it is not sufficient to have a representative proportion of FRPL-eligible students in the overall sample, because we know that disadvantaged students perform more poorly if they attend schools where they are not integrated with more advantaged students and are instead heavily concentrated with other FRPL-eligible students.

These characteristics of high-poverty schools frequently result in lower achievement for students who attend such schools.

Students who attend schools where disadvantage is concentrated are likely to perform, on average, at considerably lower levels than students whose family income is similarly low but who attend schools where more students are middle class.

A sampled population that includes students eligible for FRPL who are dispersed across many schools will typically have higher average achievement than a similar sampled population with the same proportion of FRPL students but where these students are concentrated in fewer schools. Therefore, for an accurate sample, PISA should not only have a proportion of FRPL-eligible students that is similar to that proportion nationwide, but should have FRPL-eligible students whose distribution among schools with concentrated disadvantage is also similar to the distribution nationwide.

Table 25 compares the distribution of all U. Forty percent of the PISA sample was drawn from schools where half or more of the students were eligible for free or reduced-price lunches. Only 32 percent of all U. Likewise, students who attend schools where few students are FRPL-eligible, and whose scores tend, on average, to be higher, were undersampled. This oversampling of students who attend schools with high levels of poverty and undersampling of students from schools with less poverty results in artificially low PISA reports of national average scores.

We have queried officials at the National Center for Education Statistics in an attempt to determine why the PISA sample was skewed in this way, but while these officials acknowledge that there may be a sampling error, they have been unable to provide an explanation. One possibility is that the PISA sampling methodology excluded very small schools, where poverty is less likely to be concentrated. Another possibility is that because participation in PISA is voluntary on the part of schools and districts that are randomly selected for the sample, schools serving more affluent students may be more likely to decline to participate after being selected.

Perhaps this is because such schools are generally less supervised by the federal government than schools serving disadvantaged students and feel freer to decline government requests. Whatever the reason, an initial PISA sample that was representative would lose some validity if schools serving higher proportions of more affluent children were more likely to decline to cooperate, and were then replaced in the sample by schools serving lower proportions of affluent students.

An underestimation of national average scores is then bound to result. To get a sense of how much of an underestimate resulted, we recalculated the overall U.

For this recalculation, we assume that the average score of students attending schools in each category of FRPL participation is unchanged, but the proportion of such students is that of the nation as a whole, not that of the PISA sample. We find that with these assumptions, the U. Indeed, the effect of the sampling error is probably even greater, because 3 percent of schools nationwide do not report their FRPL percentages to the National Center for Education Statistics.

It is more likely that these schools are those without any FRPL-eligible students, because schools that do not participate in government programs are more likely to fail to comply with reporting requirements. If so, the missing data probably come from schools whose average scores are somewhat higher than those from schools that did report but that had few FRPL-eligible students in reading and in math, from row d of Table Then, the calculations in row g of Table 25 would yield averages that are higher than and Then we showed, for example see Tables 3B and 3D , that if the social class distribution of the U.

If we add the two social class adjustments together, one for the excess preponderance of disadvantaged students in the U. As noted above, these adjusted average scores may still be too low, because if disadvantaged students had been sampled accurately in schools with less concentrated disadvantage, the average scores of U.

But this consideration is offset by another: When we adjust the U. When the proportion of disadvantaged students decreases, the potential for bias in the average test score from oversampling in high-poverty schools also decreases, simply because the weight of disadvantaged students in the average national score is lower.

Thus, the adjustment we make for sampling error, and the adjustment we make for the proportion of disadvantaged students in the total sample, will overlap, but we cannot say to what extent. On balance, taking these two considerations together, we consider the adjusted reading and math scores of and to be plausible. As a not unreasonable speculative exercise, if we accept this adjusted average U. If we accept this adjusted average U.

In this report, we have focused only on the United States and six comparison countries. However, in discussions of PISA scores, the media and policymakers have frequently cited the fact that of all 34 OECD test-taking countries in , the United States ranked 14th in reading and 25th in math.

If we use our adjusted for social class composition and sampling error U. Yet unless our claim is seriously flawed that BH is the most reasonable proxy available for social class characteristics relevant to student academic performance, it is apparent that either TIMSS or PISA, or both, have failed to administer tests to accurate samples of national populations and that, therefore, the national average score results reported in TIMSS or PISA, or both, should not be taken as accurate.

We have already noted that differences in the social class composition of PISA test takers in reading vs. Table 2A showed the distribution of test takers in the United States and other countries by social class group. Table 8A showed how the distribution by social class in the United States and other countries changed from to Table 26 then calculates the change in social class composition for PISA mathematics test takers from to , and adds similar data for TIMSS from to , almost an identical period.

If books in the home is a reasonable proxy for social class characteristics relevant to student academic performance, then there are apparently flaws in the student samples to which either the TIMSS or PISA, or both, were administered.

According to PISA sampling, the share of students who were disadvantaged Groups 1 and 2 increased from to by 27 percent. According to TIMSS sampling, the share of the same students over almost the same time period increased by 70 percent. It is important to remember that these sampling inconsistencies do not call into question the accuracy of the average scores for each of the social class groups in either TIMSS or PISA.

They do, however, call into serious question the accuracy of the national average reported scores, and it is these scores to which policymakers and pundits direct such anguished attention. As noted, NAEP does not report books-in-the-home data for test takers. From to , the share of 8th-grade Main NAEP math test takers whose mothers had only a high school education or less declined from 34 to 30 percent i.

From to , the share of 8th-grade Main NAEP math test takers who participated in the free and reduced-price lunch program increased from 29 to 37 percent. We tend to think that the educational attainment of mothers is a more relevant for comparison with BH data factor than receipt of free or reduced-price lunches, so we suspect that the NAEP data are more consistent with PISA data that show the share of disadvantaged students increasing by 27 percent in this period than with TIMSS data that show this share increasing by 70 percent.

Or, perhaps, the score was erroneously high in , because it sampled too few lower-scoring disadvantaged students. Either of these errors would have dampened the report of a real increase in TIMSS scores from to From what we have seen so far, it is apparent that no single assessment accurately reflects student performance.

A look at anomalies from the scores of many other countries, not examined in detail in this report, provides further evidence. Students in Australia, Slovenia, and Norway performed better or substantially better than U.

Students in New Zealand performed substantially better than U. Other inconsistencies appear when we compare trends in U. As Figure G illustrates, U. But U. PISA mathematics scores made rapid gains from to , so the U.

But on the Main NAEP, 8th-grade math scores increased consistently from to , with the rate of increase more rapid in the first half of the decade, the very years when U.

PISA math scores were falling. PISA math scores were falling substantially and U. Main NAEP math scores were rising substantially. As discussed above, these inconsistencies could result from flaws in population sampling. If a test samples a larger proportion of disadvantaged students than is present in the national student population, it could erroneously report national average performance that is lower than another test with a more accurate sample.

Yet even if population samples were accurate in both tests, they could report inconsistent average performance because of a different kind of sampling problem—inconsistencies in curriculum sampling, i. No test of an hour or so can assess every topic or skill in the curriculum; test designers must make judgments about which topics or skills to include, and what emphasis each should be given. A possible explanation for the inconsistencies between the tests discussed in this report could be that each assessment samples different aspects of the mathematics or reading curriculum.

Yet the actual data from these tests illustrated in Figure G challenge this conventional description of curricular test differences. But they are not, or at least not consistently. Indeed, U. Commonplace explanations of why tests can differ so much in their reports of student performance are not persuasive. For example, some U. But this does not seem to be the case. As Table 27 shows, the share of the TIMSS 8th-grade test devoted to geometry increased by two-thirds from to , while the share devoted to algebra increased only by one-third.

Teachers continually prepare students for high-stakes tests and depend heavily on worksheets and drills. South Korean families depend heavily on private tutoring to help their children perform well on high-stakes tests. But if Finland, Singapore and South Korea are all doing better than we are, that suggests there may be a factor at play other than how we teach. And indeed there is something that all three of these nations, and every other country that outranks the United States on the PISA test, have in common: lower rates of child poverty.

And poverty is a major factor in how well students perform on the tests. Though the United States is by most measures a wealthy country, it is one with many poor people. A Unicef repo rt looked at the relative child poverty rates of 41 well-off nations. The United States ranked seventh from the bottom. A study by Stanford University researchers found that the U. A analysis by the nonprofit organization Turnaround for Children, which studies the effects of various traumas on children, found that in U.

When poverty equates to lower academic performance, we perpetuate that poverty from one generation to the next. In the U. Department of Education. Canada, for example, has a substantial minority group East Asians , but no data on such Asians as compared with Caucasians. The U. Therefore, it is possible that the overall scores for Canada are enhanced by its East Asian minority population. Another area where Finland is homogeneous is in school funding.

Finland has not. One of the most interesting new trends in international comparisons is the effort by some policy groups to compare individual states — rather than the United States as a whole — with other countries. This is seen as a way to pressure state governments to improve education.

It also highlights the discrepancy in education that exists within the U. The National Governors Association, the Council of Chief State School Officers, and Achieve, an education advocacy organization, are researching ways to compare states with other countries to tease out information on best practices and global competitiveness.

The first such study linked the U. The study analyzed the scores of eighth-grade American students in standardized tests given by the U. Department of Education in and , and compared them with their peers in 45 countries. The top rank in the world is held by a group of four provinces within China Beijing, Shanghai, Jiangsu and Zhejiang.

It ranks 13th out of the 79 countries and regions, according to the PISA scores in reading. As with math, U. Those results, released in October , also found that U. The international PISA test is taken by older students, year-olds, every three years. The majority of U. Amid the long-term stagnation, there is an important change to note. Inequality is growing. Peggy Carr, associate commissioner of the National Center for Education Statistics NCES points out that both exams are showing a widening achievement gap between high- and low-performing students.

One in five American year-olds, 19 percent, scored so low on the PISA test that they had difficulty with basic aspects of reading, such as identifying the main ideas in a text of moderate length.

Related: U. But the inequality story is a nuanced one. Part of the inequality is between schools with students at wealthier schools posting much higher test scores than students at schools with large numbers of disadvantaged students. But the vast majority of educational inequality in America is inside each school, according to the PISA test score report.

Statisticians mathematically teased out inequality between schools versus within each school and found that, in the U.

The remaining 80 percent is inside each school. Imagine five schools, each with 10 students. Students in the first school come from the poorest families and students in the fifth school come from the wealthiest. The other three schools lie between the two extremes. If you calculate the average test score for the 10 students in each school, you would see that the average test score for each rises with wealth. In the U.



0コメント

  • 1000 / 1000