>Recently there has been much speculation in the press over the difference(about 10 percentiles) between mean scores in California and Arizona and some other states between 8th and 9th grade on the Stanford 9 reading tests.
I'd like to offer my own views of why there would the dropoff between
8th and 9th
> grade reading scores on the Stanford 9 in selected states including
> California and Arizona.
>
> I've been saying what others are beginning to say: this has to represent
> flaws in the test since 9th graders have already lost some of the low
> achieving 8th graders through dropout, so, given equivelent tests, scores
> should be higher in 9th grade
>
> Let's look at two major flaws in the test which are likely to explaine
> that the norms are not equally applicable to all populations in all
> areas.
>
> 1. The test theory is flawed: Having normed the test on a sample of the
> population chosen to represent socioeconomic strata across 48 states
> (which 3 were left out since Dc is included?) does not mean that the
> test will yield the same distribution in every state in fact it should
> not since the point of using 48 states is that there are in fact
> population differences across states. If all states could be expected to
> show the same pattern of scores as all other then the test could be
> normed on a single state. SO the result of one group of states showing a
> mean substantially lower than the national means shows that the national
> means are not as applicable to those states. Logically, if the norming was
> done on a truly representative sample there ought to be an equal number
> of states that show a mean percentile above the national mean. If there
> isn't such a group than that's proof that the norming process itself was
> flawed and the norming group was not representative of the test population.
>
> Publishers of norm referenced tests warn in fine print that the test can
> not be used diagnostically to compare individual test-takers. By the
> same logic it can not be used to compare groups of different test
> takers. It is quite meaningless to look at mean scores in California and
> Arizona (themselves means across highly diverse populations) and compare
> them to more homgenous states like Maine and North Dakota.
>
> 2. The content flaw: In this case the seeming anomoly is in the
> so-called drop-off in mean scores in these states between 8th and 9th
> grades. That could only mean that the form of the test taken by the 8th
> graders is not equivelent to that taken by the 9th graders.
>
> >From what teachers have shared with me, I have learned there is a heavy emphasis in the
> high school forms of the Stanford 9 on vocabulary. Any test of vocabulary
> is a small sample of words from the vast lexicon of the English
> language, in all its many varieties and uses. I once had a graduate student do a
> study of the vocabulary section of the New York State Regents exam which
> all graduates of New York high schools are required to take. One could
> use the group of words in the sample to reconstruct the lifestyle of
> the young people scoring well on the test. To know those words they had
> to have had experiences characteristic of the suburban middle-class.
> That's not surprising- who decides, after all which words are important enough to be on the test. Test theory can show which items were hard or
> easy for the norming population as a whole. But it can't show which items were
> appropriate and which not for each state or each population. Nor can it
> be used to support the weight given on a reading test to vocabulary. The
> latter is an arbitrary decision based on the traditions and/or
> prejudices of the decision makers who developed the test.
>
> What's the lesson of all this? TOO MUCH IS BEING MADE DEPENDENT ON TEST
> RESULTS AND TOO MUCH CONFIDENCE IS BEING PLACED IN TESTS. It is wrong to base policy and curriculum decisions on any test, particularly nore referenced tests such as the Stanford 9.
>
> Brian street would say that the test is built on the view of reading as
> autonomous, independent of who is using it for what and in which social
> context. What these score anomolies are demonstrating is that it is
> reading is not autnomous.
> Ken Goodman
>
> Years ago I called for a moratorium on reading tests Here is one more
> example of failure of a major test publisher to produce a test that can
> do fairly what it claims to do.
> --
> Kenneth S. Goodman, Professor, Language, Reading & Culture
> 504 College of Education, University of Arizona, Tucson, AZ
> fax 520 7456895 phone 520 6217868
>
> These are mean times- and in the mean time
> We need to Learn to Live Under Water
--------------73729144D2FA761682B516C4
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-ID: <35E33308.51787DBD who-is-at u.arizona.edu>
Date: Tue, 25 Aug 1998 14:56:24 -0700
From: Ken Goodman <kgoodman who-is-at u.arizona.edu>
X-Mailer: Mozilla 4.04 [en] (Win95; U)
MIME-Version: 1.0
To: "McQuillan, Jeff" <jmcquillan who-is-at Exchange.FULLERTON.EDU>
CC: "'educator who-is-at ReadingForAll.org'" <educator@ReadingForAll.org>
Subject: Re: [Educator]The Mysterious Drop in 9th Grade Test Scores
References: <FB7CDF7E68C9D111B5D90060B06BFE20B42E30 who-is-at EXCH2.Fullerton.EDU>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
I'd like to say a few key things about the dropoff between 8th and 9th
grade reading scores on the Stanford 9 in selected states including
California and Arizona.
I've been saying what others are beginning to say: this has to represent
flaws in the test since 9th graders have already lost some of the low
achieving 8th graders through dropout, so, given equivelent test, scores
should be higher in 9th grade
Let's look at two major flaws in the test which is likely to explaine
that the norms are not equally applicable to all populations in all
areas.
1. The test theory is flawed: Having normed the test on a sample of the
population chosen to represent socioeconomic strata across 48 states
(which 3 were left out since Dc is included?) does not mean that the
test will yield the same distribution in every state in fact it should
not since the point of using 48 states is that there are in fact
population differences across states. If all states could be expected to
show the same pattern of scores as any other then the test could be
normed on a single state. SO the result of one group of states showing a
mean substantially lower than the national means shows that the national
means are not applicable to those states. Logically, if the norming was
done on a truly representative sample there ought to be an equal number
of states that show a mean percentile above the national mean. If there
isn't such a group than that's proof that the norming process itself was
flawed and the norming group was not representative of test population.
Publishers of norm referenced tests warn in fine print that the test can
not be used diagnostically to compare individual test-takers. By the
same logic it can not be used to compare groups of different test
takers. It is quite meaningless to look at mean scores in California and
Arizona (themselves means across highly diverse populations) and compare
them to more homgenous states like Maine and North Dakota.
2. The content flaw: In this case the seeming anomoly is in the
so-called drop-off in mean scores in these states between 8th and 9th
grades. That could only mean that the form of the test taken by the 8th
graders is not equivelent to that taken by the 9th graders.
>From what teachers have shared with there is a heavy emphasis in the
high schools forms of Stanford 9 on vocabulary. Any test of vocabulary
is a small sample of words from the vast lexicon of the English
language, in all its many varieties. I once had a graduate student do a
study of the vocabulary section of the New York State Regents exam which
all graduates of New York high schools are required to take. One could
use the group of words in the sample to reconstruct the lifestyle of
the young people scoring well on the test. To know those words they had
to have had experiences characteristic of the suburban middle-class.
That's not surprising- who decides, after all which words are important
enough to be on the test. Test theory can show which items were hard or
easy for the norming population. But it can't show whhich tiems were
appropriate and which not for each state or each population. Nor can it
be used to support the weight given on a reading test to vocabulary. The
latter is an arbitrary decision based on the traditions and/or
prejudices of the decision makers who developed the test.
What's the lesson of all this? TOO MUCH IS BEING MADE DEPENDENT ON TEST
RESULTS AND TOO MUCH CONFIDENCE IS BEING PLACED IN TESTS.
Brian street would say that the test is built on the view of reading as
autonomous, independent of who is using it for what and in which social
contgext. What these score anomolies are demonstrating is that it is
not.
Ken Goodman
Years ago I called for a moratorium on reading test> Here is one more
example of failure of a major test publisher to produce a test that can
do fairly what it claims to do.
-- Kenneth S. Goodman, Professor, Language, Reading & Culture 504 College of Education, University of Arizona, Tucson, AZ fax 520 7456895 phone 520 6217868These are mean times- and in the mean time We need to Learn to Live Under Water
--------------73729144D2FA761682B516C4--