Tuesday, June 26, 2007

TftF 77: High Stakes Testing


Part 1: Introduction

In a world where viable careers in manufacturing and service are being exported out of the United States, American students are having to compete more with students form other countries. Not since the Cold War has the education of our children (big or small) has been so important for the social and economic security of our nation. But now in an effort to track our students' progress we are teaching our students to the test? With education initiative like NCLB or the MCAS here in Massachusetts we are spending more class time preparing and examining our students.

While some argue that high stakes testing is a way for politicians to tout their tough stance on education others argue that this is an important way to assess our student's progress. In today's episode we will explore the issues surrounding high stakes testing and their effects on education.

Part 2: Assessment in the Media

  • Elementary and middle school MCAS scores flat for second year from the Boston Globe: From the article: "People tend to concentrate on high school results but it's clear to me that we need to also focus on our middle and elementary schools," Driscoll said this morning in a written statement. "Any sign of a decline in the lower grades is concerning, because it gets more difficult to catch up as students get older." -Dave LaMorte
  • MCAS proponents demand reform from BostonNow: Former proponents of the MCAS such as Martin Kaplan, former chairman of the Massachusetts BoE, and Frank Hadu, former Massachusetts Ed Commissioner, are going on record as saying that the "MCAS test is off track from its original purpose" -Dave LaMorte

Part 3: Interview

I spoke with Dr. Louis Volante, a professor in the Education Department at Brock University in Hamilton, Ontario. Dr. Volante is a noted critic of
high stakes testing and analyzes test-based accountability models in
various industrialized countries. He spoke to me of some of the misconceptions and some of the dangers of high stakes testing.


Transcript:
Host: Welcome to Teaching for the Future, where education and technology collide.

While some argue that this is just a way for politicians to tap the success of the educational departments under them, others argue that High-Stakes Testing is an important part of assessing the progress of our students.

On this episode of Teaching for the Future we’ll talk a little bit about High-Stakes Testing; what are the benefits, what are the goals and what are some of the problems that come up.

I don’t want to spend too much time on No Child Left Behind only because I feel like that issue itself really deserves its own podcast or own series of podcast to explain all of the nuances of the precedent and the goals of the actual Act itself, but I would be doing you a disservice by saying that High-Stakes Testing has not been influenced by the inaction of the No Child Left Behind Act in 2001.

For clarity’s sake we should probably say that Standardized Testing is a way of judging students’ understanding and learning based on a single standard that all of the students would be held against. A single metric that all administrators can sort of measure by. What makes the testing attached to the No Child Left Behind Act high-stakes is that if students don’t progress or students don’t reach certain metric, districts and teachers and administrators can actually be fired or suffer for that.

The high-stakes also affect the students because many students aren’t able to graduate even though they maybe proficient in the actual course work. They need to pass these tests to graduate and to move on to higher education or to move on really with their diploma. The goal of High-Stakes Testing is to foster student achievement for the most part, at least here in Boston there have been mixed results.

In a Boston Globe article from October, 2006 entitled ‘Elementary School and Middle School MCAS Scores fall flat for a second year’. There has been a lot of evidence that a lot of these changes haven’t actually created the results that educators were actually looking for.

Even more recently a lot of proponents of the MCAS, the No Child Left Behind High-Stakes Testing here in Boston have claimed that the test has pretty much gotten off track. I point you to a Boston NOW article entitled ‘MCAS proponent demands reform’. Former Chairman of the Massachusetts Board of Education, Martin Kaplan, is one of the proponents saying that the test has really lost focus on its original goal.

Though the High-Stakes Testing may not all be doom and gloom, I will point you to a blog post, Moving at the Speed of Creativity. EdTech blogger, Wesley Fryer, discusses some of the positive outcomes of the No Child Left Behind Act and a lot of this is High-Stakes Testing. With reports that say that math and reading scores are actually up, although Wesley Fryer believes, at least states in this article that he feel that maybe it’s a result of teaching to the test. Furthermore, Fryer actually goes on to talk about how the High-Stakes Testing may not be the best metric to test student achievement.

But to better explain some of the nuances of High-Stakes Testing and Standardized Testing, I spoke with Dr. Louis Volante, an Assistant Professor and the Faculty of Education at Brock University in Hamilton, Ontario.

Louis Volante: See, I don’t necessarily have a problem with using Standardized Test to look at things like student achievement, what I have a problem with is not acknowledging the limitations in those tests and putting them both as sort of like the defining measure of student achievement. So, even when you look at commonly tested areas; reading, writing, mathematics, science, those are commonly tested subject areas.

So, let’s take a look at something just like literacy, for instance. Literacy, if you ask someone, well, what makes a literate person? A literate person knows how to basically read, write, speak and listen, there’s four domains.

Host: Right.

Louis Volante: But if Standardized Test can only really look at two of those domains; reading and writing, and even within those domains they’re somewhat constricted in what they can look at, so it’s not surprising then that most State Assessment Systems have an overemphasis on things like multiple choice, because money is an issue.

If you’re going to look at open-ended responses where students actually write more extended responses to a probe, a lead question, that costs money to mark, it costs a lot more money to mark that than it does to look at just feeding a bubble sheet through a feeder and getting a set of responses. But when you’re looking at multiple choice responses, for instance, that’s only looking at recognition, it’s not recall, so you’re only tapping into one type of student learning.

So, when you think of all these different issues; what kind of skill you’re looking at, how reliable and valid the test is, which is another really important issue that often doesn’t get enough attention. You have to be able to acknowledge the fact that there are limitations in holding schools, teachers, and even students accountable based on a limited measure, in my mind is not the appropriate approach to take.

Host: You’re referring more to High-Stakes Testing?

Louis Volante: I’m referring to High-Stakes, yes. So, Standardized Achievement Tests, for instance, that are used as graduation requirements in a state or province -- for instance, my own province has the graduation requirement that you complete a literacy test in order to graduate from secondary school, but we’ve seen since that graduation test has been brought into effect that the high school completion rate has dropped about six percentage points from about 77% to 71%. Six percentage points with the student population of around 200,000 is about 12,000 more students that are not completing high school in which they were supposed to complete high school, but that’s Ontario.

The US doesn’t show anything significantly different as well, they’re reporting the exact same kind of concerns. So, for instance, you have students being held back a grade, 30-40-50% more the year prior to a High-Stakes Test, so what’s that saying? It’s basically telling us that students are consciously, deliberately, being held back because the prospect of them actually passing that test the following year presumably goes up if they have that one extra year to prepare.

So, those are some of the more insidious and some of the more difficult. So, in Ontario in Canada we have a somewhat different system where the stakes are higher for students and lower for teachers. In the US the stakes are high for all primary stakeholders; students, teachers, administrators, though it’s not surprising that now we see a litany of problems as a result of High-Stakes Standardized Achievement Testing. But if you look at it from an international point of view there are jurisdictions, there are countries that use Standardized Achievement Test effectively, but they don’t use them for high-stakes decisions, which is the ironical part of all of this.

Host: I know you talked a little bit before about how we’re not really sure exactly how to measure the quality of these tests. Are you saying using them as a metric of student achievement or the actual level of assessment that the test can achieve?

Louis Volante: Well, okay, let’s be clear. First of all, there are two basic types of Standardized Achievement Test. There is Norm-Referenced, which is basically based on a Norm sample. So, if you -- you know what an IQ test looks like essentially and you get your IQ, it’s based on a norm sample. So, an average IQ is a 100. A Standardized Achievement Test works the same way. They give it to a representative sample and then they can say, for instance, a child who is eight years old is really not the age level of a child who is 7.4 years of age, so they’re below where they should be. Conversely, the child’s eight years old but they’re reading at the level of a child that’s 9.2 years old. So, that’s an Norm reference sample.

What we see more of is the Criterion-Referenced Tests, which basically suggests that all students can be successful because it’s against a set standard.

So, within a Norm-Reference sample, by it’s very nature when you looked at a Norm distribution, half of the students will be below average and half will be above, think of the bell curve. Whereas with a Criterion-Referenced test it’s still theoretically possible for all of those students to reach that level. So, if the state’s standard say the state has five levels of proficiency in a level one to five, and the state standard is level three, it’s possible for 100% of the student population to get to level three.

So, those are the two main differences, but the question is, is well, when you use a Criterion-Referenced test you still have to have very good reliability and validity to those tests if you’re going to use them for important decision making purposes.

Reliability, I mean there’s different types of reliability, but if that test is taken -- essentially what it means is if that student takes that test on a different day, at a different point in time, will they get a somewhat similar result? If they do then it’s a reliable measure of student achievement, but that doesn’t necessarily mean it’s a valid one, and here’s the crux of the issue. I mean the test could be, two, three, four students can get similar results over time, but that doesn’t necessarily mean that it’s going to be a good predictor of how well they are in that particular subject area.

So, for instance, in language, arts, how well do student’s marks -- student’s achievement levels on a Standardized Achievement Test line up with their classroom based grades. So, they might get very low or very high marks, so you can overemphasize or underemphasize how well a student does based on that one measure at one point in time. The analogy that I often use is if you go to a doctor and he takes your blood pressure, well it’s likely that he won’t give you a pharmaceutical drug in order to correct what might be high blood pressure based on one reading, right?

Host: Right.

Louis Volante: He’s going to take your -- he’s going to use a variety of different factors to determine whether you have high or low blood pressure.

In the same way in schools we need to look at a variety of different types of information before we can make sleeping statements about individual students, schools and even districts. Unfortunately, No Child Left Behind has put a really heavy emphasis on doing well on particular test.

I guess the other issue that we need to consider in terms of reliability and validity is if you’re teaching to the test then you’re really eroding the predictive validity of it. You know what I mean by teaching to the test, I’m talking about having mock examinations with sample test from previous years. There’s a number of individuals that would argue that’s an unethical and educationally indefensible approach to approaching Standardized Achievement Testing.

So, that’s a separate issue as well, and we know this when we look at research across the US, Canada, Australia, the UK, New Zealand to a certain extent, the higher the stakes of the test go the more you are going to see teaching to the tests, and other types of abuses. What would be another example of abuse is sending a student home the day of a test or queuing them to the correct response, I mean there is a lot of research in the US that actually talks about this sort of thing happening.

So, my argument is that the higher the stakes of the test the greater the teaching to the test and an inverse relationship for the test reliability and validity, that the validity of the test goes down precisely because you are teaching to the test.

Host: Right.

Louis Volante: Those are few of the issues. I mean I haven’t hit all of the issues, there’s certainly a lot more that go into that; things like test-wiseness which tend to favor particular groups of students and particular groups of students from certain ethnical and cultural backgrounds.

Host: Right, where certain students are just more familiar with certain words, word groups, or even certain topics in other words.

Louis Volante: Absolutely, absolutely, certain students are always going to do better on test than others. I mean I teach, I coordinate assessment and evaluation at my university and I do a number of exercises with the students, and I’ve shown them, for instance, how it’s possible to presumably pass a multiple choice test without knowing anything related to the that subject area just by test design, for instance.

So, a very simple thing to do is if you’re not sure go with the longest answer response. The irony is I had a student bring in a Standardized Achievement Test and present some of those items in class. As she was presenting them the students were going with the longest response, and it was funny because the longest response was correct four out of the five times, and it was a test on Physics which none of the students had any background in.

So, I mean those types of skills need to be also considered and those types of design principles need to be considered. I would not say that Standardized Tests haven’t been designed appropriately, but what I would say is certain students have much more capacity with test writing than others.

When you teach to the test it obviously makes more sense that certain students are going to do better than others, but unfortunately, we have to consider well, what’s the kickback or the negative implications for doing that sort of thing? Well, when you teach to the test that means presumably other parts of the curriculum gets squeezed out, and we know that as well, the research is fairly clear.

I think the RAND Corporation which is a nonprofit organization in the US and they do a lot of good work looking at test based accountability systems, just came out with a report fairly recently and looked at the impact of No Child Left Behind in three states. What they found is that those non-tested subject areas were indeed squeezed out of the curriculum. So, prior to a High-Stakes Test teacher would spend less time on non-tested subject areas; music, physical education, visual arts, drama, social science, etcetera, you get the picture.

Host: Is there any serious discussion about possibly going to like a portfolio sort of approach?

Louis Volante: Well, I mean there has been this discussion, a lot of people have advocated for it. You have the School Redesign Network out of Stanford and Linda Darling-Hammond has done some work and she looked at every state, looked at all 50 states and the 38 states that showed improvements in tests scores were ones that used multiple measures. Well, multiple measures meaning not just Standardized Achievement Test, but more authentic based types of assessments.

So, the research is clear on this that when you use different types of assessment approaches within a classroom students benefit, not only in terms of student learning and achievement, but teachers as well because it broadens their assessment repertoire. A lot of teachers already use those approaches, but if you’re sending conflicting messages to teachers by saying okay, we want you to use all this different types of assessment approaches, but at the end of the year, at the end of the school year, the only one that really matters in terms of where you fit within, where your school fits within a district, where that district fits within a state, and maybe even merit pay tied to test scores which has been introduced in parts of the US, it’s sending a conflicting message because it’s saying yes, this is valuable, but in the end it really doesn’t count for much.

Host: Right.

Louis Volante: Because their job depends on it, the school depends on it. Schools can be reconstituted based on poor test performance. Where you start the race determines where you would finish it. So, some students are starting a 100-Meter Dash race with their blocks 10 meters into the race and others are starting from behind the start line. So, some kids come to school already knowing how to read and write, because of socioeconomic and other cultural issues that need to be considered.

I know there has been some work looking at controlling for those and what we would consider extraneous variables, so the single best predictor of performance on Standardized Achievement Test to this day is still SES, it’s above 40%. A significant part is teachers as well and how they teach to those students, but we still have to be able to control for those -- for such a huge amount of variance when we look at interpreting those results and tying implications to those results, implications such as school funding, such as bonus pay, etcetera.

Host: It also seems that -- I feel like Standardize Testing is also a very political issue. It seems often that it’s coming not from the school systems themselves but the government, the wider government, so is this really…

Louis Volante: Absolutely, when you look at the impetus for a lot of this educational reforms they’re not bottom-up it’s top-down and it’s driven by those with positions of power. I wouldn’t necessarily dismiss an educational reforms just because it’s one that’s top-down driven, what I would say is that it will never be successful if it’s not embraced by those that are directly affected in practice. What I mean by that are teachers.

There was just a poll that came out that was published by Educational Testing Services, ETS, which is your largest testing body in the US. There was a number of important findings from their survey, but one of the key findings was something like 77% of educators still had serious concerns with No Child Left Behind. No significant large scale reform can be successful if you don’t have teachers and administrators on board. That doesn’t necessarily mean that they have to wholeheartedly agree with every single aspect of a significant reform proposal, but they have to have some input and design of it and they have to be on board for the major aspects of it. The major aspects of No Child Left Behind are the testing that takes place from grade three to eight and AYP, Adequate Yearly Progress, and all those sorts of things.

I guess what I would like to see is a little bit more attention to broader notions of student learning and achievement, because I think what we’re focusing right now on is student performance, not necessarily student learning.

Host: Right

Louis Volante: What I mean by that is that a student’s test score doesn’t necessarily represent how much they’ve learned. Smith & Fay in the US have shown some research that if you teach a certain way, teaching to the test that is, for instance, the school can look half a year better than a comparable school that didn’t adopt those types of test practices. Now, what are we looking at here? Are we really looking at student learning or are we looking at student performance? Sometimes those two are one and the same, but sometimes they are not.

So, I don’t see a lot of methodological sophistication around looking at all these sorts of issues when it comes to large scale assessment, but I would definitely want to say one thing that I do support the use of large scale assessment as a way to promote school improvement. My main concern is with the way those test scores are being utilized right now and reported to the public, and that’s the thing that I fundamentally disagree with. For instance, rank ordering of schools based on Raw test scores is a no-no in my mind. It’s a no-no not only from an ethical point of view but even from a measurement point of view. When you factor in measurement error those Raw score differences essentially disappear for a lot of schools.

Host: Right, exactly.

Louis Volante: The best way I can explain this to you is if you put all of the basketball players on the San Antonio Spurs up against the wall, and you said, I want you to organize yourself from shortest to tallest. Then you basically take the shortest person and you say, your rank ordering is one and the tallest person is at a rank ordering of 99th percentile. 99th percentile, why? Because 99% of the people on his team are shorter than he is. Do they reflect the average height of the average person in North America? Absolutely not, but when you rank order people from lowest to highest in that manner you’re accentuating what could potentially be small differences between them. You see my point?

Host: Yeah, exactly, some of it’s statistically not significant at all.

Louis Volante: It always produces predictable winners and losers. I mean whenever you rank order you have someone from lowest to highest, there is always going to be predictable winners and losers. So, for instance, if there’s 20 schools in a district and your rank ordering is nine, you’re below average, and if your rank ordering is 19th you’re above average, but it’s possible if you take that same school and throw that school into another district we can reverse the trend. The school that was 19th is 10th and the one that’s 10th becomes 19th. So, always looking at ordinal rankings has a way of pitting schools against schools. If you do that then how are you supposed to promote collegiality, not only amongst schools, but within schools.

If I am teaching in a grade six classroom and another teacher is teaching in a grade six classroom, is there any benefit to me sharing resources with that teacher for competing for a bonus pay based on test scores? I mean how is that supposed to promote sharing. Now, I know there has been some experimenting around instead of individual teachers we’re going to give bonus pay based on teams of teachers, but I still think it fundamentally implies that in order to improve schools we have to use the same type of model that we use in the business world in terms of improving productivity. Schools and products are not one and the same and they don’t operate according to the same principles.

Host: Now, I know I’ve said a lot today but I would love to hear from you. If you have any comments, questions or concerns, please feel free to leave me a comment or email me at teachingforthefuture@gmail.com. If you want to help produce the next show you can go over to teachingforthefuture.pbwiki.com and there’s also a link at the home page at teachingforthefuture.com. I’d like to thank my guest Dr. Louis Volante as well as Aaron Smith and Whitney Hoffman for their help with the research for this podcast. Thanks a lot and please stay subscribed.

Total Duration: 26 Minutes


Thanks to Aaron Smith and Whitney Hoffman for their help with the research for this episode.

If you want to help out or participate with Teaching for the Future you can leave a comment on the homepage or link to us on your blog or podcast. If you want to get in touch, feel free to email at teachingforthefuture@gmail.com.

Tags:, , , , ,

StumbleUpon Toolbar Stumble It!

Tuesday, June 19, 2007

TftF 76: Teaching for the Future 1.9

Mp3

News:

São Paulo: The City That Said No To Advertising
from Buisnessweek.com: I has been a year since the No Advertising initiative in São Paulo was signed into law and it seems to be working out well. The lack of ads has allowed residents to really appreciate the city in a way they were never able to do ever before. I wish Boston or New York would try something like this, imagine what we might be missing.

Superintendent’s Speech Stirs Talk of Plagiarism NYTimes.com: The now former Superintendent of Schools in Fort Lee, NJ was caught stealing her speech to the National Honors Society. Not only did she swipe her speech from the web, but she swiped it from About.com. To many of you this may be evidence of how easy it is for students to steal and plagiarize from the web, but all I can think about is how any high school student worth his or her salt would have at least changed some of the speech.

If nothing else I hope the students in the National Honors Society learned the importance of being honest and original in their work.

PFT #92 from Podcast for Teachers: I heard about the above story from the New York Times from the Podcast for Teachers, hosted and produced by Marc Gura and Kathy King. They were both flabbergasted by the fact that someone in the education could make such a mistake. However there was one thing that I found irritating. Dr.King was amazed that not only did the administrator take the content and present it as her own, but she was upset that the administrator ignored About.com's User Agreement. According to Dr.King the user agreement clearly stated that the user needs to ask permission to use content.

The user agreement does in fact state that "You agree not to modify, reproduce, retransmit, distribute, disseminate, sell, publish, broadcast or circulate any such material without the written permission of About.com or the appropriate affiliate."

And that's when my ears really perked up. I can't believe that any website could realistically expect for others to ask permission to quote form their publicly available material. I think it is wrong to reproduce whole articles, but content creators need to be open for other content creators to borrow and share ideas.

I do not plan on writing About.com for permission to quote their user agreement, but I did let Mark Gura and Kathy King know that I was using a clip from their show as a courtesy to them.

Upcoming:
Teaching for the Future 2.0 is on the way. Take a look at the wiki and try to get involved.

If you want to help out or participate with Teaching for the Future you can leave a comment on the homepage or link to us on your blog or podcast. If you want to get in touch, feel free to email at teachingforthefuture@gmail.com.

Tags:, , , , , , , ,

StumbleUpon Toolbar Stumble It!

Saturday, June 02, 2007

TftF 75: Wiki for the Future!

NEWS:
Honk for peace' case tests limits on free speech from the San Francisco Chronicle: Deboarrah Mayer is a former elementary school teacher who was fired for expressing her views about the war in Iraq in the classroo
m. It turns out that teachers are not protected by the first amendment in the classroom, and this article also gives other examples of teachers who have been fired.

Trashed: Much left behind at colleges from Boston.com: College students are leaving more stuff behind at school than ever before as they graduate or move back home. I'm going to have to go to my local college and see what I can find.

iTunes U is a Glimpse into the Future of Higher Ed from the Financial Aid Podcast: Chris Penn pointed out the new iTunes U. A lot of great higher education content available free on iTunes.

BRAND NEW!
Check out the new wiki where you can participate and help create future episodes of Teaching for the Future. You can leave links to stories, leave your comments, and help produce the show.

If you want to help out or participate with Teaching for the Future you can leave a comment on the homepage or go over to our wiki at teachingforthefuture.pbwiki.com. If you want to get in touch, feel free to email at teachingforthefuture@gmail.com.

Tags:, , , , , , , , ,

StumbleUpon Toolbar Stumble It!

Add to iTunes RSS and XML Feed Add to MySpace Facebook Group

Contact Me


Email: teachingforthefuture@gmail.com

AIM: davelamorte




Last posts


Ads













Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 2.5 License.