Always Formative: cst

Showing posts with label cst. Show all posts

Thursday, April 25, 2013

Shuffle Quiz

We're in the midst of CST prep. I am bored. CST review is this weird game of picking out specific skills/topics not because they're important so much as they can be quickly recalled and practiced back to fluency.

Right now I'm leaning heavily on shuffle quizzes for my skills practice.

The basic idea is the students get a set of problems and work on them together. At spaced intervals a group member raises his/her hand and I come over. I take their papers and shuffle them up. Whoever gets their paper pulled gets asked the questions for the group. I ask the questions at the bottom and sign off when they can move on.

The only thing I did differently in these examples is that instead of everyone working the same problem, each member in a group of four was assigned a specific problem. (seat 1 did problem 1, seat 2 problem 2, etc). They split the big whiteboard into quarters and worked the problems. They took turns explaining and checking and then call me over. If this answers aren't correct I let them know and come back later. When they are working different problems I usually just ask for one problem to be explained but it's never their own.

Some use notes:

These questions are pretty plain vanilla to emulate the glory of the CST but I've used this strategy for more interesting questions. The more difficult the problem, the more students are assigned. When I went to see Complex Instruction at Mission HS, the teachers used this strategy for nearly all of the group work.
It's a bit of an art to balance how many consecutive problems students should try before I need to be called over. Too many problems and students go too long without checking in. Too few and I can't sit with any one group long enough and other groups are just waiting around. For reference, my typical class is 8 groups of 4. For the density/Archimedes one the pacing ended up a bit quick but just barely. Next year I'd probably eliminate the first checkpoint because the first two sets are straight plug and chug but keep the last check point. For the graphing practice it was about right.
Since this is review, I tried to cluster them into similar problem types.
This year I used A/B/C/Redo but in previous years I've done a sign off with no score or plus/check/minus. I don't have a preference. Generally I just tell them I'll come back later if a student clearly isn't prepared.
I've signed each paper and also just signed the one and had students staple them all together. Again, no preference.

Here are some sample pics:

Notes: Graph A is the top right from this angle, Graph B is the top left, and Graph C is the bottom right. For the FBD scenario, we watched the clip on youtube earlier.

I don't know why speed-time graphs are so much more difficult for my students. I assume it's because I present position-time graphs first and they get locked in. Or maybe it's just easier to visualize changing position than changing speed. If you have any insights I'd love to hear them.

Thursday, September 8, 2011

Information about the California Standards Test Part 2b

Part 1
Part 2

I haven't said anything about the students. This belongs in bold:

Do not make any placement decisions solely based on test scores.

I know you do it. We do too. In fact, denying students an elective based on their state test scores is one of the joys of being in Program Improvement. But the state of California, perhaps in word but not deed, agrees. From the Post-Test Guide....and I quote:

Any comparison of groups between years should not be used for diagnostic, placement, or promotion or retention purposes. Decisions about promotion, retention, placement, or eligibility for special programs may use or include STAR Program results only in conjunction with multiple other measures including, but not limited to, locally administered tests, teacher recommendations, and grades. (page 13)

and then on page 14 they give almost the exact same quote directed towards individual students in a bolded and boxed callout:

Decisions about promotion, retention, placement, or eligibility for special programs may use or include CST or CMA results only in conjunction with multiple other measures including, but not limited to, locally administered tests, teacher recommendations, and grades.

WAIT!!! One more because this bears repeating (also on page 13)

While there may be a valid comparison to be made between students within a grade and content area, it is not valid to subtract a student’s or class’s scale score received one year in a given content area from the scale score received the previous year in the same content area in order to show growth. While the scale scores may look the same, they are independently scaled so that differences for the same students across years cannot be calculated using basic subtraction.

Summary: You can't use test scores as the sole method for placement AND you can't use them to determine growth year to year. Wait....but what if a student scores Proficient one year and Basic the next? Shouldn't we put him in a second math class and catch him up? Noooooooooo. Here's the easiest example why.

2nd grade - 62%
3rd grade - 65%
4th grade - 68%
5th grade - 60%
6th grade - 52%

What's this? This is the percent of students in California proficient in math in 2010.

Either: California sticks all of its best teachers in 4th grade. (unlikely). Approaching puberty makes students universally worse at standardized tests (plausible). The test is harder (that's what this study argues).

Here's the percentage of kids scoring Advanced:

2nd grade - 36%
3rd grade - 38%
4th grade - 42%
5th grade - 29%
6th grade - 23%

(7th and 8th grade we start tracking the kids more. At many schools, advanced 7th graders end up in Algebra and then Geometry in 8th. Some kids even take Algebra 2. A direct comparison is harder).

So a student drops from Proficient to Basic. Why? Hard to say. The test looks like it certainly got harder. With standard error and such he could have really been Basic last year. That's not to mention the inanity of deciding an entire year of coursework based on one 60ish question multiple choice test taken 80% of the way into the year. The test wasn't designed to make those kinds of inferences and you're left with too much doubt about the cause of the drop in score (or whether there even truly was a drop).

I didn't include anything about useful information about students because there isn't any. Well, there almost isn't any. You could probably tease out some information in conjunction with local assessments but really, what's the point? Right now I can tell you that Alondra got 9/13 correct in the "Functions and Rational Expressions" strand of her 7th grade math test. In order to figure out exactly what gaps she had, I need to give her a local diagnostic anyway. And that's really the whole point. If Alondra got an F in the class, I'm more concerned about that than her scoring Proficient. But for some reason, we think that because she scored Proficient on a single day of the year, that's more important than the other 181 days she was floundering. We let a single test override an entire year.

The hours that your school is going to spend dealing with this stuff could be spent doing something useful. You know, like standards-based grading.

Yeah I know. I came back to it. But really, isn't the complete and utter nonsense that is our current grading system a major reason for needing a standardized test in the first place? Right now we look and say, "Oh noes! Jason is basic this year! We've got to get him in extra math!" We have to do that because a B in Mr. X's class is not the same thing as a B in Mrs. Y's class. We don't trust that Jason getting a B in math means anything at all. But really, if we trusted the grades we could say, wait, Jason learned X, Y, and Z. He's fine. Or he still needs to learn Z but that's not going to require us to take him out of art or music the entire year.

If there's a moral here it's that we try to make the CST more than it is. The state of California does an awful job of communicating what the state test really does. I don't really think it's in the state's best interest to do a good job though. If more parents/students really knew what decisions were improperly being made off these score, the state would have a lot of 'splainin' to do. I also think from our end, because we have to devote so much time and energy towards it, we feel the need to justify that.

I've gotten some valuable information out of the test. I've revamped a couple of units. I've gone and visited a few really good science teachers and programs. That's fine. But you're not going to get much more than that. Invest your time and energy into something more meaningful. Oh and never place kids solely based on test scores. Just don't.

Full disclosure: I'm not against standardized testing. I find it useful to have a third party to calibrate my class against. I live in absolute terror that I'm teaching down to my kids. It helps me set a baseline. What I am against is the high stakes part. Even disregarding the punitive aspects of the test, the high stakes nature (and subsequent fear of cheating) has caused California to hide any useful information we might get out of these things that take up so much of our time.

Wednesday, September 7, 2011

Information about the California Standards Test Part 2

Part 1

(You have my permission to skip this post. It's not all that interesting unless you're curious about the lengths we have to go to in order to extract any useful information out of the CSTs)

In the previous post I wrote about the most common ways I see our state test scores interpreted. You can't make a straight comparison of API year to year and you can't use it to track student growth from year to year. The quote from page 6 of the technical guide:

When comparing scale score results from the CSTs, the reviewer is limited to comparing results only within the same content area and grade.

The hard part is they're not *really* designed as anything more than a school report card. (Poorly designed, that is)

In this post we're going to see the two pieces of useful information I've been able to extract from the CSTs. Prepare to be underwhelmed!

You can't compare across grade or content, but you can compare districts/schools/teachers to each other as long as its the same grade/content/year. Basically, you're down to comparing teachers who teach the same content at your own school and comparing departments in different schools.

We don't get item analysis or even standard analysis. The smallest grain we get is by strand. For science, there's six. If I'm reading the Algebra test right, you get a whopping four strands. You'd think you could compare how your students did year to year by strand, but alas, California doesn't norm the strands either so you have no way of knowing if your students improved or if the strand just got easier. You can compare the percentage of students who were proficient in that year's strand to the mean California percentage. I've found it moderately useful to look at the difference between the two and look for big gaps. The problem of course is that California won't release enough detailed strand information for you to really tell what's going on. So my kids perform poorly in "Number Properties" or "World History and Geography." Lots of help there. You might as well tell me to "teach better" and expect results. Oh wait. That's exactly what happens.

I have, however, gotten good results from comparing strand information between teachers at your school (or district if you're lucky). The key to this one is acknowledging that every teacher's class makeup is different. At my school, we send certain types of kids to certain teachers. Even if your school does scheduling randomly, chances are in any one year those classes aren't going to be balanced between teachers. If you acknowledge that upfront, you're more likely to get honest conversation rather than people being defensive about their test scores.

Here's what ours looked like. We mostly tracked each other. However, the blue teacher seemed to teach the Solar System section better. I'd probably also say the blue teacher performed a bit worse on the Chemical Reactions section. So that's a good start to a conversation. "Hey Blue Teacher, take us through your Solar System unit." This is pretty fast and worth it.

The other useful information I've gotten is comparing schools. Why compare schools? I visit a couple of schools each year to go observe. An easy place to start is with those schools with excellent science scores. This is a tricky one because demographics are so huge here. I've got three years of test scores for the other middle schools in San Jose and the percentage of students on free and reduced lunch and percentage of students who are English learners both correlate about -0.9 to every 8th grade test score. 1.0 would be a perfect line and 0 is a random scattering. 0.9 is really really close to a perfect line.

I've gotten some good information out of graphing that relationship (science score vs. free and reduced and vs. %ELL) and looked at who stuck out. I visited 2 of those schools and exchanged a few emails with another. Unfortunately, it takes awhile to gather that information and it's not all in the same place. You can cheat a bit by going to the CDE website and looking at what schools are listed as the similar 100 schools to yours. If nothing else, it's pretty interesting what the CDE considers "similar." Spoiler alert: They don't seem very similar to me at all.

A quicker way is to just gather a few of the other subject area tests and compare them. Again, you're just looking for a break in a trend. In 8th grade, you can take the History, Science, and ELA scores. Math is trickier because some kids take Alg, some take Gen Math, and some take Geometry (and a very few take Alg II).¹

The Red Line shows the state averages. The Yellow Line shows the county averages. You can see they track each other almost perfectly. Light blue is my school. Nearly every school I've looked at is reasonably close to that nice straight line. Yeah, we're a little better in science than expected but before I start bragging check out the purple line. They kick butt in science (also shown when I graph vs % free and reduced). The light green school? If I'm a social studies teacher I'm going to go visit that school. As a side note, that light green school actually has a population that's 100% on free and reduced lunch and close to that for % ELL. Every other score across the entire school lines up along expected values. If I'm the principal of that school I'm going into the history classes and figuring out what the heck they're doing and doing whatever I can to spread that. If I was a history teacher at a different school, I'd spend some series time observing those classes.

(wait! This got too long. I'm going to split this up and talk about student placement in post 2b)

1: If you wanted to, you could do what California assumes and bump the Gen Math kids down a level (only count Advanced as Proficient) and then maybe take the Geo kids and bump them up a level. You might get a good approximation.

Saturday, August 13, 2011

Information about the California Standards Test Part 1

I was going to do a post on questioning routines but I got distracted by a Twitter convo with David Cox and Jennifer Borgioli. It was about how the results of the California Standards Test (CST) can be used. This information is specifically for my California peeps.

Most of the information comes from the technical report. It's scary to look at but it's mostly skippable data tables so it's not a terrible read. There's also the API Information Guide. I'll try to remember to cite when I can but if something seems wonky, call me on it and I'll verify.

Part 1 I'll explain the basics of test construction and API results.
Part 2 I'll discuss the few useful pieces of information I've been able to extract from test results.
I don't think there will be a part 3 but if I get enough questions I'll see what I can do.

If you don't feel like reading, skip to "What are valid score comparisons?" That's the part you'll want to know. The more appropriate heading would be, "What aren't valid score comparisons?"

How is API calculated?

There are adjustments ~~for certain populations~~ (need to look into this more. Adjustments may just be in order to norm base/growth years. Edit again: The population adjustments are just for finding Similar Schools.), but it basically comes down to a straight mean. Your kids either score Advanced, Proficient, Basic, Below Basic, or Far Below Basic. Advanced and Proficient are good. The rest are bad. Advanced earns 1000 points, Proficient 875, Basic 700, Below Basic 500, and Far Below Basic 200. As far as I know, the only weird thing is a student not taking Algebra in 8th grade (the General Math test) get bumped down a level in API points. So if she scored Advanced in the General Math test, she would earn 875 points for the school. (Edit: A ninth grader taking the General Math test gets bumped down two levels) Additionally, 7th graders do not get a bump for taking Algebra in 7th and the same applies for 8th graders in Geometry. After that, each test is weighted and the mean is calculated. The CAPA and CMA follow the same weighting rules as the CST. (edit: added)

From the API info guide (page 6):

In high school, the CAHSEE (our exit exam) is also factored in. The arithmetically minded may notice the large drop-off in points from Below Basic to Far Below Basic.

There is an excel spreadsheet to help you estimate your API.

How is the test constructed and scored?

It's a lot. I'll give you the highlights. Tara Richerson has an excellent series on test construction and she's got actual experience at it. Pay attention to how they're anchored to the previous year's test.

There are two things that really interested me. The first was how cut scores for the different levels of proficiency were created. I'm just going to snip and let you read. From the technical report (257):

www.cde.ca.gov/ta/tg/sr/documents/csttechrpt2010.pdf

The Modified Angoff is used for the ELA tests and the Bookmark Method for the rest. Nutshell: A panel reads the questions and estimates what a barely proficient/basic/etc person would get right. Then the median of the panelists is taken. This becomes the cut scores. Science uses the Bookmark Method, so they put the questions from easiest to hardest. Someone then says, "I think a barely proficient person would get it right up to this question about 2/3 of the time and miss the ones after about 2/3 of the time." A bunch of those people are asked and the median becomes the cut scores. ELA works basically the same way except they rate each question and the cut score is computed based on the score. The cut scores and all raw scores are then matched to a table to align the scale scores from year to year (actually they only really align in two-year pairs). This isn't useful to know at all, but I just find it really interesting.

The second thing I'm pointing out is actually useful. Based on the test results, CA has generated proficiency level descriptors. If I recall correctly, these were generated based on a few years of test results and so are supposed to be things that, for example, a Proficient science student actually knows. These are useful, especially for those of us who need to decide on the level of depth for our standards. It's located here and the good stuff starts in appendix A. 8th grade science starts at A-102. Here's an example:

www.cde.ca.gov/ta/tg/sr/documents/pldreport.pdf

What are valid score comparisons?

There are two main ways people (teachers, parents, admin, everyone) mess this up. People think you can compare scale score from year to year and that you can compare API scores year to year. You can't do either. This is crucial to understand.

In the example that got this started, Student A got a perfect 600 in 7th grade and a 550 in 8th grade. It's a natural question to ask why the student dropped from 7th to 8th. You can't though. California does not vertically align its scores. A 550 in one year has no relation to a 550 the other. Additionally, a 550 in the same year has no relation to a 550 in a different content area. You CANNOT make this comparison.

Horse's mouth (Technical Report, 6):

You are fine comparing the same year/content to other classes, schools, districts, the state. Anything else, and I mean anything else, isn't valid. Jennifer tweeted this link out earlier. If you take a look at the graph you'll see certain test cut scores are harder than others. MS math scores will be lower than elementary scores because our tests are harder to score proficient on.

If you go back up to how the test cut scores are created, you'll noticed they're defined for "Proficient Algebra student" or "Proficient Science Student." They are not scored based on growth from the previous year. Some states do that.¹

The API results are similarly misleading. You'd think you can just look at your school's API each year and see if it goes up. Turns out, you can't. That's because how the API is calculated varies each year, for example the weights of different tests and which tests are included. So a 2006 API score can't be compared to 2011. It makes sense when you think about it but it's completely unintuitive and everybody in the entire world thinks you can create a line graph and see how your school is doing.

You CAN compare between base and growth APIs. These will be matched (page 14 of the Info Guide)

www.cde.ca.gov/ta/ac/ap/documents/infoguide10.pdf

and you CAN compare the growth from year to year. Take the Growth API and subtract the Base API. Also on page 14.

Repeating myself in case you missed it: Base and Growth scores in the same cycle compare different year's test scores with the same calculation method. If they are not in the same cycle, they could (and likely do) use a different method for calculation. It is not valid to compare API scores in different cycles.

Does anyone know this? No. Everyone, understandably, compares scores year to year. This is important to know though because if your scores take a dip, it might be because the calculation methods have changed. For example, until the 2010-2011 cycle, high school APIs didn't include the CMA, which is the modified test usually taken by students in SDC.

Summary: You can compare your student scores only within the same grade level and content area. You can compare API scores within cycles (Base to Growth) and you can compare growth between cycles. THAT. IS. IT.

In part 2, I'll write about what useful (for me) information you can get out of it.

1: Vertically aligned scores usually come in 2 flavors. Either the same score indicates the same equivalent level. If you score a 550 one year and 550 another, that means you made the equivalent of one year of growth. Or the score is like the "Reading Level" reports we get and all students are scored on the same scale. One year you get a 400 and the next a 520. You've made 120 points of growth that year. Smarter states will make one year equal 100 points so you can easily see if you made a year of growth.

Thursday, February 10, 2011

Make it Right

We've got about 6 weeks before our state tests so I just sent out an email to the other two teachers in my department detailing a list of standards they need to teach and how those standards will be assessed. I had to say things like, "Skip inertia because it is not tested." I'm feeling a little dirty right now. On the plus side, it reminded me of one of my favorite test prep techniques.

I call it Make it Right but I'm sure it goes by other names. The instructions are simple. After selecting the correct answer the students need to rewrite the question so that the other choices now become correct. Example:

After selecting A as the right answer, they then redraw the problem three times to make the answer correct for B, C, and D.

This year, with my spanking new whiteboards, I'll just assign each kid a letter and have him/her take that letter for every question (Student 1 is responsible for all the A responses, Student 2 for all the B responses, etc). We can quarter the whiteboards and have the kids share around their four-person group.

Sometimes it gets a little weird like in this one:

There really isn't a situation (at least that would ever come up in our level of science) where anything but C would be correct. In this case I just ask them to justify why A, B, and D can't ever be the correct answer.

My only other guide is that they should keep the spirit of the question intact. Thus for:

I want them to try to adjust the numbers in the table, rather than, for example, write an F=ma problem where the answer is 13 N.

Even for the plug and chug questions, this exercise takes a good amount of understanding and number sense. I don't feel bad about doing it and when asked I can say that I've been test prepping so it's a win-win.

Any other test prep suggestions are welcome.

Disclaimer - I'm not a big believer in test prep, but my school is. We're a "persistently low-achieving school" according to various governing bodies so I don't fault them for feeling the squeeze. One of the things the higher ups in my district think might be helpful (editor's note:no comment) is to print out a copy of the released test questions [pdf] for every CST a student will be taking. So every 8th grader has 4 packets (math, ELA, science, social studies) crammed into the bottom of his or her backpack. Since my school spends 4 digits on these copies (editor's note:again, no comment) I feel I should use them somehow. They could have at least thrown me a bone by removing the multiple choice answers but alas......

Always Formative

Pages