The Future of Testing

The Future of Testing

Given how much the rest of education has changed since the middle of the 20th century, it’s remarkable that the model of large-scale student assessment we have today still looks pretty much the way it did back in the nineteen-fifties: a group of kids under careful watch, lined up in rows of seats in a rigidly controlled space, all asked the same questions, each silently bubbling in answer sheets under the same strict time limits.

To be sure, new technologies have been incorporated into standardized testing over the decades: machine scoring, computerized authoring and delivery, computer adaptive testing, technology-enhanced items, automated essay scoring, automated item generation. But these innovations—not all of them widespread; it’s still very much a paper-and-pencil world in most US schools—haven’t really changed the underlying testing paradigm. Whether computer- or paper-based, the tests still are comprised mostly of multiple-choice questions. They still require highly contrived and regimented conditions for administration. They still make use of the same measurement principles and techniques. They still are governed by the values of 20th-century industrialization: speed, uniformity, cost efficiency, quantifiability, and mass production.

This model of testing persists precisely because it so effectively delivers on machine-age promises of reliability and efficiency at a large scale. But these benefits come at a cost: the fixed content and rigid testing conditions severely constrain the skills and knowledge that can be assessed. A battery of multiple-choice and short-answer questions on a timed test may do a pretty good job of evaluating basic skills, but it falls short of measuring many of the most essential academic competencies: sustained engagement, invention, planning, collaboration, experimenting and revising, spit-shining a finished project—the kind of skills developed through authentic, substantive educational experiences.

Standardized testing has not kept up with advances in learning science. It ignores, for example, the non-cognitive skills that research today tells us are integral to learning—personal resilience, for example, or a willingness to cooperate. What’s more, we acknowledge today that students develop their academic competencies, cognitive and non-cognitive, in particular educational contexts in which their own varied interests, backgrounds, identities, and languages are brought to bear as valued resources. Conventional standardized tests work to neutralize the impact of these variables, rather than incorporate them.

We do need, and will continue to need, large-scale assessments, despite the many dissatisfactions we may have with them at present. Classroom assessment by itself doesn’t tell us what we need to know about student performance at the state or national level. Without large-scale assessment, we’re blind to differences among subgroups and regions, and thus cannot make fully informed decisions about who needs more help, where best to put resources, which efforts are working and which aren’t.

The central problem to address, then, is how to get an accurate assessment of a fuller range of authentic academic competencies in a way that is educative, timely, affordable, and scalable—a tall order indeed. Recognizing the limitations of the existing testing paradigm, the Every Student Succeeds Act (ESSA) of 2015 opened the door for a limited number of states to try out alternative models that might eventually replace existing accountability tests. Thanks in part to this opportunity, plus ever-advancing technologies, new ideas are in the works.

Here are some directions in which the future of testing may be headed:

Classroom-Based Evidence. The assessment of authentic classroom work can provide a fuller and more genuine portrait of student abilities than we get from the snapshot view afforded by timed multiple-choice-based tests. Indeed, portfolio assessments are widely used in a variety of contexts, from individual courses to district-level graduation requirements. Historically, however, they haven’t worked well at scale. Experiments with large-scale portfolio assessment in the 1990s were abandoned as they proved cumbersome and expensive, and as states found it difficult to establish comparability across schools and districts.

Hopes for using collections of authentic student evidence in large-scale assessments are being revived, however, as ESSA creates new opportunities for state-level change. The anti-standardized testing group, FairTest, has developed a model to help guide state system innovations toward local assessment of classroom-based evidence. The model folds teacher-evaluated, student-focused extended projects into a statewide accountability system with built-in checks for quality and comparability. FairTest cites programs already underway in New Hampshire and elsewhere as evidence of where this approach might lead.

The FairTest model doesn’t rely on new technologies, but large-scale portfolio assessment potentially becomes more feasible today, compared with the low-tech version in the nineties, thanks to easier digitization, cheaper storage, and ubiquitous connectivity. More than mere repositories for uploaded student work, platforms today can combine creation and social interaction spaces with advanced data analytics. This creates opportunities for assessing new constructs (research, or collaborative problem-solving, for example), gaining new insights into student competencies (e.g. social skills), and even automating some dimensions of portfolio assessment to make it faster and more affordable. Scholar, a social knowledge platform currently in use in higher education, provides a glimpse into the kind of environment in which large-scale e-portfolio assessment might someday take root.

Real-World Fidelity. Another shortcoming of multiple-choice based standardized tests is that that they do not present students with authentic contexts in which to demonstrate their knowledge and skills. More authentic tasks, critics argue, better elicit the actual skills associated with the constructs measured, and thus lead to more valid test score interpretations.

Computer-based tests create opportunities for item types that more closely resemble real-world activities, compared with traditional multiple-choice questions. Technology-enhanced items (TEIs) can, for example, allow students to manipulate digital objects, highlight text, show their calculations, or respond to multimedia sources. While such items fall short of replicating real-world activities, they do represent a step beyond selecting an answer from a list and filling in a bubble sheet.

Many computer-based versions of standardized tests now add TEIs to the mix of conventional items in hopes of measuring a broader range of skills and improving test validity. In truth, however, TEIs bring their own set of test development challenges. Though eager to use them, test makers at this point do not know very much about what a given TEI might measure beyond a conventional multiple-choice question, if anything. Additionally, in their quest for greater real-world fidelity, TEIs can at the same time introduce a new layer of measurement interference, requiring examinees not only to demonstrate their academic ability, but also to master novel test item formats and response actions.

Despite their current limitations, however, technology-enhanced items will likely continue pushing standardized testing toward greater real-world fidelity, particularly as they grow more adept at simulating authentic problems and interactions, and better at providing test takers with opportunities to devise and exercise their own problem-solving strategies. The latest iteration of the PISA test, a large-scale international assessment, simulates student-to-student interaction to gauge test takers’ collaborative problem-solving skills. Future versions will connect real students with one another in real time.

Continuous Assessment. As tests evolve toward truer representations of real-world tasks, they will likely pick up a trick or two from computer-based games, such as Mars Generation One: Argubot Academy or Physics Playground. These games, like many others, immerse students in complex problem-solving activities. To the extent that conventional test-makers learn likewise to engage students in absorbing tasks, they will better succeed at eliciting the kinds of performances that accurately reflect students’ capabilities. When tasks lack relevance and authenticity they work against students’ ability to demonstrate their best work.

In addition to engaging their interest, computer-based educational games can continuously assess students’ performances without interrupting their learning. The games register a student’s success at accomplishing a task; but more than that, they can capture behind-the-scenes data that reveal, for example, how persistent or creative the student was in finding a solution.

As they develop, platforms delivering academic instruction might also automatically assess some dimensions of authentic student performance as it happens, without interrupting learning activities. The Assessment and Teaching of 21st Century Skills project, from the University of Melbourne, provides an example of how an academic platform can capture log stream and chat stream data to model and evaluate student activity. This kind of “stealth assessment” creates opportunities for including non-cognitive competencies—e.g., level of effort, willingness to contribute—in the overall picture of a student’s abilities.

Inclusion. To achieve statistical reliability, conventional standardized tests demand rigorously uniform test-taker experiences. Accordingly, the tests have always had a hard time accommodating examinees with special needs. Education today, however, persistently leads away from uniformity, towards greater inclusion and accommodation of the whole community of learners, including those with various physical, learning, and language differences.

Computer-based testing presents both opportunities and challenges for accessibility. On one hand, special tools, such as magnifiers and glosses, can be built into standard items. On the other, TEI formats using color, interactivity, response actions requiring fine motor skills, and other features can be difficult or impossible for some test takers. Nevertheless, research suggests that, overall, the digital testing environment can improve access to testing for students with disabilities.

Among the challenges to inclusivity in US testing is the problem of evaluating students who are learning English against standards that assume they already have English language skills. According to Professor Alida Anderson of American University, this problem highlights the need for future assessment systems to be more flexible, not only in the design and delivery of test content, but also in the interpretation and use of standards. Towards that end, programs such as the New York State Bilingual Common Core Initiative are developing bilingual standards and learning progressions that align with English language-based standards frameworks. These efforts promise a fairer and more accurate interpretation of test results for more students.

My own company, BetterRhetor, is combining some of the innovations discussed above in an effort to overcome the limitations of conventional testing (see our long-term vision here). Our web-based platform, for use in classrooms, will deliver five-week instructional modules in Writing and STEM. Assessment of student performance is facilitated by the platform and integrated into instruction. The modules will teach, elicit, capture, and assess not only cognitive skills, but also social and personal competencies. Because students engage over an extended period, we’ll be able to supply actionable feedback, as well as indications of progress. Our overall goal is to provide teachers and schools with a highly effective instructional resource that generates a rich portrait of their students’ authentic abilities.

These kinds of innovation will likely require parallel innovations in measurement science if they are to take hold in large-scale assessment. Test reliability, for instance, might be reframed in terms of negotiated interpretations by panels of local stakeholders, instead of statistical correlations among test scores. Determinations of validity may need to consider how well a test elicits fair and authentic performances from the full complement of learners in an educational community. Comparability across schools and districts may need to take into account the degree to which an assessment supports not just institutional needs but also student learning.

Ideally, future forms of large-scale assessment will function as integral dimensions of learning itself, rather than interruptions or intrusions. They’ll both evaluate and reinforce the full array of knowledge and skills required for the successful completion of real academic work in real educational contexts.

© 2018 BetterRhetor Resources LLC

(This post was originally written for and published on Getting Smart.)


Why Are Standardized Tests So Boring?: A Sensitive Subject

Why Are Standardized Tests So Boring?: A Sensitive Subject

It is a guiding principle in test development that stimulus materials and test questions should not upset test-takers. Much like dinner conversation with in-laws, tests should refrain from referencing religion, or sex, or race, or politics—anything that might provoke a heightened emotional response that could interfere with test-takers’ ability to give their best effort.

Attention to “sensitivity” concerns, as they’re known, makes sense conceptually. But in practice, as they shape actual test development, sensitivity concerns are responsible for much of why conventional standardized tests are so ridiculously bland and unengaging. The drive to avoid potentially sensitive content constrains test developers to such a degree that one might legitimately question whether the cure is at least as bad as the disease.

So determined are test-makers to avoid triggering unwanted test-taker emotions, they end up compromising the validity of their tests by excluding essential educational content and restricting students’ opportunities to demonstrate the creative and critical thinking skills they’re actually capable of. In other words, ironically, conventional standardized tests may be so radically boring that they’re no better at measuring actual ability and achievement than if they regularly froze test-takers solid with depictions of graphic horror.

Actually, no one knows for certain if the tests are better or worse for being so cautious. There is no research defining sensitivity, no evidence-based catalog of topics to avoid, no study measuring the test-taking effects of “sensitive” content. For all anyone knows, inflaming emotions might actually improve test results—though few test-makers would risk experimenting to find out.

No test-maker wants to hear from a teacher or parent that a student was stunned, enraged, offended, or even mildly disconcerted by content they encountered on a test. And in fairness, no test-maker wants to subject a test-taking kid to a hurtful or upsetting experience. They’re captives, after all; if something on the test makes them feel crappy, they have little choice but to sit there and absorb it. Their scores may or may not reflect the fact that their emotions were triggered: there’s really no way to tell.

On the other hand, high-stakes standardized tests, in and of themselves, trigger lots of negative emotions in plenty of kids, regardless of question content. So a cynic might wonder how much sensitivity concerns are driven by concern for kids’ experience, and how much by fear of the PR nightmares that would ensue from a question or passage that someone could claim was racially or religiously offensive. Whatever the case, the result is the same: keep it safe by keeping it bland.

Since there is no research to guide decisions on sensitivity, the rules test-makers set for themselves are based strictly on their own judgment, and on some sense of industry practice. Inevitably they default to the most conservative positions possible: if a topic might conceivably be construed as sensitive, that’s enough reason to keep it off the test.

Typically, sensitivity guidelines steer test developers away from content focused on age, disability, gender, race, ethnicity, or sexual orientation. Test-makers also avoid subjects they deem inherently combustible, such as drugs and drinking, death and disease, religion and the occult, sexuality, current politics, race relations, and violence.

A “bias review” process gets applied in the course of developing passages and questions for testing, to weed out anything that might be offensive or unfair to certain subgroups—typically African Americans, Asian Americans, Latinos, Women, sometimes Native Americans. The test-maker will send prospective test materials out for review by qualified educators who belong to these subgroups. If a reviewer thinks a test item is problematic, it gets tossed. Though this process is better than nothing, it reflects more butt-covering than enlightenment, putting test-maker and reviewer alike in the awkward position of saying, for instance, “These test items are not unfair to Black people. How do we know? We had a Black person look at them!”

Judgments on topics not pertaining to identity and cultural difference rest purely on the test makers, who, as mentioned, are as risk-averse as can be. In one example I’m familiar with, a passage about the mythological Greek figure Eurydice was rejected because the story deals with death and the underworld. Think of all the literature and art excluded from testing on that kind of criteria. Think of the impoverished portrait of human achievement and lived experience conveyed to students by such an exclusion.

In another case, a passage on ants was rejected because it reported that males get booted out of the colony and die shortly after mating. I’m still not clear on whether the basis for that judgment centered on the reference to insects mating, insects dying, or the prospect of a student projecting insect gender relations onto human relations and being thereby too disturbed to think clearly. Whatever the case, rejecting such a passage on the basis of sensitivity concerns seems downright anti-science.

As does the elimination of references to hurricanes and floods because some kids might have experienced them. I remember a wonderful literary passage that depicted a kid watching his family’s possessions float around the basement when their neighborhood flooded. It was intended for high schoolers. It got the noose.

I’ve seen a pair of passages from Booker T. Washington and W. E. B. DuBois nixed out of concern for racial sensitivity: you can’t have African Americans arguing with each other on questions of race. Test-makers strive to include people of color in their test content to satisfy requirements for cultural inclusivity. But those people of color cannot be engaged in the experience of being people of color —which renders the whole impulse toward inclusivity hollow and cynical. Such an over-abundance of caution does more to protect the test-maker than the student.

The content validity of educational assessments that cannot reference slavery, evolution, extreme weather events, natural life cycles, economic inequality, illness, and other such potentially sensitive topics should come under serious interrogation. More concerning still is the prospect of such tests driving curriculum. With school funding and teacher accountability riding on standardized test scores, teaching to the test makes irresistibly practical sense in many educational contexts. Thus, if the tests avoid great swaths of history, science, and literature, then so will curriculum.

The makers of the standardized tests schoolkids encounter argue that they are not interested in censoring educational content, only in recognizing that when students encounter potentially sensitive topics they need the presence of an adult to guide them through. The classroom and the dinner table are places for negotiating challenging subjects, not the testing environment, where kids are under pressure and on their own.

This rationale should rouse everyone to question why we continue to tolerate such artificial conditions for evaluating student learning. It essentially concedes that testing doesn’t align with curriculum, that kids will not be assessed on the things they’re taught—only on the things test-makers decide are safe enough to put in front of them. Further, it admits that test-makers compromise the content validity of their tests in deference to the highly contrived testing conditions they require. Surely we can recognize in this the severe design flaws that lie at the heart of the testing problem.

Obviously, insulting or traumatizing students with test content is something to be avoided. But at the same time, studies show that test-taker engagement is essential for eliciting the kinds of performances that accurately reflect students’ capabilities. When tasks lack relevance and authenticity they work against students’ ability to demonstrate their best work, especially students from underserved populations. Consider this statement:

Engagement is strongly related to student performance on assessment tasks, especially for students who have been typically less advantaged in school settings (e.g. English Language Learners, students of historically marginalized backgrounds) (Arbuthnot, 2011; Darling-Hammond et al., 2008; Walkington, 2013). In the traditional assessment paradigm, however, engagement has not been a goal of testing, and concerns about equity have focused on issues of bias and accessibility. A common tactic to avoid bias has been to create highly decontextualized items. Unfortunately, this has come at the cost of decreasing students’ opportunities to create meaning in the task as well as their motivation to cognitively invest in the task, thereby undermining students’ opportunities to adequately demonstrate their knowledge and skills.

In my own experience interviewing high schoolers about writing prompts, they want to write about Mexican rappers, violence in videogames, representations of gender and race in popular culture, football concussions, gun ownership, the double-standard dress codes schools impose on girls compared with boys, and other topics that are both authentic and relevant to them. Conventional standardized tests would not come near topics like these.

Any solution to this problem has to entail breaking away from the dominant, procrustean model of standardized test-taking, which isolates individual students from all resources and people, asks them to think and write on topics they may never have encountered before and care nothing about, and confines them to a timeframe that reflects the practical considerations of the test-maker, not the nature of authentic intellectual work.

Once free of the absurdly contrived conditions of conventional test-taking, sensitivity concerns can be removed from the domain of test-makers worried about their own liability. Instead, along with their teachers and guardians, students can decide what topics are appropriate to grapple with in their academic work. In fact, learning to choose, scope, and frame a topic in ways appropriate for an academic project is itself an essential skill, worthy of teaching and assessing.

Arbuthnot, K. (2011). Filling in the blanks: Understanding standardized testing and the Black-White achievement gap. Charlotte, NC: Information Age Publishing.

Darling-Hammond, L., Barron, B., Pearson, P. D., Schoenfeld, A. H., Stage, E. K., Zimmerman, T. D., … & Tilson, J. L. (2015). Powerful learning: What we know about teaching for understanding. John Wiley & Sons.

Walkington, C. A. (2013). Using adaptive learning technologies to personalize instruction to student interests: The impact of relevant contexts on performance and learning outcomes. Journal of Educational Psychology, 105(4), 932.

© 2016 BetterRhetor Resources LLC

(This post was featured as a guest post on CURMUDGUCATION.)


Is the ACT a Valid Test? (Spoiler Alert: No.)

Is the ACT a Valid Test? (Spoiler Alert: No.)

ACT, Inc. released the results of its 2016 National Curriculum Survey earlier this year. The Survey goes out every three or four years to elementary, middle school, high school, and college teachers, as well as to workforce professionals. It collects information about what respondents are teaching, how they teach it, what they care about, and so forth. It serves as the basis upon which ACT builds its tests.

Because the Survey provides a look into both what pre-college students are being taught, and what they need to know to be prepared for college, it is a useful tool for examining the serious and persistent problem of college readiness—why it is that the majority of high school graduates are underprepared for college-level academic work. ACT itself reports that 72% of its test-takers fall short of at least one of its college-readiness benchmarks, which confirms the widespread underpreparedness reported by other sources. And indeed, ACT’s National Curriculum Survey reveals wide disjunctures between high school teaching and college expectations, which may have something to do with why students aren’t better prepared.

But ironically, by pointing out these disjunctures, the Survey raises questions about the validity of the ACT exam itself. The ACT is a test that straddles the space between high school and college, claiming to be both reflective of high school curricula and a measure of college readiness. But if ACT’s own Survey reveals that high school curricula do not align with college expectations, how can the ACT validly claim to measure both?

Tests are all about validity. Their value and utility depend upon them actually measuring what they purport to measure. If a test does not actually assess what it purports to, then it’s not a valid test, and any inferences made based on its results are faulty—inferences such as “this kid has been taught the skills needed for college success but hasn’t learned them very well.”

The two claims ACT, Inc. makes about the ACT test are at odds with each other, which calls into question the test’s validity. The claim that the test is “curriculum based” rests on Survey results, which ACT says serve as empirical evidence upon which it decides how to build the test. In this way, according to ACT, the test reflects what is being taught in high schools—an important claim, since testing kids on things they haven’t been taught doesn’t tell anyone much about their abilities.

ACT also, of course, claims that the test is a measure of college readiness. Through the Survey, it gathers an understanding of what college instructors expect from entering students. This understanding is reflected in ACT’s College and Career Readiness Standards, a set of “descriptions of the essential skills and knowledge students need to become ready for college and career.”

According to ACT, the Standards are validated by the Survey in a process that “ensures that our assessments always measure not only what is being taught in schools around the country, but also what demonstrably matters most for college and career readiness.”

But can the ACT test both what is taught in high school and what is expected in college if those two things don’t square up, as is suggested by their National Curriculum Survey and other research?

Perhaps there’s a significant degree of overlap. Perhaps ACT can identify and test students on those things that fall into both the learned-it-in-high-school category and the better-know-it-for-college category. ACT says indeed there is overlap, and that they have a way of figuring out what’s in it.

How do they do it? According to a 2015 white paper, “ACT first identifies what postsecondary faculty, including instructors of entry-level college and workforce training courses, expect of their entering students—that is, the knowledge and skills students need to be ready for entry-level postsecondary courses and jobs. ACT then compares these expectations to what is really happening in elementary, middle, and high school classrooms. ACT uses the results of these comparisons to determine the skills and knowledge that should be measured on ACT assessments and to guide its test blueprints.”

The company does not explain how this process of comparison works, but it implies that they identify a subset of knowledge and skills that fall into both camps, then simply test kids on that.

To feel confident in this process, we would need to be certain that the subset is sufficient in size and scope to support the dual claims. That is, we would need to know what lies outside the overlap slice, as well as what lies within. What is being taught in high school that does not appear on the test because it is not a college-ready expectation? Likewise, what college-ready expectations do not appear on the test because they are not being taught in high school?

Once we knew those things, then we could validate the ACT by answering this question: Is the overlap slice sufficient to support both the claim that the test measures what is being taught in high school and the claim that it measures college readiness?

In other words, is there enough of the high school curriculum on the test to justify calling it a valid measure of high school achievement? And are there, at the same time, enough college expectations on the test to justify calling it a valid measure of college readiness?

ACT doesn’t attempt to answer these questions. As far as the ACT is concerned, if you demonstrate proficiency on the test, then ipso facto you’ve both mastered your high school curriculum and are ready for college, because the claims they make for the test require that the two constructs be identical.

What if you don’t do so well on the test? Is it because you haven’t learned well enough what you’ve been taught? Or because you haven’t been taught what you’re being tested on?

The ACT simply doesn’t allow for the second possibility.

In point of fact, if high schools were teaching certain essential college-ready skills – how to revise your work in response to feedback, for example—a conventional standardized test like the ACT would never be able to detect it, because it cannot provide for test-takers opportunities to do the kind of authentic, extended, or collaborative intellectual work that will be required of them in college.

But alas, as mentioned already, plenty of research demonstrates that there is a significant difference between high school learning and college expectations, suggesting that any overlap might not be very robust.  According to a six-year national study on college readiness from Stanford, “coursework between high school and college is not connected; students graduate from high school under one set of standards and, three months later, are required to meet a whole new set of standards in college.”

ACT’s own research confirms this. Two things jump out from the National Curriculum Survey results. First, as we can see from the table below, in many cases the Survey does not ask high school teachers and college instructors the same questions, so there is not much opportunity for determining where high school teaching lines up, or not, with college expectations. The Survey doesn’t look like a very good tool for comparing high school teaching to college expectations in Writing, for example.

The second thing is, where the Survey does provide opportunities for comparing high school with college, it finds that high school teaching does not align with college expectations. The Survey report points out, for example, that high school Writing teachers and college instructors are not emphasizing the same skills. Further, high school math teachers do not agree with college math instructors about what skills are important for success in STEM courses. Less than half of high school teachers believe that the Common Core math standards (which ACT stresses are in line with its own College Readiness Standards) match college instructors’ expectations for college readiness.

In other words, ACT’s own Survey shows that, to a significant extent, the knowledge and skills high school teachers are teaching are not the knowledge and skills college instructors are expecting of entering students.

Hence the college-readiness gap.

But if those two bodies of knowledge and skills aren’t the same, how can ACT support the claim that its test measures both what students actually learn and what ACT says they should learn for college readiness? The test doesn’t distinguish a “high-school-learning” part from a “college-requirements” part. As far as the test is concerned, it’s all the same.

In fact, ACT can’t really support both claims at the same time. But they make them anyway because they want to sell the test to two distinct markets. They want to sell it to students who are trying to get into college, so they call it a college-readiness test. And they want to sell it to states and districts for accountability purposes. These entities want to know whether their students are learning what they’re being taught; thus ACT calls the test curriculum-based.

But, we might wonder, don’t standards take care of all this? Standards, after all, both reflect the skills needed for college readiness and guide high school curriculum, right? Therefore, if the test aligns with the standards, then it’s both curriculum-based and a college-readiness indicator, because those are the same thing.

Most states have adopted the Common Core State Standards. Those that haven’t have concocted their own state standards, which are pretty much in line with the CCSS. In addition, ACT has its own College and Career Readiness Standards, which, it says, line up with both the CCSS and any non-CCSS state standards you care to throw at it. (As ACT says, “If a state’s standards represent the knowledge and skills that prepare students for college and career, then ACT Aspire and the ACT measure that knowledge and those skills”—a statement that manages to be both a non sequitur and a tautology.)

Again, however, ACT’s own research shows that neither high school teachers nor college instructors are much convinced that the CCSS reflect college-level expectations anyway. Asked by the Survey, “To what extent do you feel that the Common Core State Standards are aligned with college instructors’ expectations regarding college readiness,” the majority of both high school and college teachers responded little or slightly, rather than a great deal or completely.

In other words, according to its own data, ACT shouldn’t really get away with equating standards-based “curriculum achievement” with “college readiness.”

So what’s the cost of the ACT’s tricky claim-game? The cost is that we get farther away from understanding and addressing the college-readiness gap, so long as everyone believes that the ACT is really measuring what it says it does.

Wherever high school curricula lack significant overlap with the skills and knowledge ACT identifies as necessary for college-readiness, the test measures not what students have learned but what they haven’t been taught. This, then, contrary to ACT’s claims, is not an indicator of student readiness or achievement, but a measure of the distance between high school teaching and college expectations (or at least those ACT identifies and can test for).

But this is not how the interpretation of test results falls out for either student or state customers. Rather, the inescapable inference for both is that the majority of students have been taught what they need to know but simply haven’t learned it well enough—student’s fault, or teacher’s fault, but not the test’s fault for leading everyone to a lousy inference.

The faulty inference that issues from the ACT doesn’t help matters where students’ future opportunities are at stake; prospective colleges have no way of knowing that a kid was tested on things she was never actually taught. And it doesn’t help where states are trying to figure out how to improve their education systems. Rather, it makes matters worse by misdirecting both states and students away from the problem of how better to connect high school learning to necessary college skills, and toward the problem of how to get kids to score better on the test.

We do indeed want an education system in which high school curricula are focused securely on the skills and knowledge we confidently know are needed for success upon entry into college.  Demonstrably, that’s not what we have now, so we don’t need a test that falsely suggests otherwise.

© 2016 BetterRhetor Resources LLC


Welcome to the Sausage Factory

Welcome to the Sausage Factory

I used to work for ACT, Inc., designing and developing student assessments. In my final years there, I was Director of the Writing and Communications Literacies group. In one of my last major projects, I headed the team responsible for the revamped ACT Writing Test, which rolled out in 2015.

That roll-out was famously botched. ACT Test Development neglected to tend to the basics of scoring the new test; it’s partner, Pearson, screwed up the reporting. Students and their families had paid and trusted ACT to send scores in time for college application deadlines. There were serious life consequences for missing those deadlines. As a remedy, ACT ham-handedly told students to take a picture of their paper score reports and send that to their prospective colleges. Needless to say, it was all a big mess.

Let it be known that I had nothing to do with that debacle: I was gone months before. (Changes in leadership. Peter principle. ‘Nuff said.)

My team and I improved the form and content of the writing test significantly, moving it away as best we could from the old binary prompt: “Some people think X about this bland made-up issue that you care nothing about; others think Y. Now pick a side and write something intelligible about it in 30 minutes.”

As hard as I worked to improve things, however, I came to realize that the job was hopeless. ACT would never present test-takers with an authentic writing task, one that would engage them, teach them, prepare them for college-level work, or give them a chance to show what they can really do. An exercise in writing that asks students to respond to a topic they have no interest in; that they’ve never even thought about before; that constrains them under a strict, arbitrary time limit; that resembles in no way the work they’ll be asked to do in any other context—well, it’s a pointless exercise at best. Worse than pointless, it’s damaging to a student’s understanding of what competent writing is, and why they should care about it.

The whole (supposed) reason for taking the ACT test is to predict college success. But in fact, largely because of the contrived and constrained form of the test itself, ACT is incapable of presenting to colleges an accurate portrait of a student’s abilities and potential.

The longer I worked at ACT, the more disillusioned I became with the organization—and with conventional standardized testing as a force in education. The tests my group was charged with creating were clearly driving how writing was being taught in schools, and it was a very limited model of writing indeed—certainly not one that encouraged the kind of thinking and communication skills students really need. The thousands upon thousands of student responses we saw each year made this apparent with depressing consistency.

My experience at ACT prompted me to think critically about test development, with two key realizations shaking out:

1. For most kids, there is a huge gap between what they learn in high school and what they are expected to do when they get to college.

This is no stunning revelation. There are plenty of studies and statistics that bear out the fact that a huge percentage of high school graduates are underprepared for college. The ramifications are no secret either: not enough kids go to college; too many need remediation once they get there; too few graduate; too few graduate within four years.

Despite all the emphasis on “college readiness” in education circles, something is obviously not working properly. Actually, many things, but the crucial one that jumped out to my eyes was this:

2. Standardized tests, including—perhaps especially including—the ACT and SAT, are part of the problem.

These tests are highly contrived instruments that do not elicit or assess the kind of student performances required in authentic educational environments. Yet teaching and learning must conform to these tests because of their outsized role in key educational decisions. Little wonder then that so many students are unprepared for the demands of real academic work.

My intention for this blog is not merely to complain about testing, but to give it some serious critique from the point of view of someone with hands-on experience. Maybe this kind of discussion can be valuable to educators or policymakers, or parents, or even test makers, who likewise are thinking hard about how standardized tests are affecting education.

My larger goal in writing this blog, as for BetterRhetor, is to help address the readiness gap between high school and college, and contribute to solutions that lead to more success for more kids.

We want to see college prep instruction and readiness assessment move to a higher level of efficacy; we want to see every student move up a level in education and life opportunities.

At the same time, we want to see the college-admissions playing field leveled up as well, so that students aren’t disadvantaged in their access to education and their readiness for college academics because of their income or background.

The current system is not working. Not enough high schoolers are developing the skills they need for success upon entry into college, despite the rise of standards-based education. The 60-year ACT/SAT admissions testing duopoly, which serves as a gatekeeper for so many students wanting into college, disadvantages and distracts instead of helping kids transition from high school to college academics. We need an alternative. (Click here for a discussion of the duopoly and the college readiness gap.)

We need to make available to colleges not faceless collections of scores and data, but rich, textured portraits of students that show their social, personal, and cognitive abilities, and their promise for academic success. Ultimately, we need a better way to connect students with colleges that believe in them. That’s BetterRhetor’s goal.

© 2016 BetterRhetor Resources LLC