Universities began putting their courses online in the form of “Massively Open Online Courses” (MOOCs) about a decade ago, with the idea of making a wide range of high-quality academic instruction accessible and affordable to people around the world. One famous, early success story was that of Battushig Myanganbayar, a teenager in Mongolia who aced an online MIT engineering course and earned himself a scholarship to the university. Today MOOCs are available by the thousands on marketplace platforms with global reach, such asedX, Udacity, and Coursera.
But there is also another tier of online courses feeding global education, offered not by universities but by individuals. Learning platforms such as Teachable, Thinkific, and Ruzuku have flipped the script on MOOCs: now anyone anywhere can not only take an online course but build and teach one, too. And many thousands do, offering instruction on just about anything you can think of, from business management to blacksmithing, cardio training to calligraphy, leadership to lepidoptery. Online learning is a global bazaar, where anyone can offer up their expertise, enthusiasm, or experience for the whole world’s edification—and earn a rupee (or try to) in the process.
Open online learning platforms are not just for hobbyists and dilettantes; experts of all types are well represented, including plenty of moonlighting school teachers plying lessons in grammar, geometry, chemistry, art, computer science, and every other traditional school subject. In a sense, online learning platforms have given rise to a global gig economy for educators—a way for them to independently leverage their expertise and supplement their income. If the success of these courses demonstrates a worldwide demand for learning not satisfied by conventional schooling, perhaps it also demonstrates how difficult it is for teachers around the planet to make ends meet with just their day job.
Global online learning outside the bounds of institutional education takes a variety of forms. Here are some of the other types of platforms connecting otherwise disconnected students and teachers around the world:
Online Course Marketplaces
Most open platforms require that course developers find and recruit their own students, which generally entails building email lists and working social media channels. If you’re a teacher looking to market your DIY algebra course to ninth-graders, this is a hard pull. Another breed of platform, however, functions as a searchable marketplace, where students come looking for what’s on offer. The largest, Udemy, hosts some 80,000 courses. Your algebra course pops up (along with all your competitors) when an interested student searches the site.
YouTube makes a vast array of instructional videos accessible for free, including videos on every academic subject under the sun. Plenty of schools maintain a presence on YouTube as part of their recruitment and visibility strategy. Likewise, many online teaching entrepreneurs offer free content as lead magnets for the courses they charge for on other platforms.
It’s not all about business, though; plenty of content developers just like to share. One of the most prolific academic presences is Khan Academy, a non-profit that posts hundreds of videos for K-12 students on subjects from literature and civics to calculus and finance. As an instrument of unregulated global education, YouTube is an unsung powerhouse: 37% of users surveyed say they’re looking to improve school or job skills—and YouTube has 1.9 billion active monthly users.
Learning Management Systems
Many K-12 and most higher ed schools now use learning management systems (LMSs) to deliver online courses to their own students. Increasingly, even traditional classroom-based courses incorporate an LMS site that instructors can use to manage their class and post their syllabus, readings, videos, quizzes, and so forth.
But LMS platforms can also be used by individuals and companies to offer courses to the general public—typically courses that are more text-based and extensive than the video-centric offerings commonly found on other types of platform. For example, my company, BetterRhetor, recently launched College-Ready Writing Essentials via Canvas, an LMS widely used by colleges and universities. In contrast to the one-student-at-a-time model, it is a teacher-facilitated resource for high school and college classroom use. Since it is hosted on a Web-accessed LMS, it’s available to any classroom anywhere.
Khan Academy is all over YouTube, as noted above, but they also make all of their videos available through their own website. Any student in the world can, for example, survey 20th Century History through a series of more than 50 video lessons for free on the Khan Academy site. Math, science, humanities, economics—it’s all there.
For-profit learning companies likewise offer education globally on their own dedicated platforms. At VIPKID, for example, home-based teachers located anywhere (again, many of them moonlighting) connect with individual students in China, who learn a traditional curriculum, but in English.
Many of the courses found on these teaching platforms are developed by individuals or entities with no formal accountability for their content—so, it’s buyer beware. Even so, independently developed online courses constitute a busy market connecting eager learners to enterprising teachers worldwide. Online learning platforms spread global education beyond the purview and confines of conventional models and institutions.
how much the rest of education has changed since the middle of the 20th
century, it’s remarkable that the model of large-scale student assessment we
have today still looks pretty much the way it did back in the nineteen-fifties:
a group of kids under careful watch, lined up in rows of seats in a rigidly
controlled space, all asked the same questions, each silently bubbling in
answer sheets under the same strict time limits.
To be sure, new technologies have been incorporated into standardized testing over the decades: machine scoring, computerized authoring and delivery, computer adaptive testing, technology-enhanced items, automated essay scoring, automated item generation. But these innovations—not all of them widespread; it’s still very much a paper-and-pencil world in most US schools—haven’t really changed the underlying testing paradigm. Whether computer- or paper-based, the tests still are comprised mostly of multiple-choice questions. They still require highly contrived and regimented conditions for administration. They still make use of the same measurement principles and techniques. They still are governed by the values of 20th-century industrialization: speed, uniformity, cost efficiency, quantifiability, and mass production.
This model of testing persists precisely because it so effectively delivers on machine-age promises of reliability and efficiency at a large scale. But these benefits come at a cost: the fixed content and rigid testing conditions severely constrain the skills and knowledge that can be assessed. A battery of multiple-choice and short-answer questions on a timed test may do a pretty good job of evaluating basic skills, but it falls short of measuring many of the most essential academic competencies: sustained engagement, invention, planning, collaboration, experimenting and revising, spit-shining a finished project—the kind of skills developed through authentic, substantive educational experiences.
Standardized testing has not kept up with advances in learning science. It ignores, for example, the non-cognitive skills that research today tells us are integral to learning—personal resilience, for example, or a willingness to cooperate. What’s more, we acknowledge today that students develop their academic competencies, cognitive and non-cognitive, in particular educational contexts in which their own varied interests, backgrounds, identities, and languages are brought to bear as valued resources. Conventional standardized tests work to neutralize the impact of these variables, rather than incorporate them.
do need, and will continue to need, large-scale assessments, despite the many
dissatisfactions we may have with them at present. Classroom assessment by
itself doesn’t tell us what we need to know about student performance at the
state or national level. Without large-scale assessment, we’re blind to
differences among subgroups and regions, and thus cannot make fully informed
decisions about who needs more help, where best to put resources, which efforts
are working and which aren’t.
The central problem to address, then, is how to get an accurate assessment of a fuller range of authentic academic competencies in a way that is educative, timely, affordable, and scalable—a tall order indeed. Recognizing the limitations of the existing testing paradigm, the Every Student Succeeds Act (ESSA) of 2015 opened the door for a limited number of states to try out alternative models that might eventually replace existing accountability tests. Thanks in part to this opportunity, plus ever-advancing technologies, new ideas are in the works.
are some directions in which the future of testing may be headed:
Classroom-Based Evidence.The assessment of authentic classroom work can provide a fuller and more genuine portrait of student abilities than we get from the snapshot view afforded by timed multiple-choice-based tests. Indeed, portfolio assessments are widely used in a variety of contexts, from individual courses to district-level graduation requirements. Historically, however, they haven’t worked well at scale. Experiments with large-scale portfolio assessment in the 1990s were abandoned as they proved cumbersome and expensive, and as states found it difficult to establish comparability across schools and districts.
Hopes for using collections of authentic student evidence in large-scale assessments are being revived, however, as ESSA creates new opportunities for state-level change. The anti-standardized testing group, FairTest, has developed a model to help guide state system innovations toward local assessment of classroom-based evidence. The model folds teacher-evaluated, student-focused extended projects into a statewide accountability system with built-in checks for quality and comparability. FairTest cites programs already underway in New Hampshire and elsewhere as evidence of where this approach might lead.
The FairTest model doesn’t rely on new technologies, but large-scale portfolio assessment potentially becomes more feasible today, compared with the low-tech version in the nineties, thanks to easier digitization, cheaper storage, and ubiquitous connectivity. More than mere repositories for uploaded student work, platforms today can combine creation and social interaction spaces with advanced data analytics. This creates opportunities for assessing new constructs (research, or collaborative problem-solving, for example), gaining new insights into student competencies (e.g. social skills), and even automating some dimensions of portfolio assessment to make it faster and more affordable. Scholar, a social knowledge platform currently in use in higher education, provides a glimpse into the kind of environment in which large-scale e-portfolio assessment might someday take root.
Real-World Fidelity. Another shortcoming of multiple-choice based standardized tests is that that they do not present students with authentic contexts in which to demonstrate their knowledge and skills. More authentic tasks, critics argue, better elicit the actual skills associated with the constructs measured, and thus lead to more valid test score interpretations.
tests create opportunities for item types that more closely resemble real-world
activities, compared with traditional multiple-choice questions. Technology-enhanced
items (TEIs) can, for example, allow students to manipulate digital objects,
highlight text, show their calculations, or respond to multimedia sources.
While such items fall short of replicating real-world activities, they do
represent a step beyond selecting an answer from a list and filling in a bubble
Many computer-based versions of standardized tests now add TEIs to the mix of conventional items in hopes of measuring a broader range of skills and improving test validity. In truth, however, TEIs bring their own set of test development challenges. Though eager to use them, test makers at this point do not know very much about what a given TEI might measure beyond a conventional multiple-choice question, if anything. Additionally, in their quest for greater real-world fidelity, TEIs can at the same time introduce a new layer of measurement interference, requiring examinees not only to demonstrate their academic ability, but also to master novel test item formats and response actions.
Despite their current limitations, however, technology-enhanced items will likely continue pushing standardized testing toward greater real-world fidelity, particularly as they grow more adept at simulating authentic problems and interactions, and better at providing test takers with opportunities to devise and exercise their own problem-solving strategies. The latest iteration of the PISA test, a large-scale international assessment, simulates student-to-student interaction to gauge test takers’ collaborative problem-solving skills. Future versions will connect real students with one another in real time.
Continuous Assessment.As tests evolve toward truer representations of real-world tasks, they will likely pick up a trick or two from computer-based games, such as Mars Generation One: Argubot Academy or Physics Playground. These games, like many others, immerse students in complex problem-solving activities. To the extent that conventional test-makers learn likewise to engage students in absorbing tasks, they will better succeed at eliciting the kinds of performances that accurately reflect students’ capabilities. When tasks lack relevance and authenticity they work against students’ ability to demonstrate their best work.
addition to engaging their interest, computer-based educational games can
continuously assess students’ performances without interrupting their
learning. The games register a student’s success at accomplishing a task; but
more than that, they can capture behind-the-scenes data that reveal, for
example, how persistent or creative the student was in finding a solution.
As they develop, platforms delivering academic instruction might also automatically assess some dimensions of authentic student performance as it happens, without interrupting learning activities. The Assessment and Teaching of 21st Century Skills project, from the University of Melbourne, provides an example of how an academic platform can capture log stream and chat stream data to model and evaluate student activity. This kind of “stealth assessment” creates opportunities for including non-cognitive competencies—e.g., level of effort, willingness to contribute—in the overall picture of a student’s abilities.
Inclusion.To achieve statistical reliability, conventional standardized tests demand rigorously uniform test-taker experiences. Accordingly, the tests have always had a hard time accommodating examinees with special needs. Education today, however, persistently leads away from uniformity, towards greater inclusion and accommodation of the whole community of learners, including those with various physical, learning, and language differences.
testing presents both opportunities and challenges for accessibility. On one
hand, special tools, such as magnifiers and glosses, can be built into standard
items. On the other, TEI formats using color, interactivity, response actions
requiring fine motor skills, and other features can be difficult or impossible
for some test takers. Nevertheless, research suggests that, overall, the
digital testing environment can improve access to testing for students with
Among the challenges to inclusivity in US testing is the problem of evaluating students who are learning English against standards that assume they already have English language skills. According to Professor Alida Anderson of American University, this problem highlights the need for future assessment systems to be more flexible, not only in the design and delivery of test content, but also in the interpretation and use of standards. Towards that end, programs such as the New York State Bilingual Common Core Initiative are developing bilingual standards and learning progressions that align with English language-based standards frameworks. These efforts promise a fairer and more accurate interpretation of test results for more students.
My own company, BetterRhetor, is combining some of the innovations discussed above in an effort to overcome the limitations of conventional testing (see our long-term vision here). Our web-based platform, for use in classrooms, will deliver five-week instructional modules in Writing and STEM. Assessment of student performance is facilitated by the platform and integrated into instruction. The modules will teach, elicit, capture, and assess not only cognitive skills, but also social and personal competencies. Because students engage over an extended period, we’ll be able to supply actionable feedback, as well as indications of progress. Our overall goal is to provide teachers and schools with a highly effective instructional resource that generates a rich portrait of their students’ authentic abilities.
kinds of innovation will likely require parallel innovations in measurement
science if they are to take hold in large-scale assessment. Test reliability,
for instance, might be reframed in terms of negotiated interpretations by
panels of local stakeholders, instead of statistical correlations among test
scores. Determinations of validity may need to consider how well a test elicits
fair and authentic performances from the full complement of learners in an
educational community. Comparability across schools and districts may need to
take into account the degree to which an assessment supports not just
institutional needs but also student learning.
Ideally, future forms of large-scale assessment will function as integral dimensions of learning itself, rather than interruptions or intrusions. They’ll both evaluate and reinforce the full array of knowledge and skills required for the successful completion of real academic work in real educational contexts.
In the 1950s, C. P. Snow famously argued that academia had separated into two cultures—the sciences and the humanities—with no commerce between them. As both a novelist and a scientist himself, Snow shuttled between the two worlds, and lamented that they did not combine forces to solve problems neither was equipped to address on its own.
In our time, a separation between the sciences and the humanities is asserted on practical grounds: economic life is dominated by technology, which requires science, engineering, and math, not literature, history, philosophy and the like. College is expensive and the global marketplace competitive. Any individual looking for a serious career—and any country hoping to compete in the world economy—had best forget about the humanities and focus instead on things more practical.
STEM-promoting programs have proliferated throughout education, while the humanities have, in places, become expendable. States across the country offer incentives for students getting degrees in fields such as electrical engineering, while in Kentucky, for example, the governor has gone so far as to propose withholding state funds from schools that produce too many graduates in French literature.
All of this bias in favor of STEM has begun to generate some pushback from people who feel that there is a valuable, even necessary, place for the humanities in today’s world. Some caution against reducing education to career training alone. We should be unwilling, the novelist Marilynne Robinson writes, to “cede… humane freedom to a very uncertain promise of employability.” Rather, she says, we need the humanities for “preparing capable citizens, imaginative and innovative contributors to a full and generous, and largely unmonetizable, national life.”
In contrast to the 1950s, any rift between technical and humanistic fields today seems to be closing on its own anyway. As new technologies integrate themselves ever more thoroughly into all corners of human life, they increasingly require for their success a deeper attunement to the nature of human beings. In education, the evolution of STEM to STEAM (with the A standing for arts) reflects this integration, as does the current interest in design thinking—a recognition that technical things and systems must be responsive to aesthetics, personal preferences, cultural differences, and human behaviors of all sorts.
The digital humanities likewise blend
the two cultures into one, applying methods of quantification and data analysis
to the study of literature, geography, history, and other fields. As such, the
digital humanities provide an excellent avenue for teaching students technical
skills and humanistic modes of inquiry in complementary fashion, perhaps just
the way they’ll be asked to use them in their professional and civic lives down
K-12 teachers and students will find
online a host of digital humanities resources in three primary areas: human
geography, historical archives, and text analysis. Here are some examples:
The visualization of digitized geographical information has created a wealth of opportunities for exploring both historical and contemporary relationships between people and place. ORBIS: The Stanford Geospatial Network Model of the Roman World, for example, provides an interactive map that calculates the time and cost of traveling throughout the Roman Empire by land or sea, even taking into account the seasons.
A Vision of Britain Through Time overlays the geography of Britain with election data, census information, historical maps, and travel writing. Select any district and explore changes through time in its population, social structure, housing, industry, economic conditions, and more.
Google Earth’s Voyager section contains ready-made explorations in travel, culture, and history. Tour famous writers’ homes, explore medieval Europe, or discover tribal government success stories. The Lewis and Clark unit, created with PBS Education, combines videos and text with an interactive Google Earth map of the explorers’ journey to the Pacific and back.
Disciplinary boundaries tend to break down in the digital humanities, so geographically centered resources may also be rich troves of archival information. Civil War Washington, for example, combines historical documents, images, data, and maps, with interpretive essays to provide a thick description of the nation’s capital during the Civil War.
The Roy Rosenzweig Center for History and New Media is a repository for a wide mix of projects for studying history, as well as tools for managing citations and organizing and publishing archives. An abundance of teaching resources provide lesson plans organized around rich collections of historical materials. For example, Making the History of 1989 is a project that explores the fall of the communist states in Europe. It includes hundreds of primary sources, along with multi-media interviews with historians.
The Lincoln Telegrams Project is part of “Decoding the Civil War: Engaging the Public with 19th Century Technology and Cryptology through Crowdsourcing,” an effort to transcribe and decode Civil War military telegrams through crowdsourcing. The site includes online access to Lincoln’s wartime telegrams, along with lesson plans for high school students.
American Memory, from the Library of Congress, contains extensive collections of historical materials centered on American life, literature, history, and more. A section for teachers includes classroom materials, professional development resources, and guides for using primary sources.
Sophisticated approaches to text mining
are yielding new scholarly insights in fields from literature to linguistics to
cultural criticism. For the pre-college classroom, some handy text analysis
tools can give students an idea of how digitization opens up modes of inquiry
into language and literature.
Google Books Ngram Viewer finds the number of times user-entered words and phrases occur within the vast number of books Google has digitized up to 2008. The Viewer returns a record of the rise or decline of concepts, names, terms, and events appearing in print over years, decades, or centuries.
Wordleand similar programs generate word clouds from user-entered text. Students can analyze famous speeches, for example, by discovering the words used most or least often. Student essays entered into the program can shed light on word usage perhaps not otherwise obvious to writer or instructor. Here’s a list of classroom lessons using word cloud generators.
Voyant moves beyond word clouds to provide context for the words and phrases found in a text. When a word such as “future“ appears in a transcribed conversation, for example, does it carry a positive or negative connotation? This kind of sentiment analysis is more technically challenging to accomplish than simple word clouds, but for the right teacher or student, it can be a useful tool for examining a wide variety of texts.
In our time, technical and humanistic
domains tend to meld in ways that C. P. Snow could not have anticipated, but
likely would have welcomed. For tomorrow’s students, the very idea that the
sciences and humanities could be separated might seem perplexing, as they’ll
see all around them technical tools in the service of humanistic inquiry, and
human insights shaping the form and application of new technologies.
It is a guiding principle in test development that stimulus materials and test questions should not upset test-takers. Much like dinner conversation with in-laws, tests should refrain from referencing religion, or sex, or race, or politics—anything that might provoke a heightened emotional response that could interfere with test-takers’ ability to give their best effort.
to “sensitivity” concerns, as they’re known, makes sense conceptually. But in
practice, as they shape actual test development, sensitivity concerns are
responsible for much of why conventional standardized tests are so ridiculously
bland and unengaging. The drive to avoid potentially sensitive content
constrains test developers to such a degree that one might legitimately
question whether the cure is at least as bad as the disease.
determined are test-makers to avoid triggering unwanted test-taker emotions,
they end up compromising the validity of their tests by excluding essential
educational content and restricting students’ opportunities to demonstrate the
creative and critical thinking skills they’re actually capable of. In other
words, ironically, conventional standardized tests may be so radically boring
that they’re no better at measuring actual ability and achievement than if they
regularly froze test-takers solid with depictions of graphic horror.
Actually, no one knows for certain if the tests are better or worse for being so cautious. There is no research defining sensitivity, no evidence-based catalog of topics to avoid, no study measuring the test-taking effects of “sensitive” content. For all anyone knows, inflaming emotions might actually improve test results—though few test-makers would risk experimenting to find out.
test-maker wants to hear from a teacher or parent that a student was stunned,
enraged, offended, or even mildly disconcerted by content they encountered on a
test. And in fairness, no test-maker wants to subject a test-taking kid to a
hurtful or upsetting experience. They’re captives, after all; if something on
the test makes them feel crappy, they have little choice but to sit there and
absorb it. Their scores may or may not reflect the fact that their emotions
were triggered: there’s really no way to tell.
the other hand, high-stakes standardized tests, in and of themselves, trigger
lots of negative emotions in plenty of kids, regardless of question content. So
a cynic might wonder how much sensitivity concerns are driven by concern for
kids’ experience, and how much by fear of the PR nightmares that would ensue
from a question or passage that someone could claim was racially or religiously
offensive. Whatever the case, the result is the same: keep it safe by keeping
Since there is no research to guide decisions on sensitivity, the rules test-makers set for themselves are based strictly on their own judgment, and on some sense of industry practice. Inevitably they default to the most conservative positions possible: if a topic might conceivably be construed as sensitive, that’s enough reason to keep it off the test.
sensitivity guidelines steer test developers away from content focused on age,
disability, gender, race, ethnicity, or sexual orientation. Test-makers also
avoid subjects they deem inherently combustible, such as drugs and drinking,
death and disease, religion and the occult, sexuality, current politics, race
relations, and violence.
A “bias review” process gets applied in the course of developing passages and questions for testing, to weed out anything that might be offensive or unfair to certain subgroups—typically African Americans, Asian Americans, Latinos, Women, sometimes Native Americans. The test-maker will send prospective test materials out for review by qualified educators who belong to these subgroups. If a reviewer thinks a test item is problematic, it gets tossed. Though this process is better than nothing, it reflects more butt-covering than enlightenment, putting test-maker and reviewer alike in the awkward position of saying, for instance, “These test items are not unfair to Black people. How do we know? We had a Black person look at them!”
on topics not pertaining to identity and cultural difference rest purely on the
test makers, who, as mentioned, are as risk-averse as can be. In one example
I’m familiar with, a passage about the mythological Greek figure Eurydice was
rejected because the story deals with death and the underworld. Think of all
the literature and art excluded from testing on that kind of criteria. Think of
the impoverished portrait of human achievement and lived experience conveyed to
students by such an exclusion.
another case, a passage on ants was rejected because it reported that males get
booted out of the colony and die shortly after mating. I’m still not clear on
whether the basis for that judgment centered on the reference to insects
mating, insects dying, or the prospect of a student projecting insect gender
relations onto human relations and being thereby too disturbed to think
clearly. Whatever the case, rejecting such a passage on the basis of
sensitivity concerns seems downright anti-science.
As does the elimination of references to hurricanes and floods because some kids might have experienced them. I remember a wonderful literary passage that depicted a kid watching his family’s possessions float around the basement when their neighborhood flooded. It was intended for high schoolers. It got the noose.
I’ve seen a pair of passages from Booker T. Washington and W. E. B. DuBois nixed out of concern for racial sensitivity: you can’t have African Americans arguing with each other on questions of race. Test-makers strive to include people of color in their test content to satisfy requirements for cultural inclusivity. But those people of color cannot be engaged in the experience of being people of color —which renders the whole impulse toward inclusivity hollow and cynical. Such an over-abundance of caution does more to protect the test-maker than the student.
content validity of educational assessments that cannot reference slavery,
evolution, extreme weather events, natural life cycles, economic inequality,
illness, and other such potentially sensitive topics should come under serious
interrogation. More concerning still is the prospect of such tests driving
curriculum. With school funding and teacher accountability riding on
standardized test scores, teaching to the test makes irresistibly practical
sense in many educational contexts. Thus, if the tests avoid great swaths of
history, science, and literature, then so will curriculum.
makers of the standardized tests schoolkids encounter argue that they are not
interested in censoring educational content, only in recognizing that when
students encounter potentially sensitive topics they need the presence of an
adult to guide them through. The classroom and the dinner table are places for
negotiating challenging subjects, not the testing environment, where kids are
under pressure and on their own.
This rationale should rouse everyone to question why we continue to tolerate such artificial conditions for evaluating student learning. It essentially concedes that testing doesn’t align with curriculum, that kids will not be assessed on the things they’re taught—only on the things test-makers decide are safe enough to put in front of them. Further, it admits that test-makers compromise the content validity of their tests in deference to the highly contrived testing conditions they require. Surely we can recognize in this the severe design flaws that lie at the heart of the testing problem.
Obviously, insulting or traumatizing students with test content is something to be avoided. But at the same time, studies show that test-taker engagement is essential for eliciting the kinds of performances that accurately reflect students’ capabilities. When tasks lack relevance and authenticity they work against students’ ability to demonstrate their best work, especially students from underserved populations. Consider this statement:
Engagement is strongly related to student performance on assessment tasks, especially for students who have been typically less advantaged in school settings (e.g. English Language Learners, students of historically marginalized backgrounds) (Arbuthnot, 2011; Darling-Hammond et al., 2008; Walkington, 2013). In the traditional assessment paradigm, however, engagement has not been a goal of testing, and concerns about equity have focused on issues of bias and accessibility. A common tactic to avoid bias has been to create highly decontextualized items. Unfortunately, this has come at the cost of decreasing students’ opportunities to create meaning in the task as well as their motivation to cognitively invest in the task, thereby undermining students’ opportunities to adequately demonstrate their knowledge and skills.
my own experience interviewing high schoolers about writing prompts, they want
to write about Mexican rappers, violence in videogames, representations of
gender and race in popular culture, football concussions, gun ownership, the
double-standard dress codes schools impose on girls compared with boys, and
other topics that are both authentic and relevant to them. Conventional
standardized tests would not come near topics like these.
solution to this problem has to entail breaking away from the dominant,
procrustean model of standardized test-taking, which isolates individual
students from all resources and people, asks them to think and write on topics
they may never have encountered before and care nothing about, and confines
them to a timeframe that reflects the practical considerations of the
test-maker, not the nature of authentic intellectual work.
free of the absurdly contrived conditions of conventional test-taking,
sensitivity concerns can be removed from the domain of test-makers worried
about their own liability. Instead, along with their teachers and guardians,
students can decide what topics are appropriate to grapple with in their
academic work. In fact, learning to choose, scope, and frame a topic in ways
appropriate for an academic project is itself an essential skill, worthy of
teaching and assessing.
WORKS CITED Arbuthnot, K. (2011). Filling in the blanks: Understanding standardized testing and the Black-White achievement gap. Charlotte, NC: Information Age Publishing.
L., Barron, B., Pearson, P. D., Schoenfeld, A. H., Stage, E. K., Zimmerman, T.
D., … & Tilson, J. L. (2015). Powerful learning: What we know about
teaching for understanding. John Wiley & Sons.
C. A. (2013). Using adaptive learning technologies to personalize instruction
to student interests: The impact of relevant contexts on performance and
learning outcomes. Journal of Educational Psychology, 105(4), 932.
project-based learning were to form the core of curricula in American schools,
our problems with large-scale standardized testing would become even more
pronounced than they are now. This is not a reason to forego project-based
learning, of course; rather, it’s a reason to find a better way to test.
do need, and will continue to need, large-scale assessments, despite the many
dissatisfactions we may have with them at present. Local assessment by itself
doesn’t tell us what we need to know about student performance at the state or
national level. Without large-scale assessment, we’re blind to differences
among subgroups and regions, and thus cannot make fully informed decisions
about who needs more help, where best to put resources, which efforts are working
and which aren’t.
of the underlying aims of large-scale assessment are laudable: equity,
improvement, good stewardship. Rather it is the limitations of their form that
make the tests so problematic. They are severely restricted in the kinds of academic
work they can elicit and measure. Thus, to the degree that they demand
classroom focus or drive instruction, they actively discourage the authentic
academic work that is the aim of project-based learning.
This is so because efficiency and scalability, rather than authenticity, govern their form. The tests are composed of artificial tasks—mostly multiple-choice questions—so that student performances can be recorded and evaluated fast and cheap. They are administered under highly contrived conditions because the artificiality of the tasks creates particular kinds of problems with security and reliability, problems that can only be addressed by further compounding the artificiality with rigid time strictures, centralized testing locations, and snapshot performances by students isolated from all real-world aids and resources, including one another.
of their format restrictions, conventional standardized tests simply cannot
provide opportunities for students to demonstrate the array of skills that
comprise authentic intellectual work: e.g. generating ideas, planning,
collaborating, experimenting and revising, spit-shining the finished product.
The tests can’t elicit these skills because, in their existing incarnation,
they can’t accommodate the time it takes to do the work, and because the cost
of evaluating this kind of student work at scale would doom the whole
enterprise from the start.
other words, the real academic work that is the aim of project-based learning
is uncapturable by conventional
large-scale standardized tests. If PBL, then, formed the core of curricula, the
existing testing paradigm would utterly fail at generating the student
performance information that justifies testing in the first place.
course, one might argue, standardized tests never claim to be more than
indirect measures. They’re proxies designed to indicate larger sets of skills, not exhaustively evaluate
everything a student knows and can do. The partialness and indirectness of the
measurement is precisely the concession we make to time and cost constraints.
And anyway, some information is better than none; the assessments just need to
be good enough to sample content domains and show whether kids are mastering
If that’s the case, then justification for the tests comes down to whether basic skills are a good enough proxy for the higher order skills PBL would place at the center of education. Would we be OK with making funding and accountability decisions based on such a limited slice of what we’re actually teaching? Or does there come a point at which the disparity between what the tests can measure and what we believe students need to know becomes so great as to render the proxy argument altogether implausible?
problem with standardized tests is their form, not necessarily their function.
Fundamentally, it’s a technological problem. The multiple-choice question is a
20th-century technology that made possible the economies of scale that have
driven the format of standardized testing for nearly a century. Today’s tests
still rely predominantly on multiple-choice questions, even as they migrate
from paper to computer. New “technology-enhanced” item types appearing on
computer-based tests are still mostly elaborated forms of the multiple-choice
format. They still fall short of eliciting from students the skills and
abilities that lie at the heart of authentic academic work.
tests have remained rooted in the mid-twentieth century technologies, even as
immensely powerful networked digital technologies have arisen around them.
Today we could use existing technologies to capture all of the skills and
abilities students display as they engage in extended academic projects, from
planning to finish. We could facilitate and record collaborative interactions
within work groups, whether localized in a single classroom, or assembled from
across the nation or world. We could even generate assessable information about
personal qualities such as persistence and resilience, capturing the effort
students put into generating solutions, for example, or revising their work in
response to feedback, or contributing ideas and assistance to others.
technologies can elicit and capture the critical skills and abilities, both
cognitive and non-cognitive, cultivated by project-based learning. With some
creativity in modes of assessment, and a willingness to re-think hidebound
approaches to test reliability and validity, we can replace a model of testing
that runs counter to our most ambitious education goals.
project-based learning to someday take a primary place in education, we will
need a form of large-scale assessment that can validate its efficacy and equity
across groups and regions. That assessment will need to be embedded within
instruction and learning activities, an integral dimension of learning itself,
rather than an interruption of or intrusion into the authentic, challenging and
satisfying experience of practicing and trying, making and doing, building and
creating. It will need to be an authentic assessment capable of capturing
the full array of knowledge and skills required for the successful completion
of real academic work.
The problems with standardized tests lie less with the content they cover than with their very form—which drives their content and everything else about them.
The tests have looked pretty much the way they do ever since the fifties—a bunch of kids all in the same place, bubbling in answers to the same questions, under the same strict time limits, under the watchful gaze of a roving proctor. Replicate across district, state, nation, world.
The tests took this form not because it
is good at measuring what kids know, but because it is efficient. The form is
the product of 20th-century industrialization, with speed, uniformity, quantifiability,
and mass production being the governing virtues.
Large-scale testing of this type was
made possible by two industrial-era inventions: 1) the multiple choice
question, which allowed for the super-quick recording and evaluation of
responses; and 2) Scantron technology, which turned the job of checking answers
over to an electronic scanning machine that could read bubble sheets at
blinding speed, achieving a quantum leap in scalability and cost efficiency.
A third thing created the large-scale standardized
test, as well: the science of psychological measurement, itself born of the
quintessentially 20th-century project aimed at bringing the rigor and precision
of the hard sciences to the messy business of human thought and behavior.
“If only we could control the variables
well enough,” supposed the mid-century psychometricians, presumably adjusting
their spectacles and checking their clipboards, “why, we could reliably measure
even intelligence itself, quantifying each subject’s relative value within the
collective!” Backs were slapped. Huzzahs exchanged.
It sounds kind of scary now, in the way that Taylorism and Skinner Boxes sound scary. And some of this science was indeed put to nefarious ends, such as justifying the exclusion of whole groups of Americans from college admission, as The College Board did in its early years.
But mostly the impulse to quantify was
sincerely pointed toward improving and democratizing education. After all, if
there were a basis for comparing students under standardized conditions, we
might be able to glean some reliable insights into how our education system
treats different subgroups, how geographic regions differ and why, whether our
efforts to educate are improving over time, etc. Maybe we could figure out who
needs more help, and how to teach better, and where best to put our resources.
Or we might even be able simply to look
at a number to determine who is ready
for college and who is perhaps, ahem, not quite Hahvahd material, so sorry.
Notice, however, how many suspect
assumptions underlie the whole project. Are all of these students equally
prepared for the experience of taking
this test? It is, after all, pretty weird and stressful and artificial. Should
all kids be expected to have the same knowledge and abilities? Is there really
only one type of academic success? Are we confident that the tasks we’re
putting in front of students are yielding the information we want? Given how
contrived the test format and testing experience are, do we really even know
what we’re measuring?
The big question, of course—the one that forever dogs standardized testing—is this: are the tests measuring what they need to or only what they can? And if only what they can, is that good enough to support the kind of test-results-based inferences we’re making about kids and teachers and schools?
The constraints that format standardization places on content, time, and space drive what gets tested. That is, the validity and reliability of the test require the standardization of conditions—which means corralling groups of kids into the same kinds of spaces, for the same amount of time, and asking them the same questions. The multiple choice format historically makes all this possible, but is only good for eliciting certain kinds of knowledge and skills. The main skill it elicits, of course, is the dubious skill of test-taking itself. It can also do pretty well at testing a kid on basic skills, such as the rules of grammar, but cannot prompt from a student her capacity for “higher order” academic abilities, such as generating original ideas for an essay.
In fact, even including some number of
constructed response questions and technology-enhanced items, conventional
standardized tests cannot elicit the kinds of things essential for authentic academic
work in college:
Analyzing conflicting source documents
Supporting arguments with evidence
Solving complex problems that have no obvious answer
Thinking deeply about what you’re being taught.
They can’t do it given test-taking time
constraints, and they can’t do it because the cost of evaluating this kind of
student work on a large scale would blow-up the whole enterprise.
And this short list from the National Research Council leaves a lot out, including the many social, dispositional, and behavioral skills students need for success.
When so much in our education system is determined by scores on large-scale standardized tests—especially school funding and teacher evaluations —it is not surprising that many schools, however frustrated in their own efforts, resort to training kids to perform on the tests. Otherwise, they’re out of business. But that means the kids are not learning what they need most for actual academic success, only what they need for test-taking.
To extend the tragedy, our students
aren’t even doing well on the tests they’re being trained to take. How do we
know this? The tests!
In other words, we’re operating within a strange, Escher-like world, in which standardized tests serve as the instruments used to monitor how much they themselves are screwing up education.
When we consider all the essential
knowledge, skills, and abilities that these tests, because of their very form, cannot elicit and measure, it’s clear
that we really need to start reevaluating their usefulness.
One further thing to consider: even
now, as more and more tests migrate to computer, as computer adaptive testing
and tech-enhanced items and automated scoring et cetera become more common,
large-scale standardized tests are still overwhelmingly reliant on the multiple
That is, even as a world of immensely
powerful and networked digital technologies has grown up around them, the
tests, in their basic form, are still rooted in the mid-20th-century paradigm
of the bubble-in Scantron sheet.
This actually gives rise to hope,
however. It raises the possibility that we might already have the tools to
create different kinds of instruments for education measurement, but that we’re
just not using them. These would be instruments that share the original goals
of providing insights for improving and democratizing education, but which also
overcome the limitations on content, time, and space that have always made
old-school standardization such a poor governing principle for assessing