Standards-based assessment

Last updated

In an educational setting, standards-based assessment [1] is assessment that relies on the evaluation of student understanding with respect to agreed-upon standards, also known as "outcomes". The standards set the criteria for the successful demonstration of the understanding of a concept or skill. [2]

Contents

Overview

In the standards-based paradigm, [3] students have the freedom to demonstrate understanding in diverse ways, including (but not limited to) selected response (e.g. multiple choice tests), physical constructions, written responses, and performances. Of course, these are not new types of assessments, nor is the concept of differentiated assessment. The teacher uses all available observations and quantitative information to summarize learning with reference to a specific standard. With these data, a teacher can formulate the steps or actions that can be taken to gain mastery of a particular concept. That is, it aids in assessment for learning.

One of the key aspects of standards-based assessment is post-assessment feedback. The feedback a student receives from this type of assessment does not emphasize a score, percentage, or statistical average, but information about the expectations of performance as compared to the standard. A standards-based approach does not necessarily dismiss a summative grade, percentage, or a measure of central tendency (such as a mean, or median). However, an assessment that does not reference or give feedback with respect to a standard would not be standards-based. There is a large body of evidence that points to the effectiveness of appropriate feedback. [4]

Purpose

The purpose of standards-based assessment [5] is to connect evidence of learning to learning outcomes (the standards). When standards are explicit and clear, the learner becomes aware of their achievement with reference to the standards, and the teacher may use assessment data to give meaningful feedback to students about this progress. The awareness of one's own learning allows students to point to a specific standard of achievement and so strengthens self-regulation and meta-cognition, two skills generally understood to be effective learning strategies. [6]

Framework of the standards-based approach in assessment

A common approach to standards-based assessment (SBA) is:

Example from the British Columbia Grade 3 Curriculum Package (September 2010):

"It is expected that students will view and demonstrate comprehension of visual texts (e.g., cartoons, illustrations, diagrams, posters)" [7]

Example from the New Zealand curriculum document Mathematics Standards for Years 1-8, by the end of year 5:

"In contexts that require them to solve problems or model situations, students will be able to create, continue, and predict further members of sequential patterns with two variables" [8]

Respectively to the example from the British Columbia Grade 3 Curriculum Package:

"describe key messages and images and relevant details in response to questions or activities" [7]

Hallmarks

Geographical distinctions

United States

A standards-based test is an assessment based on the outcome-based education or performance-based education philosophy. [11] Assessment is a key part of the standards reform movement. The first part is to set new, higher standards to be expected of every student. Then the curriculum must be aligned to the new standards. Finally, the student must be assessed if they meet these standards of what every student "must know and be able to do". In the United States, a high school diploma which is given on passing a high school graduation examination [12] or Certificate of Initial Mastery is awarded only when these standards are achieved. It is fully expected that every child will become proficient in all areas of academic skills by the end of a period, typically 10 years in the United States, but sometimes longer, after the passing of education reform bill by a state legislature. The United States federal government, under No Child Left Behind can further require that all schools must demonstrate improvement among all students, even if they are already all over proficient.

Holistic grading

Rather than using computers to log responses to multiple choice tests, rubrics for state assessments such as in North Carolina [13] ask scorers to look at the entire paper and make judgments. Scorers are not allowed to count errors, and rubrics do not contain numeric measurements of how many spelling or grammar error constitute a "1" or "2". The Analytical Writing section of the GRE test is scored using a six-point holistic scale in half-point increments. [14] Holistic grading is one of the main reason for disagreement between scorers, but for this reason some tests are scored more than once to check for agreement.

Advantages

  • Students are compared to a standard that all can reach, rather than artificially ranked into a bell curve where some students must be called failures, and only a few are allowed to succeed
  • Humans, not computers can evaluate the entire value of a response rather than imposing a strict right or wrong that is not open to interpretation
  • Free response uses and tests for higher order thinking, which is important in most new education standards.
  • Computer scored multiple choice tests have been shown to have deleterious effects for minorities, unfairly denying opportunities.
  • Only a standards-based test is aligned with standards-based education reform, which is based on a belief that all students can succeed, not only a few.
  • Passing a 10th grade test and awarding a Certificate of Initial Mastery ensures that all students will graduate with the skills they need to succeed in the world-class economy of the twenty-first century.
  • Students will no longer be cheated by passing them on to the next grade without obtaining what every child at the grade level must be able to know and do.
  • When all students pass all standards, as is the central belief of standards-based education reform, all students from all demographics will achieve the same test score, eliminating the mysterious achievement gap which has previously been shown to occur between all groups on all tests. However, as of 2006, no standards-based assessment has yet achieved this optimistic (critics might call it impossible) goal, though many show rising scores.

Cons

Compared to a multiple-choice, norm-referenced test, a standards-based test can be recognized by:

  • A cut score is determined for different levels of performance. There are no cut scores for norm-referenced tests. There is no failing score on the SAT test. Each college or institution sets their own score standards for admission or awards.
  • Different levels of performance are set. Typically these are Above Standard, Meets Standard, Below Standard. These levels a typically set in a benchmarking process, even though such a process does not take into an account whether the test items are even appropriate for the grade level.
  • Tests are holistically graded against a free-written response, often with pictures, rather than graded correct or incorrect among multiple choices.
  • Tests are more expensive to grade because of this, typically $25–30 per test compared to $2–5, not including the cost of developing the test, typically different every year for every state.
  • Tests are more difficult to grade because they are typically graded against a handful with no more than one or two example papers at each scoring level. They cannot be graded by computer
  • Tests are less reliable. Agreement may fall between 60 and 80 percent on a 4-point scale and be considered to be accurate.
  • Graders do not need teaching credentials, only a bachelor's degree in any field, and are typically paid $8 to $11 per hour for part-time work.
  • Failure rates as high as 80 to 95 percent are not only not unusual, they are fully expected and announced as test programs are introduced to the local press. Under traditional graduation criteria, African Americans had achieved national graduation rates within a few points of whites. In 2006, three-quarters of African Americans who failed the WASL were promised by Superintendent Terry Bergeson that they would not get a diploma if they did not pass retakes of the test in two years, even though she had pledged earlier that "all students" would get a world class diploma.
  • Failure rates for minorities and special education students are typically two to four times higher than for majority groups as extended response questions are more difficult to answer than multiple choice
  • Content is often difficult even for adults to quickly answer, even at grade levels as low as the fourth grade, especially in mathematics. Professor Don Orlich called the WASL a "disaster", with math and science tests falling well above the normal development level of students at many grade levels.
  • Mathematics has a high proportion of statistics and geometry, and a low content of simple arithmetic.
  • Schools are scored as zero for students who do not take the test.
  • Passing such a test at the 10th grade level is typically planned as being required for graduating high school.
  • Passing such a test, rather than the 50th percentile, is defined as grade level performance.
  • A question with a correct answer may be graded as incorrect if it does not show how the answer as arrived at. A question with an incorrect numerical conclusion may not necessarily be graded as wrong. [15]
  • California's first year of the CLAS test permitted no '4' high math grades, not even in the highest scoring schools, in order to leave room for improvement [16]
  • The North Carolina Writing project gave out less than 1 percent exemplary '4' scores. Such papers employed vocabulary and knowledge on a level sometimes exceeding that of the college graduate graders, and well above the intended audience of a high school graduation. [17] This level would be even more difficult than achieving an SAT score sufficient for entry into an Ivy League private college.
  • Scores typically rise much faster than standardized tests such as NAEP or SAT given over the same time period. [18]

See also

Related Research Articles

Alternative assessment is also known under various other terms, including:

<span class="mw-page-title-main">Standardized test</span> Test administered and scored in a predetermined, standard manner

A standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions and interpretations are consistent and are administered and scored in a predetermined, standard manner.

Educational assessment or educational evaluation is the systematic process of documenting and using empirical data on the knowledge, skill, attitudes, aptitude and beliefs to refine programs and improve student learning. Assessment data can be obtained from directly examining student work to assess the achievement of learning outcomes or can be based on data from which one can make inferences about learning. Assessment is often used interchangeably with test, but not limited to tests. Assessment can focus on the individual learner, the learning community, a course, an academic program, the institution, or the educational system as a whole. The word "assessment" came into use in an educational context after the Second World War.

The National Science Education Standards (NSES) represent guidelines for the science education in primary and secondary schools in the United States, as established by the National Research Council in 1996. These provide a set of goals for teachers to set for their students and for administrators to provide professional development. The NSES influence various states' own science learning standards, and statewide standardized testing.

A concept inventory is a criterion-referenced test designed to help determine whether a student has an accurate working knowledge of a specific set of concepts. Historically, concept inventories have been in the form of multiple-choice tests in order to aid interpretability and facilitate administration in large classes. Unlike a typical, teacher-authored multiple-choice test, questions and response choices on concept inventories are the subject of extensive research. The aims of the research include ascertaining (a) the range of what individuals think a particular question is asking and (b) the most common responses to the questions. Concept inventories are evaluated to ensure test reliability and validity. In its final form, each question includes one correct answer and several distractors.

The Washington Assessment of Student Learning (WASL) was a standardized educational assessment system given as the primary assessment in the state of Washington from spring 1997 to summer 2009. The WASL was also used as a high school graduation examination beginning in the spring of 2006 and ending in 2009. It has been replaced by the High School Proficiency Exam (HSPE), the Measurements of Students Progress (MSP) for grades 3–8, and later the Smarter Balanced Assessment (SBAC). The WASL assessment consisted of examinations over four subjects with four different types of questions. It was given to students from third through eighth grades and tenth grade. Third and sixth graders were tested in reading and math; fourth and seventh graders in math, reading and writing. Fifth and eighth graders were tested in reading, math and science. The high school assessment, given during a student's tenth grade year, contained all four subjects.

In the realm of US education, a rubric is a "scoring guide used to evaluate the quality of students' constructed responses" according to James Popham. In simpler terms, it serves as a set of criteria for grading assignments. Typically presented in table format, rubrics contain evaluative criteria, quality definitions for various levels of achievement, and a scoring strategy. They play a dual role for teachers in marking assignments and for students in planning their work.

Mastery learning is an instructional strategy and educational philosophy, first formally proposed by Benjamin Bloom in 1968. Mastery learning maintains that students must achieve a level of mastery in prerequisite knowledge before moving forward to learn subsequent information. If a student does not achieve mastery on the test, they are given additional support in learning and reviewing the information and then tested again. This cycle continues until the learner accomplishes mastery, and they may then move on to the next stage. In a self-paced online learning environment, students study the material and take assessments. If they make mistakes, the system provides insightful explanations and directs them to revisit the relevant sections. They then answer different questions on the same material, and this cycle repeats until they reach the established mastery threshold. Only then can they move on to subsequent learning modules, assessments, or certifications.

CLAS was a test and a standards-based assessment based on Outcomes Based Education principles given in California in the early 1990s. It was based on concepts of new standards such as whole language and reform mathematics. Instead of multiple choice tests with one correct answer, it used open written responses that were graded according to rubrics. Test takers would have to write about passages of literature that they were asked to read and relate the passage to their own experiences, or to explain how they found solutions to math problems that they were asked to solve. Such tests were thought to be fairer to students of all abilities.

Holistic grading or holistic scoring, in standards-based education, is an approach to scoring essays using a simple grading structure that bases a grade on a paper's overall quality. This type of grading, which is also described as nonreductionist grading, contrasts with analytic grading, which takes more factors into account when assigning a grade. Holistic grading can also be used to assess classroom-based work. Rather than counting errors, a paper is judged holistically and often compared to an anchor paper to evaluate if it meets a writing standard. It differs from other methods of scoring written discourse in two basic ways. It treats the composition as a whole, not assigning separate values to different parts of the writing. And it uses two or more raters, with the final score derived from their independent scores. Holistic scoring has gone by other names: "non-analytic," "overall quality," "general merit," "general impression," "rapid impression." Although the value and validation of the system are a matter of debate, holistic scoring of writing is still in wide application.

<span class="mw-page-title-main">Higher-order thinking</span> Concept in education and education reform

Higher-order thinking, also known as higher order thinking skills (HOTS), is a concept applied in relation to education reform and based on learning taxonomies. The idea is that some types of learning require more cognitive processing than others, but also have more generalized benefits. In Bloom's taxonomy, for example, skills involving analysis, evaluation and synthesis are thought to be of a higher order than the learning of facts and concepts using lower-order thinking skills, which require different learning and teaching methods. Higher-order thinking involves the learning of complex judgmental skills such as critical thinking and problem solving.

Formative assessment, formative evaluation, formative feedback, or assessment for learning, including diagnostic testing, is a range of formal and informal assessment procedures conducted by teachers during the learning process in order to modify teaching and learning activities to improve student attainment. The goal of a formative assessment is to monitor student learning to provide ongoing feedback that can help students identify their strengths and weaknesses and target areas that need work. It also helps faculty recognize where students are struggling and address problems immediately. It typically involves qualitative feedback for both student and teacher that focuses on the details of content and performance. It is commonly contrasted with summative assessment, which seeks to monitor educational outcomes, often for purposes of external accountability.

The Connecticut Mastery Test, or CMT, is a test administered to students in grades 3 through 8. The CMT tests students in mathematics, reading comprehension, writing, and science. The other major standardized test administered to schoolchildren in Connecticut is the Connecticut Academic Performance Test, or CAPT, which is given in grade 10. Until the 2005–2006 school year, the CMT was administered in the fall; now it is given in the spring.

Education reform in the United States since the 1980s has been largely driven by the setting of academic standards for what students should know and be able to do. These standards can then be used to guide all other system components. The SBE reform movement calls for clear, measurable standards for all school students. Rather than norm-referenced rankings, a standards-based system measures each student against the concrete standard. Curriculum, assessments, and professional development are aligned to the standards.

An anchor paper is a sample essay response to an assignment or test question requiring an essay, primarily in an educational effort. Unlike more traditional educational assessments such as multiple choice, essays cannot be graded with an answer key, as no strictly correct or incorrect solution exists. The anchor paper provides an example to the person reviewing or grading the assignment of a well-written response to the essay prompt. Sometimes examiners prepare a range of anchor papers, to provide examples of responses at different levels of merit.

Corrective feedback is a frequent practice in the field of learning and achievement. It typically involves a learner receiving either formal or informal feedback on their understanding or performance on various tasks by an agent such as teacher, employer or peer(s). To successfully deliver corrective feedback, it needs to be nonevaluative, supportive, timely, and specific.

<span class="mw-page-title-main">Differentiated instruction</span> Framework or philosophy for effective teaching

Differentiated instruction and assessment, also known as differentiated learning or, in education, simply, differentiation, is a framework or philosophy for effective teaching that involves providing all students within their diverse classroom community of learners a range of different avenues for understanding new information in terms of: acquiring content; processing, constructing, or making sense of ideas; and developing teaching materials and assessment measures so that all students within a classroom can learn effectively, regardless of differences in their ability. Differentiated instruction means using different tools, content, and due process in order to successfully reach all individuals. Differentiated instruction, according to Carol Ann Tomlinson, is the process of "ensuring that what a student learns, how he or she learns it, and how the student demonstrates what he or she has learned is a match for that student's readiness level, interests, and preferred mode of learning." According to Boelens et al. (2018), differentiation can be on two different levels: the administration level and the classroom level. The administration level takes the socioeconomic status and gender of students into consideration. At the classroom level, differentiation revolves around content, processing, product, and effects. On the content level, teachers adapt what they are teaching to meet the needs of students. This can mean making content more challenging or simplified for students based on their levels. The process of learning can be differentiated as well. Teachers may choose to teach individually at a time, assign problems to small groups, partners or the whole group depending on the needs of the students. By differentiating product, teachers decide how students will present what they have learned. This may take the form of videos, graphic organizers, photo presentations, writing, and oral presentations. All these take place in a safe classroom environment where students feel respected and valued—effects.

Teacher quality assessment commonly includes reviews of qualifications, tests of teacher knowledge, observations of practice, and measurements of student learning gains. Assessments of teacher quality are currently used for policymaking, employment and tenure decisions, teacher evaluations, merit pay awards, and as data to inform the professional growth of teachers.

Writing assessment refers to an area of study that contains theories and practices that guide the evaluation of a writer's performance or potential through a writing task. Writing assessment can be considered a combination of scholarship from composition studies and measurement theory within educational assessment. Writing assessment can also refer to the technologies and practices used to evaluate student writing and learning. An important consequence of writing assessment is that the type and manner of assessment may impact writing instruction, with consequences for the character and quality of that instruction.

The Framework for Authentic Intellectual Work (AIW) is an evaluative tool used by educators of all subjects at the elementary and secondary levels to assess the quality of classroom instruction, assignments, and student work. The framework was founded by Dr. Dana L. Carmichael, Dr. M. Bruce King, and Dr. Fred M. Newmann. The purpose of the framework is to promote student production of genuine and rigorous work that resembles the complex work of adults, which identifies three main criteria for student learning, and provides standards accompanied by scaled rubrics for classroom instruction, assignments, and student work. The standards and rubrics are meant to support teachers in the promotion of genuine and rigorous work, as well as guide professional development and collaboration.

References

  1. Partnership, Great Schools (2014-01-30). "Standards-Based Definition". The Glossary of Education Reform. Retrieved 2018-05-22.
  2. Standards-Based Assessment , retrieved February 20, 2016
  3. "Leaders of Their Own Learning: Chapter 8: Standards-Based Grading | EL Education". eleducation.org. Retrieved 2018-05-22.
  4. 1 2 John Hattie, Power of Feedback (PDF), retrieved February 20, 2016
  5. Glavin, Chris (2014-02-06). "Standards-based Assessment | K12 Academics". www.k12academics.com. Retrieved 2018-05-22.
  6. Emily R. Lai, Metacognition: A Literature Review (PDF), retrieved February 20, 2016
  7. 1 2 British Columbia Ministry of Education, Grade 3 Curriculum Package (PDF), Ministry of Education, British Columbia, retrieved February 21, 2016
  8. New Zealand Ministry of Education, "Mathematics Standards for Years 1-8" (PDF), New Zealand Curriculum Online, Ministry of Education, retrieved February 21, 2016
  9. Carol Dwyer, "Using Praise to Enhance Student Resilience and Learning Outcomes", American Psychological Association, retrieved March 21, 2016
  10. Susan M. Brookhart, How to Create and Use Rubrics for Formative Assessment and Grading , retrieved February 20, 2016
  11. January 18, 2006 Standards- Based Test draws plan for awareness Jackie Schlotfeldt News-Bulletin Staff Writer Valencia County News-Bulletin "With the New Mexico Standards Based Assessments just a little over a month away"
  12. Glavin, Chris (2014-02-06). "High School Graduation Examination | K12 Academics". www.k12academics.com. Retrieved 2018-05-22.
  13. AFT - Hot Topics - Standards-Based Reform North Carolina Focused Holistic Scoring Guide: The Expository Composition Grade 7,
  14. GRE Update - March 2006 The Analytical Writing section of the General Test will continue to be scored using the six-point holistic scale in half-point increments
  15. 1997 WASL math released problems
  16. testimony of Maureen DiMarco to Washington State legislators
  17. At a grading session in Auburn in 2004, no graders could identify some of the words used in some papers
  18. Rand study of Kentucky KIRIS