Grades and Grading - The Intuitive Language Teacher

If Tasks should form the backbone of the communicative curriculum, then what kinds of Tasks are there?

This is the next step in Bill VanPatten’s discussion of Tasks in his book While We’re on the Topic (ACTFL, 2017).

VanPatten distinguishes between input-oriented Tasks and output-oriented Tasks. Input-oriented Tasks are Tasks in which the learners interpret communication and do not create meaning. Output-oriented Tasks, on the other hand, require learners to express (or create) meaning.

VanPatten envisions using primarily input-oriented Tasks with beginning students and reserving output-oriented Tasks for students who already have a significant amount of language. As he puts it,

Input-oriented Tasks allow for communication when learners have limited expressive ability. Output-oriented Tasks allow for communication when learners have more expressive ability.

The challenge for teachers is to envision and create input-oriented Tasks because we have been trained to think of Tasks as output.

Another characteristic of Tasks is that they are structured. “They have steps – a procedure – that guides students and lets them know when they have finished.” (2017, p. 88)

Tasks also have an immediate concrete informational goal.

Tasks can but don’t have to be project-based. It should be noted that all of VanPatten’s examples of project-based Tasks are output oriented. This raises the question of whether project-based Tasks could be input oriented.

Tasks can also progress from simple to complex.

VanPatten now comes to the question of Working With Tasks.

The first decision that the teacher must make is how to incorporate Tasks into the curriculum. There are, according to VanPatten, two options:
1. Drop them in at points that make sense thematically or
2. Let them drive the curriculum.

Obviously, from the full title of the chapter (“Tasks Should Form the Backbone of the Communicative Curriculum”), VanPatten prefers the second option, although he concedes the usefulness of the former.

Another way to Work With Tasks is to use them as measures of proficiency development. Here, VanPatten suggests using Tasks instead of the ACTFL can-do statements. Since, according to ACTFL, the can-do statements were never intended to be used for assessment, this means that Tasks are much more useful to the classroom teacher.

And so, Tasks can be used as alternatives to traditional testing.

Finally, VanPatten comes to the point of the chapter:

“Tasks can also form the backbone of the curriculum by driving the content of the course. This means abandoning textbooks and traditional classroom approaches and forming units around Tasks.”

In practice, this involves backward planning from the Task to providing the language and other information that students will need to complete the Task. In the simplest terms, this means that the teacher will

Select the Task you want to be the “goal”.
Determine what students need to know and know how to do in order to complete the Task.
Develop activities [not Activities] and mini-tasks that work on what they [students] need to know and know how to do so that they work toward the goal.

Preparing students to complete a task means providing them with what they need to complete the task and avoiding too much extraneous material. As VanPatten notes, students may need only certain verb forms for a Task, and that is all right because acquisition is slow and piecemeal anyway.

To end the chapter, VanPatten suggests the following Implications for Language Teaching:

Exercises and Activities are not the foundation of communicative or proficiency-oriented language teaching.

This is because Exercises and Activities are not communicative – or are partially communicative at best – in nature and therefore are not particularly useful.

Textbooks and commercial materials need to move away from Exercises and Activities as the staples of learning and make Tasks central to classroom activities.

This is obviously a Call To Action. Textbook publishers will adapt and change textbooks when the demand changes. They respond to the market. Therefore, VanPatten’s call is to teachers, departments, and districts to begin demanding textbooks that are based on second language acquisition research and not just a reworking of the traditional grammar syllabus.

Instructors need alternative means to assess students and perhaps even move away from “assigning grades” to students at the end of the semester.

VanPatten states that “we need alternatives to traditional testing and grading”. While this sounds like a radical call, once one knows the history of grades and grading in education, it is really a call to return to practices that used to be common. Did you know, for example, that “grades” used to be narrative? That is, the instructor described what the student could do at the end of the course. A return to this sort of “grade” would require a wholesale overhaul of the education system, but that might not be such a bad thing. Teachers I know are not opposed to reforming the education system; they are opposed to the schemes and machinations of the current “education reformers” because of the nature of those schemes and machinations. But we won’t go further into that political quagmire.

Next week we will begin taking a look at what I consider VanPatten’s most problematic chapter in his book.

Some time ago, I did research into grades and grading in the United States. It was in conjunction with inquiry into teaching practices in foreign language instruction. The research was interesting and gave me insight into the arbitrary and subjective nature of grades and grading in schools.

I am providing a summary of my findings for anyone who is interested and licensing this document under a Creative Commons Attribution 4.0 International License. You are free to copy it, adapt it, and distribute it as long as you give proper credit to me for it.

Traditional Grading

Prior to the 1800s, “grading” was narrative and descriptive, i.e. the teacher summarized what the student was able to do (and not able to do).
“Prior to that time [1785], U.S. colleges employed the Oxford and Cambridge model, in which students attended regular lectures and engaged in a weekly colloquy with their proctor, in writing and in person. The students were determined to have completed the course when the proctor, and sometimes a panel of other professors, decided they had demonstrated an adequate mastery of the subject. There was no grade. The only way for a potential employer to compare students’ credentials was on the basis of letters of recommendation.” (Palmer)
Cambridge and Oxford Universities have continued this practice into contemporary times, although some accommodation to “points” has been made.

Grading Innovation

Yale University in 1785 was the first institution to assign grades: Optima, Second Optimi,Inferiores (Boni), Pejores. These were descriptive adjectives in Latin (Best, Second Best, Worse [Good], Worst).
Mount Holyoke College in Massachusetts is the first institution to have records of a letter-grade system (Palmer):
1. A=95-100;B=85-94;C=76-84;D=75;E=0-74.(1897)[E=Failure]
2. A=95-100;B=90-94;C=85-89;D=80-84;E=75-79;F=0-74.(1898)
In the 1800s, there was a great deal of variation in the grading “scales”.
Yale experimented with four- and nine-point numerical scales (Palmer) but did not correlate these to letter grades. (Durm)
Harvard tinkered with 20- and 100-point scales before settling on five “classes”with the lowest class failing the course (Palmer) but first used numbers in 1830. (Durm)
1. 1877: Division 1 = 90+ Division 2 = 75-89 Division 3 = 60-74 Division 4 = 50-59 Division 5 = 40-49 Division 6 = below 40
2. 1884: five classes, the lowest of which includes those who failed the cours
3. 1895: three classifications – Failed, Passed, Passed with Distinction
4. William and Mary College had four groupings with descriptors (e.g. “orderly, correct, and attentive” and “they have learnt little or nothing”) to guide faculty in classifying students. (Palmer)
5. The University of Michigan tried several systems. (Durm)
  1. A numerical system
  2. Pass-no pass system (1851)
  3. Pass-conditional-no pass system (1860)
  4. 100-point system (shortly after 1860)
  5. P = Pass; C = Conditioned; A = Absent (1867)
  6. Passed, Incomplete, Conditioned, Not Passed, Absent
In the early 1900s the wide variation in grading practices fueled a movement away from the 100-point scale to scales that had fewer and larger categories.
1. Excellent, Average, Poor
2. Excellent, Good, Average, Poor, Failing
3. A,B,C,D,F
The 100-point scale did not become widespread until the early 1990s and has increased with the prevalence of computers that make this grading scale easier to manage.
When first instituted, the average grade was 50, and any score outside the 25-75 range was very rare. [N.B.: This seems to parallel the French model in which it is virtually impossible to score 20/20. As one professor put it: “20 is reserved for God because only He is perfect. 19 is reserved for me because I am the professor. The absolute best any student can score is 18. Otherwise, you shouldn’t be in this course.” (Richard Harrell, personal conversation)]
Cut-off points are arbitrary and show wide discrepancies.
1. A=100-96;B=95-86;C=85-76;F=75 and below
2. A=100-90;B=89-80;C=79-70;D=69-60;F=59andbelow
Computer grading programs are designed and created by computer programmers, so they incorporate scales that appeal to technicians, specifically percentages.
The 100-point scale seems to come mainly “from the increased use of technology and the partialities of computer technicians, not from the desire of educators for alternative grading scales or from research about better grading practice.” (Guskey) There is no pedagogical or educational reason for the 100- point scale.

Contemporary Grading

There is great variation in grading scales, indicating the subjective nature of grades.
1. Oxford University has a “final exam” that students must pass in order to receive a degree. There is no cumulative grade point average. To receive a degree from one of the world’s foremost institutions of higher learning, a student must receive30/100. Yet, 30/100 is abject failure in most classrooms in the US. Would anyone maintain that public schools in the US are more rigorous than Oxford University because their cut-off score for passing is higher?
2. 50/100 can mean different things. On the GRE (Graduate Record Exam) physics exam, 50/100 places the student ahead of 70% of all individuals who take the exam. Only about half of all individuals who take the GRE literature exam score 50 percent correct, so this score for literature is “average”. In most classrooms in the United States, this is a failing grade. So, are the majority of prospective graduate students failures?
3. Point, percentage or letter grades do not give students feedback (or “washback”) on specific aspects of the assessment. (N.B.: Some institutions are returning to the narrative for grading in order to address this issue; cf. Johnston College at University of Redlands.)

Grading Problems

Even experts grade the same work quite differently, despite attempts at “calibration”, cf. studies like Hunter Brimi (2011) in which teachers who had received 20 hours of training in writing assessment gave widely divergent scores to writing samples. (Guskey)
If using a 100-point scale (i.e. percentages), can the teacher genuinely discern the difference between a 79-point performance and an 80-point performance on an essay, speaking assignment, playing task, etc.? Yet there is a significant difference in perspective between a C+ grade and a B- grade [and even more between A- and B+] in the most common grading systems.
Attempts to “improve” the ability to make these fine distinctions often result in onerous tests that have the illusion of objectivity and test things that do not and cannot demonstrate genuine acquisition, proficiency, or performance.
Many people question the current system by pointing out that it identifies 60 or more distinct levels of failure but only 40 levels of success.
1. Nearly two thirds of the grading scale describes failure.
2. Distinguishing 60 levels of failure is not helpful to anyone; besides, is anyone genuinely concerned about whether they “barely failed” or “grossly failed” under this system?
3. If no one uses these 60 different levels of failure, why have them? 100 is an arbitrary number.
4. This scale implies that degrees of failure can be more finely distinguished than degrees of success. Is this where our focus ought to be?
Using a 100-point scale implies a level of grading accuracy that doesn’t exist. On a 20-item assessment of student learning the standard error of measurement is plus or minus two items. That is 20 percentage points, i.e. at least two letter grades error of measurement. (Guskey)
Statistical error makes it far more likely that a student will be “misclassified as performing at the 85-percent level when his true achievement is at the 90-percent level” than of being mistaken for Average when his performance is Excellent. (Guskey)
“Overall, the large number of grade categories in the percentage grading scale and the fine discrimination required in determining the differences among categories allow for the greater influence of subjectivity, more error, and diminished reliability. The increased precision of percentage grades is truly far more imaginary than real.” (Guskey)
Many teachers give points for non-academic work, items that rightly belong on a description of work habits or citizenship, thus giving a false grade.
1. If students receive a grade or score for homework, this puts the lie to the primary justification of homework, which is practice. People are supposed to make mistakes when they practice, so it is unjust to penalize students for their mistakes in practice.
2. If students receive a grade for simply completing their homework, this reflects work or study habits, not academic performance, and it means that the overall grade is not reflective of the student’s proficiency or knowledge. The inequity works in both directions: students who do not do homework but demonstrate knowledge and proficiency on assessments are penalized for failure to do something they don’t need to do, whereas students who cannot demonstrate the requisite knowledge or skill receive a “grade boost” for doing something that has not otherwise benefitted them. Often this is done under pressure to reduce the D/F rate so that the school looks good.
3. The problems described above and the Common Core State Standards (as well as the ACTFL Standards and others) have given rise to “Standards-Based Assessment”. This is an attempt to grade in a way that circumvents the most egregious problems, but does it accomplish the task? Does it even address certain key criticisms of grading?
Students learn how to game the system and often put far more effort into finding ways to get around the system than in actually learning the content. “To put it another way, the importance of grades as a currency for moving through the educational system [has] partly superseded the pedagogical purpose they … serve. If learning sometimes [has] to occur, so be it; but otherwise, students [will] do the least amount of work possible in order to attain the token of highest value.” (Schneider & Hutt)
Grades do not necessarily promote learning. According to one study, “an emphasis on grades encourages cheating, restricts study to material likely to be on the test, and encourages students to conform on tests and in the classroom to the instructor’s views and opinions”. (Schneider & Hutt) Far from promoting learning and “higher-level thinking skills”, grading – especially in the age of standardized tests – often reduces classroom engagement to the lowest level necessary to “get by” and obtain “the token of highest value”.
Grades are often arrived at arbitrarily or unfairly. Students are highly sensitive to unfairness, especially when it touches them directly, and this leads to resentment, loss of interest, confusion, and conflict.
As external validation, grades often become ends in themselves. Many have addressed the problem of external motivation. See, for example, Punished by Rewards by Alfie Kohn.

Quotes about grading

“Percentage grading systems that attempt to identify 100 distinct levels of performance distort the precision, objectivity, and reliability of grades. They also create unsolvable methodological and logistical problems for teachers. Limiting the number of grade categories to four or five through an integer grading system allows educators to offer more honest, sensible, and reliable evaluations of students’ performance. Combining the grade with supplemental narrative descriptions or standards checklists describing the learning criteria used to determine the grade further enhances its communicative value.” (Guskey)
“As reformers worked to develop a national school system in the late nineteenth century, they saw grades as useful tools in an organizational rather than pedagogical enterprise …” (Schneider & Hutt) In other words, the grading system was developed for the sake of the educational organization, not because it has any pedagogical value.
After World War II, “Businesses were interested in the grades of graduates they were hiring. And as a result of this, schools not only had to offer grades as a means of extrinsic motivation for students compelled to attend, but also had to ensure their grades provided a readily interpretable message to future teachers, schools, and employers about the quality of the student. The implication was that grades were an accurate measure of both aptitude and achievement.” (Schneider & Hutt) In other words, education served the interests of business rather than the common good.
“In order for grades to be useful as tools for systemic communication – allowing for national movement, seamless coordination, and seemingly standard communication to parents and outsiders – they had to be simple and easy to digest. Yet that set of characteristics often conflicts with learning because the outcomes of learning are inherently complicated and messy.” (Schneider & Hutt)
Teachers “must find a way to work within a system that is universally accepted … And, at the same time, they must find a way to keep students focused on learning and not merely on a set of measurable outcomes loosely connected to the process of education.” (Schneider & Hutt) This last statement reflects the education system as it exists and imposes its strictures and conditions on the individual classroom teacher, not as education ought to be. It places additional burdens on the classroom teacher rather than supporting the teacher as pedagogue or instructor.

Resources

Durm, Mark W. “An A Is Not an A Is Not an A: A History of Grading” in The Educational Forum, vol. 57, Spring 1993. http://www.indiana.edu/~educy520/sec6342/week_07/durm93.pdf

Guskey, Thomas R. “The Case Against Percentage Grades” in Educational Leadership. September 2013, vol. 71 num. 1, pp. 68-72.
www.be.wednet.edu/cms/lib2/W A01001601/Centricity/Domain/18/SBG/EL13%20Perce ntage%20Grades.pdf

Kohn, Alfie. Punished by Rewards. Boston: Houghton Mifflin, 1993/1999. Palmer, Brian. “E Is for Fail”. Slate. August 9, 2010.
http://www.slate.com/articles/news_and_politics/explainer/2010/08/e_is_for_fail.html Schneider, Jack and Ethan Hutt , Journal of Curriculum Studies (2013): “Making the grade: a history of the A–F marking scheme”, Journal of Curriculum Studies, DOI:10.1080/00220272.2013.790480 terpconnect.umd.edu/~ehutt/Making_the_Grade.pdf

Creative Commons License

Continue reading “A Short History of Grades”

Category: Grades and Grading

CLT Principle 5: Tasks – Part 4

A Short History of Grades