One Size Fits None

Russ Hunt

One size fits none: rethinking student course evaluation surveys

Hosted Dialogue Session
St. Thomas University Learning and Teaching Development Office, 4 April 2014

[NOTE: If you'd like more information about, or want to comment on, any of this, please email me. This is part of what I'm hoping will be an article about alternatives in "Course Evaluations"]

Abstract:
Most people have problems with the current form of course evaluations at St. Thomas, and at most universities. Many of the questions seem irrelevant to how you actually teach, the numbers seem a crude measure at best, student written comments are short and cursory if they exist at all, and the way they are usually used -- as fundamental components of promotion and tenure decisions -- gives them disproportionate weight.
Russ Hunt has been thinking different about them for some years, and he'll introduce some unconventional ways of imagining, creating, and conducting these surveys -- and invite discussion.
Among other things: why not do them online? Why not ask the questions you really need answers to? Why not do them in another way than taking up time during the last meetings of class? Why not use them as the basis for discussion, and as a way of engaging students in the process? Why not persuade the university to entertain more meaningful alternatives?

Some issues

Conventional surveys (in-class "bubble sheets") weren't helpful to me:

Questions and ratings mostly irrelevant or obvious (many didn't apply to what I do; many could only tell me whether students liked it, not whether, or how, they'd helped the student to learn)
Discursive comments extremely short and unhelpful (writing in pencil in a too-short class session, with no clear idea of the audience or purpose for the writing, led to perfunctory comments)
Too late to do any good for the students filling it out (or for me, usually)

Solutions? Some strategies: change the questions, change the timing, move the process online. Here are some links to examples of what I've tried. Those marked with asterisks are those I tried to discuss during the presentation portion of the dialogue session:

*2006-7 Midterm evaluation form for 3336 (including the online version of the "bubble sheet"): http://people.stu.ca/~hunt/33360607/3336eval.htm

This shows how I tried to re-create the 22-item multiple choice scantron sheet in an online format, using a php script. It wasn't very successful, partly because it didn't easily translate to the university's way of compiling results.

2011-12 Final evaluation form for 3336; discursive questions only: http://people.stu.ca/~hunt/33361112/3336eval.htm
2013-14 October (midterm) evaluation form for 2783; discursive questions only: http://people.stu.ca/~hunt/27831314/2783eval.ht m
*December prompt setting up midyear feedback process for English 3236 in 2010-11: http://people.stu.ca/~hunt/32361011/3236pt32.htm

What it says is, “I've created a form for providing feedback to me on the way the course is going for you, and how we might adjust things to make it more useful. It will be linked from the course Moodle page later this afternoon. It's anonymous and voluntary: there are seven questions on it that I hope will help you help me think about the structure of the course. I'll leave it up for a week; the way it works is that responses on it come anonymously to me as emails. I'll post all the text I receive on the Web site (along with my responses and comments, where necessary). Please take the time to respond to it; it'll help all of us.”

*Responses to the midyear survey for 3236 in 2010-11, as posted on the course Web site, with my responses: http://people.stu.ca/~hunt/32361011/3236eval .htm
Responses to the final survey for 3236 in 2010-11 as posted on the Web site for the course as offered in 2012-13: http://people.stu.ca/~hunt/32361011/feedback.htm

This shows how the results can be presented, and incidentally also demonstrates one way I've used the results of the feedback survey, to help introduce students to a subsequent offering of the course

Responses to the end of the first term survey in English 3236, 2012-13, as posted on the course Web site, with my responses: http://people.stu.ca/~hunt/32361213/3236cmts.htm

*The whole process for first term of English 1006, 2013-14

*The prompt in which I set up the mid-first term feedback survey: http://people.stu.ca/~hunt/10061314/1006pt33.htm

At the bottom of the prompt is the section "Where you can talk about how the course is working."

*Responses to the midterm survey, as posted, with comments, on the course Web site: http://people.stu.ca/~hunt/10061314/feedback.htm

Although it says "October," this didn't actually happen till early November. Only five students responded, but as you can see they were useful -- especially in that others in the class can see them as well (and my responses)

*Midyear feedback form for first term of English 1006G, 2013-14. https://docs.google.com/forms/d/1qVq3bQpAH0-8hJQuQrNaBmDnUkXZkpAXEMsctammxWw/viewform

I had learned how to use Google Forms rather than a php script here. I think the format is clearer and the results were more easily formatted into a Web page. Equally important, I had learned that anonymity could be achieved by putting a link in Moodle: I could check to see that everyone had clicked the link, but could not tell whether they'd filled out the form, or who had said what.

*Midyear responses, reformatted and prepared for Web site (but not commented): http://people.stu.ca/~hunt/10061314/1trmfdbk.htm

I never got round to completing the comments because I resigned my position at STU and another teacher took over the course. Had I continued I'd probably have devoted part of a class meeting to discussing the comments and my responses.

Some miscellaneous notes on course evaluations (notes toward an article on this issue)

Doing research on them it's useful to remember some of the range of things they're called, which need to be used as search terms: SETs (Student Evaluations of Teaching); SRIs (Student Ratings of Instruction); SRTs (Student Ratings of Teaching); Course Evaluations; Student Course Evaluations; Course Evaluation Surveys . . . etc. There are disputes about terminology -- for example, "evaluations" suggests more authority than "ratings."

Nothing about teaching is more frequently discussed, in print or on line. A discussion of one topic -- "SETs under attack again" -- on the POD list created well over a hundred postings in a matter of days last summer. A discussion of a Wall Street Journal attack on the practice preoccupied the STLHE-L list last November, and was dismissed on the POD list by one of the major figures in the literature about the practice, Mike Theall.(see quote from him below.)

Questions that come up regularly:

Are they "reliable"?
Are they meaningful?
Are they affected by extraneous factors like gender, age of instructor, subject matter, sampling errors?
Are there any questions that are more reliable than others?
How can the multiple-choice questions be most fairly (or usefully) weighted?
What role should they play in personnel decisions like hiring, promotion, tenure?
Are there practical alternatives to multiple-choice quantitative forms?
Why not use tests of learning instead of ratings?
Who and what do students think they're for? How can we persuade students they matter?
Who are they really for?
What use are discursive comments (as opposed to black spots in circles)?

And as well:

What can we do to make them more efficient?
How can we prevent them from being misused?
Why when they're done on line do students not fill them out? (response rates decline)

Some not-quite-random quotations:

From the POD network:

Mike Theall, 9 November 2013: "I agree with the last sentence in the abstract . . . but with the change of the wording to "... SETs should not be the only data used to evaluate faculty.' Of course, 'SETs' is an incorrect and misleading acronym that itself adds to the resistance to ratings. Maybe that's why Asher resents having the opinions of "teenagers" registered, and why it's the same old, unsubstantiated complaint we have heard for decades. Many on this list (e.g., Arreola, Berk, Hativa, me) have recommended the use of "student ratings" since ratings are only one source of data used by the real evaluators of faculty performance (peers & administrators).
One other comment about Raoul's book & his eight-part "comprehensive" evaluation process. Among other things, that process requires dialogue and agreement on: 1) defining the roles and requirements for faculty work ... e.g. teaching, scholarship, service, administrative responsibilities; 2) definition of the specific and measurable components for each domain of work; 3) assigning weight to each component; identifying useful sources of data; and 4) finding appropriate measurement devices. I would add that providing faculty and administrators (even student raters) with help/training in providing/understanding/using the data can also improve the overall evaluation system. The mass of the eval/ratings literature far outweighs the few studies that attempt to show bias and/or lack of validity & reliability. If properly handled, reviewing the evidence for these constituencies can also make the evaluation process more fair and accurate. When Raoul & I began developing the "meta-profession" model, his 8-part process was expanded to include evidence-based characteristics of performance related to the 4 roles noted above. Having such a framework can be used for both formative and summative purposes (as described in the Theall-Arreola-Mullinix citations provided in my recent post) and it helps to avoid committee-generated lists of individuals "favorite" items/issues."

Ed Nuhfer, 9 June 2013, responding to “Why not use tests of learning instead of ratings?”: “Student ratings of professors also have high reliability, and attempts to correlate them with tests and grades of unknown reliability will not produce any better results. When one correlates a knowledge survey with a separate measure of learning performance, expect to obtain a correlation that is numerically at about the reliability of the LEAST reliable instrument less .2 to 0.3. Many grades and tests have reliabilities less than .5 -- which means one will often get a reliability of about zip by mindlessly correlating any kind of student ratings with tests and grades of unknown reliability.
So, if your interest is doing student ratings forms that address content, skills and knowledge, use a knowledge survey that focuses on content, skills and knowledge. If you are interested in rating professors, use the conventional student ratings form. But we do need to quit pretending that student ratings have merit as some kind of assessment measure of student learning. They do not.”

Nuhfer, 10 June 2013: “What is really important to remember is that SRIs began in the 1920s and were created by students for student use. They were co-opted by various administrations from 1930 (the 'Improvement of Teaching' movement) through the 1970s (when POD began). Frankly, I think it is absolutely pointless to argue that we should do away with SRIs. We have them because we want to control how they are constructed, and especially because we wanted them to be valid and reliable. A lot of work has gone into that. Throwing them out because a lot of people have forgotten how to use them just means the students will be in charge again. Hands up everyone who likes RatemyProf.com! Because that’s where we’ll be going if we throw out the baby with the bathwater. So maybe the conversation should go back to 'Ok, we are stuck with these. How can we use them well?' and also, 'Ok, how CAN we measure student learning?'”

Angela Linse, 10 June 2013: “We need to focus on helping faculty take ownership of their ratings rather than leaving it entirely to someone else to interpret the numbers, not faceless review committees or local administrators. We also need to be talking to administrators about taking some control of rampant misuse of the data, e.g. that tendency to want everyone to be "above average" over-interpreting differences of a couple of 10ths or 100ths of a point when looking at a faculty member's average ratings, making summative decisions based on rankings of or comparisons between faculty (who no doubt differ along many other dimensions). ARGH! Most of the problems with student ratings seem to me to lie with how the data are used.”

Some references ( I hope there will be more soon. Recommendations welcome. The Nira Hativa book listed below is available from the LTD library):

Stephen L. Benton and Stephen L. Benton. (2012). “Student Ratings of Teaching: A Summary of Research and Literature.” Idea Paper #50: The IDEA Center. Online: theideacenter.org

“There are probably more studies of student ratings than of all of the other data used to evaluate college teaching combined. Although one can find individual studies that support almost any conclusion, for many variables there are enough studies to discern trends. In general, student ratings tend to be statistically reliable, valid, and relatively free from bias or the need for control, perhaps more so than any other data used for faculty evaluation.”

Nira Hativa. (2013). Student Ratings of Instruction: A Practical Approach to Designing, Operating and Reporting. Oron Publications.

[ten pages of useful bibliography]

Pia Marks. (2012). Silent Partners: Student Course Evaluations and the Construction of Pedagogical Worlds. Canadian Journal for Studies in Discourse and Writing, 24(1).

“Results indicate that the genre projects an institutionally dominant ideology about teaching and learning in the Faculty of Arts which is at odds with emerging practices. Qualitative analysis suggests that the instrument acts [as] a silent partner for students, mediating pedagogical meaning for them, as well as for instructors, seeking to impose institutionally dominant pedagogies and to influence their pedagogical decisions.”

The April 2 Chronicle of Higher Education has a piece suggesting a way to do an interactive course feedback session. I'll try it, I think, next time I teach.

Peter Filene has a suggestion for dealing with a course that does not seem to be going well. If discussions falter, responses to the readings are cursory at best, and you do not seem to be getting through to students, Filene suggests involving the students in diagnosing the problems. Distribute index cards to the students and ask them to evaluate the course by responding to questions. These can be very general, as in 'What is going well?' and 'What do you think could be improved or changed?'. The questions, alternatively, could be more focused on specific issues: 'How does the difficulty of the readings compare to your other courses?', 'What holds you back from participating in discussions,' etc.

At the end of the exercise, you can collect the index cards, take them home and think about your students’ answers. Or you can shuffle the cards, return them to the students, and have each student read a card aloud, beginning a class-wide discussion of what’s wrong with the class and how the problems can be fixed.

David Gooblar. (2014). “It's Time for a Course Correction.” chroniclevitae.com: https://chroniclevitae.com/news/420-it-s-time-for-a-course-correction?cid=at&utm_source=at&utm_medium=en. Drawn from Peter Filene (2005). The Joy of Teaching: A Practical Guide for New College Instructors. Chapel Hill: U of North Carolina P, 2005, 71-73.

Back to Russ Hunt's Web main Web page