Oral Exams and First-Pass Grading with ChatGPT

I double down on the value of orals and show how to use ChatGPT to reduce the cognitive load of grading.

Graham Clay
November 27, 2023 • Estimated Reading Time: 18 minutes

[image created with Dall-E 3 via ChatGPT Plus]

Welcome to AutomatED: the newsletter on how to teach better with tech.

Each week, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.

In this week’s piece, I express my love to oral exams (half kidding), and I explain how you can fast-track your grading by using ChatGPT as a first pass evaluator. But first I note some changes here at AutomatED.

🔜 Updates from the Team
💡 Idea of the Week: Oral Exams are Underrated
🧰 An AI Use-Case: Using ChatGPT as First-Pass Evaluator
🔗 Links
⬆️ How to Access Premium

🔜 Updates from the Team

We at AutomatED have three updates for you all as we approach the end of the fall semester:

We have gotten rid of the learning community. Simply put, there wasn’t enough interest, and we lacked the time to sustain the various conversations/resources. We would like to thank those who joined and tried to help us get it going. Onward and upward.
Starting today, we are returning to weekly newsletters/pieces. We have not been writing as regularly as we would like to. A significant reason for this is that our pieces had slowly evolved into somewhat comprehensive “takes” on big issues, like our guide to discouraging and preventing AI misuse or my recent discussion of how to deal with privacy issues in connection with AI tools. These big pieces are great and we plan to continue writing them going forward (see below), but they suffer from two issues. First, they are costly for us to produce. Second, they are often too verbose for many of our readers to digest given their busy schedules (thanks for this feedback — you know who you are). So, we are going to start releasing short and punchy weekly pieces. They will always include an “idea of the week” and an “AI use-case for your toolbox.” Other components will make occasional appearances, like discussions of noteworthy forthcoming tech developments in the higher ed space.
Starting in two weeks, we will start releasing Premium pieces. We want to continue writing the aforementioned comprehensive “takes” on big issues in tech and higher ed. In fact, we are developing “guides” like our guide on AI misuse for all major aspects of professorial use of AI. We want to release them more regularly. And we want to share some of the insights, tools, integrations, and GPTs that we have been developing for clients in our consultations. However, to justify the time investment that they require, we are adding a Premium subscription tier to fund their development. It starts at $5 per month ($50 per year), with referrals offsetting the cost for referrers (see below). You can upgrade any time from now forward if you’d like. The only old piece that is classified as Premium is our guide on AI misuse — to access it, you need Premium.

With that said, let’s dive into this week’s idea and AI use-case…

💡 Idea of the Week:
Oral Exams are Underrated

After dealing with a surprisingly high number of AI plagiarizers last year, I centered oral exams in my arsenal of assessments. I am currently in the thick of administering 80 of these 30-minute exams, so you would think I might be having second thoughts… But I am not. I am more convinced than ever that more professors should leverage oral exams. (See here and here for some of our earlier discussions of this topic.)

The core insight guiding me down this road is simple: if suspected AI plagiarism leads a professor to meet one-on-one with a student to assess their mastery of course content — and the degree to which they simply offloaded the thinking behind their submission to an LLM — why not simply assess them this way from the beginning while incentivizing constructive AI use?

Perhaps this would make less sense in a context where AI misuse is expensive because AI use itself is expensive. But that is no longer the case — ChatGPT, Claude, Bard, etc. are easy to access and able to complete a wide range of assignments, per our AI immunity challenge.

To make matters better, there is strong evidence that students benefit a great deal from personalized feedback that is given in a tutorial setting where they have social and other incentives to absorb it, synthesize it, and deploy it anew. Professors can mix this feedback with their oral exams. And you can have a written part of the assessment sequence leading to the oral exam, too, if you want students to develop their writing abilities.

This semester, I am teaching two different Philosophy courses, but the structure of the oral exams is similar. First, students write and submit a paper, using AI if they want. Next, they sign up for an oral exam slot via a Calendly link (note: be sure to set Calendly to remind students multiple times about their exams or you will deal with rescheduling headaches when they invariably forget). When the oral exam period starts, I give them qualitative feedback on their papers that is indicative of the questions I will ask them in the exam itself. They listen very carefully, knowing what is coming next. I then ask them customized questions about the main components of their papers, both to check the degree to which they understand the reasoning behind what they wrote and to push them to take their reasoning several steps further in light of noted consequences and objections. Finally, I give them feedback on their performance and grades on both their paper and their oral exam.

Here is how I would summarize what I am finding this semester:

My students are both very nervous and very appreciative. They report that the oral exam is intimidating but they think it helps them learn much more than had they only written their paper and received written feedback on it. The apparent learning differential is large, from their perspective and mine.
My students do very well in the oral exam. Those few who do poorly are those that I suspected of relying too heavily on AI in advance — and they realize that they were overreliant on it as the oral exam proceeds. Lesson learned.
My students and I develop a much better relationship from doing the oral exam together. We both get a better sense of each other’s personalities, and they see that I care about them, their specific work, and their progress as a learner.
Per student, the time I spend preparing for the oral exam (reading their paper and preparing my questions for them) combined with the oral exam itself is approximately the same time I would have spent writing comparably extensive comments on their paper and a second written assignment comparable to the oral exam.
Per minute, I enjoy myself so much more. Writing extensive written feedback and sending it into the abyss — even if its absorption is incentivized by being linked to subsequent assignments — is painful in a special way that grinds on me, especially when I have to do a lot of it.

So, I am doubling down: oral exams are underrated, especially if you are worried about AI misuse for take-home written assignments. Pair them!

🧰 An AI Use-Case for Your Toolbox:
Using ChatGPT4 as a First-Pass Evaluator

Today, my AI use-case involves grading because I know most of our readers are either falling behind on grading right now or…fell behind long ago, like me.

What follows is a four-step strategy to ease the cognitive load of grading by using ChatGPT4 to give first-pass evaluations of student submissions. The goal is to help you see the virtues and vices of your students’ submissions more quickly by creating a process by which ChatGPT4 delivers its own judgments about the submissions in light of your exemplar and your rubric.

NOTE: this use-case refers to ChatGPT4, which requires a $20/month subscription. If you don’t have one already, take it from me: it’s worth it. But if you don’t want to splurge yet, use Poe in the meantime.

Step One: Create an Exemplar

It is extremely helpful to show ChatGPT4 an exemplar — namely, an example of what you are asking it to produce. This goes for other LLM prompting contexts, too. Sometimes, LLMs can produce the outputs that you want without any exemplars, but there are very few situations where exemplars don’t lead them to produce better results with less prompting.

So, you should start by creating an exemplar of the sort of grading output that you seek to produce for a given student submission — either (i) an exemplar of what you would store in your own records justifying the grade you assign or (ii) an exemplar of what you would convey to the student.

The assignment I will use for demonstration purposes is the oral exam I discussed in the prior section. For this oral exam, students need to be able to discuss and defend various aspects of a paper that they turn in beforehand. This paper finds them giving a premise-conclusion form argument (a sequence of claims that jointly justify their thesis) for a substantive philosophical thesis that relates to their major, expected major, expected career, or life. I tell students that they are expected to:

(1) be able to explain the context of your argument and its relevance to your major, expected major, expected career, or life;
(2) be able to explain the meaning and nature of your premises and conclusion;
(3) be able to justify your premises over alternatives (either by elaborating on those justifications found in the paper or by adding justifications not found in the paper);
(4) be able to justify your choice of objection — that is, the objection you chose to respond to in your paper — over the alternatives;
(5) be able to justify your response to your chosen objection in the face of a further rejoinder from a philosopher like your objector; and
(6) be able to diagnose a way in which your paper could be expanded upon or improved.

The abilities I expect students to display in the aforementioned oral exam

Thus, in creating my exemplar, I took a specific student’s paper and created six questions sensitive to the content of their paper, each of which corresponded to these six abilities. I then saved these questions in a document named “Exemplar - Oral Exam Initial Questions.”

Step Two: Anonymize or Pseudonymize

The next step is to put your students’ work in a format that complies with ethical and legal rules and guidelines governing student privacy before you convey it to ChatGPT4. What you should do in your educational context is sensitive to its unique features, your own views, etc., but I discuss the broad sort of considerations in my earlier piece about privacy.

In my own case, I anonymized student submissions so that they contained nothing personally identifiable. I then labeled them in a safe and secure way that enabled me to determine authorship later. So, for instance, I removed my student’s names from the body of their papers, I labeled the papers as AA, AB, AC, etc., and I stored a key in a secure location on my harddrive that coded ‘AA’ to the first student’s name, ‘AB’ to the second student’s name, etc.

Step Three: Prompt and Iterate to Perfection

The third step is to prompt ChatGPT4 to generate what you want, using Advanced Data Analysis to upload your documents.

In my case, I uploaded my file “Exemplar - Oral Exam Initial Questions” alongside an anonymized document “Exemplar - Student Paper” along with the assignment rubric (in the form of the .pdf I had given students in advance) and the papers I wanted help with labeled “Student Paper - AA,” “Student Paper - AB.,” etc. I paired these files with the following initial prompt:

I am an instructor at a university and I would like your help preparing to give my students oral exams about the papers that they just turned in. Three student's papers are uploaded here and are titled "Student Paper - X", where 'X' stands for their pseudonyms. Their pseudonyms are 'AA', 'AB', 'AC', etc. I need to prepare to give an oral exam to each of AA, AB, AC, etc. based on the content of their papers. Please review the contents of their papers. Next, consider the files uploaded that are titled "Exemplar - Student Paper" and "Exemplar - Oral Exam Initial Questions" which are a student's paper and the questions I asked them in their oral exam. I want you to help me administer distinct oral exams to AA, AB, AC, etc. that are modeled on the questions I asked in the "Exemplar - Oral Exam Initial Questions" document of the student who wrote the "Exemplar - Student Paper" document. I also uploaded the rubric, titled "Instructions - Oral Exam Rubric," so that you can understand the structure, format, and expectations of the oral exam. Before I ask you for six initial questions for each of AA, AB, AC, etc., I want to be sure we are on the same page. Just to be sure we are on the same page, can you tell me the six kinds of initial questions that students should come prepared to answer (they are listed in the Rubric document)?

My initial prompt.

It did a good job at this point, repeating back to me the nutshell of the block quote from Step One above but with adjusted grammar for the context.

If it hadn’t, I would have needed to correct it at this point. Always correct steps in the process you want completed before proceeding to the next steps.

Next, I prompted ChatGPT4 as follows:

Perfect. Now, in alignment with my prior prompt, please supply me with six initial questions for AA that (i) fit these six descriptions, (ii) are modeled on my exemplar questions, and (iii) are specific to AA's paper.

My second prompt.

ChatGPT4’s answer was not sufficiently specific or detailed, so I corrected it in various ways in my next prompt. I urged it to:

be more specific, offering questions that turned on details unique to AA’s paper;
challenge AA to defend themselves by attempting to show that AA was committed to various positions (rather than leaving it open to AA to answer in any way they pleased); and
critically analyze AA’s paper by developing questions and counterexamples that delve deeper into the consistency of AA's logic and the independent plausibility of their claims.

After a few iterations, ChatGPT4 was creating excellent questions, but it had lost the thread on the need for six initial questions matching my rubric. So, I gave it this correcting prompt:

These are excellent. However, they don't fit the six initial question kinds outlined in my Instructions document. Create six initial questions for AA that fit the kinds outlined in the Instructions document. If some of these questions are alternative ways to question AA about how they justify their premises, for instance, then list them as alternatives under that kind (i.e., have two questions under #3). The most important thing is to have at least one excellent question for each of the six initial question kinds.

My fifth-ish prompt.

The resulting questions were excellent.

Step Four: Repeat for Each Student

After I had ChatGPT4 firing on all trillion+ parameters, I got it to produce sample questions for AB, AC, and AD, in sequence. Then I uploaded batches of 4 student papers, along with the following prompt:

Now I want to upload four new students’ papers, and I want you to produce six initial questions for each one just like you have above. The students' names are AE, AF, AG, and AH. I will upload them here. If you could produce all 24 questions in a sequence of outputs, that would be great.

My first repetition prompt.

And I was off to the races. Rinse, repeat.

Crucially, ChatGPT4 was not perfect in its responses. However, what I find is that it significantly reduces the cognitive load for me as I develop my questions for my students’ oral exams. Generally, it is on target, notes important considerations and avenues of questioning, and eases the “gear shifting” that I must undertake to move from a paper to another one that concerns a radically different topic (e.g., moving from a paper about voting systems in North Carolina to a paper about the permissibility of “Enhanced Interrogation Techniques” to a paper about the benefits of being grateful).

When I am giving 6-9 oral exams a day, with 15 minute breaks between each, this is incredibly valuable.

In next week’s piece, I will discuss evolving this strategy into the development of custom GPTs…

🔗 Links

Exploring the potential utility of AI large language models for medical ethics: an expert panel evaluation of GPT-4

The objective of this study was to evaluate the performance of GPT-4 in responding to complex medical ethical vignettes and to gauge its utility and limitations for aiding medical ethicists. Based on panellist feedback, GPT-4 was able to identify and articulate key ethical issues but struggled to appreciate the nuanced aspects of ethical dilemmas and misapplied certain moral principles.

jme.bmj.com/content/early/2023/11/09/jme-2023-109549

How AI Can Help with Grading, Feedback, and Assessment: A Chat with Graham Clay - OneHE

In this video, Niya Bond talks to Graham Clay about using AI to assist with grading, feedback, and assessment.

onehe.org/resources/how-ai-can-help-with-grading-feedback-and-assessment-a-chat-with-graham-clay

Time to Act: Building the Technical and Institutional Foundations for AI Assurance

AI assurance requires agreement among governments that systems are behaving appropriately. Existing international standards institutions can help.

www.lawfaremedia.org/article/time-to-act-building-the-technical-and-institutional-foundations-for-ai-assurance

The AI Ethicist: Fact or Fiction?

This study investigates the efficacy of an AI-based ethical advisor using the GPT-4 model. Drawing from a pool of ethical dilemmas published in the New York Times.

papers.ssrn.com/sol3/papers.cfm?abstract_id=4609825

⬆️ How to Access Premium

Late in the fall of 2023, we started posting Premium pieces every two weeks, consisting of comprehensive guides, releases of exclusive AI tools like AutomatED-built GPTs, Q&As with the AutomatED team, in-depth explanations of AI use-cases, and other deep dives.

So far, we have three Premium pieces:

To get access to Premium, you can either upgrade for $5/month (or $50/year) or get one free month for every two (non-Premium) subscribers that you refer to AutomatED.

To get credit for referring subscribers to AutomatED, you need to click on the button below or copy/paste the included link in an email to them.

(They need to subscribe after clicking your link, or otherwise their subscription won’t count for you. If you cannot see the referral section immediately below, you need to subscribe first and/or log in.)