Gurus: "Prompting is easy!" Reality: "No, it ain't."
Plus, the schedule for our April 20 webinar on training students to use AI.
[image created with Dall-E 3 via ChatGPT Plus]
Welcome to AutomatED: the newsletter on how to teach better with tech.
Each week, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.
In this week’s piece, I push back on the narrative that prompting LLMs is easy, intuitive, and obvious by providing evidence from a range of experts that it isn’t, at least for use cases of any complexity.
Table of Contents
👀 On Saturday, April 20th:
Our “Train Your Students to Use AI” Webinar
In our recent polls, our subscribers — maybe even you! — made their preferences clear for our next webinar.
We listened!
Our next Zoom webinar will focus on training students to use AI, and it will occur on Saturday, April 20th from 12pm to 1pm Eastern Daylight Time. I will be the host, as was the case for our last webinar on custom GPTs.
The price is $69.
✨Premium subscribers get a 10% discount code, included below.
Webinar Schedule
Pre-Webinar Activity: Learning Objectives Reflection
Objective: Position participants to assess and align their courses’ learning objectives with the potential integration of AI technologies.
Activity Description: Prior to the webinar, attendees will be tasked with reviewing their course syllabi, with a focus on their learning objectives. They will consider the relevance of AI in achieving these objectives, taking into account their students' future careers and the role AI might play in those fields.
Resources: A checklist adapted from our comprehensive Guide to Designing Assignments and Assessments in the Age of AI to help identify which learning objectives might benefit from AI integration.
Webinar Part 1: Learning Objectives Alignment
12:00-12:15
Objective: Equip educators with strategies to integrate AI into their teaching practices to enhance learning objectives.
Content:
Frameworks for integrating AI in ways that support deep learning and understanding.
Case studies exemplifying alignment between learning objectives and AI use.
Strategies to avoid AI misuse — grounded in the forthcoming edition of our comprehensive Guide — that discourage and prevent students from circumventing learning objective fulfillment.
Webinar Part 2: Evaluating AI Outputs
12:15-12:30
Objective: Develop educators' ability to teach students to critically evaluate AI outputs.
Content:
Heuristics for determining the degree to which students need AI-independent expertise, judgment, knowledge, and skills to evaluate the outputs of AI.
Discussion of core components of AI literacy: operation, biases, hallucinations, sourcing, externalities.
Webinar Part 3: Determining AI Tools
12:30-12:45
Objective: Assist educators in selecting appropriate AI tools for their learning objectives.
Content:
Overview of various AI tools and their educational applications, highlighting considerations for selection.
Discussion of practical and ethical obstacles to AI tool use, as well as solutions sensitive to institutional setting and professorial/student preferences.
Webinar Part 4: Technical Training
12:45-1:00
Objective: Prepare educators to effectively train students in the hands-on use of AI tools.
Content:
Demonstration of effective training techniques for AI tools, with a focus on the technical features common to many tools that students struggle to navigate.
Personal experiences from training students to use AI for philosophy.
Post-Webinar Recording & Resources
💡 Idea of the Week:
Promptin’ Ain’t Easy
If you use social media, and if you are interested in AI and education, you can’t miss various higher education AI gurus — influencers? — constantly offering a range of hot takes on AI.
Many of these hot takes are versions of this schema:
“Everyone wants you to think that [AI-related issue] requires knowledge or expertise. But it’s simple! With a bit of common sense and your current skillset, you can intuit it with little to no effort. Here’s how!”
One recently popular hot take is that the prompting of LLMs is intuitive.
“You don’t need any fancy methods, any secret phrases, or any coding skills to prompt effectively.”
Another is that prompting requires exactly those skills inculcated by humanities degrees.
“The timeless but contextually sensitive skills that we teach — including cross-cultural understanding and domain-insensitive analytical abilities — are precisely what our students need to communicate effectively with a chatbot.”
While I am a professor in the humanities (philosophy), I am sorry to report: these are misleading. They are false for the most important use cases of LLMs. But there is a kernel of truth in both. Let me explain why.
Let’s start with the positive or charitable. On the one hand, basic and uninformed prompts can get amazing results in some use cases. Part of the reason LLMs have taken the world by storm is that almost anyone can use minimal inputs get useful outputs from them for a range of tasks, whether brainstorming, basic email drafting, or text summarization.
Regarding the second hot take, users of AI tools that have conversational interfaces do benefit from having some of the “soft” skills inculcated by the study of the humanities. (And obviously I’m a big believer in the humanities in general.)
On the other hand, many use cases — and, in particular, those that are significantly complex or involve getting AI tools to produce outputs sensitive to domain-specific nuances — are not as straightforward.
But don’t take my word for it! One thing is for sure: I’m not trying out-guru the gurus.
Neither the empirical research nor folks building custom AI tools relying on LLMs agree that prompting LLMs is intuitive (or correlates in any meaningful way with having a humanities background).
What Do the Prompting Researchers Say?
First, a bit of background.
“Zero-shot” prompting is when you prompt an LLM to complete a task without providing any examples or illustrations (“exemplars”) of what success amounts to.
My prompt: I want you to classify the quoted text into neutral, positive, or negative: “Earth is a planet.”
LLM: …
The LLM will respond depending on how effectively your instructions — which contain no exemplars — describe what you are looking for. In this case, I say I want a quoted sentence classified into one of three categories, described with one word each, but I don’t illustrate how I understand what the categories apply to. That is, I don’t define the words with exemplars.
Zero-shot prompting contrasts with “few-shot” prompting, which is when you provide, in your prompt, a few examples of success.
My prompt: “Earth is a planet.” Classification: Neutral. “I love pandas.” Classification: Positive. “I hate bamboo.” Classification: Negative. “Mountains are taller than hills.” Classification:
LLM: Neutral.
Zero-shot prompting is the relevant type for a lot of tasks, like lesson planning for an entirely new course context, because you often won’t have the right sort of output available to you in advance — that’s why you are asking the LLM for help.
Researchers have found that
[…] prompt variations, sometimes counterintuitive to humans, can result in significant changes in model output and are addressed through prompt engineering, which involves designing natural language queries to guide LLMs responses effectively.
So, what are some of the prompt variations that are effective?
“Chain of thought” (CoT) prompting, which is when you encourage the LLM to reason in a sequence of steps, “improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking”, as the authors of a seminal paper on the technique report.
And what does CoT prompting amount to? Here’s a one-shot example from the above-quoted (and above-linked) paper by Wei et al.:
Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, 2022
The CoT method that improves the output of the LLM is manifested in a prompt that contains an exemplar solution that expresses how the authors would expect the LLM to reason through the Roger problem.
But one striking thing about CoT is that researchers have also found that zero-shot prompting is radically improved by simply inserting the phrase “Let’s think step by step” (and nothing else!) before each answer.
The effectiveness of this method isn’t something you would have intuited!
Here’s another one. Recently, researchers have found that zero-shot prompting using role playing is more effective at getting LLMs to reason successfully than many other methods, including CoT. They give this example:
There is also evidence that LLMs that are encouraged to play a range of context-relevant roles are better at summarizing text, at least when evaluated by human judges.
While we at AutomatED have advocated for role-based prompting for nearly a year, it is because I stumbled upon the method in my own experimentation. It was only through trial and error with the LLMs that I realized that my encouragements to the LLMs to play a role made a difference, even if my prompts contained no real insights (just like the Kong et al. case involving Xavier and Cole above). And I had to read the literature to find out about CoT nuances.
What Do the Developers Say?
Speaking of my own experimentation, in developing our Course Design Wizard custom GPT, I read a lot of information about other developers’ efforts.
Note: if you are a new subscriber, custom GPTs are “custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills” that you can create. Their instructions are like a meta-prompt, appended to the start of any conversation a user has with them. Their knowledge is the same, but in the form of files. OpenAI released the functionality in November. We have advocated for professors to consider using them to evaluate major assessments or for in-class activities. They were the topic for our last webinar.
Unfortunately, I have to hunt down much of the relevant information on creating custom GPTs on Twitter (X) and various online forums. OpenAI’s documentation is largely useless.
I learned very quickly that developers have found a range of counterintuitive methods effective for prompting custom GPTs.
For instance, here’s Nick Dobos, the developer of the most popular coding custom GPT (Grimoire), arguing against using custom GPTs’ knowledge files and instead using files accessible only by Code Interpreter (which lets the GPT run Python code in a controlled environment):
Is this intuitive? Unless you are quite familiar with custom GPTs and other aspects of LLMs (i.e., methods like retrieval-augmented generation or “RAG” that they use to insert text from files you upload into their prompts), you aren’t going to arrive at Dobos’ position. In fact, you won’t even understand what he’s talking about!
As for me, my experiences have differed from Dobos’, partly because of my different use cases and background. And I don’t agree with him entirely.
More generally, I am consistently surprised by which instructions work better than others, especially when I try to get our Course Design Wizard custom GPT to improve at completing complex tasks, to interact effectively with a wide range of users, or to stay within well-defined guardrails. (Three things that I am constantly working on — updates coming soon!)
This is the takeaway of this week: promptin’ ain’t easy!
📢 Quick Hits:
News Tidbits for Higher Educators
What do Harvard, Washington University, the University of Michigan, and the University of California, Irvine have in common? They have built their own institutional large language models (LLMs).
Why it matters: As I have discussed, the privacy and data management issues unique to educational institutions require thoughtful solutions, from anonymization to sandboxing. Matters are made worse when AI tool developers are not sufficiently transparent on their goals, LLM training techniques, etc. So, it is a good idea for institutions to pursue their own LLMs. Given the relative ease with which an institution can get a sandboxed LLM up and running — Albert Lai of Wash U reports to Inside Higher Ed that creating “WashU GPT” was “pretty straightforward” — I expect this option to become more common in the coming year. The only wrinkle is that Microsoft and Google are aiming to provide more powerful LLMs that are better integrated with their office suites (Workspace and 365), and these integrated tools — properly deployed — will make obsolete the institutional LLMs in most cases. Let the race begin!
Hume AI announced that EVI, a new LLM that “understands and emulates tones of voice, word emphasis and more to optimize human-AI interaction” will be “generally available in April 2024.” You can try it out in trial form here. It is trained on “millions of human interactions” and unites “language modeling and text-to-speech with better EQ, prosody, end-of-turn detection, interruptibility, and alignment.”
Why it matters: Before EVI, there were a range of competitors in the “empathetic” LLM space, including Pi foremost among them. As we reported last week, Inflection — Pi’s developer — has been “effectively gutted by Microsoft,” so there is a clear expectation that this sort of LLM will be incredibly valuable. EVI is different and potentially more powerful in that it is focused on oral interaction and takes in more inputs, like your tone or word choice, to tailor its responses to you. For educators, there are many intriguing possibilities, including more empathetic and effective communication from AI chatbots deployed for tutoring purposes. For everyone, worries about propaganda and other forms of manipulation will loom larger than ever before.
Zoom released Zoom Workplace, a competitor to Google Workspace and Microsoft 365.
Why it matters: I haven’t made it a secret that I find it more useful to get the outputs of Zoom’s AI Companion — which produces Meeting Summaries and other analyses of your meetings automatically — into the office suites that I use. Hence our Premium Tutorial on integrating Zoom AI Companion with Google Docs (so that your Meeting Summaries can get dumped in Docs specific to the participants for further collaboration and Gemini use). Clearly, Zoom is aware of the concern; namely, office suites have a lot of inertia and Zoom becomes less appealing as a product if it is hard to integrate with useful office suite apps. I doubt they will have much success displacing Google or Microsoft, but we will see.
Graham | Expand your pedagogy and teaching toolkit further with ✨Premium, or reach out for a consultation if you have unique needs. Let's transform learning together. Feel free to connect on LinkedIN, too! |
What'd you think of today's newsletter? |