Claude 3 for Lesson Planning, and Positioning AI Well
Can Sonnet outperform ChatGPT?
[image created with Dall-E 3 via ChatGPT Plus]
This issue is brought to you by Packback
Welcome to AutomatED: the newsletter on how to teach better with tech.
Each week, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.
In this week’s piece, I urge you to assume that generative AI is capable of completing many tasks that competent humans can complete, if only it is positioned with the appropriate context. I also test the new Claude 3 Sonnet for lesson planning, comparing it to ChatGPT3.5.
Table of Contents
📢 Quick Hits:
News Tidbits for Higher Educators
This past week, Microsoft hosted an online event, Reimagine Education, that I attended. They had several big announcements:
(i) on April 1, Copilot for Microsoft 365 will be available to purchase as an add-on for students for all higher education institutions on A3 or A5 licenses (click here if you are confused by all the different Copilots — there are four main ones total);
(ii) Microsoft Copilot, the browser-based chatbot, with commercial data protection is now built into all Microsoft 365 Education offers, including the zero-cost license;
(iii) the Microsoft Education AI Toolkit, a 92-page pdf to help educational institutions “plan their AI journey,” is now available; and
(iv) there are several big updates to Learning Accelerators, which have reading and math coaches for K12 settings.
Why it matters: While the K12 updates are very promising and the expansion of commercial data protection is long overdue, I am most intrigued by the opportunities afforded by the imminent expansion of Copilot for Microsoft 365 to students. As I discuss in our Idea of the Week below, the core limitation on the functionality of generative AI is getting it into contact with the relevant context and information. When Copilot is integrated more and more into the Microsoft 365 suite, it will enable a range of powerful applications of generative AI that leverage all of the data and information in your OneDrive that you access via Word, PowerPoint, Excel, Outlook, etc. Even in its relatively limited current form, this is already huge for professors, as we have discussed in our Premium Tutorial for using Copilot for Microsoft 365 for answering students and creating slides, and the use cases for students will be similarly powerful, especially as capabilities improve.
Anthropic released the Claude 3 model family, which is a collection of multi-modal models. The family has three members, in order of ascending “intelligence”: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. The latter model, Opus, outperforms Google’s Gemini 1.0 Ultra and OpenAI’s ChatGPT4 on some metrics, including “undergraduate-level knowledge.”
Why it matters: Although there are not radical differences all-things-considered between Opus and ChatGPT4 or Gemini 1.0 Ultra, there are some education-relevant domains or dimensions in which Opus shines: graduate level reasoning (click here to read more about the benchmark), multilingual use cases (especially in Spanish, Japanese, and French), coding, and output production speed. Its vision capabilities are comparable to GPT-4V and Gemini 1.0 Ultra, which may make it useful for analyzing handwritten student submissions, and it allows for a massive prompt (context) window. Anecdotally, many have reported that all three Claude 3 models are the “most concise, human-feeling, and creative AI yet.” We discuss the freely-available Claude 3 Sonnet’s utility for lesson planning below.
👀 Sign Ups Now Open:
Our “Train Your Students to Use AI” Webinar
A month ago, we hosted our first AutomatED webinar. It was 2 hours long, covered custom GPTs in depth, and enabled our participants to begin to build their own.
Over the past few weeks, we have been gathering your preferences on topic options for our next webinar. Here are the results:
With these results in hand, we are pleased to announce our next webinar. It will focus on training students to use AI, and it will occur on Zoom on Saturday, April 20th from 12pm to 1pm Eastern Daylight Time.
To sign up, click here:
The price is $69, with a 10% discount for Premium subscribers, included as a discount code immediately below (visible only to those Premium subscribers receiving this via email or logged in on our website).
If you end up being unable to participate at the scheduled time, you will still receive the recording via email.
Given our poll results, we also plan to run webinars in June and August that will focus on:
(i) using AI for pedagogical purposes while avoiding student AI misuse, and
(ii) using AI to save time and increase productivity (including how to integrate project management software).
PS - We are going to be raising prices for Premium in ~2.5 weeks, so now is the time to lock in our current yearly rate of $50/year!
💡 Idea of the Week:
Position AI Well
Sometime soon, we will reach the point in the history of generative AI that it is powerful enough. It will be powerful enough to do many of the tasks that are central to our lives, at least at the level of a competent human.
Perhaps we have already reached this point in the past year — perhaps when ChatGPT4 was released last spring, or perhaps with Gemini 1.5 or Claude 3 Opus.
I am tempted by this view.
This isn’t to claim that generative AI won’t continue to improve or that we shouldn’t continue to improve its capabilities. All I am claiming is that the core functionality is close at hand or here already.
Don’t believe me? Just look at the cases where generative AI struggles at a task.
In most of these cases, the reason for its struggles is that it has not been given enough information about what the task demands to succeed at it. Prompting isn’t magic or a dark art, but one thing is for sure: ChatGPT, Gemini, Claude, and all the others cannot read your mind, and they rely on you setting them up for success with your prompts to perform well. Once you provide them with enough of the information a competent human would have access to in the relevant context, they generally do a darn good job.
In the other cases — the cases where generative AI cannot perform well despite having the relevant information or context — the reason is typically that the task is very challenging in its own right, and competent humans tend to fail at it, too.
The takeaway here is that each of us needs to focus on getting generative AI into the right contexts, with the right information and directions, for it to be successfully integrated in our lives. We don’t need more horsepower — we need that horsepower directed and channeled in the right way.
For example, generative AI enables our students to complete many of our traditional assignments and assessments much easier than before, so we need to reconfigure our approach to these assignments and assessments to incentivize students to complete them in ways that achieve the learning objectives (even if these learning objectives themselves need to shift given the power of generative AI). This is the lesson of our popular AI immunity challenge (edition 1 here; edition 2 here; edition 3 here; initial announcement here), which we plan to continue this summer. I cover how exactly you can reconfigure your approach in a Premium Guide on designing assignments and assessments in the age of AI from December.
Or, to give another example, Zoom AI Companion uses generative AI to create Meeting Summaries from recordings’ transcripts. This makes the raw transcripts — the production of which relies mainly on non-AI software — much more useful by summarizing them, noting actionable next steps that are mentioned in them, and so on.
But these virtues cannot be realized without getting generative AI into contact with what happened in the recorded meetings. And we cannot analyze, synthesize, and create as effectively without bringing Zoom Meeting Summaries in contact with more powerful generative AI, like Gemini, and other information, like that in our Google Docs, without integrating them. I cover how to do so in my Premium Tutorial from this past Wednesday.
While companies like Microsoft and Google are working hard to put already-powerful generative AI into a position to be advantageous — and so are we at AutomatED — you, the user, need to be constantly vigilant for ways in which you can put it in such a position.
You have special insight into your workflow, your field, and your life, so you are uniquely capable of judging how this newfound horsepower can accelerate you to your goals.
🧰 An AI Use Case for Your Toolbox:
Claude 3 for Lesson Planning
With Claude 3 released to rave reviews — including many compliments on its reasoning and writing abilities — I decided I should give it a whirl for lesson planning. I chose to give Claude 3 Sonnet the same exact prompts that I gave ChatGPT3.5 back in September when I wrote our last piece on using LLMs to quickly plan better lessons. Sonnet is the second-best of the three Claude 3 models; it is better than the also-free Haiku but worse than the paywalled Opus.
Step Zero: Be Lazy
Now, I will admit: I was lazy. I wanted to use the same prompts from my last piece, but I didn’t feel like digging through 100s of my old chats to find and copy-paste them into Sonnet.
So, I told Sonnet: take a gander at these images of my old prompts and tell me what the prompts are, word-for-word. It did a great job.
For example:
I will be doing a comparison between Claude 3, Gemini, and ChatGPT4 in the coming weeks, in order to evaluate their abilities to evaluate handwritten student submissions in a more comprehensive fashion than my first-pass evaluation of Gemini 1.0 Ultra’s capabilities. (I also plan to evaluate Gemini’s ability to analyze video, which some have raved about.)
Now onto the present use case…
Step One: Provide an Initial Prompt
As I explained back in September, when prompting LLMs for lesson planning, there are, at minimum, six criteria that your initial prompt should meet.
Your initial prompt should:
Encourage the LLM to take on a role that leverages pedagogically sound practices that fit with your own pedagogical outlook, such as formative assessments or active learning techniques.
Make clear the pedagogical context of the class session, such as the nature of the course in which it is found or the surrounding sessions’ content.
Enumerate the learning objectives that you intend to achieve with this session (or the standards that you aim to meet), such as understanding of a concept or mastery of a skill.
Clarify the practical constraints that the class session will be bound by, such as the length of the class session, the number of students you expect to attend, or the capabilities of the room in which it is held.
Present as much detail on your specific vision for the session as you want, such as that you want a given objective fulfilled in a specific manner (although, perhaps, you are open to the LLM filling in all the other details).
Tell the LLM what format you want its output to take, especially if it is non-standard for the genre.
The initial prompt from Step Zero above meets these six criteria. In a nutshell, I am using it to ask Sonnet a simplified version of what I would ask an expert teacher colleague about the specific constraints surrounding a lesson in my (philosophy) critical thinking class at UNC Chapel Hill.
Just like humans, LLMs need enough context to give useful advice.
When I gave ChatGPT3.5 this prompt in September, it would tend to give me an ambitious lesson plan that could not work well given my time constraints. As I wrote back then, “ChatGPT3.5 often proposed a lesson plan with a mini-lecture, two whole-class activities, and two separate small group exercises. In 50 minutes, this is extremely unlikely to work well, given student interjections and tangents in the whole-class activities, issues with collaboration and alignment in the small group exercises, and delays arising from the transitions between the various components.”
What about Sonnet? Here is a representative example of what Sonnet gave me with the same prompt:
Not bad! But there are several issues…
First, like ChatGPT3.5, Sonnet is too ambitious. There is no way the described formative assessments or group presentations could each occur in 10 minutes.
Second, the whole point of linking the in-class work with the preceding homework assignment was to let each student build on their own homework submission, which analyzed a specific argument in the assigned reading that they chose — not build on an argument that one of their group members analyzed.
Yet, the fact that Sonnet’s response suffers from these issues does not come as a surprise. After all, most of us are overly ambitious in our lesson plans — I have learned over the years to pare my plans back repeatedly from what I initially expect to be able to complete. I consistently run out of time. Indeed, Sonnet may be reflecting this aspect of human teaching optimism, given its training data.
And Sonnet’s misunderstanding of the role of the preceding homework is also unsurprising, given that I didn’t provide it with enough information to know better. I didn’t make clear that different students would have analyzed different arguments for homework!
No matter — we can continue to prompt Sonnet to revise its outputs in ways that are more responsive to our needs. Consider these subsequent prompts to be an essential part of the process.
Step Two: Ask for Revisions
As I wrote in September, the revision process is analogous to that which you would undertake with a human mentor.
If your teaching mentor gave you this sort of lesson plan, with these flaws, you would respond so that they can get a better handle on your educational context and request — you wouldn’t give up on the chance to benefit from their advice because they misunderstood a few details that you underexplained. Do the same with your generative AI pal.
In my case, I made several requests of Sonnet, including noting the two issues from above. When I wasn’t satisfied with the outputs, I made further requests. Here is the result of 5 minutes of additional work:
This is very similar to the suggested lesson plan I received from ChatGPT3.5 back in September, although Sonnet did not make math errors (ChatGPT3.5 couldn’t keep track of the 50-minute time limit), Sonnett did not try to assign homework (ChatGPT3.5 repeatedly gave me a homework assignment for the final 5 minutes that I didn’t ask for), and Sonnet provided more commentary on the educational value of the lesson plan.
Let’s see if it will shine even more if we ask it about student perspectives on this lesson…
Step Three: Ask for Other Perspectives
As I wrote in September, “even though the LLM might have just presented this plan as preferable, it can often identify notable (and strikingly apt) worries or concerns about it from the student’s perspective. This is an advantage over a human interlocutor who could have produced a similar plan, as humans tend to be proud of what they have already asserted or created.”
When pressed to consider the flaws of its own lesson plan from the perspective of a student, Sonnet produced the following:
This is much more detail than ChatGPT3.5 tends to provide. Very interesting!
Concluding Remarks
So, what do I think of Claude 3 Sonnet for lesson planning?
At the very least, it is definitely a lot better than Claude 2! Here is how I described Claude 2 back in September:
In terms of comparing the various LLMs, I have found that Claude 2 is generally less ambitious than ChatGPT, which results in more simplistic or basic lesson plans that require significant supplementation. This is good for straightforward color-within-the-lines brainstorming.
More testing is needed, but I think Sonnet is better than ChatGPT3.5 and comparable to ChatGPT4 for this use case. And since you can also upload files to Sonnet to further contextualize your needs like exemplar lesson plans, your syllabus, or class readings — you cannot do so with ChatGPT3.5 — this is a further advantage.
Google’s Gemini is nice because it is better integrated with other media, like YouTube and Google Search, but Sonnet is better at composition and comes across as more thoughtful.
While we are getting to the point that the top models are converging on high performance, it is still worth the time to compare and contrast for each of our use cases.
And, as I recommended before, given the iterative process of revision I recommend here (and always recommend, even with custom GPTs), I suggest that you open a new chat for each class session that you are planning. Label each chat with the date of the class session and the course name so that you can return and continue the iteration process if need be.
Have you tried out our college/university course design custom GPT? Give it a try, if you have ChatGPT Plus! It can produce assignments, assignment sequences, rubrics, and AI course policies. We have designed it to be especially effective when it comes to pedagogical issues related to AI.
Remember, you can even get it involved in any other GPT conversation you are having, if you @ it!
Our first ratings are rolling in after OpenAI rolled out the new rating system for the GPT store. We are constantly working to improve it, so please give it a rating or submit feedback to tell us how it performs. In fact, we are working on a significant update that we plan to release later this week!
📬 From Our Partners:
An AI Writing and Grading Assistant
Packback is the leading Instructional AI platform. Our platform acts as a “Digital TA”, providing every student with an AI writing tutor, and every instructor an AI grading assistant. Our Digital TA powers our award-winning discussion platform, Packback Questions, and our AI-supported writing assignment platform, Deep Dives.
Through Packback’s AI grading assistance, educators are able to spend more time teaching and connecting, and less time correcting and doing administrative work. Through Packback’s AI-powered feedback, students are able to receive feedback in real time, allowing them to strengthen their writing and build confidence.
We teach students how to write, never writing for them.
To learn more about Packback or to schedule a demo, click here.
To get access to Premium, you can upgrade for $5/month or $50/year, or get one free month for every two (non-Premium) subscribers that you refer to AutomatED.
The price will be going up in ~2.5 weeks, so don’t miss the chance to lock in at the current annual rate!
To get credit for referring subscribers to AutomatED, you need to click on the button below or copy/paste the included link in an email to them.
(They need to subscribe after clicking your link, or otherwise their subscription won’t count for you. If you cannot see the referral section immediately below, you need to subscribe first and/or log in.)