Train Students on AI with Claude 3.5

I show how and compare it to GPT-4o.

Graham Clay
June 24, 2024 • Estimated Reading Time: 16 minutes

In partnership with

[image created with Dall-E 3 via ChatGPT Plus]

Welcome to AutomatED: the newsletter on how to teach better with tech.

Each week, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.

In this week’s piece, I explain how the newly released Claude 3.5 Sonnet multimodal LLM is useful for training students to use AI because of its improved user interface. I also release one-off shareable Premium Tutorials and Guides, as well as some feedback from readers like you.

Are you going to train your students to use AI next semester/term?

🧰 AI for Your Toolbox: Claude 3.5 for AI Training
- Artifacts and AI Training
🛠️ Changes at AutomatED
🚀 Further Improvements with Claude 3.5 Sonnet
- Vision
- Comparisons with GPT-4o on Reasoning
✉️ What You, Our Subscribers, Are Saying
✨Recent and Upcoming Premium Posts

🧰 An AI Use Case for Your Toolbox:
Claude 3.5 for AI Training

This week, I release a ✨Premium Guide on how to train students to use AI. This is probably our most in-demand topic other than custom GPTs, so I am pumped to see what our Premium subscribers — now 50 in total! — think.

Last week, fortuitously, Anthropic released Claude 3.5 Sonnet, which is supposed to be comparable to GPT-4o (more on that below) and which also comes with “Artifacts.“

Artifacts uses a side-by-side display to show the outputs of the large language model (LLM) in real time as it produces them. As you continue to prompt Claude on the left side of the screen, it produces new versions of its output on the right side, with the option to scroll through past versions with the click of a button.

What good timing!

One of the key ways that I recommend training students on using AI involves showing them — or, more accurately, letting them see for themselves — how AI works. This is crucial to avoiding all sorts of problems with student AI use. If they don’t know how it works, they can’t make it work well and they don’t know how it can go astray.

Artifacts is a nice way to address this issue for some use cases, as it gets the student closer to seeing what exactly the LLM is doing when they prompt it. This also helps them learn the subject matter, too, because they can see the ways it addresses problems that they identify or requests they make.

Let’s dive in!

Artifacts and AI Training

Fundamentally, Claude’s new Artifacts just is a change in the user interface of the LLM. As Anthropic (the developers of Claude) put it, this feature “marks Claude’s evolution from a conversational AI to a collaborative work environment.” Here’s a short video demo so you can see what they mean:

There are countless use cases of Artifacts — here is a list compiled by Min Choi, complete with videos — but my focus is on using it for training your students to use AI. For this purpose, the goal is to help students see how the AI produces its output, generally via intermediate code, thereby learning how the AI works and how they might approach similar problems (whether with the help of AI or independently). Here are some examples:

If you teach computer science, user interface design, or anything involving web development, you can have students prompt Claude to produce web pages’ source code, see this code produced on the right side, preview it after it has compiled, and iterate through code+preview combinations.

To use a toy example, if you prompt Claude to show how it would code a basic webpage, it will produce an HTML output with CSS styling, explain the function of each part, and show you a preview of how its specific code works when implemented. You can interact with this preview in real time as if it were a live webpage.

In my case, it produced a button that uses JavaScript to change the background color of the webpage:
Here is the interactive “Preview” that appeared if I toggle from “Code” via the button in the upper right corner:
I could then tell it: “Give me more color options!”, watch as it implements this change in the code, fiddle with the resultant preview, and then cycle between code+preview combinations via the version button to see how it went about making this adjustment. I could also ask it to show me how to produce a similar output in a different coding language (a “code translation”) and then compare the versions to see which bits of syntax it treats as functionally equivalent (click here for more advanced implementations of Claude’s “tool use” or function calling capability).
If you teach economics, financial analysis, or accounting, you can have students prompt Claude to create analyses of markets or businesses, including interactive infographics, charts, or reports via React. Since it shows its work with Artifacts, your students can see how different prompts result in different statistical analyses, different representations of this information, and more.
If you teach subjects that produce purely textual outputs without a code intermediary, like philosophy, creative writing, or journalism, your students can compare prompting techniques, easily review their work, note common issues, and iterate drafts by comparing versions.

As you know, the alternative user interface, like that of ChatGPT or Gemini, is a sequence of prompt-output pairs where you have to repeatedly scroll back up to see what happened already…and where code is, at best, inserted as a block prior to the resultant preview (as is the case with ChatGPT-4 and ChatGPT-4o, if you have the option to show code selected).

I see this as the first serious step towards improving the otherwise terrible user interfaces of LLMs for broad use. It may turn out to be a small change in the grand scheme of things, but it sure feels like a big improvement — especially in the pedagogical context.

By the way, you can turn on Artifacts via the pop-up prompts that Claude gives you, which look like this:

Or you can turn it on via the “Feature Preview” button in the menu in the upper right corner of your browser:

Can you see yourself using Claude 3.5 Sonnet's Artifacts?

Once you pick an option, briefly tell me how!

🛠️ Changes at AutomatED

Before I discuss the improved vision and reasoning capabilities of Claude 3.5 Sonnet, I first want to announce 3 changes at the newsletter…

Premium Archive Now Transparent

Pop quiz:

How many ✨Premium pieces are there in our Archive?

Bonus:

And how many words and screenshots are in them?

Since AutomatED launched, we have delivered 93,000 emails, we have gained more than 3,000 subscribers, and … we have sent 13 ✨Premium pieces weighing in at a total of 52,000 words and 200 screenshots!

I will admit that I badly failed this quiz — and I wrote them all. And when I talk to subscribers, they are often unaware of Premium. In fact, some Premium subscribers tell me they have missed emails and thus lost track of what’s in the Archive. (Sorry!)

I’m not sure why I didn’t do this sooner, but now there is a way to view the entire Archive all in one place on our website, and it is linked on the Upgrade page (see above for screenshot). Check it out here:

Shareable Premium Pieces for $5

Some of you have told me that…

you want to access individual Premium pieces from our archive without subscribing to Premium
you want to be able to share these pieces with colleagues easily (because you are part of an AI pedagogy group, you are a department leader, you want to spark discussion, …)

Now you can!

For $5, you can get an emailed version of one of our top 5 Premium pieces that you can forward to anyone you'd like to share it with! No subscription is needed, and you have a right to distribute the piece via email as much as you’d like.

Currently, the options are:

Tip Jar

Like my work at AutomatED and want to donate a few bucks? Now you can, with the tip jar option on our Upgrade page. You pick the amount. I appreciate your support!

🚀 Further Improvements with Claude 3.5 Sonnet

Vision

Claude claims that Sonnet 3.5 outperforms GPT-4o on visual analysis and that the improvements are “most noticeable for tasks that require visual reasoning, like interpreting charts and graphs.”

This sounds good for us educators!

Their demos show them using it for a range of interesting cases, including a genomics professor teaching their class about the human genome and wanting to transcribe data about genome sequencing into JSON:

To conduct my own initial evaluation, I gave it the same test that I had given to Gemini when it was first released, Gemini 1.0 Ultra when it was released, and GPT-4o when it was released. To recap: Gemini had trouble originally, Gemini 1.0 Ultra did better but had errors, and GPT-4o almost always succeeded.

Here’s the problem, a handwritten student answer to a physics question, complete with diagram (devised by Google themselves in their Gemini documentation):

Long story short, when I gave Claude 3.5 Sonnet this image, what I found was this: it would always falsely state that the student got the correct answer, but when corrected that they did not — by pointing out that their formula ‘E = mgL’ should be ‘E = mgH’ — it would agree and make all the necessary corrections to its calculations.

So, not bad, but not as reliable for this sort of use case as GPT-4o. (For discussion of its handling of another physics problem relative to GPT-4o, click here.)

At the handwriting transcription test I gave GPT-4o, Claude 3.5 Sonnet did admirably, converting this scribbled mess:

Into this:

It made a few minor mistakes but did slightly better than GPT-4o, effectively capturing the especially tough smaller scribbles in the upper right corner.

(Of course, the industry benchmarking is a more useful way of evaluating its general ability across many use cases, even if some of the benchmarks don’t consist of “real world” tests; my point is just that one must be careful about whether it is reliable enough for the use cases one is interested in.)

Comparisons with GPT-4o on Reasoning

In short, Claude 3.5 Sonnet is comparable or slightly better than GPT-4o on all the major reasoning benchmarks, per this chart from Anthropic:

Some experts have conducted more specific real-world comparisons and found GPT-4o superior — e.g. Anita Kirkovska at vellum — but this is very promising, especially given that Claude is generally seen as a much better writer than ChatGPT.

Try it out for yourself and see what you think!

What'd you think of today's newsletter?

✉️ What You, Our Subscribers, Are Saying

Would you like to see more interviews like my interview with Walter Sinnott Armstrong on Moral AI?

“Yes! This was great. Especially the section where the interviewee explains how he uses AI in his teaching.”

Anonymous Subscriber

I hear you! Next week, robots willing, you will hear from an accounting professor about how he uses AI in his teaching!

I’m on it!

✨Recent and Upcoming Premium Posts

May 31 - Tutorial: Easy Student Consent Management in Google Workspace

June 19 - Tutorial: Easy Student Consent Management in Microsoft 365

June 26 - Guide: How to Train Students to Use AI

July - Tutorial: How Professors Should Use Gemini 1.5 Pro

July - Tutorial: How Professors Should Use GPT-4o

An entirely new way to present ideas

Gamma’s AI creates beautiful presentations, websites, and more. No design or coding skills required. Try it free today.

Graham

Expand your pedagogy and teaching toolkit further with ✨Premium, or reach out for a consultation if you have unique needs.

Let's transform learning together.

Feel free to connect on LinkedIN, too!