GPT-4o is Here and GPT-4 + Custom GPTs are Now Free
Plus I release a custom GPT-4o for digitizing anything handwritten.
[image created with Dall-E 3 via ChatGPT Plus]
Welcome to AutomatED: the newsletter on how to teach better with tech.
Each week, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.
In this week’s edition, I discuss the new multimodal LLM from OpenAI, I show what it can do, I explain the (arguably bigger) news about GPT-4 access, and I discuss why all this matters to professors and other educators. This is one you won’t want to miss…
Table of Contents
📢 The Big News
👀 GPT-4o is Here (and Coming Soon)
Imagine if you could feed a LLM images or videos and it could analyze them into text, reason about their content, and speak about it in 20+ languages in ways that would often pass the Turing test.
It is already here.
This past week, OpenAI announced the release of GPT-4o, a fast new LLM that improves on GPT-4 in all dimensions — especially with non-English languages — and also is multimodal from the ground up (the ‘o’ is for ‘omni’). Its knowledge cutoff is in October 2023.
Here’s how they put it:
GPT-4o is a step towards much more natural human-computer interaction — it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.
Communicating with GPT-4o via voice is like talking with a person, as OpenAI’s video demonstrations show. Here it is helping Imran Khan (the son of Sal Khan of Khan Academy) with a math problem:
OpenAI’s video demos show that GPT-4o can do the following (click each for the relevant video):
OpenAI also shows GPT-4o doing the following (click here and scroll down to see these):
create visual narratives
create posters
design characters
create 3D object illustrations
convert photos to caricatures
create fonts
convert a meeting recording to a transcript with multiple unlabeled speakers
summarize lectures or other video/audio recordings
Given the power of GPT-4o, its modalities, and the naturalness of its conversational abilities, OpenAI claims it has undergone extensive “external red teaming” to identify risks. Nonetheless, at the moment, they are only releasing “text and image inputs and text outputs.” In the “upcoming weeks and months,” all the other modalities will be released once more risk analysis and mitigation is complete.
Below, I provide more detail and some illustrative use cases involving the analysis of handwriting (including a custom GPT that digitizes handwritten documents and places them in Google Docs), but the short story is that GPT-4o is already the best current LLM, in terms of reasoning capabilities, multimodal analysis abilities, and speed.
Here is a graphic showing that it outperforms GPT-4, Gemini 1.0 Ultra, Gemini 1.5 Pro, and Claude Opus on all benchmarks for vision understanding:
And this one shows that it outperforms them all — plus Llama3 400B — on all but one of the major reasoning benchmarks:
🤑 Now Free:
GPT-4o, GPT-4, and Tools (including Custom GPTs)
OpenAI also announced that GPT-4o, GPT-4, and associated tools — including custom GPTs, Advanced Data Analysis, Memory, file uploads — are now free to use, although there are usage limits for non-paying users.
While GPT-4o is big news — especially once all the modalities are released — this is bigger news right now.
No longer will the paywall of ChatGPT Plus (which runs $20 per month) block users from leveraging the much more powerful models on offer from OpenAI.
The free ChatGPT-3.5 has been significantly worse than the alternatives since early 2023, but many users have declined to upgrade due to the price.
Relatedly, no longer will professors need to decide between requiring their students to buy ChatGPT Plus and being unable to leverage student-facing custom GPTs for their pedagogy.
Indeed, prior to custom GPTs, I never required my students to purchase course materials for my Philosophy courses. I always worked hard to provide materials on Blackboard (RIP), Sakai (also RIP), or Canvas, free of charge.
While I was torn this past semester about requiring my students to purchase ChatGPT Plus to use the custom GPTs that I built to act as course tutors, they reported time and again that they found the GPTs more than worth it. My end-of-semester evals were full of mentions of them and their value.
But the cost was still a significant obstacle and burden for my students.
Now, I and other professors needn’t make this tough decision. Anyone with a ChatGPT account can access custom GPTs, although they cannot create custom GPTs themselves without ChatGPT Plus.
This means that professors can create custom GPTs, link them to their students (and their students only), and their students can interact with them without paying a dime.
And there are so many creative ways professors can leverage custom GPTs, as we have discussed before…
Historical Figure Simulations: History or political science professors can create GPTs to simulate interactions with historical figures during critical moments. For instance, students could converse with Abraham Lincoln during the Civil War or participate in a simulated UN Security Council meeting during the Cuban Missile Crisis.
Virtual Debate and Argumentation Practice: Professors in philosophy, law, and communication studies can develop GPTs embodying various philosophical positions, legal theories, or rhetorical stances. Students can engage in debates and dialogues with these virtual interlocutors, honing their argumentation skills and receiving feedback on their positions. (I do this!)
Language and Cultural Immersion: Language professors can create GPTs simulating native speakers with specific cultural contexts. For example, a Spanish professor could develop a GPT that embodies a native speaker from Mexico City, facilitating culturally-rich dialogues and improving conversational skills. GPT-4o is much better at this sort of use case than GPT-4 was.
Customized Research Assistants: Professors can develop GPTs tailored to assist with specific research tasks. A biology professor, for example, could create a GPT that helps design experiments, analyze data, and interpret results based on the latest scientific literature, providing guidance and critiques. (Here’s where Advanced Data Analysis comes into play.)
Interactive Case Studies for Problem-Based Learning: In business, medicine, and engineering, professors can use GPTs to create dynamic case studies and problem-solving scenarios. For instance, a business professor could design a GPT simulating a corporate negotiation, while a medical professor could develop one presenting patient cases for diagnosis and treatment planning.
Are you considering using custom GPTs now that students won't have to pay to use them? |
This update also means that our Course Design Wizard is free to use. Give it a whirl if you haven’t tried it yet!
In the next section below, I discuss some ways professors can use GPT-4o now via an illustration of some of its multimodal capabilities (and I release a new AutomatED GPT that digitizes handwriting).
After that, I conclude with some prognostications about how we might want to use GPT-4o once all its modalities are released.
🧰 How Can Professors Leverage GPT-4o Now?
In next week’s piece, I will discuss a load of updates from Google. As I will discuss, one way in which GPT-4o is inferior to the alternatives is that it has a 128,000-token context window.
Note: An LLM’s context window is the amount of information that you can prompt it with at one time, with 100 tokens being roughly equivalent to 75 words.
While 128,000 tokens is a lot — ChatGPT originally had 4,000- or 8,000-token context windows, in early 2023 — Google’s Gemini 1.5 Pro, available via API and soon to be available via Gemini Advanced, has a context window of 1,000,000 tokens, with 2,000,000 tokens available to beta testers.
This makes Gemini better at large-scale tasks, like processing massive chunks of text (e.g. 700-page books) or lengthy high-quality videos (e.g. 1.5-hour lectures).
So, what can GPT-4o do well? And what should professors look to use it for?
Although GPT-4o is superior to the alternatives with respect to reasoning, it shines — and will shine — the most when tasked with multimodal challenges.
Given that OpenAI has publicly released only image analysis, this means for now that it shines in this dimension.
For instance, it is much better than the alternatives at analyzing and evaluating handwritten student homework submissions and at digitizing handwritten notes.
Let me show you…
🔎 Evaluating Handwritten Student Submissions
Back in December, when Gemini was first released as a natively multimodal LLM, I mercilessly tested whether it could reliably complete a task displayed in Google’s own documentation of its capabilities. The task is to evaluate a handwritten student answer to a physics question, complete with diagram. Here are the question and instructions:
As I reported, Gemini initially made two mistakes in completing this task, leading me to the following conclusion:
To be sure, there are powerful use-cases in the future of these multimodal LLMs. We will continue to experiment to find them — it could be that our prompting needs optimizing, an integration with another software tool would address their shortcomings, or something else on our end is going wrong — but it seems that they need to be more reliable if professors are to gain a lot from them.
Then, in February, Google released the new Ultra 1.0 model via Gemini Advanced. I reported that it still made errors, though fewer than before, leading me to this conclusion:
[W]hile it is an improvement on the prior answer in several dimensions, there is still a mistake that indicates that Gemini Advanced has not ironed out all of the wrinkles with reasoning — especially if professors are to rely on it to help them evaluate student work.
So, what about GPT-4o?
With the same prompt I gave to both Geminis, GPT-4o knocks the problem out of the park almost every time (I tested it 25+ times). Although it sometimes would take a different path, it generally starts by analyzing the student’s work:
Next, it tends to offer its own path to the solution:
It properly analyzes the image, identifies and explains the student’s work, explains how to correctly reason through the problem, and relates this latter explanation to the student’s work, noting where the student goes wrong.
In this specific instance, the only wrinkle is that it represents the velocity as equal to 28.00 m/s. In other tests, it recognized that it is only approximately so (the real value is 28.01428207…). Indeed, this was the only error I could get it to make in all my tests.
Not bad!
I leave you to determine whether it is reliable enough for the problems your students complete by hand.
Note: If you plan to use student data with GPT-4o or a custom GPT built for this sort of purpose, be sure to consider my guidance in our ✨Premium Guide to Ethically Using AI with Student Data. If you want to evaluate student work at scale with GPT-4o, you should probably consider pseudonymization as a solution.
📝 Digitizing Your Handwriting (e.g. Notes)
In my experimentation with GPT-4o, I decided a good test of its multimodality would be some gnarly handwritten notes of mine, like these that I wrote during a Q&A of one of my talks as I parsed audience questions:
Sure, the handwriting in the physics problem from the preceding section is bad… but not this bad!
GPT-4o natively did a very good job of parsing my chicken scratch, producing the following:
There are a few errors, like the missing #3 from the first list, the missing scribbles from the upper right corner, and the misreading of the passage number at the bottom (it’s T 1.4.2.20). But it is impressive nonetheless, and lots of further testing showed that GPT-4o is generally excellent at parsing handwriting.
It got me thinking: why not build a custom GPT to try to improve GPT-4o’s native performance (even if slightly), to accommodate batch image/photo uploads, and dump the outputs into Google Docs (either one note per Doc, multiple notes in each Doc, or some other user-specified combo)?
Several hours later, I have a new custom GPT:
Try it out if you have some digitization needs!
After this experience, I am tempted to build several other custom GPTs that leverage the same GPT-4o functionality.
For instance, in some of my classes, I quiz my students with 5-minute multiple choice or short answer quizzes at the start of each session to ensure that they read the reading, and I could create a custom GPT to help grade them.
What if I could scan their submissions (with pseudonyms rather than their names), give the images in bulk to a custom GPT along with the answer key, have the custom GPT grade them, output the grades to a .csv formatted to upload to Canvas, locally convert the pseudonyms to names, and then upload the .csv to Canvas?
Intriguing…
Would you use a quiz grader GPT like this, if Graham made it available? |
🗣️ What About the Forthcoming Modalities?
As the other modalities of GPT-4o become available — audio and video inputs, and audio and image outputs — professors and other higher educators should look forward to leveraging them.
Here are three ways in which I think the yet-to-be-released modalities of GPT-4o might be utilized effectively in the university context:
Accessing Information in Another Language
GPT-4o’s advanced proficiency in non-English languages will enable professors to access and share information from diverse linguistic sources. Whether it’s translating academic papers, communicating with international colleagues, or helping students understand foreign-language texts, this capability will significantly broaden the scope of accessible information. Further, the model’s quick and natural conversational abilities will ensure that translations and communications are smooth and coherent, facilitating seamless interaction across language barriers.
Multimodal Tutoring
The integration of screen sharing and live discussion capabilities transforms GPT-4o into a more effective tutoring assistant. Professors can use this functionality to provide personalized guidance to students, walking them through complex problems or concepts in real-time with immmediate oral feedback and annotated documents via custom GPTs that are more immersive.
Multi-GPT Conversation
GPT-4o's ability to engage in and analyze conversations opens new avenues for teaching critical thinking and debate. Professors can set up two or more iterations of GPT-4o to simulate discussions or arguments on various topics. These AI-driven discussions can be analyzed in real-time, providing students with insights into argument structures, logical fallacies, and effective rhetoric.
What are some ways that you think this technology will create opportunities and challenges in your educational context?
Let me know your thoughts by responding to this email (as always, all responses go straight to my inbox) or by clicking this link:
📬 From Our Partners: MIT Short Course
Learn How AI Impacts Strategy with MIT
As AI technology continues to advance, businesses are facing new challenges and opportunities across the board. Stay ahead of the curve by understanding how AI can impact your business strategy.
In the MIT Artificial Intelligence: Implications for Business Strategy online short course you’ll gain:
Practical knowledge and a foundational understanding of AI's current state
The ability to identify and leverage AI opportunities for organizational growth
A focus on the managerial rather than technical aspects of AI to prepare you for strategic decision making
Graham | Expand your pedagogy and teaching toolkit further with ✨Premium, or reach out for a consultation if you have unique needs. Let's transform learning together. Feel free to connect on LinkedIN, too! |
What'd you think of today's newsletter? |