GPT-4.5 is Sad but Deep Research is Cheap

Plus, a comparison of voice-based AI tools for orals, reflection, conversation, and more.

Graham Clay
March 03, 2025 • Estimated Reading Time: 14 minutes

In partnership with

[image created with Dall-E 3 via ChatGPT Plus]

Welcome to AutomatED: the newsletter on how to teach better with tech.

In each edition, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.

Last Week: I answered 3 reader questions on topics big and small related to tech/AI and pedagogy, including “What do you think of using NotebookLM Plus as an AI tutor, rather than a custom GPT?” 👈 click here if you missed it

Today, I share all the big news from last week, including the announcement of the new GPT-4.5, Claude 3.7 Sonnet, and the wider availability of OpenAI Deep Research. But first, we have an awesome guest post on several AI-powered voice tools.

📣 By the way, I just spun up a little website to explain the AI tools I’ve developed to help department course schedulers:

PS - There are only 2 more department slots available for this semester.

🎤 What is the Future of Voice in Education?
📢 Quick Hits: AI News and Links

📬 From Our Partners

Your daily AI dose

Mindstream is your one-stop shop for all things AI.

How good are we? Well, we become only the second ever newsletter (after the Hustle) to be acquired by HubSpot. Our small team of writers works hard to put out the most enjoyable and informative newsletter on AI around.

It’s completely free, and you’ll get a bunch of free AI resources when you subscribe.

Click here to subscribe

Remember: Advertisements like this one are not visible to ✨Premium subscribers. Sign up today to support my work (and enjoy $500+ worth of features and benefits)!

🎤 What is the Future of Voice in Education?

A Guest Post by Elliot Roe and Duncan Johnson

Voice-based AI tools are becoming more common in education, offering a potential solution to AI misuse. But do they represent a genuine opportunity to improve student outcomes, or are they just another layer in the cat-and-mouse game of academic integrity?

Duncan and I are student researchers at Tufts University and Georgia Tech, respectively, and we've spent the last 4 months researching and designing a voice-based tool for submission to the Learning @ Scale conference. We started exploring voice-based AI learning tools to see if they could help foster student-to-student connections in online classes but have shifted our approach significantly since then based on our exploration. We spoke with online education professionals and tested various tools before building our own voice-based tool and piloting it in classes. Along the way, we've learned a lot — both about how students are actually interacting with the tech and about the realities of integrating voice AI into education.

We'll discuss three existing voice-based AI tools that we found, each using voice AI in a unique way. Then, we’ll provide a quick breakdown of how we approached our own design and share our thoughts on where this technology is heading.

Socratic Mind

Voice-based assessment methods — like oral exams — have become an accepted way of "AI-proofing" assignments. But oral exams tend not to be very scalable, which is exactly what Socratic Mind aims to solve.

Built by researchers at Georgia Tech, Socratic Mind replicates oral exams asynchronously with AI. Here's how it works: Instructors set up assignments by adding questions and writing an "ideal" answer for each. Then, when students take the exam, the AI uses Socratic questioning to dig into their knowledge and guide their thinking.

While it's great to see how effectively Socratic Mind can help students develop their understanding through this back-and-forth, we've heard from students and educators who’ve used the tool that it can balloon the length of assignments.

Sherpa Labs

Where Socratic Mind tackles comprehensive oral exams, Sherpa Labs takes a different approach with quick, 10-minute quizzes on course content. Their main innovation is in content transformation — instructors simply drag and drop a PDF reading and specify a few key areas they want the questions to cover. What makes Sherpa interesting is that it's not trying to replace traditional assessments with oral exams. Instead, they're bringing an extra layer of security to lower-stakes activities.

This seems like it could work well alongside essays or other assignments. But there's a catch: the tool generates a fixed set of questions for each reading, and some educators have noted that students can still attempt to game the system by preparing scripts in advance. It's a trade-off between simplicity and security — you get a streamlined process, but you might not get the full benefits of spontaneous oral assessment.

MirrorTalk by Swivl

MirrorTalk, developed by Swivl, is a voice-AI tool designed to support self-reflection in education. It allows students and teachers to record spoken reflections and receive AI-generated insights.

We include MirrorTalk here because scaffolding reflection with a voice-AI tool is a novel and potentially powerful idea. However, where MirrorTalk falls short is its focus. It’s designed to integrate with Swivl’s co-teacher devices and assist in self-reflection for teachers. While activities can be designed and sent to students as well, it’s clear — from template activities on their platform to their product’s marketing — that student usage is not their first priority. To our knowledge, no independent educators have shared their experiences with the tool beyond Swivl’s promotional materials. So, while it could provide an interesting way for your students to reflect, it doesn’t feel as realistic to do so.

Verse

Learning from these platforms, we saw an opportunity to create something more conversational. We created Verse. While other tools mix text and audio or follow structured question-answer formats, their interactions often feel mechanical and chatbot-like. We wanted to build a purely voice-based experience that feels like a natural discussion, where students can work through course concepts in real-time synthesis.

Assignment Creation View

Student View

This informal approach is intended to let us capture students genuinely grappling with ideas. However, it presents a unique challenge: evaluating free-flowing discussions is more complex than grading structured essays.

To address this, we've incorporated an automated specifications grading system. Instructors define both core discussion questions and specific learning criteria. The system then analyzes each conversation transcript, identifying relevant quotes that demonstrate how students met these criteria. Students can review their evaluation and choose to either refine their ideas through another conversation or submit their transcript directly to their LMS.

Evaluation Output

Product Comparison

Product	Key Takeaway	How are assignments turned in	Input Modality	Output Modality
Socratic Mind	AI oral assessment using the Socratic questioning technique	Platform & LTI Integration	Text or Audio	Text
Sherpa Labs	Transform current class content into low-stakes, voice-based quiz	Platform	Audio with optional Video	Audio or Text
MirrorTalk by Swivl	Voice AI-guided reflections for teachers and students	Platform	Audio and Video	Audio
Verse	Voice AI discussions with automated specifications grading for structured feedback	Platform & Transcript Export	Audio	Audio

What’s Our Take?

Voice-AI in education is still in its early stages. Currently, Socratic Mind shows the most promise with its well-defined use case for oral exams and strong research backing. While Sherpa Labs achieved the highest adoption among these tools, their focus seems to have shifted elsewhere since the tool's creation.

But looking beyond these specific implementations, we see voice-AI's real potential in formative feedback — if we can make it practical for everyday classroom use. This has led us to focus on two key questions:

How can we create voice-AI-based activities that build on the best parts of synchronous, teacher-led discussions?
How do we provide efficient and equitable feedback to both instructors and students on these voice-based activities?

The answers to these questions will likely determine whether voice-AI becomes a transformative educational tool or just another fancy CAPTCHA in the ever-growing collection of edtech solutions. What do you think?

Can you see yourself using voice-based AI tools? Why or why not?

Click an option and then share your thoughts.

This section was written by Elliot Roe, who is an Undergraduate Research Assistant at the Georgia Tech Play and Learn Lab, and Duncan Johnson, who is an Undergraduate Researcher with the Tufts University Center for Engineering Education and Outreach.

📢 Quick Hits:
AI News and Links

1. OpenAI announced GPT-4.5, which they describe as their "largest and most knowledgeable model yet." In short, its “innate” strengths are representing empirical facts, avoiding hallucinations, and conversing more naturally, but it is worse at complex reasoning tasks like coding and STEM than their o models (o1, o1-pro, o3-mini, o3-mini-high) and these models’ competitors. Crucially, 4.5 is also worse at representing and finding empirical facts than Deep Research, which is OpenAI’s agentic fusion of their reasoning models with advanced search capabilities (see below for news on it, by the way). So, the sense in which it is strong at representing empirical facts is that, when limited to training alone and not combined with other functions/tools, it does the best. This leaves it in a weird position in our AI toolkit, and it reduces OpenAI to suggesting that its socioemotional intelligence may be the source of its lasting value (and that the benchmarks that they normally tout “don’t always reflect real-world usefulness”):

Early testing shows that interacting with GPT‑4.5 feels more natural. Its broader knowledge base, improved ability to follow user intent, and greater “EQ” make it useful for tasks like improving writing, programming, and solving practical problems. We also expect it to hallucinate less.

OpenAI, “Introducing GPT-4.5”

Many commentators are arguing that the limitations of 4.5 prove the limitations of “unsupervised learning” training methods, although perhaps combining 4.5 with the reasoning models and functions/tools will result in “something far more powerful.” You can access 4.5 via ChatGPT Pro right now or via Plus later this week.

2. In more exciting news, Anthropic released Claude 3.7 Sonnet, their "most intelligent model to date" and the first "hybrid reasoning model" on the market. It can produce either near-instant responses or engage in extended step-by-step thinking that's visible to users. The model shows particular improvements in coding capabilities, with partners like Cursor, Cognition, Vercel, Replit, and Canva confirming its exceptional performance. Many commentators think it is significantly ahead when it comes to real-world development tasks. Anthropic also introduced Claude Code, a "command line tool for agentic coding" in limited research preview, which enables delegating substantial engineering tasks to Claude directly from your terminal.

3. OpenAI expanded access to Deep Research, its web browsing agent that creates comprehensive research reports, to all paying ChatGPT users. Previously only available to $200/month Pro subscribers, the feature is now included with ChatGPT Plus, Team, Enterprise, and Edu plans (limited to 10 queries monthly, while Pro users get 120). This follows Google's similar move last week extending their Deep Research agent to all Gemini Advanced users, highlighting the competition between OpenAI, Google, and Perplexity to deliver value through advanced research capabilities.

4. ICYMI: Google launched “Career Dreamer,” an AI-powered career exploration tool that helps people identify transferable skills and discover new career paths. It uses Gemini to help craft career identity statements, explore job possibilities, and draft application materials. This could be particularly valuable for career services offices and academic advisors working with students on their career pathways. (More info here.)

5. ICYMI: Google also unveiled an "AI co-scientist," a multi-agent system built on Gemini 2.0 that assists researchers in generating novel hypotheses and experimental designs. In real lab experiments, it successfully predicted new drug treatments for leukemia and proposed liver fibrosis targets that showed "significant anti-fibrotic activity." Perhaps most impressively, it independently rediscovered a novel bacterial gene transfer mechanism that had been recently found (but not yet published) by researchers at Imperial College London. Google is opening access to research organizations through a Trusted Tester Program, and you can sign up here.

Google, “Accelerating scientific breakthroughs with an AI co-scientist.”

What'd you think of today's newsletter?

Graham

Let's transform learning together.

If you would like to consult with me or have me present to your team, discussing options is the first step:

Feel free to connect on LinkedIN, too!