ChatGPT's Search Engine is Here

Plus, a Dec. 6 webinar on using AI for feedback.

Graham Clay
November 04, 2024 • Estimated Reading Time: 18 minutes

[image created with Dall-E 3 via ChatGPT Plus]

Welcome to AutomatED: the newsletter on how to teach better with tech.

In each edition, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.

In this week’s piece, I discuss the relevance of ChatGPT search for research and for educators training students to conduct research. I also open sign ups for a December 6 webinar on using AI for feedback on student work, share some recent studies on AI uptake, and explain a high-impact tip for LLM prompting.

📝✅ December 6th Webinar on Feedback & Assessment
- Dates and Numbers
- What To Expect
🔎 ChatGPT Search: First Reflections
- Why Care?
  - 1. Hallucination Issues Continue to Shrink
  - 2. Reevaluation of Training Research Skills
📢 Quick Hits: AI News and Links
📬 From Our Partners: An AI Agent Just for You
🧪 Call For Interest: AI Course Scheduler for Depts
🧰 A High-Impact Tip for Your AI Toolbox

📝✅ December 6th Webinar
on Feedback & Assessment

Given the positive response to our recent webinars, I will be hosting one final webinar for the year.

The topic: accelerating and improving feedback with AI.

On October 4th, I hosted a webinar on how to use LLMs like ChatGPT as a professor. Just like my September 6th webinar on how to train your students to use AI, the feedback was very positive, with 100% of responding participants giving it an A (“Excellent!”) afterwards.

Here’s some of the feedback I received:

“Graham opened my eyes to how create better prompts utilizing his approach.”

“A well organized approach to an intimidating topic. I especially appreciated the depths of the prompting explanations. They brought a new level of understanding to the challenges of 'talking' to A.I. ”

Attendees at my Oct. 4th webinar

Let’s keep the party rolling with one final webinar for the year!

You can check out the dedicated webinar webpage for more detail or sign up directly below, but here are the highlights.

Dates and Numbers

Date and Time: Friday, December 6th from 12pm to 1:30pm EST
Standard Price: $150
Early Registration Price: $75
✨Premium Subscribers’ Price: $60 (discount visible on webinar page for logged in users)
Early Registration Deadline: Monday, December 2nd at 11:59pm
Total Available Seats: 50
Minimum Participation: 20 registrations by Monday, December 2nd; if we do not reach 20 registrations by this date, all early registrations will be fully refunded and the webinar will be canceled/rescheduled
Money-Back Guarantee: You can get a full refund up to 30 days after the webinar’s date, for any reason whatsoever

What To Expect

Live 90-Minute Interactive Webinar on Zoom:
- Framework for evaluating when and how to use AI in feedback
- Live demonstrations of feedback generation with ChatGPT, custom GPTs, and Claude
- Practical strategies for maintaining student data privacy
- Concrete examples of using LLMs as mentors, student simulations, and teaching assistants
- Extended ethics discussion and Q&A session (with me, Dr. Graham Clay)
High-Value Post-Webinar Resources:
- Complete video recording, AI-generated summary, and presentation slides
- Three complimentary ✨Premium pieces:
- Curated collection of feedback-specific prompts for common assignment types

🔎 ChatGPT Search: First Reflections

Last week, OpenAI announced that ChatGPT now has much improved web search capabilities. (This is what had been called the “SearchGPT” prototype, which I reported on in July.)

On the one hand, “ChatGPT search” is a way to supplement ChatGPT’s static knowledge base — its abilities to create text, analyze and produce images, listen and speak with Advanced Voice Mode, and conduct data analyses in Python are all grounded in training data from the past. Now ChatGPT is significantly better at drawing from real-time information and providing clickable links to original sources.

On the other hand, what OpenAI seems to be aiming for is something beyond merely a new conduit for information. By combining the ability to search the web with an improved information-display interface and the contextualization that its LLM capabilities provide, ChatGPT search is intended to move ChatGPT closer to the role of research assistant, like Perplexity.

How does it work? OpenAI explains here:

The search model is a fine-tuned version of GPT-4o, post-trained using novel synthetic data generation techniques, including distilling outputs from OpenAI o1-preview. ChatGPT search leverages third-party search providers, as well as content provided directly by our partners, to provide the information users are looking for. Learn more here⁠.

OpenAI, “Introducing ChatGPT search”

The partners they list include the Associated Press, Reuters, Le Monde, and more, and they say that “any website or publisher can choose to appear⁠ in ChatGPT search.”

Available at chatgpt.com through both web and mobile interfaces, ChatGPT search offers specialized displays for different types of information including current events, data, and maps. While currently limited to Plus and Team users, the feature will roll out to Enterprise and Education users in the coming weeks, with free users gaining access over the coming months.

In some initial testing, I found that it worked quite well, displaying good access to recent academic research on nuanced topics I am familiar with, like work related to some of my publications. My only complaint would be that it would sometimes misdescribe the results in minor ways. Fortunately, none of these misdescriptions were significant. After more testing, I will provide a Tutorial on using it and Perplexity for educational use cases.

ChatGPT search rightly identifying my co-authored article on AI ethics as relevant to my query.

Why Care?

Here are two reasons to take note of this development, one general and one specific to educators:

1. Hallucination Issues Continue to Shrink

Over the past several years, one of the main focuses of AI developers has been to reduce the rate at which their tools produce text outputs that misrepresent the truth.

“Hallucinations” come in many forms. They can arise from AI tools’ training data alone — when a context-less query triggers the AI to produce an output that misrepresents a fact, say — and from AI tools failing to properly analyze information provided to them. The latter are serious obstacles to using AI for many use cases with great utility, like summarization.

AI developers have found significant success in addressing this problem.

For instance, Google’s current Gemini 1.5 Pro is significantly better than the version from February, as well as 1.0 Pro and 1.0 Ultra, at “needle in a haystack” recall. This test involves tasking the LLM with finding and accurately representing a bit of information in its context — a single sentence on a topic in a sea of novel data submitted to it (in-depth explanation from DeepMind here). Gemini 1.5 Pro shows impressive performance even as the context increases to 1 million tokens and beyond (it’s better than ChatGPT-4 at all points, including beyond the latter’s 128,000-token context window).

To use another example that requires reasoning plus recall, the May version of GPT-4o gets an 88.7% accuracy rate on the MMLU benchmark, which consists of 10,000+ multiple-choice questions about several dozen subjects. Human domain-experts in each subject are estimated by the creators of MMLU to have an average accuracy rating of 89.8%. At its best, GPT-3 was at 43.9% in 2020.

As hallucination in its many forms improves for AI tools like Gemini and ChatGPT, their ability to accurately represent completely new information — that isn’t found in their training data — improves.

ChatGPT search wouldn’t be a viable tool if the information it receives and (seemingly) displays was typically inaccurate and if we had reason to think that its developers were not making fast progress on remaining accuracy issues.

2. Training of Research Skills Requires Reevaluation

With the libraries of old, research required navigating indices of printed works, interfacing heavily in person with human librarians, manually writing transcriptions, etc.

With computer-enabled libraries, research evolved, requiring us to know how to navigate search engines, access electronic-only journal articles behind paywalls, etc.

With AI-enabled libraries where the AI suffers from serious inaccuracies, research still further evolves, as the AI can help brainstorm ideas and provide critiques or sparks of insight. Think early ChatGPT.

With AI-enabled libraries where the AI is as accurate as domain experts but lacks access to new information, it is a crystallized assistant, like a domain expert who you ask questions after they’ve retired and moved to Bermuda to do something new.

With ChatGPT search, Perplexity, and other AI tools that can access and represent new information to users, we see the combination of (ever-increasing) domain expertise and the ability to draw upon it in real time.

This means that teaching research skills must evolve yet again.

The traditional emphasis on finding reliable sources remains crucial, but we must now teach students to critically evaluate AI-mediated research processes.

How do we ensure students understand the difference between an AI summarizing multiple sources versus reading those sources directly? When should they do one and when should they do the other? What skills do they need to verify AI-provided information effectively? What does research look like when verification isn’t needed?

These questions become especially pertinent as tools like ChatGPT search become more sophisticated at presenting complex information. When an AI can coherently synthesize dozens of sources and present them with citations, the core research skills shift from finding disparate information to prompting and evaluating syntheses.

For educators, we need to update our information literacy curricula to account for these new tools. The $1,000,000 question is the degree to which these tools and future versions of them will — and should — simply supplant or replace traditional research skills.

What will the library of 2100 look like?

📢 Quick Hits:
AI News and Links

1. ICYMI: The Department of Education’s Office for Educational Technology has released a new report: “Empowering Education Leaders: A Toolkit for Safe, Ethical, and Equitable AI Integration.”

2. Over 25% of all new code at Google is now generated by AI and then reviewed by human engineers, CEO Sundar Pichai revealed during the company's Q3 earnings call, demonstrating AI's growing role in Google's internal operations. The tech giant reported strong financial results with Google Services revenue up 13% to $76.5B and Cloud revenue up 35% to $11.4B, driven partly by AI product adoption.

3. A new Ithaka S+R survey reveals that while 63% of biomedical researchers have experimented with AI, only 7% use it regularly, with most citing concerns about accuracy as the main barrier to adoption. The study found AI is primarily used for writing/editing and literature review tasks rather than actual experiments, with 31% using it for grammar review but just 4-7% using it for hypothesis testing or experimental design.

4. “How NotebookLM Was Made,” a discussion with its creators.

5. Adobe has unveiled its Firefly Video Model, enabling AI-generated video content through text prompts and extending its successful Firefly platform (which has already generated over 12 billion images and vectors). Coming to Premiere Pro in beta later this year, the technology will help editors create B-roll footage, extend clips, and generate new video elements — and it is training only on permissioned content.

6. Google has launched a new "Prompting Essentials" course to teach effective AI prompting in five steps, building on its AI Essentials course (supposedly Coursera's most popular AI course globally).

7. Weekly usage of generative AI among senior business leaders has nearly doubled from 37% in 2023 to 72% in 2024, according to a new Wharton-GBK Collective report surveying 800+ executives at large organizations. The study finds that while AI excels at specific tasks like data analysis and contract drafting, business sentiment has shifted from initial "curiosity" to being "pleased" and "excited," though companies are still working to determine AI's full ROI impact. (Executive summary here; full report here.)

📬 From Our Partners:
An AI Agent Customized to Help You

Your own AI clone, with memory

Imagine if you had a digital clone to do your tasks for you. Well, meet Proxy…

Last week, Convergence, the London based AI start-up revealed Proxy to the world, the first general AI Agent.

You can sign up to meet yours!

Meet your Proxy today

🧪 Call For Interest:
AI Course Scheduler for Departments

As I reported a month ago, I am putting the finishing touches on the beta version of an AI department/school/unit course scheduler.

It accepts as inputs the spreadsheets and narrative faculty preferences departments currently use for faculty teaching assignments. It then interprets these inputs approximately like a human scheduler would.

A standardized input form will be available but optional; I am assuming that most (human) department course schedulers don’t want to try to wrangle their faculty to use it and that they would rather dump in whatever data they currently gather to conduct scheduling.

The system outputs faculty-to-course assignments and can suggest course adjustments within given constraints.

If you’re interested in this AI tool, email me either to be notified when it's available (at a competitive price point) or to participate in beta testing at a discount.

Email me by responding to this email or clicking the below button:

And if you don’t handle scheduling for your department/school/unit but know who does, tell them about this to help them out!

🧰 A High-Impact Tip for Your AI Toolbox

Here’s one crucial tip for working with all large language models (LLMs) except for GPT-o1:

Always break down complex tasks into steps,
even when you don't need to see the intermediate work.

To use a toy example, rather than asking ChatGPT "What are the key themes in 'The Great Gatsby'?", you might prompt as follows: "First, list the major characters and their primary motivations. Then, identify recurring symbols and their contexts. Finally, synthesize these elements into major themes."

(And then you could break these steps into further steps, and those into further steps, and … . For a deep dive on how to prompt for very complex use cases like grant applications, see my ✨Premium Tutorial.)

Directing LLMs to use a “step-by-step” approach (called “Chain of thought” in the academic literature dedicated to empirically testing prompts) works better for two distinct reasons.

First, the step-by-step guidance helps the LLM produce better responses, even if the LLM only produces the final output. That is, it gets better results even if the outputs produced in response to the intermediate steps are not sequentially represented in the LLM’s message to you. Strange, but true!

Second, and more powerfully, you can ask the LLM to show its work at each step, which then is entered into its context for subsequent generation. When the LLM's final conclusions are grounded in its explicit intermediate analysis, they tend to be more accurate and better supported.

This is key with accuracy-sensitive use cases, like grading multiple-choice quizzes or complex data analysis.

Note: This advice is less straightforwardly relevant to GPT-o1, which does better with simple directions because it manifests a step-by-step process in its hidden “reasoning tokens” as a result of its novel training method. (More on it here.) All other models benefit from being prompted in accordance with the above advice.

What'd you think of today's newsletter?

Graham

Let's transform learning together.

If you would like to consult with me or have me present to your team, discussing options is the first step:

Feel free to connect on LinkedIN, too!