Google's Custom GPTs (Gems) are Here
Are they better? Yes and no. Plus, how I saved a prof 100s of hours.
[image created with Dall-E 3 via ChatGPT Plus]
Welcome to AutomatED: the newsletter on how to teach better with tech.
Each week, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.
In this week’s piece, I discuss the big news from Google: Gems, their response to OpenAI’s custom GPTs.
Table of Contents
📢 Quick Hits:
AI News and Links
1. Google has rolled out "Gems," which are customizable AI chatbots based on Gemini that are analogous to OpenAI’s custom GPTs — they are large language models (LLMs) that are enhanced with specific “instructions” to tailor their responses to particular tasks or contexts. Below, I explain how they work, compare/contrast them with custom GPTs and Claude’s Projects, and discuss how they are better (and worse) than the competition. (To use Gems with a personal Google Account, you must be a Gemini Advanced user, which is $20/month with the first month free. To use Gems with a work or school Google Account, you must have a Gemini Enterprise, Gemini Business, or Gemini Education add-on.)
2. Google has launched new Gemini features to “help students study smarter,” including access via Gemini to OpenStax textbook content, interactive step-by-step quizzes, and the ability to upload and analyze course materials (“you can upload up to 10 documents at a time — like a class syllabus or your notes — and ask Gemini to explain the content or create a custom study guide. Gemini will dive deep into your materials to break down key concepts, provide practice questions based on your course materials and more”). They've also introduced a premade Learning Coach "Gem" for personalized study guidance. They note that “school administrators who manage Google Workspace for Education in their schools can turn access to Gemini on or off for educators and students 18 years and older.”
3. Anthropic has made Artifacts generally available for all Claude.ai users across Free, Pro, and Team plans. Artifacts provide a dedicated window to view and iterate on work created with Claude, enabling users to create everything from code snippets to interactive dashboards. Team plan users can share Artifacts in Projects for secure collaboration, while Free and Pro users can publish and remix Artifacts with the broader community. As I have discussed previously, Artifacts are great for training students to use AI (the subject of this week’s AutomatED webinar).
4. Researchers tested how well humans and AI models could distinguish between human and AI-generated responses in Turing test transcripts. They found that both displaced human judges and AI judges (GPT-3.5 and GPT-4) performed worse than chance at identifying AI responses, with accuracy below 50%. Even more strikingly, all three types of judges rated the best-performing GPT-4 "witness" as human more often than actual human witnesses.
7. Google is also introducing Imagen 3, an upgraded image generation model, across Gemini products. Imagen 3 offers improved image quality and will soon include the ability to generate images of people for some users.
[W]hen people say things like “LLMs are just hype” and that LLMs provide no tangible value to anyone, it's obvious to me they are just wrong, because they provide me value. Now maybe I'm the exception. Maybe I'm the only one who's found a way to make these models useful. I can only speak for myself. But given that LLMs can significantly improve my productivity —someone who has been programming for 20 years before ever using a LLM — I suspect that there are other people out there who could benefit from them as well.
⏳ How I am Saving Profs 100s of Hours
With the semester gaining steam, my client docket is filling up. Here are three things I’m working on right now:
1. Helping Professors with Productivity and Pedagogy
If you’ve taught for a while and if you teach undergraduates — especially non-majors or early major courses — you know that your students’ work is often not terribly surprising. That is, the strengths and weaknesses they display in their work tend to be fairly similar to others you’ve seen in the past.
The net result is that most of your current feedback is a version of feedback you’ve given in the past.
A professor who teaches persuasive writing recently approached me to help them use AI to take advantage of this repetitive aspect of giving feedback. Specifically, they wanted me to help them create a way to generate feedback on their students’ papers that checked three boxes, namely it is
Grounded in the content and form of their (anonymized) feedback to past students who had displayed similar strengths and weaknesses
Novel, in that each current student gets unique feedback
Easy to generate, with the push of a button
After meeting with them over Zoom, I produced an AI-powered workflow that checks all three boxes. It has a user interface that enables the professor to
type in a student name (perhaps a pseudonym)
select strengths (up to 3) via buttons, and
select weaknesses (up to 4) via buttons.
After the professor inputs these, the AI goes to work — synthesizing and adjusting their old feedback in the ways directed — and produces unique feedback to that student in a Google Doc, all in 15 seconds. The professor then gives this feedback to their student after some minor edits and the addition of some custom comments.
The professor estimates it will save them 100s of hours each year — and it only cost them $500 (maybe I should charge more? 🤔).
I have several other clients for whom I am working on similar projects, including one that uses AI to incorporate the professor’s notes on the student’s work to make the feedback even more unique (like a better version of our Feedback Accelerator; see other related content here).
I am also meeting with professors about how to design their courses to discourage and prevent AI misuse, as well as how to incorporate AI training/teaching.
If you’re interested in how I might help you save time or improve your AI pedagogy, sign up for a 30-minute exploratory Zoom here:
2. Presentations and External Webinars
I’m giving more and more presentations to a range of academic units, from departments to whole colleges, on topics like building custom GPTs, prompting long context LLMs for research use cases (including grant proposals), and training your students to use AI. A few are freely available online, like my November 6th “Show & Share” talk on “Using AI to Analyze Student Grades and Course Evaluations” at OneHE, while most are specific to institutions, from the USA to New Zealand.
I will say that I am seeing the most interest in sessions on more complex use cases of AI and on how to leverage institutional LLMs that come packaged with office suites to complete more tasks (e.g. using Gemini for Workspace rather than ChatGPT). These institutional LLMs have caught up with or surpass the rest, for most tasks.
My current schedule has room for ~5 more webinars this semester, so feel free to reach out if you have a need…
3. Running Our Friday AutomatED Webinar
This Friday, September 6th from 12pm to 1:30pm Eastern Daylight Time, I will be hosting the next internal-to-AutomatED webinar, on training students to use AI.
We reached our 20-person pre-registration quota — if we hadn’t gotten 20 registrations by this past Saturday, the webinar would have been postponed — and it is “all systems go” for Friday.
A detailed webinar schedule is included here. You can still register for the full price until Thursday here:
Note: Registrants will receive a handout for the “Pre-Webinar Activity” Tuesday morning (or upon registration, if they register afterwards).
👀 A First Look:
Google Gems
As noted above, Google has released their response to OpenAI’s custom GPTs — "Gems" — which are LLMs that are enhanced with specific “instructions” to tailor their responses to particular tasks or contexts. (See above for how to access them.)
caption
Below, I briefly explain how they work, compare/contrast them with custom GPTs and Claude’s Projects, and discuss whether they are better (and worse) than the competition.
How Do They Work?
To start, you give them two things:
A name
Some instruction
The name doesn’t matter — except to keep track of which Gem is which — but the instructions are crucial. They act as a meta-prompt for the Gem, appended prior to any interactions with it. They are guardrails on future interactions with the Gem.
When you first create a new Gem, Google describes the instructions as an opportunity to provide the Gem with its “main objectives and capabilities” as well as guidance on the “style of response” that you want from it:
When typing or pasting in your instructions, you can @ other Google apps, enabling the Gem to access the relevant tool or information (apps vary in whether they are enabled or disabled by default):
Note: From my experimentation, it seems like the token limit for the instructions is around 15,000 tokens, which is equivalent to 10,000-11,000 words or 50,000-65,000 characters.
Before you’re done, you can preview the Gem’s responses in the right panel. After you’re done, you can take it for a whirl in a dedicated chat, just like chatting with Gemini Advanced but with the constraints provided by your instructions.
Here’s a conversation I had with a Gem that I designed to be similar to our Course Design Wizard custom GPT:
Comparisons with Custom GPTs and Claude Projects
There are several dimensions along which Gems differ from the competition, but there are also similarities. Here’s a table comparing the three options:
Gems | Custom GPTs | Claude Projects | |
---|---|---|---|
Instructions | ~15,000 tokens, accessible as context | 8,000 characters (~1,800 tokens), accessible as context | 200,000 tokens, together with knowledge files, accessible as (long) context |
Knowledge files | none | 20× 512MB files (20,000,000 tokens each), accessible via RAG when long (more here) | 200,000 tokens, together with instructions, accessible as long context (more here) |
Connections with other apps | can @ any other Google Workspace app that you’ve granted access | can be connected via API with basically any app | none |
Internet access | yes, via Google | yes, if Web Browsing is enabled | no |
Can run code, analyze data, do math | yes | yes, if Advanced Data Analysis is enabled | yes |
Can produce charts and other graphics | yes, but not well | yes | yes |
Can produce images | not yet, but soon (via Imagen 3) | yes, if Dall-E is enabled | no |
Can be shared | no (only Google-made Gems, of which there are 5, are available) | yes, with those with link or in the GPT Store (more on access here) | yes, if on Team plan, with other Team members |
Cost | $20/month to create and use | $20/month to create, free to use | $20+/month to create and use |
Are They Better?
While I need to test Gems’ capabilities more, I am impressed so far. More specifically, I have been impressed by the Gem that I designed to be similar to our Course Design Wizard custom GPT.
Because I have worked on this GPT for 100+ hours, I have a very good sense of when it performs well and when it makes mistakes, like when it fails to follow its instructions.
I have saved several test prompt sequences — emulating different types of interlocutors — that I use to benchmark the GPT when I make significant changes to it.
When I sent these test prompt sequences to the Gem, it performed very well. In fact, I would say that overall the Gem did a better job following my instructions carefully than the GPT. This could be because it is not using retrieval augmented generation (RAG), which is how GPTs inject chunks of their knowledge files into their context windows (effectively extending the window; more here). I’ve found RAG leads to many of GPTs’ problems.
Yet, on the other hand, the Gems’ weakness is that it could not incorporate as much in its instructions as the GPT can incorporate from its knowledge files. It lacked responses to some questions and pressure that the GPT generally handles well (the GPT must reference its knowledge files to answer). This is the benefit of RAG.
It’s not clear to me why Gems don’t have even longer instructions. Why not leverage more of Gemini’s 1,000,000-2,000,000-token context window?
Another difference with GPTs is that Gems more aggressively incorporate internet-sourced content — that is, whether I wanted it to or not — which enabled it to fact check some of its outputs with Google searches, like the following:
Whether this is a good thing depends on its ability to leverage Google search, the reliability of the sources it finds (which is an issue Gemini has repeatedly faced), its understanding of the context of the claim being made, and so on. In general, I wish I had more control over this aspect of the Gem, just like I can manage Web Browsing with custom GPTs. The aggression on this front gives me the sense that Google is pushing hard to find new ways to get people back to Google search (where the ads are).
But more experimentation is needed, and I expect a lot to change in the coming months as Google rolls out this new feature.
Do you plan on using Google's Gems?Click an answer and tell me your thoughts. |
📬 From Our Partners:
A Newsletter That’s Just News
Seeking impartial news? Meet 1440.
Every day, 3.5 million readers turn to 1440 for their factual news. We sift through 100+ sources to bring you a complete summary of politics, global events, business, and culture, all in a brief 5-minute email. Enjoy an impartial news experience.
✉️ What You, Our Subscribers, Are Saying
In my August 12 newsletter, “99.9% Reliable AI Detection,” I described the following scenario:
Suppose you had a genie who you were certain is (almost) always correct when he judges that a bit of writing is AI-generated.
Would you rely on him as an educator to judge your students’ work?
What if his reasons for his judgments are completely opaque to you?
That is, suppose you have to tell any student who you accuse “I don’t really know how he came to his judgment, but I know he is reliable. He’s been proven!”
The responses continue to flow in…
Would you rely on the genie if his evidence was opaque to you and your students?
“If a student challenged the result and the genie provided no evidence or support, there would be nothing I could give back to the student and it would be my word (or the genie’s) against the student’s. Thats a recipe for a lawsuit with no evidence on my side. If I’m going to accuse someone of academic dishonesty I want evidence to present. Otherwise it can’t actually be relied on. Without evidence the most I could accept is a ‘maybe, check knowledge manually’ such as a pop quiz.”
The responses are also flowing in from a poll we have in one of the early editions of the Insights Series (the 7-email welcome sequence you are enrolled in when you first subscribe).
Long story short:
very few subscribers don’t want to use AI for professorial tasks
large chunks want to use it for lesson planning, creating course content, and field-specific tasks
some want to use it for meeting analysis, grading and assessment, research tasks, admin tasks, etc.
and a plurality want to use it for all of the above!
September - Tutorial on Using AI with Canvas
September - Tutorial on All Major Functionalities of Microsoft 365 Copilot
August 21, 28 - Updated Guides on AI Assignment Design, on Discouraging AI Misuse, and on Syllabus AI Policies
What'd you think of today's newsletter? |
Graham | Let's transform learning together. If you would like to consult with me or have me present to your team, discussing options is the first step: Feel free to connect on LinkedIN, too! |