✨Tutorial: How to Grade and Analyze Your Teaching with ChatGPT
Ways professors can best use its Vision and Advanced Data Analysis capabilities.
[image created with Dall-E 3 via ChatGPT Plus]
Welcome to AutomatED: the newsletter on how to teach better with tech.
Each week, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.
In this fortnight’s Premium edition, I present a tutorial for prompting ChatGPT, with a focus on the two capabilities that set it apart: visual analysis and data analysis.
My focus is on showing how to leverage these capabilities in general for countless educational use cases by showing you how to use them to complete two tasks that are common — or should be common — in higher ed:
automating the grading of quizzes and tests
quantitatively evaluating one’s teaching effectiveness
Before that, I provide some advice on prompting ChatGPT in general, as well as specifically for visual analysis and data analysis.
Table of Contents
👉 ChatGPT Primer
As I discuss in my ✨Guide on How to Train Students to Use AI, an effective prompt will typically do some combination of the following:
Encourage the LLM to take on a role, preferably of a relevant expert, with information about what this amounts to.
Encourage the LLM to work step-by-step, whether “prior” to producing an output or through a sequence of outputs that show its work.
Make clear the context of the user’s needs.
Enumerate the user’s objectives.
Clarify the practical constraints, like the categories or length of output one needs.
Outline the desired format, tone, or style for the LLM’s outputs.
Give exemplars of prompt-response pairs (illustrating success).
Present sufficient detail about the above.
And, since many cases require a sequence of prompts to achieve desired results, effective prompting also requires an understanding of how to break down complex queries into manageable components or sequences of directives. This skill involves learning how to refine and adjust prompts based on initial AI responses, ask follow-up questions, or request clarifications.
There are countless guides to effective LLM prompting in general that explain these rules of thumb and their application to ChatGPT in greater detail (see here, here, here, and here). I recommend reading them if you don’t feel confident deploying them, because they are all relevant to using ChatGPT’s Vision and Advanced Data Analysis capabilities.
Vision and Advanced Data Analysis are among the notable unique or uncommon features of ChatGPT, including:
A context window size of 128,000 tokens, which is less than Claude’s 200,000 tokens or Gemini 1.5 Pro’s 1,500,000 tokens, and an output size limit of 4,096 tokens (for GPT-4o, ChatGPT’s top version right now). This means that lengthy file uploads, prompts, and conversations will get snipped sooner than with these other models.
Image generation capabilities with Dall-E 3.
Internet access, unlike Claude but like Gemini Advanced.
Custom GPTs and third-party GPTs on the GPT store, as I cover at length elsewhere.
Image analysis capabilities — “Vision” — which I cover below.
Data analysis capabilities built on a sandboxed Python code environment — “Advanced Data Analysis” — which I cover below.
Let’s turn now to what these latter two capabilities are, how to prompt them effectively, some general ways to use them as a (higher) educator, and details on how to use them to automate grading and analyze your teaching. They are the primary differentiators for ChatGPT at the present moment.
👀 ChatGPT’s Vision Capability
General Description & Example Use Cases
ChatGPT’s Vision capability allows users to input images alongside text, enabling the model to interpret and reason about visual content in conjunction with textual information. In essence, the AI is able to perceive, analyze, and describe visual elements with human-like comprehension, including recognition of objects, scenes, text within images, and instances of abstract concepts.
Vision enables a range of tasks, such as detailed image description, optical character recognition (OCR), visual question answering, and complex visual reasoning. Notably, in its current GPT-4o form, it outperforms previous models and alternatives in benchmarks for vision understanding.
As I explained when it was released in its GPT-4o form, Vision is now accessible to all ChatGPT users, including those without a ChatGPT Plus subscription (or API access), although free users are subject to usage limits. GPT-4o Mini is freely available and has Vision.
Here are some ways professors could use Vision to improve their teaching:
Streamline grading of handwritten assignments: Professors can upload images of student submissions for quick analysis and feedback. ChatGPT can interpret diagrams, equations, and text, providing initial assessments on accuracy, completeness, and even suggesting areas for improvement. I focus on quizzes in my extended illustration below.
Facilitate multilingual learning: For language courses or international programs, ChatGPT can translate text within images from various languages. This allows professors to easily incorporate authentic materials like foreign newspapers or historical documents into their curriculum, breaking down language barriers and enriching cultural understanding.
Enhance accessibility in course materials: By inputting images of textbook pages, slides, or handouts, professors can use ChatGPT to generate detailed alt-text descriptions. This aids visually impaired students, creating more inclusive learning environments, but also enables a range of other LMS functionality dependent on completed alt-text descriptions.
Develop UI designs or website code from sketches, photos, or screenshots: Professors can leverage Vision to transform rough sketches or wireframes into more polished UI designs or even generate initial HTML/CSS code. This is particularly useful for courses in web design or digital media, allowing quick iteration on ideas and providing students with immediate visual feedback on their concepts.
After I explain prompting Vision and Advanced Data Analysis, my focus will be on using ChatGPT for automating the grading of quiz or test submissions, whether handwritten or not. This is intended to be an illustration of the process of using Vision, so I will note throughout places where other use cases would require variation.
Prompting Vision
More than six months after its publication, one of the best sources for information on how to prompt ChatGPT’s Vision capability is a 166-page paper from Microsoft researchers Yang et al. It has thousands of examples and illustrations, and they recommend a range of prompting techniques.
Interestingly, most of these techniques are found in the bulleted list I provided at the outset and are applicable to LLMs in general, like showing ChatGPT 2-3 examples of what you want as an output — paired with corresponding inputs — or encouraging it to break down complex visual tasks into steps.
But one of their other recommendations is unique to the Vision context: namely, they advise us to use so-called “visual pointing.'“
Visual pointing is when users refer to specific parts of an image using visual markers or edits directly on the image. These can range from simple arrows or circles drawn on the image to more complex annotations like numerical spatial coordinates or image crops. ChatGPT excels at understanding these visual cues, much like we often quickly understand what someone is talking about when they point to something in the real world during a conversation.
This grounded description capability allows the AI to focus on specific elements while still maintaining an understanding of the overall context. You can also associate pointed objects with written labels or indexes, enabling more complex queries about multiple parts of an image. For instance, in a geography lesson, you could number different landforms in a landscape image and ask ChatGPT to compare and contrast them.
Yang et al. also note some limitations to ChatGPT’s Vision capability. These limitations largely mirror the limitations since noted by OpenAI in their publicly available documentation. Here are the main limitations to keep in mind:
Text clarity: To the degree you can, ensure text in images is large and clear. The model struggles with small or rotated text, which could affect analysis of student work or presentation slides.
Image quality: Clearer images yield better results. If a human struggles to interpret an image, the AI likely will too.
Alphabet constraints: Performance may be suboptimal with non-Latin alphabets. Consider this when working with international students or multilingual materials. (This should improve soon, as languages are a big focus of OpenAI and their competitors.)
Spatial reasoning: Avoid tasks requiring precise spatial analysis (e.g., detailed diagram interpretations that depend on the location of the diagram’s parts relative to one another).
Counting: Expect approximate counts rather than precise numbers when analyzing image content (although Yang et al. note that expert role specification and provided examples improve counting).
File types: Vision supports PNG, JPEG, WEBP, and non-animated GIF files up to 20MB.
Privacy: The model doesn't process file names or metadata, but images cannot be deleted after upload (OpenAI claims to delete them after some time, however). Exercise caution with sensitive materials.
Now let’s turn to ChatGPT’s Advanced Data Analysis capability before seeing each in action.
📈 ChatGPT’s Advanced Data Analysis (ADA) Capability
General Description & Example Use Cases
ChatGPT's Advanced Data Analysis (ADA) capability is an evolution of the former "Code Interpreter" and it enables users to upload diverse data files directly into the chat interface, including from cloud storage services like Google Drive and Microsoft OneDrive.
Leveraging GPT-4o, ADA operates in a secure, sandboxed environment where it can write and execute Python code to perform a wide array of tasks. These include initial coding and thematic analysis for dataset curation, data cleaning and manipulation, statistical analysis, and the creation of customizable visualizations using various Python libraries such as pandas and Matplotlib. As I explained two months ago, recent updates have introduced real-time table interactions and enhanced chart customization options.
But is it reliable?
Prior to GPT-4o, ADA with GPT-4 was already considered generally reliable and useful across various fields. Although comprehensive evaluations of GPT-4o's ADA are still limited, preliminary research in financial services, dermatology, and financial data analysis suggests its performance is comparable to traditional statistical software, with some discrepancies due to implementation differences. However, experts caution that while ADA is a powerful tool, it should complement rather than replace expert judgment, especially in complex cases requiring nuanced understanding from clinical or domain-specific experience.
Below, I use it for much simpler cases and I show you how to prompt it in a way to minimize errors.
ADA holds significant importance for all professors, regardless of their field. In data-intensive disciplines like economics, biology, and engineering, ADA streamlines complex data tasks without requiring extensive software expertise, though foundational statistical knowledge remains crucial.
However, ADA's impact extends beyond quantitative fields. It addresses a critical gap in higher education where professors often lack access to standardized, comprehensive data on student performance, unlike their K-12 counterparts. ADA empowers professors across disciplines to gather, analyze, and interpret valuable data on student progress, enabling more personalized and effective teaching strategies.
Here's a list of ways professors could use ADA to improve their teaching:
Assignment correlation: Find significant correlations between student performances on different assignments or assignment averages.
Predictive analysis: Determine which formative assignments are predictive of summative assessment outcomes.
Content mastery mapping: Discover content areas or groups where students consistently struggle or excel.
Grade weight optimization: Calculate optimal assignment category weight adjustments to align grades with intended outcomes.
Examine attendance impact: Investigate the relationship between attendance, participation, and academic performance to inform engagement policies.
Assess assignment difficulty: Evaluate the effectiveness of exam questions to improve assessment quality and fairness.
Identify performance patterns: Uncover trends across different assessment types to tailor teaching methods to student needs.
Below, I explain how to do the first four of these tasks.
Prompting Advanced Data Analysis
Effective prompting for ADA builds on the general principles I referenced earlier, but with some specific considerations for data-related tasks. The key is to be clear about your data structure, analytical goals, and desired outputs.
As OpenAI explains in their documentation, ADA allows you to work with multiple files in a single conversation (.xls, .xlsx, .csv, .pdf, and JSON), up to a limit of 10 files at 512MB each. When preparing your data for upload, include descriptive column headers in the first row of your spreadsheet or CSV file. Use plain language for these headers, avoiding acronyms and jargon. This helps ChatGPT understand your data structure more accurately, leading to more precise analyses. For example, instead of "Std_Perf_Q1", use "Student Performance Quarter 1".
Once your data is uploaded, start by clearly stating what you want to learn from it. Be specific about your analytical goals. For instance, instead of asking "What can you tell me about this data?", try "Analyze this student performance data to find correlations between attendance rates and final grades across different subject areas” (a case I discuss below).
To gain deeper insights into the analysis process, which can be particularly useful when using ADA as a teaching tool, encourage ChatGPT to explain its steps. You might prompt, "Walk me through your analysis process step-by-step, explaining the statistical methods you're using and why." This also improves its accuracy because it effectively provides its own context as it proceeds in real-time.
Likewise, if the initial results don't fully address your needs, don't hesitate to ask follow-up questions or request modifications to the charts or tables. You might say, "That's interesting, but can we focus more on the top-performing students? Let's filter the data to show only those in the top 25% and rerun the analysis."
OpenAI’s documentation points to some important limitations to keep in mind:
File size limits: Each file is limited to 512 MB, but for CSV files or spreadsheets, the practical limit is lower, at approximately 50 MB, depending on the size of each row.
Data structure requirements:
Data should be organized with one row per record.
Empty rows or columns should be avoided.
Multiple sections or tables in a single spreadsheet are not recommended.
Image limitations: Critical information should not be included in images within spreadsheets, as ADA cannot interpret image content. Upload them separately for Vision to kick in.
Chart interactivity: Only bar, pie, scatter, and line charts are interactive in most cases. Other chart types (like histograms, box plots, heat maps, etc.) are typically non-interactive.
Potential for code errors: While ChatGPT can interpret and resolve many code issues automatically, there's always a possibility of errors in generated code, especially for complex analyses.
Network isolation: The code execution environment cannot generate outbound network requests directly, limiting access to external data sources.
Temporary data storage: The code execution environment instance is destroyed within 13 hours of the conversation becoming inactive, so long-term data storage is not possible, or so OpenAI claims.
Now let’s turn to putting Vision and ADA into action to automate the grading of quizzes/tests and analyze your teaching effectiveness.
🚀 How to Use Vision to Auto Grade Quizzes
In this section, I explain how to use ChatGPT’s Vision capability to automatically grade quizzes/tests. This is intended to simply be an illustration of how one might use Vision, so I note considerations related to how this specific use case differs from others.
I proceed as follows…
I begin by clearly defining the primary functions of my grading system. Next, I specify the input requirements and output expectations, ensuring student data privacy is maintained.
Then I develop a detailed step-by-step process for ChatGPT to follow, from image analysis to result presentation. I address potential ambiguities and implement error handling procedures to manage unclear handwriting or other issues. I create a comprehensive example output to guide ChatGPT's responses and clarify my expectations.
Finally, I explain how to test the system with diverse quiz samples, refining my instructions based on performance.
The resulting set of instructions can be injected as a prompt at the start of a conversation with ChatGPT or used as the custom instructions for a custom GPT.
I’ll release a quiz grader GPT — and a teaching effectiveness analyst GPT — to our Premium subscribers in two weeks in my next free newsletter.
Let’s dive in!
Subscribe to Premium to read the rest.
Become a paying subscriber of Premium to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In