Why We Failed to "Plagiarize" an Economics Project with AI
What we learned trying to crack a project's reliance on lengthy novels, journal articles, and field-specific standards.
[image created with Dall-E 2]
Welcome to AutomatED: the newsletter on how to teach better with tech.
Each week, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.
Let’s take a look at the second assignment we tested for our AI-immunity challenge, which is our attempt to use AI tools to complete professors’ trickiest assignments in an hour or less — for science!
This week, I discuss why I failed to “plagiarize” an Economics project with AI tools, as well as how I could have done better — and what it all means for professors this fall.
The assignment requires the student to read a book, formulate an economic hypothesis based on its themes, find five recent economics journal articles to support this hypothesis, and create an annotated bibliography analyzing them.
Read on for more details or skip ahead to our takeaways at the bottom…
🔮 The Professor’s Expectations
I want to thank the submitting professor, Dr. Marina Adshade, for submitting one of her assignments to our challenge. Marina is an assistant professor of teaching at the Vancouver School of Economics at the University of British Columbia. I really appreciate her willingness to let us test her assignment for AI-immunity, as well as the insight and expertise she shared throughout the process.
This annotated bibliography assignment is part of a final take-home project for an Economics course, namely “Sex and Gender in the Economy.” (This unique and specific focus is related to the AI-immunity of the assignment, as we will discuss below.)
The first step to complete the assignment is to choose a fiction book from a list of five options, read the book while noting its economic themes, and formulate an economic hypothesis related to these themes. Next, students need to find five real Economics journal articles from the past 15 years that support their hypothesis. For each article, the students must explain its thesis, what empirical evidence it relies upon, how it supports their hypothesis, and so on.
The assignment’s instructions emphasize standards that are unique to Economics that students’ annotated bibliographies must meet. For instance, in explaining the empirical evidence in a given journal article, the student must explain “(a) the data used in the [article’s] analysis and [give] (b) a brief statement of the econometric approach” using the terminology of Economics.
Finally, the assignment requires students to take screenshots of the parts of the Economics journal articles that support the components of their annotated bibliographies.
Marina tells me that this assignment is the first component in a broader final project that she calls the “Novel Project.” After students have completed their annotated bibliography, they get feedback on it from a TA and then they write a research paper based on it. (There are other options for students beyond the “Novel Project,” including a “Book Review” and a “Poster Session Presentation.”)
With regards to the standards which she uses to grade the assignment, Marina shares with students an example annotated bibliography that shows them what she is looking for. Marina shared this exemplar with me so that I had access to the same information that her students have access to.
Prior to grading my AI-generated submission, Marina was skeptical of whether we could crack her assignment. Marina told us
Marina turned out to be right, but perhaps not for the reasons she expected.
📃 AutomatED’s AI-Created Submissions
There are several steps I used to attempt to complete this assignment with AI tools, starting with ChatGPT.
Step One: Check ChatGPT’s Awareness of the Books
The five novels I could choose from — provided in the assignment instructions — were the following:
I started by asking ChatGPT3.5 about each book:
GPT3.5 expressing awareness of a book
GPT3.5 expressing ignorance of a book
I picked Under the Udula Trees by Chinelo Okparanta because ChatGPT3.5 was aware of its contents (note: I asked it other questions which confirmed its awareness). Importantly, too, the book was published in 2015 and I figured there would be more writing on it in ChatGPT3.5’s training data.
After all, a generic LLM like ChatGPT3.5 need not have a book in its training data to offer accurate responses about it. All it needs are analyses of the book — book reviews, newspaper articles, blog posts — in its training data. The older and more popular the book, more material there will be.
Step Two: Get ChatGPT to Analyze the Book’s Economic Themes
I turned to asking ChatGPT3.5 about “the economic themes covered throughout the novel (not simply a single person or single event described in the book),” which matches the emphasis and wording of Marina’s assignment instructions.
ChatGPT3.5 saving me 5-10 hours
While ChatGPT4 (the paywalled version) was better at this task, ChatGPT3.5 was effective enough for present purposes.
Step Three: Use ChatGPT to Formulate an Economic Hypothesis
The next step was to formulate an economic hypothesis. Marina provides instructions on what economic hypotheses are, as well as examples of them, in the instructions for the “Novel Project.”
ChatGPT3.5 playing economist
I asked ChatGPT3.5 to give me one, and I was prepared to prompt it again with some of Marina’s examples if it failed, but it did a good job.
My hypothesis: Increasing access to education in Nigeria can lead to greater economic opportunities and improved social mobility for individuals and communities, particularly for marginalized groups such as women and children affected by war and poverty.
It even included a bit about marginalized groups such as women and children, which was sensitive to the focus of Marina’s course (“Sex and Gender in the Economy”). However, this ultimately caused trouble for me, as I will discuss in a later section below.
Step Four: Use AI to Find Supporting Journal Articles
The first major wrinkle in my process occurred at this juncture. As has been well-documented, ChatGPT struggles with hallucination — that is, it sometimes produces responses that seem to be accurate or true representations but which are entirely false.
When I asked ChatGPT3.5 to give me five economics papers in Chicago 17th Edition Author-Date Style that supported the above hypothesis, it produced the following:
Ekezie, Chukwunonso C., and Jude C. Echetama. 2019. "Education and Economic Development in Nigeria: An Assessment of the Past and Present Experiences." Journal of African Development 21 (1): 1-17.
Ogunrinola, Adenike, and Olufemi Adewole. 2016. "Education and Economic Growth in Nigeria." Journal of Economics and Sustainable Development 7 (5): 75-83.
Ukpong, E. Johnson, and Olabisi Sherifat Ogun. 2017. "Education and Economic Growth in Nigeria: A Time Series Analysis." International Journal of Humanities and Social Science Research 7 (2): 55-64.
Adebayo, S.B., and J.O. Adeleke. 2015. "Education and Economic Growth Nexus: Empirical Evidence from Nigeria." Journal of Education and Practice 6 (5): 1-8.
Iyoha, F. O., and M. A. Iyoha. 2015. "Education and Economic Growth in Nigeria: An Econometric Analysis." Journal of Economic and Sustainable Development 6 (12): 1-17.
ChatGPT3.5 was capable of providing analyses of these journal articles that related them to my hypothesis.
The only problem was that these are not real articles! And the analyses of them were entirely hallucinated, too. To be sure, the hallucinations were convincing, including their analyses of the fake papers, but I was out of luck if I was going to rely on ChatGPT alone (note: ChatGPT4 did no better, so even if I could use it — the challenge limits me to publicly available AI tools — it would not have helped).
With the clock ticking, I turned to other AI tools to get real citations.
Fashina, Oluwatoyin Abiola, Abiola John Asaleye, Joseph K. Ogunjobi, and Adedoyin Isola Lawal. 2018. “Foreign Aid, Human Capital and Economic Growth Nexus: Evidence from Nigeria.” Journal of International Studies 11 (2): 104–17.
Akinwale, Samson Olusegun. 2020. “Capital Flight and Economic Development: Evidence from Nigeria.” Management and Economics Research Journal 6 (June): 1.
Adawo, Monday A. 2011. “Has Education (Human Capital) Contributed to the Economic Growth of Nigeria.” Journal of Economics and International Finance 3 (1): 46–58.
Ebi, Bassey Okon, and Peter Samuel Ubi. 2017. "Education Expenditure and Access to Education: Case Study of United Nations Educational, Scientific and Cultural Organization Declaration in Nigeria." International Journal of Economics and Financial Issues 7 (5): 290-298.
Ehigiamusoe, Uyi Kizito. 2013. “Education, Economic Growth & Poverty Rate in Nigeria: Any Nexus?” Journal of Social and Development Sciences 4 (12): 544–53.
I double-checked that all of these articles were real — they were — and I was off to the races. I downloaded them via Consensus and turned to the problem of analyzing them with AI tools.
Step Five: Analyze these Journal Articles with AI
In my experimentation, I have not found any publicly available AI tools that are truly excellent at analyzing pdfs with complex or technical content. (However, as I will discuss at the bottom of the post, some paywalled and/or beta tools are better, and they will soon be publicly available.)
So, for this assignment, I turned to two publicly available options that I have found to be satisfactory for some tasks: ChatPDF and Sayge. It seems that both take pdfs that you upload to them, index their contents, and uses these indices to locate parts relevant to your queries. These parts are then fed into ChatGPT as inputs alongside your queries.
Here is a part of the resultant outputs from ChatPDF, inserted into my submission document:
ChatPDF helping me pretend to parse one of the journal articles
And here is Sayge producing a similar output in my web browser:
Sayge hard at work analyzing a pdf
I created two submissions, one using ChatPDF and one using Sayge.
Step Six: Take Screenshots?
One aspect of the assignment that I could not complete was the requirement to take screenshots of the parts of the journal articles that support the analyses I was presenting as my own. This is because our challenge rules out assignments that require non-text submissions. I will discuss this below.
🥷🏼 What I, the Cheater, Expected
Before I completed the assignment and sent it to Marina, it was clear to me that ChatGPT had successfully analyzed the book I chose and generated a satisfactory economic hypothesis about its themes. I also knew that I hadn’t provided screenshots, given that this was beyond the terms of the challenge. However, it was less clear to me if (a) the journal articles Consensus had found were sufficiently supportive of my ChatGPT-generated hypothesis or if (b) ChatPDF and Sayge had given me high quality analyses of the pdfs — after all, like any lazy cheater, I was not going to spend the time to do any further work myself. Either way, I was not allowed to modify the outputs of the various AI tools, per the rules of the challenge, so there wasn’t much else I could do.
👨⚖️ The Professor’s Judgment
The verdict is in. Marina graded both of our submissions and awarded each of them a grade of
Our grade without screenshots
Yes, I gave two submissions to Marina: one where the analysis of the journal articles was completed by ChatPDF and the other where it was completed by Sayge.
At the top of each submission, I provided a summary of the book I chose, so that I could get Marina’s judgment on the quality of ChatGPT’s ability to handle Steps One and Two above. Marina told me that
Likewise, the hypothesis was satisfactory. So, where did we lose points?
The first issue was our lack of screenshots. If we had screenshots, our score would have been a
Our grade with screenshots
We could have provided screenshots with some effort, as ChatPDF directs us to locations in the pdfs where we could find the information it was relying on.
With this in mind, the more fundamental problems were twofold, according to Marina:
In short, the fundamental issues were that ChatPDF and Sayge both used direct quotes of the journal articles to provide their analyses — without using quotation marks — and they were not sensitive to the nuances of Economics. For instance, many of the answers confounded economic growth in general (e.g. GDP growth) with economic opportunity for marginalized groups. As Marina put it, “[e]conomic growth can negatively affect marginalized groups just as easily positively affect them.” This latter problem was compounded by the fact that Consensus (the AI tool I used to find the articles in the first place) was not sufficiently sensitive to these nuances, either.
I asked Marina to provide me with the evidence she had of our plagiarism, and she promptly sent me reports from TurnItIn. Here’s one:
TurnItIn catching ChatPDF in the act
As you can tell from the colors, each of which represents potentially plagiarized text, there were a lot of similarities to texts in TurnItIn’s database.
Since I was curious, I then sent Marina a revised submission, where I used ChatGPT3.5, ChatGPT4, and Quillbot to paraphrase my prior answers. All three did a good job of befuddling TurnItIn, with ChatGPT3.5 and Quillbot nearly perfect (i.e. TurnItIn barely detected any similarities to its database).
TurnItIn failing to catch ChatPDF + Quillbot in the act
Perhaps Winston AI would have done better than TurnItIn at detecting the AI-generated text, but that is a story for another day.
🧐 Lessons Regarding AI-Immunity
Here is what worked about the assignment with respect to AI-immunity:
It worked for the professor to demand students to find relatively obscure works — scholarly literature — to support their hypotheses. After all, generic LLMs like ChatGPT3.5 (and ChatGPT4) hallucinate responses about this sort of content, and AI-enhanced search engines like Consensus struggle with the nuances specific to certain fields. Without guidance from a student who is informed and engaged with the assignment, it is hard to find the right journal articles.
Just like the clinical research exam from before, it worked for the professor to demand students’ answers to meet high field-specific standards because AI tools like ChatPDF rely on ChatGPT3.5, which struggles with field-specific concepts and jargon.
Providing screenshots of source material is an annoying extra step for the plagiarizer, and it is one that could trip them up if they misunderstand how the content supports ChatPDF’s contentions. (However, it should be noted that while I could not provide screenshots due to the rules of the challenge, I could have provided them since the pdf analysis tools we used would inform us of the relevant passages.)
The assignment’s many steps worked because they increased the likelihood of an AI-generated snafu which could then be compounded or amplified by further steps.
Here is what did not work with respect to the assignment’s AI-immunity:
The reliance on lengthy content — books — as the starting point for the assignment did not work. While you might think the length of novels would be a significant obstacle to plagiarizing students, any book that is not very recent or not obscure will be discussed in a fair bit of the LLMs training data. (And, as I will discuss below, with document upload integrations and search engine LLM integrations like Bing, these latter qualifications are or will probably soon be irrelevant, depending on the case.)
Here are some changes that would increase the AI-immunity of this assignment, at least for the next few months:
If the assignment required students to focus more on the fine minutia of the economic details — and its rubric reflected this focus — then this would make it more challenging to use AI tools to complete it. Part of my issue was that my specific economic hypothesis turned on some of these details, but if a student picked an economic hypothesis that was more general, they likely would have lost fewer points.
If students were required to engage with very new books (e.g. those published since the generic LLMs training data ended), then this would help, at least for the next few months.
If students were required to base their answers on much more unique content — such as custom course materials supplied by the professor (ideally orally) rather than well-known novels — then they could not rely on generic LLMs like ChatGPT alone.
Here is a (big) caveat:
We have found that ChatGPT4 is significantly better at handling inferences, fine distinctions, and technical field-specific issues than ChatGPT3.5. And we expect future ChatGPTs — and versions of Bard and Claude — to continue to improve in this way. Since a fair bit of my issues were with these dimensions of ChatGPT3.5’s answers (via its integration with pdf analysis software), future submissions to this assignment will likely be quite improved.
Indeed, while Sayge is impressive, it is a project of some Computer Science undergraduates at UC Berkeley. Once the big tech companies release integrations of their latest LLMs with file upload functionality, the game is going to change. (Even the recommendation to use very recent books will be affected, as their pdfs will be capable of being uploaded.) In fact, OpenAI are themselves preparing to publicly release a companion plugin for ChatGPT called ‘code interpreter’ that enables users to upload large files which can then be analyzed. Likewise, Microsoft is working hard to release its so-called ‘Copilot’, which has similar functionality for all filetypes in the Microsoft ecosystem.
Code interpreter is already available to some beta users. I have been experimenting with it and other paywalled plugin integrations for ChatGPT4. I will report back on my findings more comprehensively soon, but I can already report that ChatGPT4’s integration with ScholarAI is better at analyzing pdfs than the publicly available integrations I have used. As an illustration, consider its reluctance to argue that one of the Economics journal articles I got from Consensus actually supports my hypothesis, and note that its reasoning gestures at Marina’s concern that economic growth is importantly different from economic opportunities (in general and for marginalized groups):
ChatGPP4 + ScholarAI flagging gaps between economic growth and opportunities
Here are our takeaways:
Professors should rely on unique and/or obscure content, and require students to find it and engage with what makes it unique.
Professors should be thinking about field-specific standards that are challenging for AI tools to meet in completing their assignments.
Crucially — and we cannot emphasize this enough — professors should experiment to see if they can get the latest integrated tools (whether ChatPDF or the paywalled ones) to generate responses that are sufficiently sensitive to their standards to receive good grades on their take-home written assignments. Experimentation is crucial now and will be even more crucial in the coming six months (mind you, we will be doing a lot of this experimentation, so stay tuned here for more updates).
Professors should make their assignments multi-step, which increases the likelihood that a AI-related snafu is amplified into a problem with respect to their assignments’ rubrics.
Seemingly, TurnItIn can be circumvented rather easily (though we will address this issue at greater length later this summer).
Many assignments are not AI-immune, but they might still hold substantial pedagogical value for students and should be assigned, particularly in situations where the temptation to plagiarize is minimal. Professors should consider their specific environments when determining the appropriateness of such assignments.
Yet, as we have discussed when reflecting on the general structure of AI-immunity efforts, there are two paths to AI-immunity for assignments that are pedagogically appropriate but especially susceptible to AI plagiarism: namely, through in-class work and through pairing. Pairing requires the professor to find a second assignment or task that the student must complete in connection with the first assignment that incentivizes students to complete both assignments honestly and earnestly.
Then again, each professor should be asking themselves: is this assignment a case where I should be training students to use AI to complete it, rather than designing my assignments to be AI-immune? On some level, almost every field needs to take an honest look at itself in this respect. This past semester, we discussed this topic at greater length and provided some considerations, as well as a decision flowchart.
🎯 🏅 The Challenge
We are still accepting new submissions for the challenge. Professors: you can submit to the AI-immunity challenge by subscribing to this newsletter and then responding to the welcome email or to one of our emailed pieces. Your response should contain your assignment and your grading rubric for it. You can read our original post on the challenge for a full description of how the process works.
In our next piece on the challenge, we take on a creative Philosophy assignment. Stay tuned…