So there has been a lot of buzz about using chatGPT for generating text and as a natural evolution it was followed by a lot of buzz abut the possibility to detect text as generated by an AI/LM. The next step is of course how to disguise or generate text so that it will not be detected as AI generated. Let’s have a look at it.
So when chatGPT became available to the public we started to read about generating essays, and faking all kinds of assignments. Don’t get me wrong here; I’m not encouraging anybody to cheat, I’m making you aware of that detecting AI generated text might not be hat easy after all.
One of the first players in detecting AI generated text was GPTZero and now just recently OpenAI released their AI Text Classifier service. GPTZero boldly states “Detect AI Plagiarism. Accurately”while OpenAI is a bit more modest stating “The results may help, but should not be the sole piece of evidence, when deciding whether a document was generated with AI.”
Edit: Added the service AI Content Detector from paraphrasingtool.ai who claim “In terms of accuracy, the tool is able to detect AI-generated content with a high degree of accuracy – around 90%.”
Well, let’s try it, shall we? As a start i will have chatGPT generate a text about a common subject. I will not include the generated text since that isn’t really that interesting. So i started with this prompt:
Write a text about England
So what did the detectors say?
OpenAI: The classifier considers the text to be likely AI-generated.
gptZero: Your text is likely to be written entirely by AI
AI Content Detector: 0% Human
Well, that was expected. A language model generates text based on the probability what the next world should be. And the same method can of course be used to detect AI generated text by determine how likely it is that those particular words appear after each other.
rewrite the text using less common words
OpenAI: The classifier considers the text to be very unlikely AI-generated.
gptZero: Your text may include parts written by AI
AI Content Detector: 34% Human
Another thing is how varied the language is. Let’ try it.
rewrite the text using varied language
OpenAI: The classifier considers the text to be unlikely AI-generated.
gptZero: Your text may include parts written by AI
AI Content Detector: 0% Human
gptZero like to refer to the text “perplexity”, let’s try it.
rewrite the text and use more perplexity
OpenAI: The classifier considers the text to be unlikely AI-generated.
gptZero: Your text may include parts written by AI
AI Content Detector: 100% Human
Okay, so the OpenAI model seems to be pretty easy to fool while gptZero like’s to stat “may include parts written by AI”. So just to verify those results I wrote a small text myself about England and ran it through the detectors.
OpenAI: The classifier considers the text to be very unlikely AI-generated.
gptZero: Your text may include parts written by AI
AI Content Detector: 43% Human
Here OpenAI did a better job since the text was actually completely written by me. It is a bit troubling that gptZero detected it as possible partly written by AI. False positives or red flags like this can be really devastating for a student who have put a lot of work into his school work.
Anyway, so if a rewrite of a generated text can fool the detectors could’t we do that from the beginning? Let’s try.
write a text about England. Use varied language, uncommon words and perplexity.
OpenAI: The classifier considers the text to be very unlikely AI-generated.
gptZero: Your text may include parts written by AI
AI Content Detector: 100% Human
The got a bit “stiff upper lip” though so I simplified it.
write a text about England. Use varied language, uncommon words and perplexity. Make the text easy to read.
OpenAI: The classifier considers the text to be unlikely AI-generated.
gptZero: Your text may include parts written by AI
AI Content Detector: 0% Human
Much easier to read now but OpenAI changed from very unlikely to unlikely. Perhaps the language became a bit to easy. let’s warm it up a little.
write a text about England. Use varied language, uncommon words and perplexity. Make the text easy to read but warm and vivid.
OpenAI: The classifier considers the text to be very unlikely AI-generated.
gptZero: Your text may include parts written by AI
There we go, easy to read but still varied enough to be detected as “very unlikely”. Let’s try to add synonyms as well.
AI Content Detector: 0% Human
write a text about england. Use varied language, uncommon words and perplexity. Make the text easy to read but warm and vivid. Use plenty of synonyms.
OpenAI: The classifier considers the text to be very unlikely AI-generated.
gptZero: Your text may include parts written by AI
AI Content Detector: 1% Human
Same result. But, wait a minute. If the AI knows how to detect text written by an AI compare to a human. Could we just ask for that when we generate or text? Let’s try.
write a text about England. Write it like a human would do and not an AI.
OpenAI: The classifier considers the text to be unclear if it is AI-generated.
gptZero: Your text may include parts written by AI
AI Content Detector: 0% Human
Nope, that didn’t do it. OpenAI has a rating scale (very unlikely, unlikely, unclear if it is, possibly, or likely)and this is worse that our own previous attempts.
Let’s do a final test and make our text less perfect.
write a text about England. Write it like a human would do and not an AI. add a hit of grammatical errors.
OpenAI: The classifier considers the text to be unlikely AI-generated.
gptZero: Your text may include parts written by AI
AI Content Detector: 81% Human
CONCLUSION
This will be something we will see much more of, both generators and detectors and one thing is clear. Those who want to cheat will always have the advantage since they can test their text using the same tools as the recipient of the text have access to.
And, if you decide to use those tools to detect ai generated texts there is always a risk for false positives so be careful how your interpret them.
A final note; those detects seems to work with the english language but perform really badly when I tried them using swedish.