I recently had to submit a time-off application at my sister's university after her surgery. When she asked me to write an application -- I was on my computer and opted to type and print an application instead of writing one by hand. When the application was submitted -- I got a call from an administrator at the university who insisted that the application cannot be accepted as it was "AI generated" and not written by me. Upon inquiring further, he disclosed that they use various "tools" for plagiarism and AI detection.

The said application was entirely written by me without the aid of any AI tool. But this discussion prompted me to conduct a small test to see how reliable these AI detection tools are. The results are very inconsistent, to say the least.

AI Detection Tools used for testing

I used 3 tools available on the internet that didn't require me to register before using:

Undetectable.ai uses multiple tools under the hood including GPTZero, OpenAI, Writer, CROSSPLAG, CopyLeaks, Sapling, ContentAtScale, ZeroGPT.

Sample Data

For sample data, I used two articles that I had wrote myself before ChatGPT became mainstream. These articles were entirely written and structured by me without intervention from any AI tool.

To further test the capabilities of these tools, I asked ChatGPT (GPT-4o) to spin and re-write both articles in a different tone and an enhanced overall structure (I have since discarded the AI generated articles so I cannot post links to those!).

Simple non-scientific testing method

I used both, human generated articles, and AI generated articles to the each tool one by one, and noted down the results.

The articles were simply copy-pasted without any modifications or additional context.

Results

As you can see, none of the tools listed were able to perform with the same level of accuracy on two different articles.

While the article (AI Version), What is proof of residence?, was rated to be 97% human written, GPTZero was able to detect 91% probability of AI generated content in the second article.

Undetectable.ai on the other hand, rated human written version of "What is mid-market exchange rate?" to be only 5% human generated and 95% AI generated. It is to be noted that Undetectable.ai uses multiple tools under-the-hood, including our other contestant, GPTZero.

Conclusion

While some of the tools can predict the probability of AI generated content quite comprehensively under certain circumstances, it is important to keep the above results in mind while evaluating for official tasks including plagiarism checking, and AI detection in assigments and papers submitted for academic purposes.

Don't entirely rely on the results generated by such tools and punish students.