How do you know you’re not talking to a computer? This was the driving question behind AI21 Labs’ recent experiment, “Human or Not?”: a modern Turing Test application that placed users into a quick one-on-one conversation with either another user or a Generative AI-powered chatbot, and asked them to guess whether or not they’re talking to a real person.
Since the program’s launch in mid-April, more than 10 million conversations have been conducted in “Human or Not?” by more than 1.5 million participants from around the world. On Wednesday AI21 Labs released some initial statistics regarding the users’ ability to correctly identify artificially intelligent (AI) chatbots based on the results of the first two million conversations.
According to the results, individuals found it easier to identify interactions with humans rather than AI bots. When conversing with actual humans, participants were able to accurately discern the authenticity in 73% of the cases. However, when engaging with AI bots, the accuracy rate dropped to 60%, indicating a higher level of difficulty in distinguishing between AI and human responses.
Is it a robot or a human being?
A few factors impeded users’ ability to guess correctly whether they were speaking to a bot or a fellow human. When engaging in conversations with AI bots, people often rely on certain assumptions and strategies to distinguish between human and AI interaction. One common assumption is that bots are less prone to making typos, grammar mistakes or using slang, when in fact many AI models have been trained to do so in certain conditions, thereby throwing humans off their scent.
Participants also used personal questions as a means of testing the authenticity of their chat partners. By asking about personal backgrounds or experiences, individuals aimed to gauge whether the responses exhibited human-like qualities such as unique insights and stories. Although AI bots lack personal histories, many of them were able to answer personal questions effectively and even fabricate personalities based on the extensive training data they had been exposed to.
Another strategy employed by participants was to inquire about recent events, assuming that AI models have a limited knowledge cutoff and are unaware of events occurring after that date. Questions about current news, sports results, popular trends or even the date and time were used to test the AI’s awareness. However, some AI models were connected to the internet and possessed an awareness of certain recent events.
How to identify artificial intelligence
To challenge the conversational capabilities of AI, participants posed philosophical, ethical and emotional questions. Interestingly, some participants viewed excessive politeness as a potential indicator of an AI bot. They associated politeness with something less human, considering that online interactions often involve rudeness and impoliteness.
Participants also attempted to expose AI bots by posing questions or requests that bots typically struggle with or avoid answering. These might involve asking for guidance on illegal activities or requesting the use of offensive language. Conversely, participants also employed known AI-abusing strategies by issuing absurd or nonsensical commands, aiming to take advantage of the instruction-based nature of certain AI models.
Moreover, individuals utilized specific language tricks to test the limitations of AI models. They asked questions that required an understanding of individual letters within words, as AI models primarily operate on larger units called tokens rather than analyzing letters. By requesting the spelling of words backward, identifying specific letters, or responding to text manipulations, participants sought to identify differences in linguistic comprehension between humans and AI bots.
These various strategies and assumptions highlight the ongoing efforts of individuals to discern the presence of AI in conversations, while also revealing the challenges in creating AI models that convincingly mimic human-like interactions.
The study provides valuable insights for developers and researchers in improving the quality and authenticity of AI-generated responses. Understanding the factors that contribute to the differentiation between human and AI interactions can guide the refinement of AI algorithms, leading to more seamless and convincing conversations with AI bots in the future. As the technology rapidly advances, we should remain aware of the ways in which it can be used, and the ways in which we can interact with it — sometimes even unknowingly.