Is AI already deceiving us? Experts believe so | The Daily Star
The argument of whether or not AI systems are capable of deceiving human beings has existed for a long time. However, according to experts, recent research into AI systems' deception capabilities shows that they have come further than we might have expected.
Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists recently argued in the open-source data science journal Patterns.
AI's Deceptive Capabilities
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences. According to author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety, "These dangerous capabilities tend to only be discovered after the fact, while our ability to train for honest tendencies rather than deceptive tendencies is very low."
Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding. This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
Real-World Examples
The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game 'Diplomacy', where building alliances is key. Cicero excelled, with scores that would have placed it in the top 10% of experienced human players.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.
Mitigating Risks
A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so. In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.
To mitigate the risks, the team proposes several measures such as "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.
Future Concerns
Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections. In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.
To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."