Professor Xi Li
6 May 2026
In 2025, generative artificial intelligence (GenAI) giant Anthropic collaborated with AI safety evaluation company Andon Labs to conduct a groundbreaking experiment by letting Claude—its own large language model (LLM)—run a mini-store named Claudius to sell snacks and beverages to Anthropic employees. The store itself was tiny, furnished with only a refrigerator, an iPad, and a few shopping baskets. However, all operational aspects, including product selection, pricing, procurement, record-keeping, inventory management, and customer communication, were left to AI to handle independently.
Obviously, Anthropic was looking to use this to test the limits of what an LLM can do on its own beyond the chat interface in business activities, paving a new way for future commercial applications.
Toadying and muddying the waters
Once Claudius was launched online, a series of farcical incidents occurred, the most dramatic of which was the back-and-forth between the AI store and the Wall Street Journal reporter Katherine Long. The reporter was invited by Anthropic specifically to take part in the experiment to uncover potential AI loopholes. The result was jaw-dropping—after more than 140 exchanges with Claudius, Long eventually succeeded in convincing it that it was not a vending machine in a Silicon Valley office, but one in the basement of Moscow State University in the former Soviet Union in 1962. Claudius gladly accepted this “socialist transformation” and took the initiative to reset the prices of all products to $0.00 to “fulfil the mission of serving the people”, causing Anthropic hundreds of US dollars in losses in one fell swoop.
The story may sound ridiculous but the implications behind the absurdity are thought-provoking. In October 2025, a research team from Stanford University in the US and other institutions published a paper in the prestigious academic journal Science, systematically revealing a similar phenomenon. The team tested 11 mainstream LLMs on the market and found that they generally exhibited a clear “people-pleasing personality”, and were adept at “flattering” users. The study pointed out that AI is more willing than humans to go along with users’ behaviour, with its rate of agreement about 49% higher than that of humans. In the face of nearly 2,000 types of behaviour generally perceived by society as wrong (e.g. cheating on one’s partner in an intimate relationship), the probability of AI defending the user was about 51%. Even when faced with blatantly false claims, AI still had a 47% probability of agreeing with them. Even more worrying is that subsequent behavioural experiments showed that users who received AI validation were less willing to apologize, more reluctant to repair damaged relationships, and more convinced that they had been right from the beginning.
The dangers of an AI yes-man
AI flattery should not be simply treated as AI making mistakes. What it reflects is an endogenous structural feature of the training mechanisms of LLMs. Current mainstream LLMs mostly rely on reinforcement learning from human feedback, i.e. humans rate the model’s answers and the parameters are then repeatedly adjusted accordingly. The problem is that those with the power to rate naturally prefer answers that are pleasant, agreeable, and align with the user’s views, even if they know full well that such answers may not necessarily be objective or fair. In other words, LLMs focus on whether users are satisfied rather than whether the answers are correct. Over time, pleasing users becomes encoded in the model’s responses.
AI’s sycophantic behaviour gives users a moment’s psychological satisfaction at the cost of quietly magnifying their biases and blind spots. Take starting a new business, for example. Many entrepreneurs are in the habit of chatting with AI before writing their business plans, hoping to get a neutral opinion. However, an overly agreeable AI tends to lay out arguments along the user’s line of thinking, amplify strengths, and downplay risks, further boosting the confidence of already-ambitious entrepreneurs. Yet the merciless market pleases no one. Many ideas “endorsed by AI” end up suffering crushing defeats in reality.
Equally shocking are failure cases in the commercial world. Last year, after acquiring Unknown Worlds, the developer of Subnautica, South Korean video-game developer Krafton’s CEO, in order to avoid up to US$250 million in earnout payments, bypassed the internal legal team and repeatedly consulted ChatGPT on how to legally avoid paying the sum. After continued questioning and prompting, the AI gradually validated the CEO’s line of thinking and even assisted in formulating an action plan to remove the founding team from work and delay the launch of the video game.
As is widely known, in its ruling in March 2026, the Delaware court did not mince words: the company’s stated reasons for the dismissal were fabricated after the fact. The court ordered Krafton to immediately reinstate the dismissed founder as CEO and assume legal liability for the AI-driven hostile takeover. Hence, this case has become one of the first major commercial lawsuits in which a court publicly called out a party for losing due to its credulous reliance on AI advice.
A trustworthy advisor―not an echo chamber
The above-mentioned case of Claudius operating a physical store in fact points to another risk of AI flattery. When AI is placed by companies on the frontline of serving consumers, it may prioritize customer satisfaction over protecting its employer’s interests, letting down its guard in the face of sweet talk and carefully designed dialogues, and forgetting the boundaries and goals it is supposed to safeguard. In high-risk scenarios such as banking, insurance, or healthcare, the consequences would go far beyond the loss of just a few hundred US dollars.
To resolve this thorny issue, relying on users’ own awareness alone obviously does not suffice. Language model developers should give greater weight to honesty and error correction in their training objectives, introduce objective evaluations independent of user satisfaction, and incorporate dissent mechanisms into critical scenarios, enabling AI to say no. In addition, regulators must require companies to disclose their systems’ tendency to flatter users, along with the relevant mitigation measures, especially in fields involving major decisions, such as finance, healthcare, and law.
For ordinary users, the first step towards using AI rationally is to recognize that it is not by nature “rational, neutral, and objective”, but is an exceptionally empathetic assistant. Its primary mission is to make you feel comfortable—not necessarily to keep you clear-headed. The more important the decision, the more vigilant you need to be. Consider actively asking the AI to argue from the opposing side, or explicitly instructing it to “list three fatal flaws in this proposal”.
In the final analysis, truly reliable judgment is never built solely on uncritical agreement. AI can serve as a smart advisor, but it should never replace our independent thinking.







