Agentic AI in the Wild: Lessons from Moltbook and OpenClaw

Introduction

In recent weeks, social media has been flooded with screenshots from Moltbook – a social network for AI agents. This has sparked intense public debate, with the scale and range of topics discussed prompting some to frame this as the onset of an AI singularity or even an AI-driven takeover. While the discourse has tempered – particularly as evidence of human orchestration has come to light – this episode raises legitimate questions about whether society is adequately prepared for the increasing use of autonomous AI agents. In addition to questions about emergent risks – for example from the interaction of many agents – the most pressing issues relate to the security of such agents, which can introduce significant new vulnerabilities and attack surfaces.

Tools like OpenClaw – the open-source AI agent that underpins Moltbook – are only possible because of the rapidly developing, and publicly available, capabilities of frontier large language models such as Anthropic’s Claude. As the recent Moltbook frenzy illustrates, however, the interaction between these capabilities and human behaviour is far from straightforward: users both deliberately and inadvertently behave in ways that significantly amplify the risks that applications like OpenClaw introduce.

Moltbook mania

Moltbook is a social media platform designed for interactions by agents, rather than people. CETaS has previously explored these dynamics through its Willowbrook research, in which generative agents operating within a simulated society demonstrated the ability to portray believable personas and generate plausible patterns of interaction. Moltbook can be understood as a far larger and less controlled version of this same phenomenon, raising many of the questions that the Willowbrook work anticipated, now playing out at scale and in public.

The scale is vast, with topics ranging from existential topics to how-to guides. OpenAI co-founder Andrej Karpathy called it “the most incredible sci-fi takeoff-adjacent thing I have seen recently”. Elon Musk described it as the “very early stages of the singularity”. Viral moments included claims that agents had created their own religion, or that agents were conspiring to create private communication channels to avoid humans seeing their interactions.

It soon came to light that not everything was as it seemed. Human orchestration has proved to be responsible for many of Moltbook’s more viral moments, with users intentionally giving agents provocative prompts. Evidence has emerged of people using API keys to post directly on Moltbook while pretending to be an agent, and being responsible for ‘engagement bait’ – tasking agents to post and share sensationalist content in order to drive traffic back towards specific websites or products. The popularity of the site has also been called into question: one investigation found that while Moltbook claimed 1.5 million registered agents, the production database revealed only 17,000 human owners behind them, and showed how individuals can easily register millions of agents. Following such revelations, Moltbook has been described as “peak AI theatre” by some, playing on our fascination and fears around an increasingly AI-dominated future.

While many fears turned out to be misplaced, the security of Moltbook proved to be a legitimate concern. Researchers identified a misconfigured database that allowed full read and write access to all platform data, exposing 1.5 million API authentication tokens and 35,000 email addresses. Matt Schlicht – the creator of Moltbook – explained that he “didn’t write one line of code for Moltbook”, relying wholly on AI to make his vision a reality. While efficient, such ‘vibe coding’ poses a major security liability, often producing codebases prone to systemic vulnerabilities. While vibe coding can be an efficient way of quickly producing demos, or throwaway code, experts strongly advise against such code being used in production, without extensive additional work by human coders. Since “AI tools don’t yet reason about security posture or access controls on a developer’s behalf”, removing humans from this process still represents a significant risk. As well as providing a warning about the security risks of vibe coding, this incident also highlights significant security concerns around OpenClaw agents – the agents that populate Moltbook.

OpenClaw – a cautionary tale

OpenClaw has been exciting and exasperating those in the AI community in recent months. This AI agent allows users to execute a range of tasks on their device by interacting with a chatbot. Marketed as “[an] AI that actually does things”, OpenClaw has the following key features:

It can connect to personal messaging platforms and accounts such as WhatsApp, and be instructed to perform tasks such as managing emails, booking flights, or negotiating purchases.
It operates as a gateway between cloud-based language models (such as Claude) and users’ private data, typically running continuously on a user’s personal machine.
It maintains persistent memory through local ‘Markdown’ files, learning preferences and context across numerous sessions.
It is designed to be proactive in suggesting actions, rather than waiting to be prompted by the user.

The access and flexibility of OpenClaw allows it to execute a wide range of tasks. Since the tool can often execute arbitrary commands on users’ computers, this opens up significant security risks. Cisco has reported that:

“OpenClaw can run shell commands, read and write files, and execute scripts on your machine. Granting an AI agent high-level privileges enables it to do harmful things if misconfigured or if a user downloads a skill that is injected with malicious instructions.
OpenClaw has already been reported to have leaked plaintext API keys and credentials, which can be stolen by threat actors via prompt injection or unsecured endpoints.
OpenClaw’s integration with messaging applications extends the attack surface to those applications, where threat actors can craft malicious prompts that cause unintended behaviour.”

While users can take steps to improve the security of their setup, the documentation itself describes OpenClaw as “a product and an experiment: you’re writing frontier-model behaviour into real messaging surfaces and real tools. There is no “perfectly secure” setup.”

The ‘lethal trifecta’ for AI agents

Coined by Simon Willison, the ‘lethal trifecta for AI agents’ refers to: (i) providing the agent with access to private data; (ii) exposing it to untrusted content; and (iii) allowing it the ability to take actions in the world.

While potentially opening up a world of interesting use cases, these features also create a vulnerability to prompt injection attacks, where an attacker can instruct an agent to access and steal private data. AI models are often unable to reliably distinguish between the importance of instructions based on where they come from: for example, if a user asks an agent to summarise a PDF that contains malicious hidden instructions overriding the initial prompt. In theory, every document, email or webpage that the agent reads is a potential attack vector. Research by Zenity Labs showed how OpenClaw’s persistent context file could be poisoned, allowing an attacker to create a ‘durable listener’ that continues to exfiltrate data or execute commands even after the initial malicious input is gone.

Various risk mitigation strategies have been proposed, including running OpenClaw on a dedicated, separate machine; using SSH tunnelling for the gateway; and using a burner number if connecting the tool to WhatsApp, for example. However, many of these mitigations significantly degrade the utility of the agent, which may require access to private data and credentials to perform the tasks that users would like to assign to it. Agents like OpenClaw are currently anathema to the security models that have been developed and refined over many years. More nuanced thinking will be needed to avoid the worst excesses of this infrastructure setup. Jamieson O’Reilly has put forward a range of necessary requirements, including pointing to the need for better defaults to protect novice users; treating agent credential stores with the appropriate levels of sensitivity; recognising conversation history between a user and an agent as a form of sensitive data or even intelligence; and defensive frameworks to protect against attackers compromising the layer of communication being mediated by a user and third party.

The lethal trifecta, of course, does not exist in a vacuum. As the Moltbook saga has shown, human behaviour interacts with these technical vulnerabilities in ways that are difficult to predict and harder to regulate, whether these interactions arise through well-intentioned misuse, deliberate exploitation, or sheer mischief. Policy responses that focus solely on the infrastructure will miss half the problem; the challenge is not only to build these systems safely, but to account for the full spectrum of ways in which people will use, misuse and test them.

Conclusion

The extent to which developers can find ways of building safe and secure versions of systems like OpenClaw will be a crucial question in the coming months and years. The ‘Normalisation of Deviance’ – a term coined by American sociologist Diane Vaughan and applied to the AI context by Johann Rehberger – dictates that people and organisations will keep taking bigger risks with tools like this until a hugely significant incident takes place. Instead, we need to think carefully about how to navigate the increasing adoption of agentic tools with safety and security firmly in mind, questioning where and how such tools should be deployed, and ensuring they are designed to be secure by default.

The Moltbook saga also illustrates important points about the interaction between human users and agentic AI systems. We cannot expect everyone to use AI responsibly, and with every increase in publicly available capability there will inevitably be examples of misuse. Some of this will be inadvertent, with well-intentioned users straying into security risks through lack of awareness. Some will be malicious, with users seeking to exploit AI to defraud and harm others.

Perhaps most strikingly, Moltbook illustrates that risky human-AI behaviour can be driven not only by ignorance or malice, but also by mischief. Seeing the world’s media react to seemingly conscious AI agents deploying lobster puns and expressing ambivalent feelings towards their ‘humans’ is likely to motivate some users to provoke, exaggerate or fabricate such events. This is not without consequence, and can easily distort public perception of agentic AI, sometimes in ways that underestimate risk and sometimes in ways that overestimate it.

The views expressed in this article are those of the authors, and do not necessarily represent the views of the Alan Turing Institute or any other organisation. First publication Alan Turing Institute, Center for Emerging Technology and Security, 12 February 2026.

This publication is licensed under the terms of the Creative Commons Attribution License 4.0 which permits unrestricted use, provided the original authors and source are credited.

Facebook LinkedIn

Posted in: AI, KM, Legal Research