Bridging the Gap: The Reality of Autonomous Agents

Autonomous agents have become a hot topic in recent weeks, with AutoGPT being one of the fastest growing Github repos in history. Although the demos are impressive, the reality is that the acting capabilities of these agents still lag behind their planning capabilities. The question is, how do we make them truly autonomous?

The primary difference between agents and LLMs is that agents run in a self-directed loop, with a lightweight prompting layer and some sort of persistence or memory. Some agents prioritize tasks while others take a more conversational approach. Agents have a wide range of use cases, from personal assistants to automated GTM teams.

An agent running in an unconstrained environment is an open research problem. However, models are improving at an accelerated pace. AutoGPT-like agents provide an interesting practical experiment, which could plausibly start to bridge the gap between AI doomer fantasy and reality. To achieve this, we need open-source contributions in the action space (plugins) and iteration on architecture and prompting strategy.

Observations on Core Autonomous Agent Loops

After taking a look at the codebases for some popular agents, some observations were made. The core autonomous agent loops are straightforward. Agents have access to an initial prompt, a set of actions to execute, a history of messages, and a workspace where they can write executable code and files on the fly.

Although demos can be impressive, agent implementations are generally simple under the hood. AutoGPT, for example, is a light prompting layer running on a recursive loop with persistent memory, which can write executable code on the fly. LangChain has a partial implementation of AutoGPT where they augmented their base agent with the optional human feedback component. It’s essential to note that LangChain is a framework where various agents can be implemented.

An immediate next step to making these agents useful is expanding their action space. LangChain tools or AutoGPT plugins define the extended set of commands that an agent can perform. Plugins could include searching Google, writing some code on the fly, or integrating with Twitter or payment systems. The open source community might expand this finite set of actions, but the challenge is that the set of actions a user can take on the internet is almost infinite.

The Reality of Agents

Currently, agents run like an entry-level consultant or MBA graduate. They can describe plausible solutions but are often poor at executing them. The acting component of ReAct performs poorly in unconstrained environments, which is where AutoGPT-like agents come in. However, it is clear that the agent isn't able to reason at a deeper level about novel situations or problem solve on the fly.

Building a working autonomous agent in an unconstrained environment is an open research problem, and we are still far from this reality. At present, the current generation of agents still needs a lot of human intervention and direction to be effective.

Conclusion

Research shows that AI agents reasoning ability is pretty good, but their action-taking aspect is still quite rudimentary. With autonomous commercial agents, we are seeing the first experimental attempts to have agents run in the wild unconstrained. With open source contributions and iteration on architecture and prompting strategies, AutoGPT-like agents provide an exciting practical experiment that could bridge the gap between AI doomer fantasy and reality.

The potential benefits of autonomous agents are tremendous, and it’s vital to experiment and iterate to continue the progress towards truly autonomous agents in unconstrained environments. The next step is expanding the action space. While the set of actions a user can take on the internet is almost infinite, the open source community can expand this finite set of actions with plugins to bring us closer to the reality of autonomous agents.