Behind the scenes of the Octopus Extension for GitHub Copilot ...
The generative aspect of GPT has received a lot of attention. It's easy to be impressed by the ability to create eye-catching images and videos or write sensible-sounding text from a few simple prompts. It's also easy to understand how AI would augment the workflows of writing or drawing tools. But how does AI benefit more traditional business processes? The answer is perhaps less exciting than being able to generate a highly detailed drawing of a kitten in a spacesuit eating rainbows from little more than that description. Still, it's in these business-as-usual workflows that AI can have a significant impact.
I explored this question with GitHub to develop the Octopus Extension for GitHub Copilot. Satya Nadella announced Copilot extensions at this year's Build conference. Octopus was 1 of 16 extensions for the initial launch:
Improving Workflow Efficiency
The sentiment behind the phrase "Code is written once but read many times" holds for Octopus deployments and runbooks. After you configure your Octopus space, most of your interaction is through initiating deployments, running runbooks, and viewing the results. Most DevOps team members spend their days outside of Octopus. For example, developers spend most of their day writing and testing new features in their IDE. Octopus is a critical component of that workflow as it's responsible for deploying changes to various environments for internal teams and external customers to access. But often, developers only need to know where their changes have been deployed or extract some useful entries from the deployment logs.
Here we see the result of the prompt @octopus-ai-app Show dashboard space "<space name>"
, which is a markdown version of the Octopus dashboard:
Enhancing User Experience
This prompt shows how the Octopus extension keeps you in the flow by removing the need to switch between applications to access the information you need. With a simple prompt, you can review the state of your deployments in the same chat window you use as part of your development workflow. We can dig a little deeper with a prompt like @octopus-ai-app The status "Success" is represented with the 🟢 character. The status "In Progress" is represented by the 🔵 character. Other statuses are represented with the 🔴 character. Show the release version, release notes, and status of the last 5 deployments for the project "<project name>" in the "<environment name>" environment in the "<space name>" space in a markdown table.
:
Advanced AI Capabilities
The cool thing about this prompt is that the extension has no special logic for mapping statuses to UTF characters or generating markdown tables. The ability to understand these instructions and generate the required output is inherent to the Large Language Model (LLM) that backs the Octopus extension. This prompt also highlights how an AI agent improves on more traditional chatbots. The prompt is written in plain text rather than the fixed and often robotic instructions you have to formulate for a chatbot. The ability to understand complex prompts also means the Octopus extension can generate results far beyond the limited set of interactions that have to be hard-coded into a traditional chatbot.
The extension's benefit is that it brings Octopus to the tools you already use. It keeps you in the flow by removing the need to jump between windows and tools. It also lets you use LLMs' ability to comprehend plain text requests to generate custom reports or extract useful information.
Real-time Data Challenges
A challenge with AI systems is that they don't inherently have access to real-time information. LLMs are essentially frozen in time and only know the state of the world at the point when they were trained. For example, GPT 3.5 was trained in 2021 and knows nothing about the world after that date. Retrieval Augmented Generation (RAG) is a process that can overcome this limitation. It combines custom knowledge with a user's prompt to generate more accurate answers or to answer questions about custom data.
For example, you may combine the contents of a recent news article with your question in the LLM prompt. The context and question get placed in the LLM's context window, allowing it to consider content it was not trained against to provide an answer. The challenge for an extension interacting with Octopus is that the data we want to inspect is generated in real-time. The extension must query the Octopus API for the current state of a space to ensure any prompts get answered with live information.
Serialization and Optimization
Log files are easy to handle with LLMs because they can be considered a stream of unstructured text, and LLMs are good at consuming such text blobs. However, questions about a space's configuration require us to serialize and present the space's state in a format that the LLM can reason about.
There are many formats for defining the configuration of a platform like Octopus as text, including JSON, XML, YAML, TOML, HCL, OCL (used by Octopus Config as Code), and more. Given these requirements, serializing Octopus spaces to HCL was the best choice. So, queries relating to the configuration of an Octopus space work by identifying the entities being requested, converting those entities into HCL, placing the HCL in the context, and having the LLM answer the question based on the context.
Future of AI Integration
I suspect this "smart AI, dumb search" approach is something we'll see more of in the coming years. Enterprise tools haven't done a great job implementing search capabilities, and there's no reason to think the situation will improve. But having an LLM identify the phrases or entities to search for, interact with an API on your behalf, and then provide an answer based on the search results means existing tools can continue to provide rudimentary search capabilities, and LLM agents can sift through broad search results. Ever-expanding LLM context windows only make implementing this approach easier (if potentially less efficient).
I'd even argue that this approach rivals solutions like vector databases. The primary purpose of a vector database is to co-locate items with similar attributes efficiently. For example, pants and socks would be co-located because they are both clothing items, while cars and bikes would be co-located because they are both vehicles. But there's no reason an LLM can't convert the prompt "Find me red clothes" into 5 API calls returning results for t-shirts, jeans, hoodies, sneakers, and jackets, thus relying on the capability of LLMs to generate high-quality zero-shot answers to common categorization tasks rather than having to build custom search capabilities:
Overall, this approach has worked well. It resulted in a lean architecture involving 2 Azure functions (one to receive chat requests and query the Octopus API directly, and one to serialize Octopus resources to HCL) that's easy to manage and scale as needed without the burden of maintaining a custom data source.
Adapting Testing Strategies
Traditional automated testing is all about verifying that your code works. Test-driven development may encourage a small number of failing tests, but the expectation is that future work will focus on resolving test failures. Working with LLMs requires rethinking this approach. LLMs are non-deterministic by design, which means you can't be sure you'll get the same result even with exactly the same inputs.
This manifests most visibly when LLMs respond with different phrases to convey the same answer. However, the more serious concern for developers is that LLMs will sometimes provide incorrect results even when they previously provided correct results with the same inputs. The non-deterministic nature of LLMs means that developers need to adapt their testing strategies to accommodate this uncertainty.