Building a Q&A Bot with Langchain, Vicuna, and Sentence Transformers: A Step-by-Step Guide

Building a Question-Answer Bot With Langchain, Vicuna, and Sentence Transformers

In this article, we will explore how to build a question-answer bot using only open source tools such as Langchain, Vicuna, and Sentence Transformers. We will also discuss the use of embeddings with LLama models and how to connect the bot with Langchain.

Using Embeddings with LLama Models

Firstly, we need to extract the embeddings. In the LLama source code in Hugging Face, we see some functions to extract embeddings. We can take either input or output embeddings. For our bot, we went with input embeddings and wrote a function to tokenize the input and extract the input embeddings for each token:


def embed_text(text: str) -> ndarray:
    tokens = tokenizers.get("english")(text, return_tensors="pt")
    embeddings = model.input_embeddings(tokens["input_ids"])
    return embeddings.detach().numpy()

We served this behind an HTTP server to make our work easier, but it is not a hard requirement. We created an endpoint definition in Fast API:


@app.post("/embeddings")
def get_embeddings(data: EmbeddingRequest):
    return {"embeddings": embed_text(data.text)}

Next, we loaded a file and converted it to embeddings using the Chroma.from_text function. We can take a peek at the OpenAIEmbeddings class and see how this method is implemented:


class OpenAIEmbeddings:
    def __init__(self, url: str):
        self.url = url

    def get_embeddings(self, text):
        resp = requests.post(
            self.url,
            json={
                "text": text,
            },
        )
        resp.raise_for_status()
        return np.array(resp.json())

We then created our own embedding class:


class MyEmbeddings:
    def __init__(self, url: str):
        self.embeddings = OpenAIEmbeddings(url)

    def encode(self, texts: List[str]) -> np.ndarray:
        return self.embeddings.get_embeddings(texts)

Now, we can glue it into a Langchain app:


import langchain

# instantiate the embeddings
embeddings = MyEmbeddings(url="http://localhost:8000/embeddings")
# instantiate the app
app = langchain.App.from_yaml("""
pipeline:
- component: "langchain.components.Parser"
- component: "langchain.components.QuestionAnswerer"
  settings:
    model: "https://huggingface.co/deepset/xlm-roberta-large-squad2"
    embeddings: embedding_service
""")
# add the embeddings service to the app
app.services["embedding_service"] = embeddings
# start the app
app.serve()

Sentence Transformers

The Sentence Transformers library focuses on building embeddings for similarity search, offering tight integration with Hugging Face, making it easy to use. Here is an example:


from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

questions = ["What is Germany?", "How is climate in Germany?"]

embeddings = model.encode(questions)

for i, question in enumerate(questions):
    print(question, embeddings[i])

We can connect this to Langchain, which is already integrated with Sentence Transformers:


import langchain
from langchain.components import Component

class SentenceTransformerEncoder(Component):
    def __init__(self):
        super().__init__()
        self.model = SentenceTransformer('all-MiniLM-L6-v2')

    def __call__(self, inputs):
        embeddings = self.model.encode(inputs["passages"])
        return {"embeddings": embeddings}

# instantiate the app
app = langchain.App.from_yaml("""
pipeline:
- component: "langchain.components.Parser"
- component: "langchain.components.QuestionAnswerer"
  settings:
    model: "https://huggingface.co/deepset/xlm-roberta-large-squad2"
    embeddings: transformer
""")
# add the sentence transformer encoder to the app
app.services["transformer"] = SentenceTransformerEncoder()
# start the app
app.serve()

Conclusion

In this article, we explored how to build a question-answer bot using open-source tools such as Langchain, Vicuna, and Sentence Transformers. We showed how to use embeddings with LLama models and connect the bot with Langchain. We also discussed how to use Sentence Transformers as an alternative, providing better search functionality. However, we hope that the community finds a better way of leveraging embeddings from LLama models in the future.