Innovative Approaches for Continuous Knowledgebase Updation

Published On Fri Jan 24 2025
Innovative Approaches for Continuous Knowledgebase Updation

Daily Updation of Knowledgebase - API - OpenAI Developer Forum

I made a chatbot for which I had given 50 news articles and stored them embedded in a .npy file. I am not using any vector DB. The main problem I am facing is the continuous influx of new news articles on a daily basis. I am looking for a way to automate the updating of my knowledgebase without recomputing the previous data. The sources of knowledge are various news websites. I prefer not to use any third-party tools unless absolutely necessary, except for Vector DB. What would be the best approach for this use case?

Extracting embeddings from popular articles on Hacker News

Response to Query

I am not sure I fully understand your concern, but there might not be a need to recompute anything. You can simply load your existing numpy array from the .npy file, extract the embeddings for the new batch of articles, append them to the array, and then save it back to your .npy file. If the dataset starts to grow significantly large (in the order of millions of articles), it may be beneficial to consider utilizing a dedicated vector DB solution like Pinecone.

Vector Search Database Solution

Powered by Discourse, best viewed with JavaScript enabled