How to create LLM fallback from Gemini Flash to GPT-4o? - DEV ...
Generative AI has been the hottest technology trend from an year enterprises to startups. Almost every brand is incorporating GenAI and Large Language Models (LLM) in their solutions. However, an under explored part of Generative AI is the managing resiliency. It is easy to build on an API provided by a LLM vendor like OpenAI, however it is hard to manage if the vendor comes across a service disruption etc. In this blog, we will take a look at how you can create a resilient generative AI application that switches between GPT-4o to Gemini Flash by using open-source ai-gateway's fallback feature.
Understanding Fallback Strategies
In a scenario involving APIs, if the active endpoint or server goes down, as part of a fallback strategy for high availability using a load balancer, we configure both active and standby endpoints. When the active endpoint goes down, one of the configured secondary endpoints takes over and continues to serve the incoming traffic. Fallbacks ensure application resiliency in disaster scenarios and help aid in quick recovery.
In the context of Generative AI, having a fallback strategy is crucial to manage resiliency. A traditional server resiliency scenario is no different than in the case of Generative AI where maintaining uninterrupted solution experience for users is paramount.

Implementing Fallbacks in Generative AI
While fallbacks in concept for LLMs look very similar to managing server resiliency, in reality, due to the growing ecosystem and multiple standards, new levers to change the outputs, etc., it is harder to simply switch over and get similar output quality and experience. Adding this functionality with changing landscape of LLMs and LLM providers can be challenging for those not specialized in managing LLMs.
To demonstrate the fallbacks feature, we'll be building a sample Node.js application and integrating Google's Gemini. We'll be using the OpenAI SDK and Portkey's open-source AI Gateway to demonstrate the fallback to GPT.

Setting Up the Project
To start our project, we need to set up a Node.js environment. We'll install necessary dependencies for our project using packages like Express, body-parser, portkey-ai, and dotenv. Setting up API credentials from Google Developers Console is also essential before using Gemini.

After setting up the environment, creating a basic express server and defining routes for Gemini integration, we can proceed to add Gemini to our project, creating specific routes to communicate with the Gemini AI.
Implementing the Fallback Feature
If the primary LLM fails to respond or encounters an error, AI Gateway will automatically fallback to the next LLM in the list, ensuring our application's robustness and reliability. It's important to configure Portkey to define routing rules for all requests coming to the gateway, specifying fallback options between LLM providers.
By incorporating fallback features in the project, the application can seamlessly switch between different LLMs based on their performance or availability, enhancing the overall user experience and ensuring continuous functionality.
Conclusion
Integrating Gemini into a Node.js application and leveraging AI Gateway's fallback feature when Gemini is not available can significantly improve the resilience and reliability of generative AI applications. Understanding and implementing fallback strategies is essential in ensuring smooth operation even in challenging scenarios.
If you want to know more about Portkey's AI Gateway and explore similar topics, join our LLMs in Production Discord community to connect with other AI Engineers.