Hacker News Clone

RAG Is a Band-Aid. Gemini 2.0 Flash Lite Is All You Need

by tmshapland on 2/10/2025, 9:41:44 PM with 1 comments

by tmshapland on 2/10/2025, 9:41:44 PM
Google's Gemini 2.0 solved the RAG problem for Conversational AI. Put your Knowledge Base (KB) in Gemini’s system prompt and have your agent make a tool call to Gemini.
Accuracy: You get the right answer EVERY TIME.
Latency: Response time is about 900 ms.
Cost: 300 queries per day on a 50-page KB costs $26 per month ($7 with prompt caching), on par with RAG-as-a-service providers.
RAG is one of the last mile problems for real-time conversational AI. It’s very difficult to get production-worthy recall from a RAG pipeline. Model-Assisted Generation (MAG) with Gemini 2.0 Flash Lite just works. Period.
The blogpost has a link to an open source demo, which we built using the open source Pipecat Voice AI platform.
We don't have any skin in the game here. We're not making money off this.
Open source, not a money-making project, talks about one of the newly released toys in tech...it seemed like a good post for Hacker News.
Tom https://x.com/tom_shapland/status/1889041960293560540