Hacker News Clone

Hybrid search vs. Token pooling benchmarking

by jonathan-adly on 12/6/2024, 1:59:20 PM with 1 comments

by jonathan-adly on 12/6/2024, 1:59:20 PM
Hi HN!
We benchmarked two ways to improve the latency in RAG workflows with a multi-vector setup. Hybrid search using Postgres native capabilities and a relatively new method of token pooling. Token pooling unlocked up to 70% faster latency with <1% performance cost.