Hacker News Clone

DeepRAG: Thinking to retrieval step by step for large language models

by fofoz on 2/4/2025, 2:43:53 PM with 29 comments

by mkw5053 on 2/4/2025, 7:59:09 PM
This reminds me of the Agent Workflow Memory (AWM) paper [1], which also tries to find optimal decision paths for LLM-based agents but relies on in-context learning, whereas DeepRAG fine-tunes models to decide when to retrieve external knowledge.
I’ve been thinking about how modifying AWM to use fine-tuning or an external knowledge system (RAG) might work—capturing the ‘good’ workflows it discovers rather than relying purely on prompting.
[1] https://arxiv.org/abs/2409.07429 - Agent Workflow Memory (Wang et al., 2024)
by brunohaid on 2/4/2025, 5:30:04 PM
Noice!
Does anyone have a good recommendation for a local dev setup that does something similar with available tools? Ie incorporates a bunch of PDFs (~10,000 pages of datasheets) and other docs, as well as a curl style importer?
Trying to wean myself off the next tech molochs, ideally with local functionality similar to OpenAIs Search + Reason, and gave up on Langchain during my first attempt 6 months ago.
by jondwillis on 2/4/2025, 5:28:37 PM
The title reads awkwardly to a native English speaker. A search of the PDF for "latency" returns one result, discussing how naive RAG can result in latency. What are the latency impacts and other trade-offs to achieve the claimed "[improved] answer accuracy by 21.99%"? Is there any way that I could replicate these results without having to write my own implementation?