by zuck_vs_musk on 4/23/2025, 6:44:19 AM with 0 comments
So, we process data as well as documents from various sources, then,
- convert all of its text (using different OCRs)
- pass it to LLM models - depending on the customer, it can be a cheaper model, and we do have model fallbacks
How do engineers evaluate such systems?
1. New models & new libraries are coming all the time
2. Even a third-party's deployment model will change over time and might improve/regress our systems
Any good approach for writing evaluations for these?
So, we process data as well as documents from various sources, then,
How do engineers evaluate such systems? Any good approach for writing evaluations for these?