by navaed01 on 5/28/2025, 11:08:00 AM with 0 comments
Something that’s been bothering me is observability with LLMs and how to check it’s giving customers the right answer.
There seems to be multiple failure points: hallucinations, partial responses (missing facts), saying information does not exist, response accuracy depends on how and what is being asked.
How are you measuring this in production today?
- Thumbs up/ down seems like a weak signal
- Running a sample of ‘known queries’ Assumes you know what is being asked.
Something that’s been bothering me is observability with LLMs and how to check it’s giving customers the right answer.
There seems to be multiple failure points: hallucinations, partial responses (missing facts), saying information does not exist, response accuracy depends on how and what is being asked.
How are you measuring this in production today? - Thumbs up/ down seems like a weak signal - Running a sample of ‘known queries’ Assumes you know what is being asked.
What have you tried that works for you?