Beginner's Guide to AI Evals (Walkthrough)

Aug 31

How we discovered our AI still couldn't do basic math and built a system to catch failure modes

2 Comments

++ Good Post. Also, start here : $500K Salary Career Wins, 500+ LLM, RAG, ML System Design Case Studies, 300+ Implemented Projects

https://open.substack.com/pub/naina0405/p/500k-salary-career-wins-500-case?r=14q3sp&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false

Expand full comment

Whui-Mei Yeo

Sep 5

Aman, I watched the video you did with Peter. It was very helpful in confirming what I had thought about the work involved with evaluating the quality of an LLM's outputs. Given that it does take time to set up evals for teams that aren't organised/data driven minded, like defining the rubric/evaluation criteria (+ debates), analyse & score outputs (+ debates), how much time would you say it may take for such a team to get their first version of the AI agent ready to use for an internal/closed user group release? Am asking as my observation is few organisations are rigorous where the subject matter expert is able to define quality criteria to the granular enough level to say "This is good/bad/average.". I sense that it's going to take some time to do the setup work.

Expand full comment