Weekly insights on software engineering, evals, fine-tuning / training, and prompt engineering for devs and product owners.

Latest

Binary AI evals and why we need more than a verdict

When we build AI evaluations, we want some sort of objective way to tell whether or not an evaluator is "correct". Outputting a true or false evaluation often isn't enough - we want to be able to trust the evaluation itself, and be able to communicate