Avi Builds

Options for Moving STT and TTS Pipelines to Australia in April 2026

If you're running a voice AI pipeline on Deepgram, AssemblyAI, or Cartesia out of US servers, and someone asks whether you can move it to the Australian jurisdiction, how would you answer? I would say yes, you can, but your options will depend on your specific requirements. Deepgram,

Your LLM-As-A-Judge Might Be Lying to You

You build a pairwise eval pipeline, run it over your dataset, and get a clear winner. You ship the change. A few days later you pull the logs to double-check, and something doesn't add up: A beat B, B beat C, and somehow C also beat A.

Inference Options for Hosting a Sovereign Voice AI Stack in Australia

I spent some time recently trying to answer a specific question: Can you run a production-grade voice AI inference stack on Australian sovereign hardware, and what does that actually look like in practice? For this article, I'm focusing specifically on inference, instead of the STT or TTS

Stop Scoring. Start Ranking.

Stop Scoring. Start Ranking. If you’ve asked an LLM to rate a document on a scale of 1 to 10, you might have noticed something strange. The scores cluster together, most responses land between 3 and 7, and the sorted list you end up with is nearly useless. LLMs

CLI vs MCP: Which one should you use in 2026?

Most developers adding tool access to their agents just pick one and move on. They add a few MCP servers, notice things feel slow or weird, and assume it's the model. It's not the model. The choice between using CLIs or MCPs has a direct, measurable

How do you turn time with your Subject Matter Expert into a long term asset?

Here is a situation that comes up more than it should. A company has an AI workflow running in production. Maybe it’s extracting information from construction defect reports, or answering client questions from a structured knowledge base. The output isn’t quite right yet, so they bring in a

Why your agent can confidently do the wrong thing

Last week I asked Gemini CLI for a Product Requirements Document (PRD). Then got back a generic spec. It was confident, nicely formatted... yet utterly USELESS. In fact, I'd fixed this problem before. A few days ago, I'd already created a skill defining how to write

How I stopped fighting prompts and started building feedback loops

The weird thing about the internet is that it’s only possible because of one weird maths fact. That is; some operations are expensive to do, but cheap to check. Factoring a huge number into two primes is hard, but multiplying those primes back together and verifying you got the

Reducing AI-Slop with Logit Bias

In an internal experiment, outputs using logit bias were judged "less AI-like" than the baseline 74.5% of the time over 353 blind A/B evaluations. Logit bias is a vocabulary knob. You push specific tokens down (ban) or up (boost) during generation. It works on tokens,

RED framework for binary AI evals. Stop debugging in the dark

A PASS/FAIL eval isn’t actionable unless it tells you WHY the judge evaluated it as such. Ensure every binary LLM-as-judge evaluation returns REASONING and EVIDENCE alongside the DECISION. RED Framework = (Reasoning + Evidence + Decision) This allows you to identify failures in seconds and improve your prompts and

Avi Builds

I help builders ship production AI features that stay reliable, safe, and maintainable. Read about the practices behind non-deterministic systems, including evals, RAG, routing, and fine-tuning.

Options for Moving STT and TTS Pipelines to Australia in April 2026

Your LLM-As-A-Judge Might Be Lying to You

Inference Options for Hosting a Sovereign Voice AI Stack in Australia

Stop Scoring. Start Ranking.

CLI vs MCP: Which one should you use in 2026?

How do you turn time with your Subject Matter Expert into a long term asset?

Why your agent can confidently do the wrong thing

How I stopped fighting prompts and started building feedback loops

Reducing AI-Slop with Logit Bias

RED framework for binary AI evals. Stop debugging in the dark

Avi Builds

Latest