Ruminations on all things veterinary hospital operations, from the makers and supporters of Instinct.

In 2018, an audio clip broke the internet. Half of listeners heard “Yanny.” The other half heard “Laurel.” Same recording. Wildly different outputs.

What made it so disorienting was that both sides were right. The clip genuinely contained acoustic information consistent with both words. Everyone’s brain was making a defensible interpretation in the face of ambiguity. 

That’s not a quirk of one weird recording. It’s a feature of speech itself.

Why AI Transcription Struggles With Veterinary Terminology

When you speak, you’re producing a continuous stream of overlapping acoustic events. Linguists call this coarticulation: the way each sound shapes and blends into the ones around it. “Carprofen” sounds different at the start of a sentence than it does mid-dictation when you’ve already said four things and your mouth is doing three things at once.

This is why reading lips is so hard and why even human transcriptionists make errors. The signal itself is ambiguous. The brain, or the model, is always making a probabilistic guess about what was most likely said given everything it knows about context, vocabulary, and acoustic patterns.

For general-purpose AI, that “everything it knows” doesn’t include much about veterinary medicine. A model trained on general English speech has seen very little oclacitinib, lokivetmab, or phrases like “erythema AU” and “grade II/VI murmur.” When it hears something that could match either a familiar everyday word or an unfamiliar veterinary term, it tends to choose what it knows. It can look like a bug, but it’s really doing exactly what it was designed to do… just with the wrong vocabulary.

Veterinary Clinic Noise Makes Transcription Errors More Likely

On top of the fundamental ambiguity of speech, there’s the environment it’s captured in.

Veterinary clinics are acoustically rough. You might be crouched on the floor with your phone on the counter across the room. A technician may be asking the client about food history while you’re dictating physical exam findings. There’s a dog barking. There’s often more than one dog barking.

Microphone position matters more than most people realize. The difference between a phone sitting on a countertop two feet away and one in your pocket while you’re bent over a patient isn’t minor. It can be the difference between a clean recording and one where consonants blur together in exactly the ways that cause misreadings. 

“No murmur” and “new murmur” differ by one phoneme. Under good acoustic conditions, they’re easy to tell apart. Under clinic conditions, the margin gets narrower.

Speed matters too. Normal conversational speech runs somewhere between 120 and 180 words per minute. Clinical dictation during a busy appointment often runs faster. Function words, like “not,” “no,” and “without,” are short, unstressed, and easy to lose. Those are also the words that, if dropped, can invert the meaning of a finding.

Why Word Error Rate Misses What Matters in Veterinary AI Transcription

The standard metric for transcription quality is word error rate (WER): the percentage of words that are wrong. It’s a useful benchmark, but it treats all errors as equal. A missed “and” counts the same as a miswritten drug name. That asymmetry matters in a medical record. 

ScribbleVet developed an internal benchmark called WER-VET specifically to measure accuracy on veterinary terminology, the words where errors have actual clinical implications. That distinction is why a general-purpose model that performs well on everyday speech can still produce notes that need substantial editing when applied to a COHAT or a complex multi-system exam.

No AI scribe gets it right every time. Audio ambiguity doesn’t have a complete solution. What matters is whether the output is accurate enough that errors are exceptions you can catch on review rather than a pattern you’re constantly correcting for.

How to Fix Repeated Veterinary AI Transcription Errors in ScribbleVet

If you’re seeing consistent errors on specific terms, like a medication name, an abbreviation you use often, or a template field that keeps populating incorrectly, that’s usually a fixable pattern. A few things worth trying:

Recording position. Closer is almost always better. If you’re dictating after stepping out of the exam room, holding the phone near your face makes a meaningful difference.

SmartFix. If ScribbleVet missed a detail, highlight the section and pull it directly from the recording. You don’t have to retype from scratch.

Flagging errors in-app. When you see a consistent misread on a specific term, you can flag it within ScribbleVet. That feedback is shared with the team and incorporated into model training. It’s genuinely how the veterinary vocabulary improves over time.

Ready to spend less time writing notes? Claim your 14-day free trial and try ScribbleVet on your next appointment!

All Systems Go

Our friendly robots have your back 24/7. We’re currently showing that everything is good to go.
Instinct emergency? Use in-application chat or call us 866.267.1818.