When AI Should Say “I Don’t Know”: The Ethics of Honesty in Conversational AI

By Michelle Collins, Chief Operating Officer, CodeBaby

A few weeks ago, I asked one of the latest large language models a question about health insurance. The answer came back confident, polished, and completely wrong. Not dangerously wrong, but confidently inaccurate in a way that made me pause.

The scary part? If I hadn’t already known the correct answer, I probably wouldn’t have noticed.

That’s when it hit me: for all the progress we’ve made in making AI sound more human, we still haven’t taught it one of the most human things of all: how to admit when it doesn’t know.

The Problem with Confidently Wrong

One of the biggest design challenges in conversational AI is what I call the illusion of certainty.

When a digital human or chatbot responds smoothly and fluently, our brains instinctively trust it, even if the content underneath isn’t right.

That’s not just a UX problem. That’s an ethical problem.

In sectors like education, healthcare, and workforce training (where CodeBaby’s avatars are often used), confidence without accuracy isn’t helpful. It’s misleading. And when people are making real decisions based on those answers, misleading quickly turns into harmful.

AI shouldn’t be rewarded for bluffing.

Why Honesty Builds More Trust Than Perfection

It’s funny. When we think about “trustworthy” technology, we tend to imagine systems that are flawless. But that’s not how trust actually works. Trust is built through transparency, not perfection.

When you talk to a colleague who occasionally says, “That’s a great question. I’m not sure, but let me find out,” you don’t lose confidence in them. You gain it. Because they’re being real with you.

AI needs to do the same thing. A well-designed system should be able to say some version of, “I don’t know that, but here’s what I can tell you,” or “I’m not certain. Would you like me to connect you with a human expert?” That’s not a failure. That’s integrity.

Designing for Uncertainty

The tricky part is that large language models are trained to predict, not to know. They generate the most probable next word, not the most accurate fact. So the burden falls on designers and developers to build in honesty by design.

At CodeBaby, that means we think deeply about how our avatars respond in moments of uncertainty. Some of the design choices we focus on:

Setting clear confidence thresholds. If the model’s confidence score is low, it should rephrase or defer, not just guess.

Context-aware fallback responses. “I’m not sure” doesn’t have to sound robotic. It can sound natural, empathetic, and human: “That’s a really good question. I don’t have that answer right now, but let me show you where you can find it.”

Escalation paths to humans. Especially in healthcare or education, the ethical response isn’t to “fake it till you make it.” It’s to connect the user to someone who truly knows.

Honesty can be designed in. It just takes intention.

The Real Cost of Overconfidence

Here’s the uncomfortable truth: a lot of organizations would rather their AI never say “I don’t know.” They think it makes the technology look weak or incomplete. But forcing AI to always have an answer isn’t strength. It’s hubris.

Overconfidence leads to misinformation, brand damage, and broken trust. Once users catch an AI being confidently wrong, it’s almost impossible to rebuild that trust.

In contrast, AI that admits uncertainty actually increases user confidence over time. Because people recognize humility. They recognize honesty. And that makes the interaction feel safe, even if the answer isn’t perfect.

The Ethics of “I Don’t Know”

There’s something deeply human about admitting you don’t know something. It shows self-awareness, humility, and a respect for truth.

AI, by contrast, doesn’t “know” in the human sense at all. It synthesizes. Which means the ethical responsibility sits with the people building, deploying, and training these systems to decide when silence is smarter than speculation.

If we want conversational AI to be genuinely trustworthy, we have to normalize uncertainty. We have to build models and frameworks that let digital humans pause, redirect, or defer when the facts aren’t solid. Because honesty isn’t weakness. It’s what keeps technology human.

Building Digital Humans That Tell the Truth

At CodeBaby, we talk a lot about empathy, trust, and psychological safety. But those ideas only mean something if they show up in how the technology behaves. That’s why we’re designing our avatars to be not just emotionally intelligent, but ethically intelligent, capable of recognizing when to stop talking and start helping.

I’ll be honest: getting our LLM avatars to reliably say “I don’t know” took real development effort. It’s not the default behavior of these models. But that work has meant everything to our users and to the confidence our clients have in the information our avatars provide. Because when they do get an answer, they know they can trust it.

Sometimes that means answering the question. Sometimes it means saying, “I don’t know.” And sometimes it means connecting you to a real human being who does.

That’s what honest AI looks like. And honestly, it’s the only kind that deserves our trust.