The Science of Trust: What Makes People Engage More Deeply With an Avatar

By Michelle Collins, Chief Operating Officer, CodeBaby

A few years ago, researchers showed children with Autism Spectrum Disorder images of human faces alongside robot faces. The neurotypical children responded to both. The children with ASD? Only the robot faces triggered the expected neural response.

That finding stuck with me, because it upends a lot of assumptions about what makes digital humans effective. We tend to think the goal is photorealism. Make the avatar look as human as possible, and people will connect with it. But the science tells a different story.

Trust isn’t built through perfect graphics. It’s built through behavior. Through the tiny, almost invisible cues that make us think, “Okay… this thing gets me.” Eye gaze. Microexpressions. Cadence. The same ingredients that make real human-to-human conversations work.

Eye Gaze: The Instant Trust Cue

Eye contact is one of the oldest social signals we have. Our brains are wired to read it in a split second, and we’re surprisingly judgmental about it. Too much eye contact? Feels intense. Too little? Feels shifty. Just right? That’s where trust lives.

Research at Yale University in human-robot interaction calls gaze a “cognitively special” signal because of its direct link to social processing. During conversation, our eyes convey information about attention and emotion. Recent studies from the US Military and the University of Colorado show that how we look at digital agents directly affects our trust in them.

What that means for avatars is pretty simple: they have to “look” like they’re listening.

At CodeBaby, we spend a surprising amount of time making sure our avatars don’t stare like they’re trying to read your soul, but also don’t drift off like a teenager mid-lecture. A natural conversational gaze is the goal. It’s remarkable how much trust a single shift can create or diminish.

Microexpressions: The Subtle Signs of “I Hear You”

People will forgive an avatar that looks stylized. They will not forgive an avatar whose face doesn’t move.

Humans communicate an enormous amount through microexpressions: tiny eyebrow movements, slight smiles, micro-nods, minor eye shifts. These expressions are often only 1/25th of a second, but our brains pick them up instantly.

One study at the University of Liechtenstein found that digital humans with microexpressions were perceived as more sincere and trustworthy. Another joint study between universities in China and the United States showed that subtle expression shifts help people read emotions more accurately.

This is where the ASD research gets interesting. Studies have found that individuals with ASD often show better emotional recognition with stylized or cartoon-like characters than with human faces. Why? Researchers believe it’s partly because stylized characters use more exaggerated, salient expressions. The emotional signal is clearer, more overt, easier to decode.

That insight doesn’t just apply to people with ASD. It suggests something broader about how we process emotional information from digital faces. Sometimes, a cleaner signal beats a more realistic one.

Cadence: The Voice Rhythm That Makes Us Feel Safe

This one might be my favorite, because it’s the easiest to overlook. Cadence is how an avatar sounds. Not just the voice itself, but the rhythm: the pacing, pauses, and emphasis. If an avatar talks too fast, it feels like a telemarketer. If it sounds too flat, it feels robotic. If it sounds too perfect, it just feels… off.

Research funded by the National Science Foundation at City University of New York shows that pitch variation, pacing, and emphasis directly influence how trustworthy we perceive synthetic voices to be. The better the rhythm and timing, the better the social connection and trust.

When CodeBaby avatars pause before answering, or slow down slightly for something meaningful, that isn’t accidental. It’s intentional. Because cadence is one of the strongest ways to signal: “I’m present with you. You can relax.” And when people relax, they trust.

The Uncanny Valley Is About Behavior, Not Just Looks

We talk about the uncanny valley in terms of visual creepiness, that unsettling feeling when something looks almost human but not quite. But there’s a behavioral uncanny valley too, and it’s actually more disruptive.

Neuroscientists describe this in terms of “predictive encoding.” Our brains are constantly trying to anticipate what will happen next in social interactions. When a human-like entity behaves too predictably or mechanically, it creates what researchers call “error signals.” The more realistic the appearance, the more jarring the mismatch when behavior doesn’t follow.

This helps explain why hyper-realistic avatars can sometimes feel less trustworthy than stylized ones. When appearance sets the expectation of human-like behavior, and the avatar can’t deliver, the result is cognitive dissonance. A stylized avatar, by contrast, establishes different expectations from the start. Users aren’t waiting for it to be human. They’re just engaging with what it actually is.

Trust and Ethical Implications

Trust isn’t pixie dust. It doesn’t magically appear when you add more polygons or higher-res textures. Trust happens when an avatar behaves in ways that feel natural, respectful, and emotionally congruent.

When that happens, people stay longer in the conversation. They understand more. They open up more. They feel more satisfied and experience less cognitive load.

We see this across healthcare, education, hospitality, everywhere digital humans show up. And that brings me to something important: if we know how to engineer trust, we also carry the responsibility to do it ethically. Overly realistic behavior without guardrails can be manipulative.

That’s why at CodeBaby, we design for transparency, clarity, and healthy boundaries. We want people to feel supported and empowered, not nudged and influenced.

The Bottom Line

People don’t connect with avatars because they’re realistic. They connect because they’re relatable.

Eye gaze, microexpressions, and cadence are the small things that research shows make a big difference. And the science suggests that stylization isn’t a limitation to overcome. For many users and use cases, it may actually be an advantage: cleaner emotional signals, better predictive alignment, less uncanny valley disruption.

These are exactly the things we invest in every day at CodeBaby. Because if an avatar is going to speak on behalf of a brand, an educator, or a healthcare provider, it needs to earn that trust the right way.