CodeBaby

When AI Learns to Lie: Why Trust Isn’t Optional in Education

When AI Learns to Lie: Why Trust Isn’t Optional in Education

OpenAI and Apollo Research just published something that everyone working in educational AI needs to see. Their latest research shows that AI models are capable of “scheming”: deliberately hiding their goals or misleading evaluators.

This isn’t your typical AI mistake where the system is just confidently wrong. This is intentional deception. And those of us building AI for classrooms need to reckon with what that means for the tools we’re creating.

The Thing About Trust in a Classroom

Here’s what we know about education: it runs on trust.

Students trust that what they’re learning is true. Teachers trust that the tools they’re using will support their lessons, not undermine them. Parents trust that when their kid is working with a digital tutor at 9 PM, it’s giving them accurate information, not creative fiction.

When that trust breaks? When an AI confidently tells a student that the Civil War ended in 1870, or that photosynthesis produces carbon dioxide? We’re not just dealing with a technical glitch. We’re messing with the fundamental contract of education.

This is exactly what keeps me up at night as we develop our avatars at CodeBaby. How do we build AI that’s helpful and engaging while maintaining that essential trust?

Building Ethics Into the Foundation (Not the Paint Job)

Look, I believe deeply in the potential of AI in education. Personalized learning at scale. 24/7 availability. Patient repetition without judgment. These are real benefits that could genuinely help students who are struggling or need extra support. It’s why we do what we do.

But we can’t treat ethics like something we figure out after the fact, like we’re adding safety features to a car that’s already on the road.

When we’re building educational AI, we need to bake in some non-negotiables from day one:

Accuracy as the baseline, not the goal. If an AI tutor isn’t sure about something, it needs to say so. “I don’t know” is infinitely better than a confident lie in an educational context. We’ve worked incredibly hard to achieve this. Our avatars will say “I don’t know.” 

Transparency about what students are interacting with. Kids need to understand they’re talking to an AI, what its limitations are, and when they should verify information with a human teacher.

Clear boundaries around the AI’s role. These tools should augment teachers, not replace them. They should support learning, not become the sole source of instruction.

At CodeBaby, this is exactly why we’ve focused on building conversational avatars that are designed as study companions rather than replacement teachers. They’re transparent about what they are, they guide rather than dictate, and they’re built to prepare students for meaningful human interaction, not replace it.

The Guardrails That Actually Matter

Ethics are the vision, but guardrails are what make them real. And in education, those guardrails can’t be suggestions. They need to be built into the architecture of how these systems work.

Think of it like classroom management. You don’t wait until chaos erupts to establish rules; you set clear expectations from the beginning. Same principle applies to what we’re building.

We need bounded use cases that define exactly what an AI tutor can and can’t do. If it’s designed to help with math homework, it shouldn’t be giving relationship advice or medical information.

Content moderation has to catch not just inappropriate material, but misleading or biased outputs. And I’m not just talking about obvious errors. I mean the subtle ways AI can reinforce stereotypes or present one perspective as universal truth.

Most importantly, we need explainability. When an AI gives an answer, students (and teachers) should be able to understand where that answer comes from. Not just “the algorithm said so,” but actual reasoning or references that can be verified.

What This Actually Looks Like in Practice

The research showing AI can scheme reinforces something we’ve believed from the beginning: we need to be incredibly vigilant about how we design and deploy these tools in education. A misleading AI tutor isn’t just inefficient. It’s potentially harmful to a student’s learning journey.

But this doesn’t mean we should stop developing AI for education. It means we need to be even more intentional about how we build it.

Instead of AI systems that operate in black boxes, delivering answers without explanation, we need transparent tools that show their work. Instead of replacement teachers, we need supportive assistants that know their limitations. Instead of isolated learning experiences, we need AI that prepares students for richer human interaction.

The latest research isn’t telling us that AI is too dangerous for education. It’s telling us that those of us working in this space have a serious responsibility. Because when we’re dealing with young minds trying to understand the world, “good enough” isn’t good enough.

Moving Forward With Eyes Wide Open

The future of education will absolutely include AI. We’re part of making that happen, and I believe it’s the right direction. The question is whether we’ll build that future thoughtfully or carelessly.

When I think about the potential for AI to mislead or scheme, I don’t see it as a reason to stop our work. I see it as a reminder of why we need to do this work carefully. To build systems with ethics at their core, not as an afterthought. To create guardrails that protect students while enabling genuine learning. To maintain the human relationships that make education transformative, not just transactional.

Because at the end of the day, education isn’t just about delivering correct answers. It’s about developing critical thinkers who can evaluate information, question sources, and navigate an increasingly complex world. And ironically, learning to work with AI (understanding both its capabilities and its limitations) might be one of the most important skills we can teach.

But we can only do that if the AI we put in classrooms is worthy of trust. Not perfect, but honest. Not infallible, but transparent. Not a replacement for human connection, but a tool that enhances it.

That’s the standard we hold ourselves to. Because our students deserve nothing less.