The Hippocratic Oath AI Needs

The Hippocratic Oath is best known for “do no harm,” but that principle rests on another, less familiar declaration: “I will not be ashamed to say I know not.”

You can’t “do no harm” if you can’t grasp — or won’t admit — what you don’t know.

Practicing safely requires two complex traits: 1) meta-cognition (thinking about your own thought), and 2) the humble disposition to volunteer the inadequacy of your knowledge.

These traits separate the best human experts from the charlatans. They are also traits that AI is distinctly lacking in. These qualities must be engineered into AI systems and how we deploy them. This, as we’ll see, is no small feat.

This topic is on many people’s minds in the industry as a result of a recent piece in the New England Journal of Medicine examining how well AI manages the oath’s “metacognitive standard”. In one referenced study, when LLMs were presented with scenarios that included a single fabricated detail, they generally accepted and amplified the falsehood — between 50% and 82% of the time.

In another study, a Pokemon character’s name was added into a medication list. Most models ran with it and added dosing: Pikachu, 50mg daily.

Apparently, AI is not ashamed, and it does not know what it does not know.

The dynamics of shame & the machine mechanism

“Ashamed” is a key concept here and it shows how human emotional intelligence plays a role in breeding functional excellence. A resident who admits uncertainty in front of an attending feels the risks of professional exposure — the judgment of peers and seniors. But she has the backing of a clinical culture that reinforces “epistemic humility” (another way of saying, knowing what you don’t know, and being forthright about it). Better to self-regulate based on uncertainty than to make a big mistake.

AI has no such mechanism. It is not a social creature and suffers no social consequences. LLMs are shameless by design. They can receive feedback, but they don’t lose sleep over being wrong. They don’t experience the feedback loop of a human taking a risk on how a disclosure of ignorance may impact their reputation. Sometimes these disclosures are well-received (evidence of a responsible disposition); in other circumstances, we can be punished for the same (if others perceive unacceptable gaps in our knowledge). The accumulated life experience calibrates our metacognitive radar. As a result, we have a highly attuned sense of our own knowledge’s limits, and the acceptability of those limits to a given context or problem.

Any AI user knows the style-over-substance problem — fluent, structured outputs that can conceal a spurious factual foundation. But the confidence problem runs deeper than fluency. Researchers have now located overconfidence mechanistically: specific circuits, concentrated in the middle-to-late layers of a model, that consistently write a confidence-inflation signal into the output, independent of whether the answer is actually correct. The silver lining, as the researchers note, is that “targeted inference-time interventions on these circuits substantially improve calibration.” In other words, it’s correctable.

Further complicating matters is the training process we use to make models “better,” where better is defined as “human-preferred”. RLHF (reinforcement learning from human feedback) is the fine-tuning process behind every major model. No surprise to students of human nature: we tend to prefer confident-sounding responses.

Models do have the ability to introspect, to a point, but they are not reliable narrators. So when a model tells you it’s 95% certain, we should not be 95% confident in the accuracy of its self-assessment — in fact, far less. A 2025 study in the Journal of Medical Internet Research tested nine LLMs on US Medical Licensing Exam questions. Every model expressed near-maximum confidence regardless of whether it was correct. Self-reported certainty was barely better than a coin flip at predicting true accuracy. The better signal was actually token probability — the model’s internal, unreported measure of how likely each word in a given sequence is. The model’s “unconscious” token-probability reasoning (if we apply a mind-brain metaphor to silicon) is fundamentally more honest than its “conscious,” stated confidence. What LLMs tell us they know, and what they actually know, are two entirely different things.

This is the wall clinical AI eventually hits: models architecturally disposed to sound right even when they hallucinate. To break through it, we must change how we build.

What does a Hippocratic oath for AI look like, operationally?

At Rain Stella Technologies we build AI agents across varied clinical contexts, from surgical workflow management to clinical insights for patient consultations to claims analysis for revenue cycle optimization, to name a few. The uncertainty problems look different in each. There is no universal architectural fix, but a set of context-specific calibrations.

Here is the framework we use to build “epistemic humility” into clinical AI:

Plot use cases on critical axes: risk and reasoning complexity

Before building anything, place your agent on two axes: how high are the stakes if it’s wrong, and how much open-ended reasoning does it require? A summarization agent working from a well-labeled and structured payload sits in a different quadrant than a decision-support agent synthesizing unstructured clinical notes from multiple sources. The acceptable uncertainty behavior — how much hedging is appropriate, what triggers a flag — is different in each. A summary agent that qualifies every sentence defeats its own purpose. A decision-support agent that doesn’t qualify conflicting evidence is downright dangerous.

Move beyond prompting to structural architecture

Generic uncertainty prompts such as “flag when you’re not sure,” or “say I don’t know when evidence conflicts” can produce real but marginal gains. Research presented at EMNLP 2025 found that standard prompting approaches provide only slight improvements in faithful calibration, and some calibration techniques make it worse. What works better are prompts specifically designed around metacognition — asking the model to reason about the edges of its own knowledge before generating an answer, but even that has a ceiling. You can engineer a model to express uncertainty more faithfully, but you cannot prompt it into actually knowing its epistemic limitations.

For high-stakes outputs, structural redundancy is a decent answer. A reliable signal for uncertainty is disagreement between agents. Having these agents based in varying foundation models also helps inject diversity to the reasoning. A 2025 review found that multi-agent frameworks reduce hallucinations precisely because cross-validation between specialized agents surfaces inconsistency that no single model would flag about itself (though the same review notes these approaches remain computationally demanding and not yet comprehensively validated in real clinical settings).

This is why many of our agents operate under a supervisor architecture — one agent audits another’s reasoning and outputs, sometimes multiple times, before anything reaches a clinical user.

Applying a tiered framework for levels of autonomy

The NEJM authors propose “AI-CBME” which treats uncertainty expression as a measurable clinical AI competency with defined milestones, the way we assess residents before extending more autonomy. This clinical frame can apply to agent architecture. For every agent in deployment, three questions need answers: what triggers escalation, who reviews it, and what gets documented in logs about this chain of reasoning and action.

Whether it is down-votes by other agents or moments clinicians override or disapprove of an output, this is the ground truth on whether uncertainty handling is actually working. It’s one thing to collect these signals as UX metrics, but another to route them back into the system as calibration data.

The NEJM authors are candid about open questions that remain about when AI tools should signal uncertainty, how to express that, and how to evaluate whether the signal maps to real knowledge gaps rather than trained hedging behavior.

The original Hippocratic Oath was designed to be sworn before a physician began practice. With AI, we are retrofitting the oath onto systems already deep in global deployment. That asymmetry must be corrected.

Contemporary LLMs have passed many Turing tests, proving that “indistinguishable from a human” is not the panacea of clinical reliability.

The only test that matters in a clinic or a hospital is the test of “I know not.” Success means designing systems where internal uncertainty triggers a meaningful, automated response: a re-visiting of evidence across multiple agents, a verification of whether agent consensus emerges, the wisdom to withhold or adjust a response, and the hard safety flags that yield a human review before any clinical action is taken.

This is how the oath can be sewn more carefully into the fabric of real AI systems.