The Illusion of Humans in the Loop

Et tu R2-D2?

In an emergency, would you follow a malfunctioning robot into an unfamiliar space?

This is exactly what participants in a controlled study at Georgia Tech did. When a simulated fire alarm went off and smoke started filling the building, a robot pointed the participants toward an unfamiliar corridor. The exit was clearly marked, but most people followed the robot into a dark room with no visible exit.

Pauline Robinette and her colleagues called this overtrust. You’d think that human beings can easily calibrate how much trust to place in a robot, but turns out that in an emergency, when it mattered, the machine won.

Bollywood meets Chipwood

I had to look carefully to make sure the video clip of Akshay Kumar interviewing Jensen Huang wasn’t fake. While this isn’t the pairing you’d expect to discuss the future of human cognition, Jensen said something I’ve been thinking about since I watched it. Akshay asked Jensen about what human jobs can’t be taken over by AI. Jensen responded by using, what I’m calling a “dial” analogy. You figure out how much to turn the dial. 20 percent AI assist, 40, 50? It depends on what you’re doing and how much AI can help. You just don’t want to not be using AI.

The dial analogy is appealing. It puts you in the driver’s seat, tuning and calibrating how much assistance you need. But I’m not sure the dial is really mine to set.

Procedural Theater

“Human in the loop” has become the industry’s standard defense against AI risk. It’s an elegant model with AI owning most of the burden by processing information and making a recommendation, while still leaving a human to review it.

Nic Spatola wants you to think about it differently. He calls it procedural theater: the human is formally present, but the conditions that make human judgment meaningful have been silently removed.

Automation bias and the erosion of scrutiny

A few years ago, I was running late dropping my son off to a baseball game. Google Maps found “a better route”. I’d been down this stretch many times and was unsure of the new route. I took it anyway, and as you might think, it didn’t get me there faster.

I know how Google Maps works: imagery, crowdsourced data, incident reports, advanced routing algorithms I’d never replicate myself. That general knowledge felt like reason enough to trust the specific call it made that morning. But I didn’t actually know why it chose that route. I knew just enough to stop asking. The little voice in your head telling you that something feels off here gets quieter when you think you understand how the machine does its job.

That’s automation bias. It’s one thing to be late to a rec league game, but when lives are at stake, the bias really hurts.

In 2019, London’s Metropolitan Police ran a facial recognition trial. Independent review found that 81% of people the system flagged were wrongly identified, which means around four in every five was an innocent person! Isn’t that crazy? The officers trusted the machine over their own eyes.

In a study of 28 pathology experts making AI-assisted diagnoses, Rosbach and colleagues found that in 7% of cases, doctors overturned their own initially correct evaluations and followed erroneous AI advice. The AI was wrong, the doctors knew the right answer, and they changed their minds anyway.

During the 2003 Iraq War, US Patriot missiles shot down two allied aircraft in separate friendly fire incidents. Congressional investigators found that operators had approved the system’s targeting recommendations without independent scrutiny of the information available to them. The operators simply confirmed what the system flagged. The ICAO termed this automation intimidation: the automated system carries an authority that makes disagreement feel presumptuous, like you’re the one making the error.

Humans are in the loop, but the machines are quietly overriding the humans.

Overtrust and the erosion of calibration

The experiment at Georgia Tech didn’t start off with malfunctioning robots. But when participants blindly followed the robot in an emergency situation, Robinette’s team felt they might be able to influence behavior by having participants observe robots making mistakes in non-emergency situations. It didn’t help. Even though the participants had observed the robot being unreliable, circling aimlessly and taking wrong turns, when the alarm went off, they followed it blindly anyway.

Human beings build trust through experience, a gut feeling built over time. Words like “safe”, “correct”, and “fair” mean what they do because of the feeling they evoke in us. Over time, we build an internal calibration of these terms. That’s what makes a human reviewer more than a rubber stamp.

But when the same words begin to mean whatever the system produces rather than what you actually feel, you’ve already accepted that framing. Procedural accuracy passes for moral correctness as the illusion of validity. Each deferral deepens it and your calibration slowly changes to match that of the machine.

Villegas-Galaviz and Martin call this a moral sedative: AI lowers the perceived cost of a decision while quietly inflating your sense of mastery. I think of this as Dunning-Kruger for AI: the less equipped you are to evaluate the machine’s decisions, the more confident you feel doing it. Eventually, the question is no longer “is this right?” It becomes “did the machine approve it?” And the two feel identical. The machine becomes the originating source of moral judgment, and the human in the loop is left ratifying decisions, not making them. That’s moral substitution.

So if the machine is the moral author, and something goes wrong, who’s accountable?

Moral diffusion and the erosion of accountability

Have you been to your annual, fully covered, physical lately? Doctors have increasingly started asking for consent for AI transcription and if you happen to make casual conversation about your health that falls outside the strict, yet vague, boundaries of what constitutes “covered questions”, well, buckle up.

This happened to me a few years ago. The bill arrived with an extra charge I didn’t recognize. What I thought was a standard physical had been coded as two separate consultations. An AI transcription system had matched keywords from that brief exchange and generated an additional billing code. What followed was three months of escalations. The billing department explained they processed what the system generated. The doctor’s office said they submitted what the billing system produced. The insurance company processed what was submitted. I was only looking for the person that made the decision. The answer, at every stop, was the same: nobody knew. By the time I found someone who could actually reverse the charge, they treated it like an extraordinary exception, not a correction.

That’s moral diffusion. There’s no malice or negligence. Just a system where every step points to the step before it, and the accountability fades before anyone has to own it.

In 2018, a Tesla Model X operating in Autopilot mode steered into a concrete median on US-101 in Mountain View, killing the driver. The NTSB found that Autopilot had actively directed the vehicle into the gore area due to system limitations, while the driver was distracted and overreliant on the automation. Blame was assigned to the driver. A car’s crumple zone is designed to absorb the force of a crash, protecting the vehicle structure. Madeleine Clare Elish coined the term moral crumple zone for what this outcome actually was. The human, positioned at the moment of failure, absorbs the moral impact: responsibility, blame and liability, while the automated system’s record stays clean.

The institutional version of this is what Dan Davies calls an accountability sink. Organizations deploying AI get to claim both the efficiency of algorithmic decision-making and the moral cover of human oversight. When something goes wrong, the human reviewer is responsible. When it goes right, the algorithm gets the credit.

“Human in the loop” doesn’t tell you whether the human’s judgment mattered. It only tells you the human was there.

What should I set the dial to?

Captain Chesley Sullenberger had just over three minutes after both engines failed over the Hudson. Every automated system on that plane was pointing toward a different outcome: the flight computers, the checklists, the protocols. He overrode all of it. Not because he processed information faster than an algorithm, but because he had pattern recognition built from decades of embodied experience that no model had ever been trained on. The situation was genuinely novel, outside any training data. The thought experiment writes itself: what if there had been an AI co-pilot in the loop, trained on every commercial aviation incident in history? Would it have flagged Sully’s decision as a statistical anomaly? Would automation intimidation have made him hesitate in those three minutes? Would the moral crumple zone have activated, the system handing control back at the last moment?

What the Hudson landing preserved is exactly what the three failure modes erode: not the ability to follow the machine, but the capacity to know when not to. Jensen’s dial metaphor isn’t wrong. It’s describing a world the studies don’t support. In that world, the ratio is yours to set, you can hold it there, and the human judgment you’re counting on stays intact at whatever percentage you’ve dialed in. But automation bias, overtrust, and moral diffusion all move the dial without your permission. Your scrutiny erodes, your trust outpaces the system’s reliability, and accountability dissolves into a chain of people and processes, none of whom feel fully responsible.

“Human in the loop” survives as a design philosophy only if you pay as much attention to the trust relationship as you do to the AI system itself. I don’t know what Sully would have done with an AI co-pilot pointing the other way. I’m not sure he would have known either, until the moment came.

Rabbit Holes

Nicolas Spatola, AI efficiency can undermine accountability, even with humans in the loop (Tech Policy Press).

Cory Doctorow, AI’s “human in the loop” isn’t (Pluralistic).

Madeleine Clare Elish, Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction (2019).

NTSB, HAR-20/01 (2020).

Dan Davies, The Unaccountability Machine (2024).

Pauline Robinette et al., Overtrust of Robots in Emergency Evacuation Scenarios.

The Jensen/Akshay Kumar dial exchange is here.

Big Brother Watch, facial recognition statistics explainer and Stop Facial Recognition.

Rosbach et al., arXiv:2411.00998 (2024).

SpaceNews, report on the Patriot incidents.

Zerilli, Bhatt and Weller, How transparency modulates trust in artificial intelligence, Patterns (2022).

Kruger and Dunning, Unskilled and Unaware of It, JPSP (1999).

Villegas-Galaviz and Martin, Moral distance, AI, and the ethics of care, AI & Society (2023).