In 1973, the science fiction author Arthur C. Clarke wrote what would become perhaps the most quoted observation in the history of technology:

“Any sufficiently advanced technology is indistinguishable from magic.”

I have always liked this quote, because he was right. Do you know how a quartz clock works? I’ll give you a hint: it’s vibrating crystals. Turns out the holistic hippies were right, crystals are magic. Right now in my pocket I have at my fingertips essentially all human knowledge and information. Somewhere in a server farm in Virginia, a computer the size of a wardrobe decides that I need to be enraged at a video of a monkey being bullied. This thinking silicon (a rock) converts this information into pulses of light, fires them down a glass wire thinner than a human hair, sends them under the Atlantic Ocean at roughly the speed of light, up through a cable landing station on a beach in Portugal that looks completely unremarkable from the outside, across Europe, under more oceans, up through the seafloor off the coast of Queensland, through a series of increasingly unimpressive concrete buildings, through the air invisibly as radio waves, through my wall, and into a small rectangle of glass and metal in my hand. The monkey has no idea any of this happened.

By any reasonable historical standard (including right now) this is completely indistinguishable from sorcery. If I was born at any other time in human history I would have been blissfully unaware of the plight of Punch the monkey.

This is fine. I don’t particularly need to know about the round trip of light waves and feats of engineering that have occurred to deliver that sweet baby monkey to my eyeballs. In fact, I would say most of us have no idea how most of the technology in our lives works. And this (already magical) technology gets more and more advanced every day.

Most people are only consumers of technological magic, and I think this is great. You don’t need to understand the complexities of global networking infrastructure to watch Benedict Bridgerton and Sophie fall in love. But some of us are not just consumers, but sorcerers ourselves.

Yes, this is another post about AI.


There Is No One Behind The Curtain

I want to talk about a specific kind of story that I am seeing online at the moment. It usually begins with “Claude did…”. There are lots of these, from Claude Code wiped our production database to vibe-coded SaaS exposing Stripe API keys in the front end and having their customers charged!

Inevitably when you read the comments, there are several comments like “Claude did this to me too” or “if you tell Claude ‘don’t expose API keys’ in your prompt then it won’t do this”.

I find these stories irritating because of the framing. “Claude” is no more responsible for any of this than your kitchen knife is if you cut your finger. To use language like “Claude said/did/suggested” attributes agency, intent and decision-making to something that has none of those things. This anthropomorphisation is very human, and the marketing surrounding these tools doesn’t help. “Thinking” models, even the name “artificial intelligence” makes it sound like there is sentience actually involved.

Generative AI technology is incredibly powerful, and getting more powerful by the day. It can be used to accomplish incredible things, but, crucially, there is no one behind the curtain.

So what is an LLM actually doing? At its core, it is predicting the most probable next token given everything that came before. That is it. One of my colleagues recently described LLMs to me as a “big ball of math”. There is no reasoning, no fact-checking, no ground truth being consulted. There is no “it” in any meaningful sense. It is an extraordinarily sophisticated pattern-completion engine, trained on an enormous amount of human-generated text, making probabilistic guesses about what should come next, one token at a time.

I am not saying this to be dismissive. The fact that this process produces outputs as capable as the current generation of models is astonishing. But remarkable is not the same as magical, and understanding the difference matters.

“Hallucinations” are not bugs. They are the entirely expected output of a system that completes patterns rather than retrieves facts. Of course it states something false with confidence, it is predicting what a confident, knowledgeable answer looks like, not checking whether the answer is true. It is producing code that “works” (based on enormous training sets of working code), but, crucially, there is no intelligence or thinking behind the “decision” to expose API keys. The model has no idea this is wrong. It does not have ideas about anything.


Magic Does Not Excuse Us From Engineering Discipline

This feels somewhat non-intuitive to an engineer, as we are trained to think deterministically. Given input x into function f, f(x) always produces result y. This is how the software we have used forever behaves. But LLMs are different, they are non-deterministic1. Due to things like model temperature settings, system prompts and other factors beyond the scope of this post, the exact same prompt given as an input can produce entirely different results. And this is even when you use things like skills (which is just another fancy way of injecting context) – you are more likely to get a good result, but you will also sometimes get completely bizarre behaviour that completely violates explicit instruction in your prompt/skill/context. Again this is because there is no sentience interpreting the context, there is a prediction engine predicting the next token.

For engineers joining a team I am on, I would not typically grant production access during their onboarding, until we have seen how they work. In fact, I almost feel like no one should have production write access (this is what CI/CD is for!) other than some form of “break glass” admin that is not used day to day.

I would not give this engineer terraform destroy privileges. I would not grant them superuser permissions.

If we would not do this for a smart human, why are we doing it for an ultra “smart” machine that has no judgement at all? Not limited judgement, zero judgement. You’re relying on a non-deterministic system to not non-deterministically determine that the best way to make your tests green is to delete all of them (no tests == no failing tests). Or that the best way to test idempotency of your change is to run a terraform destroy and apply with your new change on production – dropping your production database.

We must always limit the blast radius and put deterministic constraints around non-deterministic behaviour.

Now, is applying proper access controls slower than just letting the agent loose? In the same way that consistent investing is slower than betting it all on black, yes, technically. One can make you a millionaire faster. But the odds are against us.

None of this is specific to AI. We have always scoped credentials. We have always limited what outsourced resources can touch. We do not give contractors the keys to every system on their first day. The magic of the technology does not change the fundamentals of good engineering practice, if anything, it makes them more important.


It Generalises

Understanding what a tool actually is, rather than what it seems like, changes what you can do with it. This is true of every tool, not just AI.

A home cook watching a chef work a stainless steel pan sees something close to magic. The eggs do not stick, the fond develops perfectly, the whole thing looks effortless. What they are actually watching is someone who understands heat transfer, how protein behaves at different temperatures, what a thin layer of oil does in a hot pan. There is no magic. There is just understanding applied to a tool. Once you have that understanding yourself, the pan stops being mysterious and starts being controllable.

The same is true of any sufficiently complex tool. The engineers I have worked with who understand their query planner, who know why a query is slow rather than just that it is, make better decisions than those treating the database as a black box. They know which knobs exist to turn, and what turning them will do in terms of performance.

I have written before about not outsourcing what you do not understand. The idea that using AI to produce outputs you cannot evaluate is a specific kind of trap. This is the companion argument: understanding the tool itself, not just the output domain, is what allows you to use it well. It is the same principle applied at different levels.

The goal is not skepticism. The goal is not to avoid powerful tools or treat them with suspicion. The goal is to stop treating power as a substitute for understanding. A tool that can do almost anything is not the same as a tool that will do the right thing. Knowing the difference is what separates the “vibe coded my API key into the front end” engineer from the professional one.


You’re a Wizard Claudy

Clarke was right that sufficiently advanced technology is indistinguishable from magic. But he was writing about perception, not about how engineers should approach their work.

The technology is extraordinary. I use it every day and continue to be impressed by what it can do. But, I would argue that the practitioners who get the most out of it are not the ones who trust it the most. They are the ones who understand it most clearly: including exactly where its judgment ends, which is everywhere, because there is no judgment. They treat it as what it is: an enormously capable, completely literal, entirely non-judgmental tool that will do precisely what it is pointed at.

Pull back the curtain. There is no wizard. There is just a very impressive ball of math doing something well-defined and predictable.

Perhaps the models will become so good in the future that the problems I have described simply will no longer occur. Until then, it is up to us as professionals to weave our spells with care and caution, to distinguish hype from reality, and to learn our tools until we can distinguish them from magic.


  1. LLMs, as in the math based model, are actually deterministic. However in practice, due to various settings like temperature, system prompts, floating point arithmetic running on different hardwar etc means they don’t behave this way with real world usage. ↩︎