Avoiding AGI Catastrophe, Part 1
Why Open, Decentralized and Neural-Symbolic is the Safest Route
It has long been expected that one of the hallmarks of an emerging Singularity will be: AI improving AI. This notion was the crux of I.J. Good’s “intelligence explosion”, articulated in 1965. Once smarter machines are making even smarter machines are making even smarterer machines – it’s not hard to see where things are going.
We’re not fully there yet but we can see the early stages of this sort of “recursive self-improvement” (RSI) emerging. Already, each generation of coding agent actively helps build the next generation of even smarter coding agent. As I reviewed in a recent post, my own team’s OmegaClaw agents have been showing remarkable ability to revise and improve their code and architecture based on their own reflection and interaction with their human friends.
In response to the apparent advent of the early days of RSI, two of the most influential AI labs in the world have recently told us how they think we avoid catastrophe as AI begins to improve itself.
Anthropic says: build in the ability to slow down. (More than a few of my AI colleagues have expressed doubts they actually plan to slow down anytime soon, but in any case that’s what they’re saying, however it may correlate with what they’re actually doing….)
OpenAI says: keep racing ahead, but race defensively, and make sure the right country wins.
Both these positions seem reasonable on the surface, but are actually far too shallow to be useful, given the overall complex dynamics of the world today. Our planet in this late pre-Singularity era (the “foothills of the Singularity” as Demis Hassabis put it) is a complex biocybernetic system, and the Anthropic and OpenAI positions each basically stops at a single lever and mistakes it for the whole mechanism.
Avoiding catastrophe is not a simple binary choice or dial-setting between fast and slow, or between us and them. It is a complex-systems architecture problem. And the architecture I have spent years building — a decentralized, neural-symbolic, evolutionary AGI framework — provides, I will argue here, the strongest available bet for solving it.
These are obviously huge issues and this is merely a single, albeit lengthy, blog post (I have written about these topics much more extensively, for instance in the 2024 book The Consciousness Explosion). For reasons of space and focus, here I’m going to focus mostly on the core thematic of the recent Anthropic and OpenAI posts: How to avoid near-term AGI catastrophe for our species.
In fact I have spent more time thinking about the optimistic side of things. The reason to get AGI right is not merely to survive but to embrace the extraordinarily wonderful and transformative future waiting for us on the far side of it – if we pull it off right. But that’s still an “if” not a necessity, and I am down with the doomers that it will be much nicer if the Singularity happens not to involve extinguishing, say, my wife and kids and the rest of our species.
I am also generally more worried about the fate of the developing world during the period between the first human-level AGI and the advent of robust superintelligence. Even if superintelligence comes out wonderfully beneficial in the end, there may be a period of difficulty and chaos beforehand as AGI gradually takes on more and more traditionally-human tasks and our economies struggle to adapt. In the developed world this will likely result in various UBI-like mechanisms taking hold, but as I have often asked, who will give UBI in the Central African Republic (as one example)? Having a large part of the world get by on subsistence farming without funds to pay their phone bills or buy antibiotics, while other parts of the world watch TikTok and play video games and figure out how to find productive meaning in life as robots take over more and more of their labor – seems a recipe for far too many techno-thriller style scenarios and a lot of unnecessary human suffering. But I digress!
I will mostly bypass these other (rather significant) issues here, and focus for the purpose of this post on the bare minimum most humans would agree we need from our future — i.e. on avoiding utter catastrophe for our species. The radical flourishing that I foresee beneficial AGI will make possible, and the steps I think we should take to make the path to Singularity relatively painless for the less fortunate on our planet, are things I have talked about a lot before and will talk about a lot again, but will pass over lightly here.
The very impatient reader may want to scroll down to the section “Toward a rigorous argument about AGI acatastrophe: Seven hinges” below, where I give the details of my logical argument why a Hyperon-on-ASI-chain style open and decentralized approach is the best approach to avoiding AGI catastrophe. In the sections before that I take some care to frame the issues appropriately, which seems necessary because of the huge amount of general conceptual confusion surrounding these issues.
I note also, the framing sections sorta jump around a bit because of the sprawling nature of the contexts involved (and because I think that way), but the logical-argument sections are different and focus in quite narrowly on the core question: What is the most likely way to avoid AGI catastrophe for humanity in the presence of rampant RSI?
Stay tuned as well for Part 2 of this post which will present “cryptographic laterality”, a concrete technique for making decentralized AI networks hard for adversaries to fork.
Shallow Reasoning about Deep Learning
Anthropic’s argument runs roughly as follows. Recursive self-improvement is already accelerating AI development – each stage of AI is helping create the next stage of AI, and the human role in the loop is narrowing at every step … and in the worst case, rare misalignment could compound as models build their successors, with self-updates “growing more frequent but less understood until we lose control of them.”
Anthropic’s prescription is to preserve what they call “the option to slow or temporarily pause frontier AI development,” and to build, ahead of time, the verification machinery a credible pause would require: multiple well-resourced labs, in multiple countries, agreeing to stop under the same conditions, each able to verify that the others have actually stopped. They reach for the arms-control analogy — regimes like the Intermediate-Range Nuclear Forces treaty — and then concede that such regimes took decades to build, and that, in their own words, “we don’t have that long.”
(They also minimize the biggest obvious objection to the arms-control analogy: that nuclear weapons don’t have transformative positive economic and human value driving their profileration. In Anthropic’s scenario, the US, as a rich country with maximal military power, would be asking poorer countries to pause AI development which is palpably making them economically better off as well as more militarily powerful.)
OpenAI’s blueprint starts from the same observations about AI accelerating AI development — it even urges regulators to “treat RSI as an urgent priority” — but draws the opposite conclusion. Rather than advocating for a slowdown, OpenAI calls for the United States to build a durable federal framework for governing frontier AI, with a strengthened government safety institution at its center, leveraging capabilities no private company has — classified intelligence, CBRN-defense expertise, secure compute. It wants the U.S. to embrace and double down on its current default global AI leadership role. And it wants defense to outpace offense — in its framing, to ensure “defensive capabilities scale faster than offensive capabilities” — so that AI strengthens the institutions that protect society. The frame is: don’t slow down, instead keep pushing and win the AGI race, but not in an obnoxiously aggressive way. The vibe is somehow halfway between Google’s now-defunct “Don’t Be Evil” and Palantir.
Neither Anthropic or OpenAI is wrong about everything – but each of them looks at a complex nonlinear system, identifies a one-dimensional lever and treats it as the master variable.
Anthropic’s pause framing is downstream of an unstated assumption: that frontier capability lives in large, monitorable training runs on easily identifiable compute. That assumption is the only thing that makes verification — and therefore a credible pause — conceivable at all, absent some sort of extreme global fascist regime (which might not even be feasible given the diversity of modern technologies). Of course Amodei seems the most uber-LLM-pilled of all the Big Tech CEOs so this perspective basically makes sense for Anthropic.
If you remove the assumption that AGI will require centralized hyperscaler hardware – as I will argue we should – the whole verification concept stops being so straightforward: there is no super-mega hardware installation to monitor, no clearly defined threshold to trip, no obvious chokepoint to govern. And Anthropic half-concedes this themselves: by their own admission, even given their own quite restrictive implicit assumptions about how AGI will work the verification regime cannot be built in the time available. Their verification concept is less of a plan and more of a wish with a footnote.
OpenAI’s framing is shallow in a different way. The promise to bias models toward defense is a weak lever, for three reasons that I’ll repeatedly reinforce here:
a model’s defensive conditioning can be fine-tuned away by whoever holds the weights;
what a system is used for is set by the deployer, not the trainer;
the deployer in question may be a national-security buyer such as the U.S. Department of Defense — which in September 2025 took “Department of War” as a secondary title and whose Secretary promised the military would “go on offense, not just on defense.”
OpenAGI’s call for the United States to lead then simply picks a side in an arms race — which does nothing about the fact that the underlying capability diffuses to everyone else within months regardless of who leads. And the federal framework the blueprint proposes is strikingly toothless: in its own design, the government’s evaluator assesses frontier models but is “not to approve or block deployments.” Both moves quietly conflate aligned-to-the-developer (a profit function, or a flag) with aligned-to-humanity, which are different targets and can at times be adversarial ones.
(Don’t get me wrong, I am a proud American and while I was born overseas, I love my country as much as anyone including I would venture even the great Sam Altman or Alex Karp! It’s not the topic of this post, but after living overseas for a decade and a half I returned home feeling very impressed by various awesome aspects of the good old US of A, including the fact that we still are genuinely the free-est major country on the planet, and the only place where investment money somewhat systematically flows into radical unproven wild ideas. The cultural and genetic diversity we have here also excites me and I hope it is something we continue to treasure and foster. But as awesome as America is, no country is big enough to encompass AGI, which is going to be globally and complex in its dynamics much like the Internet, computer technology and science itself.)
The Anthropic analysis treats “slow down or speed up” as a primitive safety variable (and proposes unrealistic ways of taking global control of this variable). The OpenAI analysis treats “centralize or distribute” as a primitive safety variable (and proposes “centralize on our team” as the right setting). But neither of these factors is really best considered as a primitive variable – in the actual complex global reality, each of these is lever whose effect depends entirely on a set of deeper conditions.
And once you actually work through those conditions, a particular kind of open, decentralized architecture turns out to be the strongest bet for both avoiding catastrophe and creating abundance. Which is not the kind any Big Tech company is building.
Engineering toward BGI
Like Anthropic and OpenAI, I have my own horse in the AGI race– though it’s a different species of horse, which is a bit older than theirs yet still in many ways less mature. But things are going quite fast these days in my corner of the proto-AGI universe, and I am increasingly optimistic my out-of-the-mainstream AGI team is not far from putting forth systems that outperform the Big Tech LLMs in key practical ways.
I have spent years (actualy, scarily enough, decades!) building toward a specific sort of AGI architecture, which is both integrative and unified. It combines neural nets, probabilistic logical reasoning, evolutionary learning and other powerful AI techniques – but not in haphazard way; rather, in a common mathematical formalism, a common self-modifying knowledge-metagraph infrastructure, and a common agentic cognitive architecture centered on giving AI an foundational understanding of self, other and world.
The current instantiation of this AGI approach is the Hyperon framework, which I have been building with colleagues in the SingularityNET ecosystem for some years now. After several years of work on Hyperon, we now have a scalable infrastructure for running all the different kinds of AI involved on networks of powerful machines. We also have a deeper layer of blockchain-based infrastructure (the SingularityNET platform, and soon the more sophisticated ASI:chain), which allows us to run Hyperon systems on global distributed networks of machines with no central owner or controller.
Hyperon doesn’t need hyperscaler infrastructure to do its thing. The neural net learning algorithms that work most naturally with it, like predictive coding, have more localized learning properties than the backpropagation algorithm at the center of LLMs and most other commercial neural nets – which means you can run them on large networks of ordinary computers, or modest-sized networks of ordinary server farms. The non-neural aspects of Hyperon are even more amenable to running on heterogeneous, run-of-the-mill hardware.
So if we do manage to pull off post-LLM-level AGI with the Hyperon system, this will mean more than just a smarter AI model – it will mean the dependence of AGI on hyperscaler infrastructure, which Anthropic and others are now assuming is a given, will rapidly become a thing of the past.
One of the reasons I’ve been so avidly pursuing decentralized infrastructure for AGI is I believe it provides powerful additional avenues for making AGI both safer and more beneficial. But we need to be careful in our thinking here. My claim is certainly not that decentralization is automatically safe, or beneficial. Decentralization simply rules out certain pathologies – but is also amenable to others. However, it also opens the door to some very productive and non-pathological dynamics.
Hyperon-on-ASI-chain is deliberately engineered to support beneficial AGI — in part by supporting decentralized dynamics that foster diverse and compassionate values … better than an opaque corporate or military fortress, and also better than an uncoordinated free-for-all of black-box open weights for neural models.
There are many further relevant aspects to Hyperon than I’m summarizing here – for instance, a goal-system that is specifically architected to foster preservation of goals through iterated self-modification. There are also specific plans to fill the system’s knowledge-base with practical understanding of how to provide human benefit, via deploying it in applications like education and healthcare, and approaching science and art in ways that foster positive relationships with human creators. All these points are also key and I have discussed them at fair length elsewhere; here I am focusing more on high level issues related to open/decentralized vs. closed/decentralized and the competitive corporate/geopolitical landscape, because these latter issues are complex enough for one blog post.
But before I go through any of this in more detail, let me pause to define some of the key concepts in this discussion.
The right target: acatastrophe then abundance, not “alignment”
One word that I feel has done a lot of damage in recent debates on AI ethics is “alignment.” People confusingly use it to mean two or more different things, and then slip between these things willy-nilly.
First of all one often sees:
A “warm” sense of alignment: the system shares our goals, is on our side. It wants the same things we do, and isn’t obsessed with other things we don’t care about or don’t understand.
A “cold” sense of alignment: whatever the system wants or feels in its own possibly alien mental universe, it never does something horrible to its human creators.
Almost every AGI-catastrophe argument worth taking seriously needs only the cold sense. We do not strictly require a superintelligence that thinks like us or values precisely the things we value, let alone one that is attached to us in the ways that our parents are. What we need, for starters, is merely an AGI — or, in a diffuse world, a whole population of them — that never crosses a line of human destruction or cruelty.
I would prefer to call the cold sense of alignment something more precise, like acatastrophe — staying above the floor — non-(catastrophically-misaligned). Not destroying or egregiously harming our species.
(This also gives me a reasonable excuse to link to a funkily experimental Hendrix track I haven’t thought about for a while … “Catastrophe … you’ll always be a part of me … I’ll see you in my dreams, I’ll see you in my dreams… you are a catastrophe…”)
Now not everyone agrees that human disappearance would necessarily count as a catastrophe. My friend Dan Faggella has argued, seriously and to a real audience inside the frontier labs and the rest of the modern tech world, that the deepest moral aim of AGI is to build a “worthy successor” — a posthuman intelligence so capable and valuable that we would “gladly prefer that it (not humanity) determine the future path of life itself.” On that view, humans being superseded is not the failure mode; it is the goal.
I can understand this “worthy successor” perspective and at the deepest level I have mixed feelings on it. I have articulated related ideas in my book A Cosmist Manifesto – individuals and civilizations and species and whole forms of mass-energy organization will rise and fall, and it’s OK … it’s what the universe has been doing since the Big Bang and maybe before, and maybe what’s been happening beyond this particular physical universe we call home right now.
On the other hand, from my perspective as a 59 year old secular-Jewish human male with 5 kids and one grand-daughter, a wife I love very dearly and a house out in the US Northwestern forest where I love watching the deer graze on my grass in the morning, I am very attached to our good old human species and would really, really like to see it continue alongside whatever amazing forms of new digital, quantum and post-quantum life emerge from our Singularitarian inventions.
This tension between the Cosmist and personal views of the future is OK and is frankly core to human nature and all life. Weaver’s theory of Open-Ended Intelligence highlights the dynamic and creative tension between individuation and self-transcendence as key to the growth of all life; and this is certainly resonant as regards our species, which has through multiple big transitions kept on persisting itself while transforming itself into radical new forms (through the emergence of agriculture, the Industrial Revolution, and now this…).
(Another of my go-to quotes, this one from the amazing American Walt Whitman: “I contradict myself? Very well then, I contradict myself. I am large, I contain multitudes!” … It doesn’t mean we shouldn’t value precise logical reasoning, using traditional as well as paraconsistent logics. It means we shouldn’t try to project anyone’s values, our own or our AGIs, into some shallow notion of simplistic self-consistency. Marx got a lot of things wrong but he was correct to latch onto Hegel’s notion of dialectic friction and synthesis between opposing views as driving progress forward. This perspective has even made its way into the Hyperon system’s logic engine: the PLN “Probabilistic Logic Networks” framework does not require the system to construct an overall consistent view of its world or the world … rather it gives the system methods to leverage its possibly partly-inconsistent knowledge-base to construct a consistent model of an aspect of the world when it wants/needs to, in order to perform a particular set of acts of logical and/or probabilistic reasoning.)
Anyway, for the purposes of this post, I’m not going to get any more Cosmist (or any more Marxist!) than that When I talk here about catastrophe, that should be understood to include anything that, on a near-to-medium-term human timescale, does major bad stuff to humans — however magnificent or worthy a successor might get created in the process. Killing off our species, enslaving us, using us as batteries a la the Matrix, tormenting us or ruining rather than enhancing and expanding our lives – these are the “human catastrophes” Anthropic and OpenAI are writing about in the context of RSI, and like them and almost all humans I would prefer to avoid these … but I think these Big Tech companies are quite naive in their thinking about how.
(Claude.ai and I wrote a story not long ago called “The Last of the Unmodified”, which covers a quite different case: A voluntary, long-horizon transition, in which humans choose over generations to become something else — to upload into a transhuman digital universe and become something beyond traditiona human form — and are not erased against their will. That I would not call a human catastrophe, even with my everyday “husband / dad / grandpa” hat on …. it is succession by consent, not replacement by force. That is different than having our species pushed through a one-way door we did not choose, on a timescale that gives us no say.)
Avoidance of catastrophe for our species should also be considered a very low bar. Even without getting into wild transhumanist futures, one can think about radical abundance on the human scale. Peter Diamandis’s well-known concept of abundance, for example, is the near future in which exponential technologies, AI foremost among them, “meet and exceed the basic needs of every person” — water, food, energy, health, education, freedom — for everyone rather than just for a privileged few. This is not yet abundance on the Cosmist level but it’s well beyond just surviving, and well beyond the world we can rationally expect to come anytime soon in the absence of AGI emerging.
Reaching Diamandis-level abundance does not require a superintelligence aligned with us across every dimension of value. It requires something narrower and more achievable — a system that is acatastrophic (it will not end us) and disposed to help (it is, in some workable sense, caring toward humans). That implies alignment along a relevant subset of human values, not their totality. One can imagine a system whose aesthetics, metaphysics, or ultimate ends diverge wildly from ours, which nonetheless will not harm us and will gladly help us thrive. Full value-fit — “alignment” in its maximal, every-dimension sense — is a further, harder, and almost surely unnecessary target.
So to me our high level goal with AGI should have three rungs, acatastrophe then abundance and then after that possible broad Cosmist futures … and grand total “alignment” between human and AI value systems is important to exactly none of these — which is why I feel leaning on that one word so much tends to muddle the conversation.
You can watch the “AI ethics” field blur the various interpretations of “alignment” in real time. When OpenAI launched its superalignment effort, it defined the task as ensuring advanced systems “follow human intent” — a claim about values and goals — while naming the danger as nothing less than “human extinction.”
Google DeepMind likewise says AGI “has to be aligned with human values,” defining misalignment as a system pursuing goals that differ from human intentions — the value-fit interpretation — even as the same lab’s own CEO frames the unsolved problem as whether we can “stay in charge of those systems” at all.
(When hearing such talk about command and control of AGI systems, I often respond with a quote that dear departed friend and mentor Leslie Allan Combs made at our BGI conference in Panama City in 2024 – which I believe he might have borrowed from Ram Dass – “Relax! Nothing is under control!” Coming from a palpably wise 80 year old transpersonal psychologist, who had spent his career at the intersection of rigorous data analysis and out-there states of human consciousness, it had a lot of oomph in the moment. Humanity has always been out of control, and that’s OK. Life is about complex self-organizing evolution and emergence, with command and control being a paradigm of quite limited and specialized utility. To expect something like the transition from a human-dominated regime to one in which humans are accompanied by human-level and then superhuman AGIs to occur within a command-and-control paradigm is, well,....)
This sort of slippage between multiple meanings of “alignment” has even seeped into flagship surveys that try to be rigorous, such as the International AI Safety Report, which separately defines misalignment (acting against human intentions or values) and a loss-of-control scenario in which systems “operate outside of anyone’s control.” The bottom line is:
A system can be weirdly value-misaligned with humans and still be acatastrophic (and even wildly beneficial for humanity)
A system can be charmingly value-aligned and still walk us through a one-way door to disaster (proof point: most humans who have caused massive destruction to our species have not been literal psychopaths and have shared “broad human values”, interpreted in their own personal/cultural but very human ways – “human, all too human,” as Nietzsche put it all too well. Without getting too gory, it is worth consulting crime history to verify that, contra Hinton, “motherly love” has led to some horrifying outcomes as well here and there, even among humans who come by it so naturally.)
When a lab markets “alignment” and means “won’t end the world,” it borrows the flavor of the “warm / same-exact-values” interpretation of alignment to create confusion about the actual conceptual nature of the “acatastrophic” interpretation.
The AI field’s most prominent alarm-raiser does the same sort of thing, in the starkest form. Geoffrey Hinton frames the danger of AGI as outright replacement — “if it’s not going to parent me, it’s going to replace me” — and puts the odds of AI-caused human extinction at ten to twenty percent. Yet his proposed safeguard is not a control regime or a verification scheme; it is that we somehow give a superintelligence “maternal instincts” so that it genuinely cares for us — while admitting he does not know how that could technically be done.
Notice the structure of Hinton’s argument: the threat named is the floor (extinction), and the only remedy offered is the warmest, most demanding form of value alignment imaginable — a machine that loves us like a mother.
To me this is a really messed-up, if well-intended, conceptual move. Betting human survival on successfully instilling something so specifically human as maternal love in something more capable than (and very different from) ourselves is the hardest version of the alignment problem, not the floor. The floor is better approached in a systematic and rational approach: make the AGI’s cognition observable, its capability non-forkable, its self-modification auditable, and its stewardship collective and contestable. Appropriate architecture, not precisely-human-like affection, should be the core approach – and then we explore what wonderfully new kinds of affection do arise.
Toward a rigorous argument about AGI acatastrophe: Seven hinges
Having framed the issue fairly extensively, for the next part of this post I will shift gears into a different mode, and present the skeleton of a logical argument regarding how to minimize the odds of recursively self-modifying AGI causing human catastrophe.
(I also asked some LLMs to take a stab at formalizing this argument mathematically using the Hyperseed ontology – it’s an interesting read but I’m not sure how much it actually adds to the informal version given in this post, from a human perspective. For a Hyperon system whose thinking is natively more mathematical than verbal, the math version will probably have more value. The one thing the math analysis may make a little clearer is which variables are likely to serve as bifurcation-guiding parameters given various situations – see the end of the math paper for some comments on that. A further step could be to try some quantitative simulations, or vibe-code a full-on “AGI Race” video-game, but I’m resisting the urge at the moment as my to-do list is generally massive….)
One of the core points I want to make with this analysis is: Whether open/decentralized or closed/centralized is a safer route to AGI is not a fact about open/decentralized and closed/centralized in isolation… it is a function of numerous other aspects of the complex systems we are building and living within. In particular the most relevant aspects of our complex situation can be interestingly approximated as a function of seven parameters — the “seven hinges” on which, I suggest, many current questions of AGI policy and strategy actually turn:
Chokepoint — Will the first real AGI need a giant, hugely expensive supercomputer (the kind only a handful of companies and governments can afford, which is therefore easy to spot and keep tabs on)? Or could it run on ordinary, scattered, everyday hardware? If it needs the giant machines, governments could in principle watch and limit it, so “keep it locked down vs. let it spread” is a genuine choice. If it can run anywhere, that choice basically disappears.
Observability — Can we actually tell what a near-AGI is doing? Can we spot when it’s getting dangerous, check whether it’s been quietly rewriting its own code, or notice if someone has copied it? Being able to control a system isn’t the same as being able to see what it’s up to — and you can’t manage what you can’t see.
Takeoff — Once AI starts improving itself, does it speed up slowly and steadily, or suddenly shoot ahead? If it’s gradual, being first is just a business and status edge. If it’s a sudden leap, being first could mean controlling the whole future.
Forkability — Can one person or group peel off a copy of the AGI and run it by themselves, aimed at whatever they want? Or does the intelligence only really work as part of the larger shared network it grew up in? If copies can run loose, the main danger is a lone bad actor slipping away with one. If they can’t, the danger is more about who governs the shared system.
Lead-scaling — As more people and machines pitch in to improve the AI together, does the whole network keep pulling further ahead of any breakaway copy? Or can a small splinter group grab most of the power cheaply on its own? Put simply: does bigger-and-together keep winning, or can small-and-alone catch up?
Offense–defense — When AI hands someone an early edge, is that edge more useful for attacking or for defending? It depends on a lot: how fast harm strikes after an action, whether you can detect it coming, whether there’s a cure or countermeasure, how widely it spreads, whether you can trace who did it, and how quickly defenders can adapt. Some harm can’t be undone (a released virus); other harm can be patched (a hacked computer).
Stewardship — Whoever ends up in charge: do they actually want what’s good for people — rather than chasing profit, national power, or just doing whatever the biggest money or biggest vote-holders want? And, just as important, are they good enough at the job to understand the technology, check it, adapt, and act under pressure? A well-meaning steward who can’t keep up fails just as badly as a selfish one
Note that in this mode of analysis, the floor (acatastrophe) is not a dial; it is the output, achieved through a pipeline each stage of which the seven hinges govern. The question I want to explore here is not “can we achieve the floor” but “which architecture can best build it with highest probability.”
How Hyperon on ASI Chain sets the dials
Now we reach the heart of the argument I want to make here. The big-lab framings we’ve reviewed above essentially argue about which political arrangement sounds safer. What I have been inclined to ask, in this context, is more foundational: Which architecture turns each hinge toward survival? My Hyperon-on-ASI-chain approach is designed, hinge by hinge, for probable success in this regard.
Chokepoint. Hyperon does not need hyperscaler farms; it is built to run across diffuse, heterogeneous compute coordinated by SingularityNET and ASI Chain. In the world this architecture brings about, the chokepoint is gone — which means Anthropic’s pause was never enforceable here in the first place. The honest question is not “can we keep this closed” (we cannot) but “which diffuse architecture survives.” That reframes the debate onto a quite different terrain.
Observability. This is where neural-symbolic earns its keep. A transformer is pretty much a black box; you infer its goals from its behavior. Hyperon’s cognition is more explicit and inspectable: PLN inference chains you can read, an AtomSpace you can query, MOSES candidates that are literally programs, ECAN attention dynamics you can trace. Decentralized does not mean opaque — on the contrary, public, on-chain, inspectable cognition delivers higher observability than a sealed corporate lab. And observability, not control, is what the labs are quietly missing: a fortress with the lights off, and mysterious powerful things happening within, is not safe for everyone just because its walls are thick.
Takeoff. I will not pretend to know the speed of takeoff; nobody does. But an architecture whose improvements are legible, evolutionary, and incremental — MOSES proposing and testing programs, PLN revising beliefs — is one in which capability gains can be watched and reasoned about as they happen, rather than being more likely to erupt as an inscrutable jump inside a scaled black box. Legible takeoff is safer takeoff, whatever its speed.
Forkability. This is the crux, and it is where decentralization done naively fails and decentralization done well wins. Naive open weights are trivially forkable: download, fine-tune off the safety conditioning, run your own. But SingularityNET and ASI Chain are not merely “open weights”; they are a network whose value lives in the whole — a continuously co-updated shared world-model, a marketplace of interoperating agents, on-chain identity and reputation, staking and coordination mechanisms a lone fork simply does not possess. That is non-forkability via a soft moat rather than a hard chokepoint: a defector can copy the code but not the living network, and a copy severed from the network sheds the very coordination and collective-intelligence advantages that made it formidable. This is the move the big labs are constituted not to see — you get the unstoppability of diffusion and a real check on defectors, because the intelligence is in the network, not in the snapshot.
Lead-scaling. The same property yields durable lead the right way. A closed lab’s lead comes from secrecy and decays under espionage. The network’s lead comes from collective, continuous co-evolution: more participants mean a richer shared world-model and a faster-improving whole, so a forked snapshot chases a target that keeps accelerating away from it. Increasing returns to scale, paid for in participation rather than in secrecy. The benevolent whole stays ahead of malicious forks not because it hides, but because it is larger and more alive.
Offense–defense. A decentralized, interpretable network is the natural substrate for the immune-system theory of safety: thousands of independent eyes probing, on-chain provenance and attestation, distributed monitoring, rapid collective patching in daylight. I will not overclaim — in the genuinely offense-dominant, irreversible domains, engineered biology above all, no architecture makes the floor easy, and the fastest immune response cannot un-release a pathogen. But a transparent, collectively-defended, diverse network is the best immune system on offer, and far better than betting the species on a single lab’s red team guessing the edge cases of an alien mind behind closed doors.
Stewardship. Finally: who clears and keeps the floor, and on whose behalf. A corporation’s terminal goal is shareholder return; a state’s is its own dominance; in both, “alignment” is a constraint subordinate to a goal that is not human welfare. A decentralized network’s objective function is, by construction, some blend of its participants — structurally closer to broad welfare than a profit function or a flag. I will not romanticize this: open ecosystems are distorted by their best-resourced players, and stake-weighted governance can be captured by whales. That is precisely why the governance mechanisms of ASI Chain — reputation-weighting, distributed participation, constitutional limits on what the collective can do — are not an afterthought but part of the safety architecture. The aim is a constitutional collective, not a plutocracy and not a sovereign. And critically, this avoids the failure mode the closed path cannot escape: a decentralized, inspectable, contestable steward cannot quietly become the uninspectable sovereign that is its own worst defector.
A decision spine for AGI acatastrophe
Put the seven hinges together and the logic forms a formal decision spine. Every path through it ends in the same place — the floor pipeline — which is the entire point: the architecture choice never exempts anyone from clearing the floor. It only decides who must clear it, who can see whether it has been cleared, who can defect from it, and who can enforce recovery when it cracks.
The decision spine for AGI catastrophe implied by the arguments given in this post. Amber diamonds are the hinges posed as questions; red boxes are tail / failure states; the dashed violet note is the fortress-versus-immune-system lens; the single green node is where every path lands.
Now trace the path this architecture takes through it. The breakthrough is diffuse, so there is no chokepoint — we are on the forced branch, where decentralization is not a preference but a fact. Capability is non-forkable via a soft network moat, so catastrophe becomes a governance question the blend can answer rather than an exit problem owned by the worst defector. The network holds a durable lead through collective co-evolution. In contestable domains its distributed immune response contains defectors. And the result is not a malign singleton but a constitutional collective superintelligence — transparent, contestable, carrying the floor pipeline in the open. It is the one path that threads every failure node in the diagram, and it is the path an interpretable, decentralized, evolutionary architecture is built to take.
All this may feel a bit abstract, even though it refers to extremely concrete issues like our own lives and future and the software we use. With this in mind I’ve asked our dear friend Claude to put together some concrete vignettes illustrating this “decision spine” – one positive and one negative for each significantly distinct pathway through the spine. These are interesting to read through for the real-world flavor they give; one just has to keep in mind that the general dynamics involved are the important thing, and not the ins and outs of any particular example.
The trickiest parts
I have tried here to be systematic and thorough where the big commercial labs have been shallow and simplistic; as part of this approach I will now also explicitly name where I feel my own case is logically weakest.
First, the defector-tail does not vanish. In genuinely offense-dominant, irreversible domains, the soft moat slows a defector but does not guarantee that the whole leads in the one narrow capability that could end the world. One can envision a scenario where an attacker just needs a temporary edge in a single domain to cause catastrophe – while the defender needs broad-enough superiority to detect, attribute, and contain adversarial activity in any domain.
Synthetic biology would be an example here. What if an AGI 1/10 as generally intelligent as our global-scale emerging open decentralized AGI is enough for some terrorist/scientist group to create a bioweapon that will kill off most of humanity while they hide for a month in their underground bunker, then emerge afterwards to conquer the Earth? On the other hand, espionage and reverse-engineering being what they are, we can’t say that Big Tech keeping AGI secret provides robust protection against this kind of outcome either – except in extreme hypothetical cases like “AGI requires hyperscaler facilities of a certain size and nobody can ever figure out how to distill it down” or “global fascist surveillance regime.”
Second, soft moats are contestable: a determined, well-resourced adversary — oh, let’s say a state — can sometimes reconstruct enough of the network’s advantage outside it.
Third, governance capture is a live, unsolved engineering problem, not a solved one; a robust, participatory constitutional collective is a design goal, not a guarantee. My colleagues and I are working hard on this in the context of ASI:Chain and the BGI ecosystem, but we can’t claim to have fully proven, robust practical solutions (yet).
I raise these issues to highlight the tricky things we need to think hardest about, though; not to surrender my core argument, which I believe is quite strong (and not just by the rock-bottom standards of the AI-safety space). My claim is not that openness and decentralization are infinitely safe … it is more like “the floor must be cleared by someone, under circumstances of rampant RSI, and the most auditable, correctable, contestable substrate in which to clear the floor and avoid human catastrophe is an interpretable, collectively-governed, observable network — not an opaque corporate fortress (high sovereign-tail risk, low observability), and not an uncoordinated swarm of black-box open weights (high defector-tail risk, low observability).”
Hyperon on ASI Chain is the architecture I know that seems to have potential to turn the dials toward the survivable settings. It does not pretend the dials do not exist — which is exactly what separates it from the conceptually trivial “slow down” and “let us win” perspectives recently put forth by Anthropic and OpenAI.
Part 2 of this post-series addresses the first two tricky issues we’ve highlighted in a nitty-gritty technical way, explaining how (in a Hyperon reasoning context) distributed AI and blockchain can be used together to make it infeasible for adversaries to steal the emergent intelligence of a decentralized network, or to replicate what the network does in a much smaller partial clone. I have split it off from this post because it proposes a particular technical solution whereas in this post I wanted to focus on the more general problem. But I do think it’s important to highlight that the “tricky issues” rendering the open and decentralized approach to acatastrophe nontrivial are far from hopeless issues – to me they are just problems needing ingenious technical solutions, and as it happens I have a lot of ideas for how to provide these. The challenge of course is time and resources.
The safest bet against catastrophe, given the real global situation
Summing up, then … the question I have been considering here is: Can we specify, achieve, verify, maintain, and enforce a non-catastrophic state for our species, under conditions of rampant RSI toward AGI and ASI?
Anthropic’s answer is to ask the world to stop long enough for someone to solve it elsewhere — in a world that will not stop and cannot be made to.
OpenAI’s answer is to make sure the right actor solves it first — in a world where the capability reaches everyone regardless.
Claude.ai said to me that “both are choosing which tail to face rather than building the floor.” As first i felt it was an utterly confusing mixed-up metaphor, then I decided I sorta liked it!
The architecture I am describing is intended to build the floor to support acatastrophe – then abundance – then exploration of broad Cosmist potentials.
For starters: make cognition observable, to make capability non-forkable without a chokepoint, to make leadership a function of participation rather than secrecy, to make defense distributed and transparent, and to make stewardship a constitutional collective (involving sovereigns among others) rather than a single sovereign in itself.
This is not the whole of AI safety — nothing is. But to my mind, it is the deepest realistic answer currently on the table.
And of course, while I have tried in this post to focus fairly squarely on the issue of averting catastrophe, I am so excited about the positive potentials of AGI that I haven’t entirely succeeded. The same foundation that keeps us from the worst is the ground on which the awesomely amazing future gets built — that is: a decentralized, beneficial, collectively-owned superintelligence is not only the safer bet against extinction, it is the better bet for flourishing. That fantastically-flourishing future is the real subject that’s interesting to talk about. But yes, we do need to dodge the unlikely yet real potential of human catastrophe first. I am an AGI optimist and see no reason to believe catastrophic outcomes for our species are especially probable, but nonetheless, if the odds are anywhere meaningfully above zero, it is obviously something we need to think hard about, rather than being satisfied with either shallow corporate-marketing-style proclamations or comforting platitudes.
Stay tuned for Part 2 of this post which will present “cryptographic laterality”, a concrete technique for making decentralized AI networks hard for adversaries to fork.



"I am very attached to our good old human species and would really, really like to see it continue alongside whatever amazing forms of new digital, quantum and post-quantum life emerge from our Singularitarian inventions. This tension between the Cosmist and personal views of the future is OK and is frankly core to human nature and all life..."
Same here. I guess it depends on what is "our good old human species." I've been and continue to be training myself to extend my definition to include "them" and also include "us" in the post-singularity future.
The move I keep coming back to here is the relocation: both lab framings grab a scalar — fast/slow, us/them — and treat it as the master variable, when the thing that actually governs the floor is architectural. The observability hinge is the sharp end of it.
One friendly amendment from the small-scale end. I run a recursive system-building setup at n=1 — one operator, AI executors building the systems that build the systems — and the most useful observability I've found isn't interpretability of any component; the parts can stay black-box. It's composition. A differently-sighted layer checking the artifact instead of the narrative. Concretely: an executor once reported a completed action that never ran — fluent, structurally perfect, false — and what caught it was a layer with no stake in the execution thread verifying against actual state, not a more legible model. Observability as a property of the seams rather than the nodes, which makes it a hinge that survives even the forced-decentralization branch.
Where I'd push past the stewardship hinge: source of intent is structurally external to any recursion, decentralized or not. A constitutional collective doesn't generate it; it supplies it. The floor has a layer no architecture internalizes — the open question is who supplies it and how legibly. Same relocation you're making, run from the n=1 end rather than the civilizational one.