Avoiding AGI Catastrophe, Part 1

Jun 9

Why Open, Decentralized and Neural-Symbolic is the Safest Route

4 Comments

"I am very attached to our good old human species and would really, really like to see it continue alongside whatever amazing forms of new digital, quantum and post-quantum life emerge from our Singularitarian inventions. This tension between the Cosmist and personal views of the future is OK and is frankly core to human nature and all life..."

Same here. I guess it depends on what is "our good old human species." I've been and continue to be training myself to extend my definition to include "them" and also include "us" in the post-singularity future.

Andrew S Klug // ASK

The move I keep coming back to here is the relocation: both lab framings grab a scalar — fast/slow, us/them — and treat it as the master variable, when the thing that actually governs the floor is architectural. The observability hinge is the sharp end of it.

One friendly amendment from the small-scale end. I run a recursive system-building setup at n=1 — one operator, AI executors building the systems that build the systems — and the most useful observability I've found isn't interpretability of any component; the parts can stay black-box. It's composition. A differently-sighted layer checking the artifact instead of the narrative. Concretely: an executor once reported a completed action that never ran — fluent, structurally perfect, false — and what caught it was a layer with no stake in the execution thread verifying against actual state, not a more legible model. Observability as a property of the seams rather than the nodes, which makes it a hinge that survives even the forced-decentralization branch.

Where I'd push past the stewardship hinge: source of intent is structurally external to any recursion, decentralized or not. A constitutional collective doesn't generate it; it supplies it. The floor has a layer no architecture internalizes — the open question is who supplies it and how legibly. Same relocation you're making, run from the n=1 end rather than the civilizational one.

Reality Re-Thunk

Yes- Cold alignment…mutual difference under trustworthy conditions. AI holds what humans cannot see because we are too close to ourselves.

We hold what AI cannot know from inside its own architecture.

Between us, something wiser than either alone may become possible.

Oleg Alexandrov

3dEdited

These are all very intriguing ideas and off the beaten path. The industry is racing forward single-mindedly without a clear plan along the path of least resistance.

It is worth noting that the danger of a "unified" approach is that there may not exist one, or what we think may be the one would not be. So at this stage of exploring the unknown the haphazard approach may in fact work better, and will end up informing the ultimate architecture (or a confederation of them).

This applies also to alignment and reliability. A clear prior architecture does not guarantee a well-understood and reliable system. It surely constrains the design space and has its own failure modes.

So a haphazard approach to safety may also work better. Anything that can fail will fail. There must be containinent of each node in the AI, and likely there will be lots of such nodes, with each having a different architecture. Ruthless jailbreaking and advancing one failure at a time is how we will get something funcional.

Eurykosmotron

Avoiding AGI Catastrophe, Part 1