7 Comments

The idea that you can "prove" *anything* that isn't purely mathematical is simply wrong. The world isn't governed by axioms, unlike consructed mathematics. Try proving that an implemented NAND gate in deep submicron (say 5nm) technology will *always* work in the presence of cosmic rays! I note that already multi-core processors are often used in single core mode in safety-critical defence applications, because of difficulties in proving cache accessing correctness.

Not only that, but stopping research until provable AI exists simply won't work: there's thousands of researchers all over the world working on different aspects of AI, and some have ideas about novel techniques that don't need trailer parks full of computers to perform the training (quite apart from those working in countries who would completely ignore such a ban). It's the application of AI that needs regulation, but frankly I think it's impossible. Already AI (and not terribly good AI at that) is used for targetting weapons in wars, and I can't see that stopping anytime soon.

Expand full comment

In my opinion, the key technical challenge is the gap between modeling/understanding and predictability in an open, complex, adaptive system such as our world. The "complex" part (sensitivity to initial conditions), which we mathematically understand the best, is only the first instantiation of this; in an open system you also have sensitivity to *boundary* conditions throughout the system's lifetime (ie, arguably you need to know everything about the system's surrounding environment to provide any guarantees). And for an adaptive system, it's even worse: you need a model of the adaptation mechanism, which will let you predict its future configuration changes and other "decisions" (or even "creations") throughout the future, even as it keeps getting bombarded by arbitrarily novel signals from the environment! This is Stuart Kauffman's "adjacent possible".

Stuart argues (https://pubmed.ncbi.nlm.nih.gov/37065266/) that this entails that law-based modeling and prediction are simply not valid modes of thought in an open CAS. I counter-argue (https://www.sciencedirect.com/science/article/pii/S1571064523001847) that you *can* do modeling and prediction, if you have a meta-model of agents and modeling that allows for continuous contingency on the current context. The theory of Bayesian mechanics driven by the Free Energy Principle (https://royalsocietypublishing.org/doi/pdf/10.1098/rsfs.2022.0029), and related recent developments such as the theory of natural induction (https://www.biorxiv.org/content/10.1101/2024.02.28.582499v1.full), are concrete scaffolds for such higher-order models.

*However*, this doesn't rescue the "provable safety" idea as posed: no ab initio, context-independent proofs are possible in this setting, not even probabilistic ones. To rescue the idea would be to reframe it as continuously recalculating safety margins, reevaluating acceptable risks (including risks of model error) as contexts evolve, and emphasizing decision engineering (contingent robustness against uncertainty, including model error) as opposed to formal guarantees. This is a lot closer to how actual practitioners think about risk -- see, for instance, the extensive writings by Taleb (https://www.researchgate.net/publication/272305236_Silent_Risk_Lectures_on_Fat_Tails_AntiFragility_and_Asymmetric_Exposures).

BTW, I've been making this case in meetings and emails with davidad and Steve O for a few months now. Also BTW, I lead an effort, the Gaia Network (https://forum.effectivealtruism.org/posts/BaoA3gz7xRaqn764J/gaia-network-an-illustrated-primer), which is explicitly an alternative implementation of AI safety that acknowledges the above limitations and hence focuses on evidence-based robustness instead of formal proof, on context-aware, decentralized modeling vs ab initio "fundamental models", and on incremental, decentralized adoption instead of top-down control. We are developing and looking for contributors! If you want to learn more, I'm giving a talk at VAISU this Friday (https://vaisu.ai/) and an in-depth session on June 13 (https://lu.ma/qn8p4wp4).

Looking forward to chatting more!

Expand full comment

I think the best example is Tesla summon feature. The idea that you would have the car drive you to a train station in the morning, then as you near the station that night, could summon the car to come and get you as the train arrives. A danger of this tech, likely NOT covered by maths or Red Teams, is the very real chance of it quietly starting-up and backing-up over the family pet. In the real world, vs what the car can "see" when summoned, there is a lot of missing information that even a system designed to test the system would still not be aware of (and thus, even a pass would result in a fail in the real world).

Expand full comment

thanks for bringing this to light ben...

Expand full comment

An admittedly rather lazy comment, because I'm too tired to be more intelligent: "Recently Ugg generated fire. This is a frightening development! Fire burns. True, it can warm us. But, be real. Fire burns! It is bad. Ugg says we should make use of fire to warm us. Ugg is irresponsible. We should form a committee of our 20-year-old elders to meet in the cave to explore why fire is dangerous and should be stopped and why any responsible citizen of the cave should stop any further experimentation with this deadly existential threat to caveman life. At least, let's enforce a pause. Anyone who violates the pause should be hit to death with a really sharp rock. The wisest of us -- who know that any change is bad -- can then put in place means of stopping further foolish ideas like using fire. Someone also mentioned "iron". That must also be stopped."

Expand full comment

I agree that Provably Safe AGI is Potentially a Very Dangerous Concept and frankly, I think it is impossible.

"Advanced AI, including AI verging toward AGI with executive functions, has tremendous and obvious potential to both do good for the world, fulfill various of peoples’ wants instead of needs,"

Almost 20 years ago I was reading Eliezer's sl4 list.  In response to the concept of a friendly AI, I wrote "The Clinic Seed" where an extremely friendly medical AI combined with human wants and needs caused the human race to go biologically extinct.

I don't care how "Provably Safe" an AGI is, the combination of useful AIs and humans is between unpredictable and provably unsafe.

Keith

Expand full comment

It's hard to buck the trend to what modern AI became -- a trickster's art form. I expected it to have more (any) cognitive structure, eg., some kind of transparent or even partially transparent world model.

Expand full comment