Provably Safe AGI is Potentially a Very Dangerous Concept

May 21, 2024

Folks have been passing around to me the paper Toward Guaranteed Safe AI by Davidad and crew (where the crew is a long list including famous folks like Yoshua Bengio, John Tenenbaum, Max Tegmark, Stuart Russell and more … Steve Omohundro who recently had a related paper on provably safe AI with Tegmark (see his video in SingularityNET’s Beneficial General Intelligence series)…).

This is a somewhat subtle one for me to respond to, but I’ll give it a shot!

On the one hand: The technical work proposed in the paper would be a step in the direction of something I’ve been advocating for a long time … the fusion of symbolic and subsymbolic AI for:

intelligence: improved intelligence beyond what can likely be done with too-purely subsymbolic approaches
ethics: improved ethics due to the ability to explicitly articulate ethical guidelines, and explicitly associate beliefs and actions with ethical principles
transparency: the greater ease of achieving transparency with symbolic as opposed to subsymbolic representations.
efficiency: because modern CPUs are rather well suited for symbolics.

On the other hand, there are few slightly frustrating things about this generally nice article:

Like many recent works veering toward AGI, it presents some fairly old ideas in moderately new language and gives the impression of significantly greater novelty than is actually there
It downplays the rather extreme practical difficulty of enacting the programme outlined in any context coming anywhere close to AGI
It seems designed with a not that well hidden “ulterior motive” to advocate for heavy AI regulation that restricts AGI R&D to tightly corporate and government controlled labs until “provably safe AGI” technology matures

To explicate how these points tie together: Basically, the authors propose that AIs should come with formal, mathematically demonstrated safety guarantees (based on assumed premises generally considered reasonable) … but while they don’t emphasize this, at present there is clearly no feasible way to provide such guarantees except for quite narrow AI systems (and even in these cases it’s very hard). So then, is one supposed to conclude from their proposal that we should halt progress on AGI R&D until we have figured out how to give early-stage AGI systems formal safety guarantees in practice? The authors don’t quite say this, but the paper seems ideally configured to be interpreted this way … and it’s not hard to find this theme made more explicit in other recent works by many of the same authors.

It is in this sense that I see provably safe AI as a potentially quite dangerous idea. Yes, research on provably safe AI is going to be valuable and important — even though we will never really get there to a full extent. However, holding out provably safe AI as a prerequisite for ambitious AGI R&D seems to have the likely practical consequence of leading to tight control of all AGI R&D occurring outside a few sanctioned corporate and government labs.

Most powerful things with great benefit also have great dangers associated — so when I say provably safe AI is potentially a dangerous idea I don’t mean it’s bad. I just mean we need to understand the risks of focusing on this interesting and important concept too extensively in the current sensitive AI/political context.

Technical Challenges with Provably Safe AI

The basic idea of the Toward Guaranteed Safe AI paper, explicitly and proximally, is that an AI system should be described in terms of a formal (mathematical/logical) world-model, plus a formal spec of what the system is trying to do in the world, and a spec of how the system is architected. The spec of desired system behavior should include ethical as well as other functional aspects. There should then be a formal mathematical proof that the system will in fact fulfill its specs, assuming the world model is correct.

This approach can’t ever give ultimate fundamental guarantees of safety, of course, because even if one places absolute faith in mathematics, there are no ultimate guarantees that the world-models provided will be correct. In fact we know we don’t understand the universe fully on a scientific level, e.g. we don’t have a thorough Grand Unified Physics Theory … and we also know from basic philosophical considerations that all our world-knowledge is fundamentally incomplete and uncertain. A superintelligence might be able to easily poke holes in our attempts to formally describe various aspects of our world, and thus identify important reasons why our formal proofs of safety don’t apply in real life, only in our simplistically constructed formal simulacra of reality.

In spite of these fundamental limitations, I think the technical approach outlined is a valuable one. Yes, we should try our best to formalize our world models and AI system goals and architectures and see what we and AI theorem-provers can learn from this! Yes, this approach has potential to have some genuine value for AI ethics.

I recall seeing this sort of approach proposed for handling lethal autonomous weaponse systems, in the early aughts. I don’t have time to dig up all the references now, but I’ll point out this 2007 paper Machine Ethics: Creating an Ethical Intelligent Agent, which covers the same basic bases as the recent paper by davidad et al, though with closer attention to different aspects. Similarly to the davidad paper, this 2007 paper by Michael and Susan Leigh Anderson advocates to start with highly focused narrow AI systems, e.g. systems offering medical advice. (There have been numerous papers and commentaries on similar themes in the formal reasoning community over the years, someone with more free time than me can dig up the refs…)

The major technical difficulties involved in applying this sort of methodology to anything verging remotely toward AGI are, basically:

Lack of decent formalizations of important aspects of our everyday world (before one even gets to the lack of a coherent formalization of all of modern physics)
Current automated theorem proving technology doesn’t deal well with tasks possessing the sort of complexity one finds in problems like “prove this complex dynamical system will fulfill this spec in the context of this complex world as approximately described by this model.“
The currently most powerful AI techniques for many practical problems — such as LLMs and other DNNs — are purely subsymbolic and nobody knows how to explain what they do in formal, symbolic terms. I don’t think these techniques are well suited to serve as the central component of an architecture achieving true human-level AGI, but I do suspect they are well suited to serve as significant components of human-level AGI systems with more flexible and abstraction-friendly mechanisms at the core.

The authors of Toward Guaranteed Safe AI get all these issues, and they get that they’re difficult to solve. I think they downplay the difficulty a bit, but they do acknowledge it. They suggest that more research should be put into these topics: formalizing the world, making more capable automated theorem-provers, creating formal characterizations of what DNNs do. All these are important research areas and I totally agree they should get more attention, e.g. as compared to the currently vastly larger literature exploring minor tweaks on LLMs.

Toward Corporate Controlled AGI

To understand the not-so-hidden agenda obviously serving as part of the motive for Toward Guaranteed Safe AI — at least on the part of some of the many co-authors — though, let’s turn to another recent paper, with an also long and overlapping author list: Managing AI Risks in an Era of Rapid Progress.

The crux of this political tract is that, yes, we should direct a much higher percentage of AI research effort to safety-related work including work on provably safe AI … and that government should ALSO tightly regulate any AI work that has serious potential to lead to advancements in the direction of AGI.

Overfitting to the specific nature of recent progress with LLMs, the Managing AI Risks paper suggests it may be good enough to tightly regulate the training of AI models on huge compute facilities. This would likely have the effect of regulatory capture of the AI world by a few large companies, working with government agencies even more closely than they already do, making sure nobody else is allowed to train large AI models due to perceived safety risks. (Personally I suspect this would work about as well as restriction of biology research to “safe” government labs, which of course is what brought us COVID-19 courtesy of US/Chinese government collaboration. Hmm, what happens when the NSF and China decide to collaborate on “AGI gain of function” research for all of our safety? But I digress….)

If more symbolic or otherwise abstraction-oriented approaches to AGI start to gain more practical traction — which is what I think is going to happen over the next few years — then regulating AI training on big compute facilities wouldn’t be enough, of course, and to fulfill the mandate of the paper it would become necessary to get significantly more fascistic across numerous aspects of society.

We are verging here toward the darker and more autocratic aspects of Nick Bostrom’s vision in his 2012 book Superintelligence, where he hints that the only safe way to proceed may be to have a very small number of elite AGI researchers working secretly on superintelligence in government sanctioned and protected labs. The difference is that we now have a more corporate flavor of the vision: These government sanctioned and protected labs are now to be in a handful of large corporations with multibillion dollar compute facilities. (See my lengthy retort to Bostrom’s book, from way back when, here.)

Bengio’s blog post, AI Scientists: Safe and Useful AI?, gets more explicit than the Managing AI Risks paper, suggesting

… a policy banning powerful autonomous AI systems that can act in the world (“executives” or “experimentalists” rather than “pure scientists”) unless proven safe. Another option, discussed below is to use the AI Scientist to make other AI systems safe, by predicting the probability of harm that could result from an action. However, such solutions would still leave open the political problem of coordinating people, organizations and countries to stick to such guidelines for safe and useful AI. The good news is that current efforts to introduce AI regulation (such as the proposed bills in Canada and the EU, but see action in the US as well) are steps in the right direction.

I should add that I know Yoshua Bengio a little — he spoke at one of the annual AGI conferences I organize some years ago — and I consider him a good-hearted and deep-thinking individual as well as an outstanding researcher. (And many of the other co-authors of these problematic screeds I’m reviewing here are also fantastic people whom I have no doubt are speaking in substantial part out of genuine concerns.) Anyway, though, this post from Bengio has a few issues.

Bengio’s idea of a “pure AI scientist” is very similar to what Eliezer Yudkowsky and others in the SIAI/MIRI world (whose perspective I have discussed extensively in a 2010 blog post and a later interview with then-MIRI director Luke Muehlhauser: part 1, part 2) used to refer to as an “Oracle AI”. Their analyses of this scenario a decade or two ago clearly made the point that a superintelligence acting purely as an oracle could pretty straightforwardly manipulate human society toward its ends. There are informal but reasonable narratives leading to the conclusion that in some senses oracle AIs might be safer for humanity than executive AIs, but it’s not a knock-down argument as some might initially believe. Heavily regulating executive AIs and permitting oracle AIs is not a simple solution and seems likely to quickly verge toward heavily regulating oracle AIs as well — once it’s noted that the answers oracle AIs give have definitive impacts on human actions.

(Bengio’s suggestion in his post to resolve these issues via “epistemic humility” — AIs that don’t give answers they think will lead to damaging results, and have enough self-knowledge to realize when they can genuinely say this with confidence — is clearly just a vague gesture and doesn’t work around the core issues. Whether one’s AIs are working with first or higher order probability distributions, the same core issues of the relationship between oracle superintelligences and relatively ignorant humans exist.)

Bengio’s evident faith in governments to effectively coordinate AI developers and product owners to adhere to agreed reasonable safety guideline is touching but rather unrealistic, in my view. This is not a situation like nuclear weapons or genetically engineered pathogens, where one is dealing with technologies whose primary purpose is evil. Advanced AI, including AI verging toward AGI with executive functions, has tremendous and obvious potential to both do good for the world, fulfill various of peoples’ wants as well as needs, and make companies of all sizes lots of money. Heavily regulating this sort of thing — when it’s evolving super-rapidly month by month — is a quite different kettle of fish than our governments have ever dealt with.

But now we’re in the general domain of AI ethics, AI regulation, and so forth. I have argued extensively that the most reasonable path toward a beneficial future is not heavy-handed control of AI by megacorporations working closely with major governments, but rather democratic and decentralized guidance of the process of creating, deploying and teaching increasingly advanced AI and AGI systems. See my book The Consciousness Explosion for an in-depth presentation of this perspective — or countless talks I’ve given, available on SingularityNET’s YouTube channel.

My main point here is that discussions of “Provably Safe AI”, in the current AI/AGI political landscape, seem to often mix genuine interest in provably safe AI with use of the concept as a lead-in to arguments for heavy AI regulation and corporate control. The (sometimes explicit, sometimes unspoken) vibe seems to be: “The only allowable AI should be provably safe AI, so even though that’s extremely difficult and maybe not feasible, until we have it we should restrict anyone but big companies working closely with government agencies from moving toward AGI…. Then in the far-off future when we have provable safety solved, others can work on AGI too….”

A Disturbing Convergence

One disturbing convergence we see here is an increasing alignment between the following three groups:

AI researchers who are worried about AGI progressing in a bad direction, so until we have some sort of provably safe AI they want to keep AGI R&D as locked-down as possible
Megacorporations who want to keep AGI R&D in their own labs to avoid business competition from others who may have better ideas, and/or others who may roll out AGI systems that are less exploitative and gain more favor because of this
Governments who want to keep AGI R&D for themselves and away from their perceived enemies

The first group, even when coming from a sincere place, seems to largely be serving at the moment to advance the agendas of the latter two groups — agendas that I strongly feel are not in the best interest of our species or its hopeful transhuman mind children.

My favorite quote from the BGI-24 Beneficial General Intelligence conference my colleagues and I held in Panama city a few months ago was from my chaos-psychology mentor Leslie Allan Combs:

“Don’t worry, nothing is under control!”

Please let us keep it that way.

To phrase Allan’s point more elaborately and boringly: What we need is mostly creative co-evolution not hierarchical control. Hierarchical control has its role to play but it shouldn’t be the dominant factor in most contexts, and definitely not the context of co-creating new species of minds It’s not that control is always bad, rather that the overall paradigm guiding radical beneficial progress cannot be that — at the top level, it’s not control but creative evolution that got our species where it is today, and will get us through the Singularity to a next even more amazing phase.

If our current scientific understanding of the universe is remotely correct, we can never have fully provably safe AGI with power at the human level or beyond. Working to create AGIs about which we can prove various theorems establishing various important properties under various assumptions, seems a quite useful research direction — and one with a long history before most of the authors of these recent papers ever started thinking about such issues. The infeasible notion of a fully provably safe AI should not be used as an excuse or justification to put into place hierarchical corporate or government control over AGI development until the fantastic future when fully provably safe AGI is achieved.

Democratic, decentralized, open AGI — free of corporate control and overly heavy government regulation — is the best way to work toward advanced AGI systems that can help us figure out how to solve the hard problems involved with formally proving interesting properties of advanced synthetic minds.

Leslie Smith

The idea that you can "prove" *anything* that isn't purely mathematical is simply wrong. The world isn't governed by axioms, unlike consructed mathematics. Try proving that an implemented NAND gate in deep submicron (say 5nm) technology will *always* work in the presence of cosmic rays! I note that already multi-core processors are often used in single core mode in safety-critical defence applications, because of difficulties in proving cache accessing correctness.

Not only that, but stopping research until provable AI exists simply won't work: there's thousands of researchers all over the world working on different aspects of AI, and some have ideas about novel techniques that don't need trailer parks full of computers to perform the training (quite apart from those working in countries who would completely ignore such a ban). It's the application of AI that needs regulation, but frankly I think it's impossible. Already AI (and not terribly good AI at that) is used for targetting weapons in wars, and I can't see that stopping anytime soon.

Expand full comment

Rafael Kaufmann

May 22, 2024Edited

In my opinion, the key technical challenge is the gap between modeling/understanding and predictability in an open, complex, adaptive system such as our world. The "complex" part (sensitivity to initial conditions), which we mathematically understand the best, is only the first instantiation of this; in an open system you also have sensitivity to *boundary* conditions throughout the system's lifetime (ie, arguably you need to know everything about the system's surrounding environment to provide any guarantees). And for an adaptive system, it's even worse: you need a model of the adaptation mechanism, which will let you predict its future configuration changes and other "decisions" (or even "creations") throughout the future, even as it keeps getting bombarded by arbitrarily novel signals from the environment! This is Stuart Kauffman's "adjacent possible".

Stuart argues (https://pubmed.ncbi.nlm.nih.gov/37065266/) that this entails that law-based modeling and prediction are simply not valid modes of thought in an open CAS. I counter-argue (https://www.sciencedirect.com/science/article/pii/S1571064523001847) that you *can* do modeling and prediction, if you have a meta-model of agents and modeling that allows for continuous contingency on the current context. The theory of Bayesian mechanics driven by the Free Energy Principle (https://royalsocietypublishing.org/doi/pdf/10.1098/rsfs.2022.0029), and related recent developments such as the theory of natural induction (https://www.biorxiv.org/content/10.1101/2024.02.28.582499v1.full), are concrete scaffolds for such higher-order models.

*However*, this doesn't rescue the "provable safety" idea as posed: no ab initio, context-independent proofs are possible in this setting, not even probabilistic ones. To rescue the idea would be to reframe it as continuously recalculating safety margins, reevaluating acceptable risks (including risks of model error) as contexts evolve, and emphasizing decision engineering (contingent robustness against uncertainty, including model error) as opposed to formal guarantees. This is a lot closer to how actual practitioners think about risk -- see, for instance, the extensive writings by Taleb (https://www.researchgate.net/publication/272305236_Silent_Risk_Lectures_on_Fat_Tails_AntiFragility_and_Asymmetric_Exposures).

BTW, I've been making this case in meetings and emails with davidad and Steve O for a few months now. Also BTW, I lead an effort, the Gaia Network (https://forum.effectivealtruism.org/posts/BaoA3gz7xRaqn764J/gaia-network-an-illustrated-primer), which is explicitly an alternative implementation of AI safety that acknowledges the above limitations and hence focuses on evidence-based robustness instead of formal proof, on context-aware, decentralized modeling vs ab initio "fundamental models", and on incremental, decentralized adoption instead of top-down control. We are developing and looking for contributors! If you want to learn more, I'm giving a talk at VAISU this Friday (https://vaisu.ai/) and an in-depth session on June 13 (https://lu.ma/qn8p4wp4).

Looking forward to chatting more!

5 more comments...

Eurykosmotron

Discussion about this post