This post is a sort of prelude to, and explanation of, a lengthy review paper I recently posted titled The General Theory of General Intelligence: A Pragmatic Patternist Perspective.
Ever since my first woefully-inadequate attempts to architect a thinking machine at age 16, my motivations for working on AGI have been multiple, e.g.
What a f**king amazingly cool thing to build!
I would like to understand how minds work, including but not only human minds. Quite often a really good way to understand something is to build it.
A more complex, intelligent, subtle, expansive universe would be a Wonderful Thing … and building minds transcending the human level would seem an amazing way to make this happen.
So many of the really horrible problems plaguing the human world — and perhaps even some of the core difficulties at the center of human nature and the core of human suffering — could be solved a beneficially-oriented superhuman mind
I’d also like to understand the universe as a whole, and there may be aspects going beyond human comprehension … so the only way to grok these may be to create superhuman AGIs and then fuse with them
The relative weightings I’ve assigned to these various motivations have shifted over the decades — early-on I was more strongly motivated by pure geek cool value and the quest to understand how my own mind works … as I’ve grown older I’ve become more heavily motivated by the desire to create AGIs that can alleviate human suffering and promote human joy.
But what I want to note right now is the interesting balance between conceptual understanding and practical engineering/teaching, that pervades these various motivations. Both theoretical and practical objectives have guided my pursuit of AGI, and I believe the same has been true of most practitioners of the field.
It is not actually clear at present how full a theoretical understanding of AGI we need to have in order to successfully engineer human-level AGIs. The history of technology gives us many conflicting analogies to drawn on. The Wright Brothers built and flew the first airplane without any solid theory of aerodynamics — based on intuition and experimentation. On the other hand, for the laser and the computer chip and every other invention to explicitly leverage microphysics, a significant body of formal theory was prerequisite for the inevitable practical tinkering.
Marcus Hutter’s beautiful (and in some respects beautifully ugly) theory of Universal AI , building on the earlier theory of Solomonoff induction, is the paradigm case of the theory-first approach. Hutter gives a mathematical theory of “what general intelligence is”, then proves that a certain sort of process called AIXI could theoretically achieve near-optimal general intelligence (in a certain precise sense) if given infinitely much computing power. Then he tries to prove results about how this ideal AIXI process could be approximated by useful real-world systems. Arthur Franz’s work is perhaps the most ambitious attempt to create actual AGI systems directly inspired by Hutter’s theory.
Cognitive architecture based AI is theory based in a different sense. It tries to take cognitive science theory —which itself aggregates theories from neuroscience, cognitive and perceptual psychology, philosophy of mind and other areas — and use this to explain how human-level AGIs should be structured. Trial and error experimentation on specific data store and algorithm decisions then occurs within a theoretical motivated cognitive architecture.
Modern deep neural net based AI is a bit more Wright Brothers like. One might not think so given the density of equations in deep learning books and papers, but the art of getting deep neural nets to work well in practice in real-world situations is remarkably tinkering/hacking-like. We don’t have any truly solid theories for why various sorts of neural nets work as well as they do on various sorts of problems. When funky new neural architectures like InfoGAN fail to live up to their initial promise when tried on more datasets and problem types, we don’t have a theory that can tell us whether this is due to e.g. flaws in the network architecture, or limitations of the backpropagation weight learning algorithm that cause it not to converge.
Throughout my career I’ve been pushing on the theory and practice sides simultaneously. In a long series of books and papers I’ve tried to flesh out a meaningful and useful theory of “What is a mind that we might build one?” At the same time I have led a series of projects focused on building proto-AGI systems and trying to teach them things (Webmind, Novamente, OpenCog, OpenCog Hyperon). The software engineering projects have been motivated by the theory, but have also involved a significant number of major intuitive “ad hoc” decisions.
Finally getting to the punchline though, during the last year I’ve been spending a bunch of evenings and weekends (with my days being largely occupied with running SingularityNET and playing with my 3 year old son) working on adding a few previously-missing pieces to my theoretical understanding of general intelligence … and thinking about how to use my general understanding of general intelligence to appropriately guide the OpenCog Hyperon project, an AGI project I’m centrally involved in now that involves heavily redesigning and rebuilding the OpenCog AGI software framework for greater scalability, usability and simplicity.
Which finally brings us to the paper I mentioned at the start — The General Theory of General Intelligence: A Pragmatic Patternist Perspective. Also, after putting this paper together I felt an urge to explain it informally and verbally, and the result was a ten-part video series (which you’ll find on the SingularityNET YouTube channel), walking through a slide deck based on (and largely excerpted from) the paper. Some episodes of the video series are fairly technical, others broadly accessible.
And so I imprecate the reader blessed with at least vaguely sufficient technical background — Read the intro to the paper or watch the video series. Proceed further if you dare!
As you will see if you work through the moderately intricate details there, I’ve been making a strong effort to use some fairly abstract and general-purpose AGI theory to draw specific conclusions about how to implement AGI functionality in current software frameworks (including but not limited to OpenCog Hyperon). So, trying to bring the theory and practice together in a more richly entwined way than what I see in most of the AGI field today.
It’s not a coincidence that some of the bridges I build between AGI conceptual theory and AGI software architecture involve constructs drawn from the theory of functional programming. Functional programming is one area of software practice where mathematical theory has deeply infused practical engineering, and engineering experiments have also led to theoretical insights.
Neither the theory nor practice of AGI is all that mature yet — human-level AGI may not be that far off in time, but there are surely multiple substantial insights and integrations needed to get there. However, I am more optimistic now than a couple years ago that the path to finalizing the quest for human-level AGI can proceed via an elegant and practically operational synergy of conceptual/mathematical theory with engineering and experimentation. Neither laser physics nor the Wright Brothers but somewhere squarely inbetween.