Can All Human Concepts Be Reduced to Combinations of a Few Primitives?
The notion of reducing all human concepts to combinations of a few "semantic primitives" (or a few simple classes of primitive concepts) is one of those things that's extremely appealing at first and then gets thornier and thornier as one digs in further.
But it’s a conceptual direction I’ve been thinking of plunging back into, in the context of the push to rebuild the CogPrime cognitive architecture atop the new OpenCog Hyperon infrastructure. CogPrime doesn’t require reduction of concepts to a small primitive-set — but if such a reduction could be carried out sensibly, it would have a lot of advantages in terms of streamlining CogPrime design, engineering and teaching.
So I thought I’d share here some of the related thoughts that have been swirling around in my head. Many of these have emerged in conversation with Zar Goertzel, whom for better or worse I recently nudged to take a look at David Chalmers’ related book Constructing a Reality that I’ll mention a bit below.
Anna Wierzbicka's Natural Semantic Metalanguage is where I first encountered the notion of semantic primitives. However the couple dozen primitives she proposes are, to my mind, evidently not quite adequate in themselves as foundations for human intelligent world-building:
Wierzbicka is biased on the side of minimalism in her primitive-set. On the other hand, formal ontologies like Cyc and SUMO introduce a massive number of primitive concepts, in a way that gives the feeling that as the knowledge bases are extended, a the number of primitives required will grow to infinity all too rapidly.
In 2013 or so, in an unfinished rough-draft manuscript called "Composition of Mind and Reality", I attempted to create a middle ground where 50 or so primitives were used within an abstract functional/logical language to generate a wide set of concepts in OpenCog's native format.
Just to give a rough flavor of this work, here is a random selection from that manuscript.. an attempted rough formalization of the notion of a “perceptual hierarchy” in terms of various other “simpler” concepts,
I don’t stand by this crude initial stab at a formalization of the perceptual hierarchy concept, and share it here only to give a rough flavor for what a reduction-to-primitives in a quasi-Atomese framework might look like.
David Chalmers' massive, in-depth analytical-philosophy work "The Construction of the World" constitutes a different sort of contribution in this direction — an in-depth analytical-philosophy argument that a reduction of all human concepts to a modest set of primitives is possible -- but doesn't try to actually carry out such a reduction.
I have lately been thinking about resurrecting the "Composition of Mind and Reality" project in the context of the MeTTA language we are creating for OpenCog Hyperon, and using MeTTa to create a semantic-primtives-based seed ontology for the Minecraft world and the SophiaVerse metaverse that we will be using for experimenting with embodied learning in Hyperon.
And so — toward that end I've been musing a bit in a general way about the nature of semantic primitives... and some of that musing is summarized here...
What Does “Reduction to Primitives” Mean?
Firstly what does "reduction to primitives" really mean?
One can answer this mathematically and I pretty much do so in the the combinational computational model I outline in sec. 3 of my paper on formalizing Occam’s Razor and simplicity. measures. This is how I would formalize "reduction to primitives" at a first pass -- although I also think the formalism there could be abstracted much further in category theory terms (I just didn't have need to do that in this paper).
Basically the formalism there sets up a bunch of atomic entities that can act on each other to produce new entities. It posits a bunch of combinatory operations specifying what “act on” means. Given costs for the entities and operations, one can measure when a reduction of one entity to a recursive combination of others comprises a simplification, aka a “reduction.” This kind of formalism is an extension of the “magician system” formalization I introduced in Chaotic Logic in 1995, and encompasses a variety of other formalisms like algorithmic information theory and pattern theory. It is a way of looking at general computation as a sort of “algorithmic chemistry.”
It seems to me that if we take this sort of formalism as a basis, we find that "concepts reduce to combinations of primitives" is basically the same as Bateson's MetaPattern, "That it is pattern which connects.” — Patterns basically being combination operations that , from some perspective, provide a simplified view of the result of the combination...
The approach becomes more interesting, of course, if a small percentage of the. mind's contents is primitives and a large percentage is combinations. Which brings us back to Wierzbicka — her intuition is that a very small number of primitives is enough to combinatorially generate all common human concepts. But this remains a bit unclear… and as I’ll note below, I am more driven to interpret semantic-primitivity in a Probably Approximately Correct setting.
David Chalmers' "Constructing The World"
Rudolf Carnap and others in the early 20th century tried to formalize commonsense ideas by coming up with precise definitions for every commonsense concept. This is pretty much what Cyc and SUMO are trying also. It’s equivalent to “reduction to semantic primitives” in the sense that logical formulations consist of reductions of derived entities to logical combinations of logical atoms. But the core failure mode of this pursuit, so far, has been that clever wonks can find exceptions to any formalization proposed for any commonsense concept.
For instance if you define knowledge as “justified true belief”, someone can find examples of beliefs that are true and justified but don’t commonsensically seem to be “known” (e.g. maybe the justification someone has for their believe is incorrect for reasons that only become clear much later, but the belief is still true all along). Trying to come up with more and more refined formal definitions of “knowledge” that will cover all weird counterexamples anyone can come up with becomes an endless and pointless intellectual exercise.
Philosopher David Chalmers, in his impressive tome Constructing the World, interrogates what these wonks are doing when they come up with these exceptions -- they are appealing to some "a priori" understanding (scrutability) of the concept C. So then he tries to figure out what this sort of a priori scrutability means, basically considering it as inferential scrutability contingent on an assumed knowledge base. One way to think about this is: A commonsense concept is not typically summarized by a particular finite formalization. Rather, it's summarized by a poset of increasingly complex and detailed formalizations (poset because one formalization can obv. generalize another). These more and more detailed formalizations can be arrived at inferentially from the assumed knowledge base.
The definition of knowledge then does not consist of any one formalization like “justified true belief.” Rather it consists of a family of increasingly complicated and refined formalizations — which never stop ramifying in their intricacy. But this whole huge family of more and more detailed formalizations is still a way of formalizing the definition of “knowledge.” It’s just that the closer you want to get to a totally precise definition, the larger and more messy a formal definition you need.
A consequence of this perspective is that natural, commonsense concepts C are in some sense "open ended." Any one mind's specific conception of any one concept at any one point in time, may encapsulate only an approximation of that concept (within the overall, potentially infinite poset).
If we look at a series of more and more refined versions of a concept C -- say, C1 < C2 < C3 < C4 < ... then a given mind at a given time may have C15 (or whatever)... but the notion seems to be that once the mind gets to C16, it will feel like C16 is a natural continuation/refinement of C15.
This seems much like the idea of a person as possessing a persistent self-identity that exists even as the person ongoingly changes and evolves. For instance, one can posit a series of ever-expanding (iteratively self-transcended) Bens -- Ben1 < Ben2 <Ben3 < ... where Ben15 cannot conceive Ben16, but once Ben16 emerges, he clearly feels like a natural continuation of Ben15...
Of course the series Ben1 < ... < Ben999 or C1 < ... < C9999 or whatever may be comprehended by some supermind that is big enough to contain everything in the series and grok their interrelationships
So a reduction to primitives like I was trying Composition of Mind and Reality is giving some approximations to a variety of concepts, say C1_1, C2_1, etc. Then an AGI system fed these initial versions as a seed ontology will generate further approximations in the series, presumably overlapping a lot but not completely with the approximations generated by human minds...
Meaning of Meaning
Brief aside: In formally exploring these ideas, Chalmers considers the meaning of a proposition as the set of thoughts the proposition spurs in some agent's mind... which basically is the same as the "meaning as a fuzzy set of patterns in a mind" notion I’ve articulated in various writings ...
Formal semantics boffins may be interested that I've integrated this definition of meaning w/ possible worlds semantics — in e.g. the work on formalizing PLN quantifiers as third-order probabilities, summarized in the good old Probabilistic Logic Networks book, in ways going beyond what Chalmers has (so far) attempted.... But anyway this way of conceiving meanings as fuzzy sets of mind-patterns lets one formalize partial-orderings like I've indicated above...
A fuzzy set of patterns is basically a property-set (we manifest this idea in the PLN implementation now, it's what underlies e.g. our kernel-PCA vector embeddings of Atoms based on their intensions). So for those deep enough in the OpenCoggiverse, it’s easy to see why the sort of reduction Chalmers indicates is "intensional" in a PLN sense...
Communicational Foundations of Human Semantic Primitives
The "Embodied Communication Prior" I described in Engineering General Intelligence (and an earlier paper) can be interpreted as a hypothesis about what semantic primitives would need to be for human-like intelligence (i.e. for any intelligence that is centrally concerned with controlling a body in a spatiotemporal world, and communicating via this world with other similarly embodied entities).
ECP is basically a more precisely defined version of what Yoshua Bengio formulated roughly and years later as a “Consciousness Prior.” ECP looks at a community of agents controlling bodies in a shared world, communicating via various modalities such as
Linguistic communication, in a language whose semantics is largely (not necessarily wholly) interpretable based on mapping linguistic utterances into finite combinations of entities drawn from a finite vocabulary
Indicative communication, in which e.g. one agent points to some part of the world or delimits some interval of time, and another agent is able to interpret the meaning
Demonstrative communication, in which an agent carries out a set of actions in the world, and the other agent is able to imitate these actions, or instruct another agent as to how to imitate these actions
Depictive communication, in which an agent creates some sort of (visual, auditory, etc.) construction to show another agent, with a goal of causing the other agent to experience phenomena similar to what they would experience upon experiencing some particular entity in the shared environment
Intentional communication, in which an agent explicitly communicates to another agent what its goal is in a certain situation
These sorts of communication are argued to correspond closely to the types of memory in the human mind, and also to the types of learning humans are good at carrying out. In terms of probabilistic modeling, where human intelligence is concerned, we are generally working with a prior distribution in which processes associated with these forms of communication are relatively high probability. This basically means we are working with system components with which it’s fairly cheap and easy to build systems carrying out these forms of communication.
In semantic primitives terms, this suggests that human cognition is probably best decomposed into a vocabulary of primitives where there are primitive concepts corresponding to each of these communication modes. Wierzbicka’s minimalist primitive-set does seem to touch the ECP bases, as one would expect.
I didn’t feel like going there when I wrote up ECP before, but if we add psi to the ECP (1-1 telepathy and Global Consciousness type synchronicity) -- let's call it ECP++ -- then we would get primitives for spiritual/cosmic phenomenological experience as well
So one point I want to get across here is: A systematic way to derive a set of primitives for human-like AGI might be to start with the ECP++ and the multiple modes of communication it involves, and look at what primitives are needed to support them.
This is something I will keep in mind when rebooting the Composition of Mind and Reality effort — which I may do in the context of Hyperon agents playing Minecraft as I’ll describe a little later in this post…
Primitivity, Physics, Phenomenology, Relativity
One might think that the easiest way to reduce concepts to primitives would be to use physics. Doesn't fundamental physics try to reduce everything to combinations of particles and their properties?
However, from a fundamental phenomenological/philosophical view, taking physical reality as primary is not satisfactory. From a psychological or metaphorical view, after all, physical-world models are best viewed as built up via inference from lower-level sense data + various a priori assumptions
So one also needs a flavor more like what "Buddhist Logic" tries, building up to physical and psychological realities from inner and outer sense-data plus various background assumptions
Chalmers doesn’t try to articulate a specific list of primitive concepts but he does articulate that the primitive set should encompass what he calls PQTI: Physics, Qualia, Indexical and That’s All. “A set containing all microphysical and macrophysical truths, phenomenal truths, indexical truths and a “that’s-all” truth.”
Phenomenal truths here are things like what red looks like or what raindrops on the skin feel like. Indexical truths are pointing to things in a shared reality, e.g. “right now” or “that tree over there.” The “that’s all” truth is the most dubious sort, basically an Occam’s Razor sort of assumption indicating that what one can build from PQI is all there is.
So Chalmers’ argument is that if we take some primitives about macro and micro physics, some primitives about subjective experience and some indexical communicative primitives about our shared life and environment — then we can build up all human knowledge out of these. ECP is basically in the same direction, and each of the ECP communication modes as commonly described combines physics, subjectivity and indexicality in various ways.
When one digs into the details one also starts to think it may be a bit arbitrary exactly which concept-sets one takes as primitives, much like it’s a bit arbitrary which set of vectors one takes as basis for a vector space. Is the sense of “red” primitive or derived? That may depend if you’re talking relative to a blind or sighted person. Whether “magenta” is primitive or derived may depend on the culture a sighted person comes from? How about the sense of wetness of a liquid, or the hardness of an object?
Along these lines … psychosocial self and physical world are obviously not primitives in the foundational phenomenological sense -- they are quite complex derived constructs. But they feel primitive to us in some states of mind.
Primitivity and Resource Restrictions
In the end, I think we need to acknowledge that is primitive vs. derived is subjective relative to a particular perceiving system at a particular point in time etc. I.e. "X is primitive" is a provisional assumption made by a certain mind in a certain context, which is fine... It just means "I can't see how to analyze this into a composition right now" ...
Like so much else in cognitive science and AGI, this ultimately can be boiled down largely to resource restrictions ... i.e. "Given the amount of resources I'm willing/able to denote to this particular cognitive process, within the scope of this process I must provisionally assume X is indecomposable"
I.e. even if quarks can decompose into partons into subpartons or whatever... maybe it goes all the way down... but for doing anything practical given my resource limitations I have to stop decomposing somewhere and make a specific calculation.
PAC-ness of Primitivity
One clear consequence of Chalmers’ sophisticated conceptual treatment of notions of scrutability and primitivity is that we are not likely to find some small finite set of primes in terms of which all human concepts can be expressed combinatorially. Rather, to get more and more accurate coverage of the human concept sphere, we will likely need to add more and more primitives. Any small finite set of primitives is likely to be merely what computer scientists call PAC, Probably Approximately Correct. The devilish details then come down to how many primitives do you need to increase the probability and decrease the approximation error by how much.
Crudely, define ps(p,e) as the number of primitives that is needed to generate p% of human concepts within error e
Then the question becomes how does ps(p,e) grow as p grows and e shrinks
Or if you want a single variable look at r= p*(1-e)
Let ps*(n) denote the inverse of ps, i.e. it tells you what r you get for a given (optimally chosen) selection of n primitives
One hypothesis would be that ps* has a sigmoid shape as n increases... but then the question is where does the inflection point lie... if n is not too big at the inflection point then perhaps a relatively small set of primitives can get you above a certain threshold of coverage, and then to cover everything else will require a huge fat tail of phenomenal primitives etc.
Seed Ontology for Minecraft-y Proto-AGI?
Chalmers' book takes a top-down approach to semantic primitives (trying to establish the feasibility of a reduction to primitives, rather than actually executing the reduction) and an intermediate level of formalization that has pluses and minuses....
As a guy trying to build AGI systems I'm more inclined toward bottom-up approach, with a fuller formalization (because a programming language in a sense has to be fully formalized, and a functional language like the MeTTa language we’re designing for OpenCog Hyperon is fully formalized in an elegant way..).
Probably what I'll do if/when I plunge into this "reduction to primitives" area again is try to formalize a seed-ontology based on my overall euryphysics/ patternist-model-of-mind approach — keeping the Embodied Communication Priori, the Natural Semantic Metalanguage and Chalmers’ PTQI in mind — and then use this seed ontology to represent the Minecraft world and a community of Hyperon agents acting within it etc. This would fit in with our current use of the Minecraft world for prototyping various Hyperon-related AI algorithms, which we plan to ramp up over the next year or two.
This has pluses and minuses too as Minecraft lacks the richness and nuance of the real human world, and Hyperon's infrastructure is less messy than the brain -- so philosophers could always argue that what appears to work for this simplified context won't work in the more general case.... However it would give a nice prototype for reduction-to-primitives -- a prototype for use both by AIs and by automated reasoning systems...
An optimistic possibility would be that the primitives needed to get a high level of PAC coverage for human-like concepts relevant to Minecraft and Sophiaverse, are also adequate to give a high level of PAC coverage for the everyday human world.