What is "Science", Exactly ?
A (Fairly) New Take Leveraging Modern AI Theory
Cultural/Pragmatic Probabilism: A New Philosophy of Science Balancing Rigor with Anarchism
There’s a tension at the heart of the philosophy of science that has annoyed and intrigued me for decades.
On one side, you have people trying to find simple, formalized criteria for what makes something “scientific”—Popperian falsificationism, Bayesian probabilism, and so on.
On the other side, you have thinkers like Paul Feyerabend pointing out that the history of science is a glorious mess, and the only honest conclusion is “anything goes.”
I grew up as a huge Feyerabend fan, and even corresponded with him briefly by post while I was a grad student in my late teens in the mid-1980s. (The main wisdom he transmitted to me was more personal than intellectual — I was trying to decide what to study for my PhD then, and considering philosophy as an option. He advised me that even if my main interest was philosophy, for purposes of career and life management I was probably better off avoiding a PhD in philosophy and should stick with math and science as my official professional focus, doing philosophy in my spare time. I took his advice and am glad I did!)
But despite my anarchist proclivities, I’ve never felt “anything goes” was quite adequate as a philosophy of science. It doesn’t really distinguish science from other areas of human pursuit. I always felt there should be some way to formalize science that acknowledges the diversity Feyerabend loved to highlight, while still capturing what makes science science.
In a new paper I just hacked together in “spare time” on a long flight home to Seattle from Africa (see the draft here), I propose an approach I call Cultural/Pragmatic Probabilism (CPP) that I believe threads this needle fairly well.
The core idea is elegantly simple (if I do say so myself ;p) : a good scientific theory is one that refuses to make unnecessary distinctions. It doesn’t overfit the data, it doesn’t invoke baroque machinery, and it doesn’t split hairs where practical action doesn’t require it.
The main trick differentiating this approach from standard probabilistic/statistical/Bayesian thinking about “what is science” is: I suggest one has to be fairly flexible and inclusive about what sorts of distinctions to measure in evaluating a scientific theory — it’s not just about datasets and prediction, but also about practical applications and cultural context. And I propose to use some nice math called “quantale weakness” to capture this flexibility.
While this is not mainly a post about AI — more about using “AI theory” tools for philosophy of science — it also has some AI implications… in particular it leads to a very clear picture of what it would take to make a full on “human level AGI scientist” capable of creating new scientific paradigms, as opposed to an AI scientific assistant that is capable of helping flesh out existing ones. I’ll come to that at the end.
The Three Channels of Scientific Weakness
CPP evaluates theories along three dimensions, which I call “weakness channels”:
Evidential weakness: A theory is evidentially weak when it doesn’t make predictive distinctions that aren’t warranted by the data. If two situations produce the same observations, a good theory treats them the same rather than inventing spurious differences.
Cultural weakness: A theory is culturally weak when it doesn’t make representational distinctions beyond what the scientific community’s accepted language requires. This is essentially Occam’s Razor, but with an important twist: what counts as “simple” depends on the paradigm’s native concepts. A quantum physicist finds Hilbert spaces simple; a classical physicist finds them baroque.
Pragmatic weakness: A theory is pragmatically weak when it doesn’t make practical distinctions beyond what matters for action. If two situations call for the same intervention and produce the same outcomes, a good theory lumps them together rather than insisting on irrelevant fine-grained categories.
The key formal insight here is that all three of these are forms of the same thing: refusing to distinguish things that don’t need distinguishing. Science, in this view, is the practice of finding explanations that are maximally weak across all three channels simultaneously.
Paradigm Shifts as Quantale Shifts
What makes this framework powerful is how naturally it handles paradigm shifts. A paradigm, in CPP terms, is essentially a package that specifies: what counts as evidence, what counts as simple, and what counts as pragmatically relevant.
When paradigms shift, one or more of these specifications changes—and this can flip which theory is preferred even when the evidence stays largely the same.
Classical to Quantum Mechanics
Consider the shift from classical to quantum physics. In the classical paradigm, trajectories, forces, and phase spaces are the “native primitives”—they’re cheap to invoke, conceptually natural. Quantum concepts like Hilbert spaces and operators appear as exotic, expensive additions.
In the quantum paradigm, this inverts. Measurement contexts and noncommutative operators become native. Quantum contextuality—the fact that measuring A then B differs from measuring B then A—isn’t an anomaly requiring elaborate patches; it’s a natural consequence of the formalism.
The evidence didn’t change overnight. What changed was the simplicity channel: the community agreed to treat quantum primitives as cheap to invoke, which suddenly made quantum explanations “simpler” than classical workarounds for the same phenomena.
There’s also a pragmatic shift: as technology advanced and quantum distinctions became practically relevant (quantum computing, quantum sensing), the pragmatic weakness of quantum theory increased relative to classical theory. Distinctions that classical physics dismissed as irrelevant became actionable.
The Emergence of Genetics
This is another fairly straightforward historical example to think about. Before Mendel, inheritance was a puzzle with no compact language to describe its regularities. Explanations were ad hoc: “like begets like,” “blending of essences,” with endless exceptions. The description length for inheritance phenomena was enormous.
Mendel’s contribution wasn’t just empirical—it was conceptual. He introduced the gene as a primitive. Suddenly, inheritance regularities became short to state: each trait is controlled by a pair of factors, one from each parent; factors segregate independently; dominant factors mask recessive ones.
The simplicity channel shifted because the coding language gained new primitives. Hypotheses using genes became shorter and could cover more cases with fewer ad hoc distinctions. The pragmatic channel shifted too: the gene concept made breeding programs more effective, so distinctions between genotypes became practically relevant.
Linear to Exponential Thinking
The shift from “linear thinking” to “exponential thinking” (a la Kurzweil, let’s say) in futurology is another clean example. This involves changes in both how error is measured (the evidence channel) and which parameterizations are considered simple.
In a linear-thinking paradigm, additive change is the native primitive: next year’s value equals this year’s plus some constant. An exponential model requires specifying the exponential form—an extra conceptual commitment.
In an exponential-thinking paradigm, multiplicative change is native: next year’s value equals this year’s times some growth factor. Now the linear model requires the extra specification.
Same models, different description lengths, depending on which growth mode is primitive.
The pragmatic channel matters here too. In domains where exponential growth dominates—technology, epidemics, compound returns—predictions based on linear extrapolation fail precisely when it matters most. An exponential-thinking paradigm assigns high pragmatic weight to long-horizon distinctions, making exponential models more appropriate even when they’re similar to linear models over short timescales.
Chaos Theory and Open-Ended Dynamics
Pre-chaos dynamical modeling privileged point prediction: given initial conditions, predict exactly where the system will be. Success meant minimizing trajectory error.
Chaos theory revealed that for many systems, sensitive dependence on initial conditions makes point prediction futile. But this doesn’t mean the system is lawless. Robust structure remains: attractors, Lyapunov exponents, statistical properties.
The evidence channel shifted: from trajectory error to distributional and invariant properties. A model is “good” not because it predicts specific trajectories but because it captures attractor geometry and stability properties.
The simplicity channel shifted too: concepts like strange attractors and fractal dimensions became cheap primitives, enabling compact explanations that would otherwise look hopelessly ad hoc. “The system has a strange attractor with fractal dimension 2.1” compresses information about infinitely many trajectories into a few numbers.
Benchmarked ML to Open-Ended AGI
A similar shift is underway in AI evaluation. The benchmark paradigm evaluates systems by performance on fixed datasets—ImageNet accuracy, BLEU scores, perplexity. This paradigm drove remarkable progress but can be all too easily Goodharted: systems optimize measurable proxies without achieving robust capabilities.
An “open-ended AGI” paradigm emphasizes robustness, transfer, interactive competence, and unpredictable general capabilities. The evidence channel expands from fixed datasets to open-world tasks and stress tests. The simplicity channel shifts to favor agentic primitives—world-models, planning loops, self-correction—that may compress many behaviors into fewer generative mechanisms.
Most dramatically, the pragmatic channel shifts: benchmark performance may have little pragmatic relevance if the system fails in real deployment. Distinctions that suddenly matter include robustness to adversarial inputs, alignment with human values, and the ability to request help when confused.
What Would It Take for Science to Encompass Psi?
For a speculative example, consider what paradigm shifts would be needed if a mature, predictive theory of psi phenomena emerged.
The evidence channel would need to treat observer state as part of context—probabilities would depend on who is observing, not just what is being observed. The experimenter becomes a variable affecting outcomes rather than just a source of noise.
The simplicity channel would need observer-coupling terms as native primitives rather than expensive epicycles. If experimenter effects are real and systematic, a theory treating them as primitive would be simpler than one adding ad hoc corrections for each lab.
The pragmatic channel would make distinctions between observers relevant. Observer selection, training, and state would become actionable variables.
Crucially, falsifiability constraints remain: if the theory can explain anything by arbitrary experimenter-specific knobs, description length explodes and weakness drops. A good psi theory would need to compress many experimenter effects into a small number of principles—not just add a free parameter for each case.
Three Tiers of Artificial Scientist
The CPP framework has interesting implications for how we might build AI systems that do science at a fully, robustly human level. If paradigms are packages specifying evidence semantics, simplicity semantics, and pragmatic semantics, then an artificial scientist shouldn’t be a monolithic agent. Instead, there are at least three distinct tiers:
Tier 1: Paradigm-internal validator. This agent evaluates and critiques hypotheses expressed in the paradigm’s language. Given a hypothesis, it computes evidential adequacy, simplicity, and pragmatic weakness, reporting whether the hypothesis achieves high overall weakness. Think of it as an automated referee and proof-checker. It doesn’t propose new representational primitives or new evaluation semantics.
Tier 2: Paradigm-internal discoverer. This agent discovers new hypotheses by explicitly optimizing “weakest selection” inside the paradigm. It searches for explanations that are maximally weak across all three channels, using the paradigm’s established primitives. This is automated “normal science”—model-building and exploration within accepted frameworks.
Tier 3: Paradigm innovator. This agent attempts radical innovation by searching not just over hypotheses but over paradigm modifications themselves—new primitives, new compositional operators, new pragmatic relevance criteria, sometimes new evidence semantics. The meta-objective is to make an expanded body of evidence compressible with low total description length —but in the native language innovated by the AGI innovator itself!
The same three-tier structure applies to experimental science: Tier 1 designs experiments to test fixed hypotheses; Tier 2 designs experiments to produce data enabling better weakest explanations; Tier 3 designs experiments to reveal which new weakness measures might fit an expanded evidence base.
Radical scientific innovation, in this view, is formalized as learning a new weakness measure—finding new ways to evaluate what counts as simple, what counts as pragmatically relevant, and sometimes what counts as evidence, such that newly admitted data becomes jointly explainable with fewer ad hoc distinctions.
Conclusion
Cultural/Pragmatic Probabilism offers a way to formalize scientific methodology that acknowledges the diversity Feyerabend highlighted while still capturing what makes science distinctive. The key moves are:
Science is anchored by a probabilistic evidence channel over community-accepted observations.
Cultural simplicity is a weakness channel whose standards are paradigm-dependent.
Pragmatic usefulness is also a weakness channel, on equal footing with evidence and simplicity.
Paradigm shifts become mathematically representable as shifts in these channels or their weightings. This makes “incommensurability” precise without abandoning probabilistic rationality.
For those interested in the mathematical foundations—including formal generalization bounds and tracking guarantees for paradigm-relative learning—the full paper develops these ideas with considerably more rigor. But the core intuition is simple: good science is weak science. It refuses to make distinctions that don’t need making.
And that, perhaps, is the one thing that doesn’t “go” in (my modern-AI-flavored variant of) Feyerabend’s anarchic kingdom: unnecessary complexity. Everything else is negotiable, morphable and evolvable — including how complexity is measured!
The full paper, “Cultural/Pragmatic Probabilism: A New Philosophy of Science Balancing Rigor with Anarchism via Quantale Weakness,” is available here.


Macroscopic Quantum Entanglement » Vlatko Vedral https://share.google/CDGqtYdiugFA4aXKW
It seems obvious to me where psi fits in. Vedral is the guy who first predicted entanglement in time and he recently concocted that experiment to test the equivalence principle: Penrose's hypothesis was falsified at the given resolution.
Why don't you write about Tller's Boogle factor?
Hi Ben, great post! I also grew up as a huge fan of Feyerabend's (against-) method, since much before hearing his name for the first time. I still am one.
I don't feel much of a need to "distinguish science from other areas of human pursuit." To me, science is not and can't be *essentially* different from hunting, foraging, fighting, building, politics, social interactions etc. (or if you prefer all these things are a kind of science). In all these areas of human pursuit one has to feel his way through uncharted territory, and all methods, schemes, and rules are eventually broken and replaced (and this is a good thing).
Of course we can try and describe the glorious mess of science and distinguish it from other glorious messes, at least for clarity, but the distinction can only be "weak" like the thing it describes.
Happy New Year!