It is a truth universally acknowledged that the more `insider knowledge’ one needs to get a joke, the funnier one finds it. Sure, there are funny jokes that everyone gets, and astrophysics jokes that are boring even to astrophysicists. However, the correlation is undeniable. For simplicity, let us imagine that this phenomenon can be well-expressed by a linear relationship between insider knowledge and humour. This phenomenon may well be the result of some sort of clique-building psychological phenomenon: we use `exclusive’ humour as a way to find others like ourselves, and exclude outsiders.

Whatever the cause, it has an interesting effect. In our society, lots of people have lots of knowledge – advanced knowledge, even. That is: the relationship between absolute amount of knowledge possessed, and the number of people with that much knowledge may not be linear. Some sorts of knowledge even cluster together in large institutions with great towers constructed of ivory – universities, some might call them. Other sorts of knowledge gather in the dark corners of the internet: forums like 4chan, for instance.

Consider a joke which requires knowledge in some particular field. Suppose that only those with knowledge ≥ *k* will get it. The humour potential *h* of a joke is the product of:

- The amount
*k*of knowledge needed to get it, - The number
*n*of people with that much knowledge, and - The fraction
*f*of those people who will hear the joke.

Without too much loss of generality, let us assume that knowledge distribution in any field follows a log-normal distribution, with σ=1. This is consistent with the reasonable model of knowledge acquisition in which the marginal cost of learning *dk* is a monotonically increasing function of current knowledge *k*. For the moment, let us assume that *f*=1. We will return to this assumption later.

So, to maximise the humour potential of any particular joke, we must maximise *k*.*n*. *n* is just the product of the cumulative distribution of the log-normal *above* *k* and the population *p* in that field. So, we have that:

*h* = *k* . *p* . ( 1 – cdf(*k*) )

We know that, 1 – cdf(*k*) is a decreasing function of *k*. Setting aside some jiggery-pokery, we thus have that there exists a finite value of *k* which maximises *h*, that is: there’s an optimal amount of insider knowledge to require for jokes.

What about *f*, though? At first glance, it may seem that *f* is completely independent of everything else here. But that’s not quite the case: the fraction of people who will hear a joke depends on its ability to move through a population: much like a virus. To reach a particular person, there must be a path through the social graph, including only people who get the joke. This means that we want a value of *k* which doesn’t separate the >*k* portion of the social graph into distinct areas. We also want a subject area which has a well-connected graph. There’s no point telling a joke that all research botanists will find funny if research botanists are typically in geographically remote areas without internet connections. We’d much rather tell a joke to computational physicists, who will immediately email it to all their friends.

In addition to a population well-adapted to the spread of a joke, we want one with a large population. We could pursue this issue further by collecting data for analysis, but Randall Munroe has already done significant work investigating this problem. Not only has he pinned down some appropriate field, he’s also managed to experimentally derive *k* with some accuracy (give or take some error function).

Think that this should be funny, but don’t know why? Have an alternate theory? Let me know in the comments.