Birthday problem
Template:Short description Template:For-multi
In probability theory, the birthday problem asks for the probability that, in a set of Template:Mvar randomly chosen people, at least two will share the same birthday. The birthday paradox is the counterintuitive fact that only 23 people are needed for that probability to exceed 50%.
The birthday paradox is a veridical paradox: it seems wrong at first glance but is, in fact, true. While it may seem surprising that only 23 individuals are required to reach a 50% probability of a shared birthday, this result is made more intuitive by considering that the birthday comparisons will be made between every possible pair of individuals. With 23 individuals, there are Template:Sfrac = 253 pairs to consider.
Real-world applications for the birthday problem include a cryptographic attack called the birthday attack, which uses this probabilistic model to reduce the complexity of finding a collision for a hash function, as well as calculating the approximate risk of a hash collision existing within the hashes of a given size of population.
The problem is generally attributed to Harold Davenport in about 1927, though he did not publish it at the time. Davenport did not claim to be its discoverer "because he could not believe that it had not been stated earlier".[1][2] The first publication of a version of the birthday problem was by Richard von Mises in 1939.[3]
Calculating the probability
Consider the event Template:Mvar that a group of Template:Mvar people does not have any repeated birthdays, and let the complementary event Template:Mvar be that of a group of Template:Mvar people contains at least two people who share a birthday. Then the probabilities P(A)Script error: No such module "Check for unknown parameters". and P(B)Script error: No such module "Check for unknown parameters". of the two events are related by the equation P(B) = 1 − P(A)Script error: No such module "Check for unknown parameters".. The probability P(A)Script error: No such module "Check for unknown parameters". can be computed using the perspective of permutations, as follows. Let be the total number of ways that Template:Mvar can have distinct birthdays, and let be the total number of ways Template:Mvar people can have birthdays arranged, including possibly repeated birthdays. The probability P(A)Script error: No such module "Check for unknown parameters". is the ratio of these two quantities, divided by . When , the two counts are given by and their ratio is , and so
Another way the birthday problem can be solved is by asking for an approximate probability that in a group of Template:Mvar people at least two have the same birthday. For simplicity, leap years, twins, selection bias, and seasonal and weekly variations in birth ratesTemplate:Refn are generally disregarded, and instead it is assumed that there are 365 possible birthdays, and that each person's birthday is equally likely to be any of these days, independent of the other people in the group.
For independent birthdays, a uniform distribution of birthdays minimizes the probability of two people in a group having the same birthday. Any unevenness increases the likelihood of two people sharing a birthday.[4][5] However real-world birthdays are not sufficiently uneven to make much change: the real-world group size necessary to have a greater than 50% chance of a shared birthday is 23, as in the theoretical uniform distribution.[6]
The goal is to compute P(B)Script error: No such module "Check for unknown parameters"., the probability that at least two people in the room have the same birthday. However, it is simpler to calculate P(A′)Script error: No such module "Check for unknown parameters"., the probability that no two people in the room have the same birthday. Then, because BScript error: No such module "Check for unknown parameters". and A′Script error: No such module "Check for unknown parameters". are the only two possibilities and are also mutually exclusive, P(B) = 1 − P(A′).Script error: No such module "Check for unknown parameters".
Here is the calculation of P(B)Script error: No such module "Check for unknown parameters". for 23 people. Let the 23 people be numbered 1 to 23. The event that all 23 people have different birthdays is the same as the event that person 2 does not have the same birthday as person 1, and that person 3 does not have the same birthday as either person 1 or person 2, and so on, and finally that person 23 does not have the same birthday as any of persons 1 through 22. Let these events be called Event 2, Event 3, and so on. Event 1 is the event of person 1 having a birthday, which occurs with probability 1. This conjunction of events may be computed using conditional probability: the probability of Event 2 is Template:Sfrac, as person 2 may have any birthday other than the birthday of person 1. Similarly, the probability of Event 3 given that Event 2 occurred is Template:Sfrac, as person 3 may have any of the birthdays not already taken by persons 1 and 2. This continues until finally the probability of Event 23 given that all preceding events occurred is Template:Sfrac. Finally, the principle of conditional probability implies that P(A′)Script error: No such module "Check for unknown parameters". is equal to the product of these individual probabilities: Template:NumBlk
The terms of equation (1) can be collected to arrive at: Template:NumBlk
Evaluating equation (2) gives P(A′) ≈ 0.492703Script error: No such module "Check for unknown parameters".
Therefore, P(B) ≈ 1 − 0.492703 = 0.507297Script error: No such module "Check for unknown parameters". (50.7297%).
This process can be generalized to a group of Template:Mvar people, where p(n)Script error: No such module "Check for unknown parameters". is the probability of at least two of the Template:Mvar people sharing a birthday. It is easier to first calculate the probability p(n)Script error: No such module "Check for unknown parameters". that all Template:Mvar birthdays are different. According to the pigeonhole principle, p(n)Script error: No such module "Check for unknown parameters". is zero when n > 365Script error: No such module "Check for unknown parameters".. When n ≤ 365Script error: No such module "Check for unknown parameters".:
where !Script error: No such module "Check for unknown parameters". is the factorial operator, Template:ParsScript error: No such module "Check for unknown parameters". is the binomial coefficient and kPrScript error: No such module "Check for unknown parameters". denotes permutation.
The equation expresses the fact that the first person has no one to share a birthday, the second person cannot have the same birthday as the first Template:ParsScript error: No such module "Check for unknown parameters"., the third cannot have the same birthday as either of the first two Template:ParsScript error: No such module "Check for unknown parameters"., and in general the Template:Mvarth birthday cannot be the same as any of the n − 1Script error: No such module "Check for unknown parameters". preceding birthdays.
The event of at least two of the Template:Mvar persons having the same birthday is complementary to all Template:Mvar birthdays being different. Therefore, its probability p(n)Script error: No such module "Check for unknown parameters". is
The following table shows the probability for some other values of Template:Mvar (for this table, the existence of leap years is ignored, and each birthday is assumed to be equally likely):
Template:Mvar p(n)Script error: No such module "Check for unknown parameters". 1 0.0% 5 2.7% 10 11.7% 20 41.1% 23 50.7% 30 70.6% 40 89.1% 50 97.0% 60 99.4% 70 99.9% 75 99.97% 100 Script error: No such module "val".% 200 (100 − Script error: No such module "val".)% 300 (100 − Script error: No such module "val".)% 350 (100 − Script error: No such module "val".)% 365 (100 − Script error: No such module "val".)% ≥ 366 100%
Approximations
The Taylor series expansion of the exponential function (the constant e ≈ Script error: No such module "val".Script error: No such module "Check for unknown parameters".)
provides a first-order approximation for exScript error: No such module "Check for unknown parameters". for :
To apply this approximation to the first expression derived for p(n)Script error: No such module "Check for unknown parameters"., set x = −Template:SfracScript error: No such module "Check for unknown parameters".. Thus,
Then, replace Template:Mvar with non-negative integers for each term in the formula of p(n)Script error: No such module "Check for unknown parameters". until a = n − 1Script error: No such module "Check for unknown parameters"., for example, when a = 1Script error: No such module "Check for unknown parameters".,
The first expression derived for p(n)Script error: No such module "Check for unknown parameters". can be approximated as
Therefore,
An even coarser approximation is given by
which, as the graph illustrates, is still fairly accurate.
According to the approximation, the same approach can be applied to any number of "people" and "days". If rather than 365 days there are Template:Mvar, if there are Template:Mvar persons, and if n ≪ dScript error: No such module "Check for unknown parameters"., then using the same approach as above we achieve the result that if p(n, d)Script error: No such module "Check for unknown parameters". is the probability that at least two out of Template:Mvar people share the same birthday from a set of Template:Mvar available days, then:
Simple exponentiation
The probability of any two people not having the same birthday is Template:Sfrac. In a room containing n people, there are Template:Pars = Template:SfracScript error: No such module "Check for unknown parameters". pairs of people, i.e. Template:ParsScript error: No such module "Check for unknown parameters". events. The probability of no two people sharing the same birthday can be approximated by assuming that these events are independent and hence by multiplying their probability together. Being independent would be equivalent to picking with replacement, any pair of people in the world, not just in a room. In short Template:Sfrac can be multiplied by itself Template:ParsScript error: No such module "Check for unknown parameters". times, which gives us
Since this is the probability of no one having the same birthday, then the probability of someone sharing a birthday is
And for the group of 23 people, the probability of sharing is
Poisson approximation
Applying the Poisson approximation for the binomial on the group of 23 people,
so
The result is over 50% as previous descriptions. This approximation is the same as the one above based on the Taylor expansion that uses ex ≈ 1 + xScript error: No such module "Check for unknown parameters"..
Square approximation
A good rule of thumb which can be used for mental calculation is the relation
which can also be written as
which works well for probabilities less than or equal to Template:Sfrac. In these equations, Template:Mvar is the number of days in a year.
For instance, to estimate the number of people required for a Template:Sfrac chance of a shared birthday, we get
Which is not too far from the correct answer of 23.
Approximation of number of people
This can also be approximated using the following formula for the number of people necessary to have at least a Template:Sfrac chance of matching:
This is a result of the good approximation that an event with Template:SfracScript error: No such module "Check for unknown parameters". probability will have a Template:Sfrac chance of occurring at least once if it is repeated k ln 2Script error: No such module "Check for unknown parameters". times.[7]
Probability table
Script error: No such module "Labelled list hatnote".
length of
hex stringno. of
bits
(Template:Mvar)hash space
size
(2bScript error: No such module "Check for unknown parameters".)Number of hashed elements such that probability of at least one hash collision ≥ Template:Mvar Template:Mvar = Script error: No such module "val". Template:Mvar = Script error: No such module "val". Template:Mvar = Script error: No such module "val". Template:Mvar = Script error: No such module "val". Template:Mvar = Script error: No such module "val". Template:Mvar = 0.001 Template:Mvar = 0.01 Template:Mvar = 0.25 Template:Mvar = 0.50 Template:Mvar = 0.75 8 32 Script error: No such module "val". 2 2 2 2.9 93 Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". (10) (40) (Script error: No such module "val".) 2 2 2 47 Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". (12) (48) (Script error: No such module "val".) 2 2 24 Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". 16 64 Script error: No such module "val". 6.1 Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". (24) (96) (Script error: No such module "val".) Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". 32 128 Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". (48) (192) (Script error: No such module "val".) Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". 64 256 Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". (96) (384) (Script error: No such module "val".) Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". 128 512 Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val". Script error: No such module "val".
The lighter fields in this table show the number of hashes needed to achieve the given probability of collision (column) given a hash space of a certain size in bits (row). Using the birthday analogy: the "hash space size" resembles the "available days", the "probability of collision" resembles the "probability of shared birthday", and the "required number of hashed elements" resembles the "required number of people in a group". One could also use this chart to determine the minimum hash size required (given upper bounds on the hashes and probability of error), or the probability of collision (for fixed number of hashes and probability of error).
For comparison, Script error: No such module "val". to Script error: No such module "val". is the uncorrectable bit error rate of a typical hard disk.[8] In theory, 128-bit hash functions, such as MD5, should stay within that range until about Script error: No such module "val". documents, even if its possible outputs are many more.
An upper bound on the probability and a lower bound on the number of people
The argument below is adapted from an argument of Paul Halmos.Template:Refn
As stated above, the probability that no two birthdays coincide is
As in earlier paragraphs, interest lies in the smallest Template:Mvar such that p(n) > Template:SfracScript error: No such module "Check for unknown parameters".; or equivalently, the smallest Template:Mvar such that p(n) < Template:SfracScript error: No such module "Check for unknown parameters"..
Using the inequality 1 − x < e−xScript error: No such module "Check for unknown parameters". in the above expression we replace 1 − Template:SfracScript error: No such module "Check for unknown parameters". with e<templatestyles src="Fraction/styles.css" />−k⁄365Script error: No such module "Check for unknown parameters".. This yields
Therefore, the expression above is not only an approximation, but also an upper bound of p(n)Script error: No such module "Check for unknown parameters".. The inequality
implies p(n) < Template:SfracScript error: No such module "Check for unknown parameters".. Solving for Template:Mvar gives
Now, 730 ln 2Script error: No such module "Check for unknown parameters". is approximately 505.997, which is barely below 506, the value of n2 − nScript error: No such module "Check for unknown parameters". attained when n = 23Script error: No such module "Check for unknown parameters".. Therefore, 23 people suffice. Incidentally, solving n2 − n = 730 ln 2Script error: No such module "Check for unknown parameters". for n gives the approximate formula of Frank H. Mathis cited above.
This derivation only shows that at most 23 people are needed to ensure the chances of a birthday match are at least even; it leaves open the possibility that Template:Mvar is 22 or less could also work.
Generalizations
Arbitrary number of days
Given a year with Template:Mvar days, the generalized birthday problem asks for the minimal number n(d)Script error: No such module "Check for unknown parameters". such that, in a set of Template:Mvar randomly chosen people, the probability of a birthday coincidence is at least 50%. In other words, n(d)Script error: No such module "Check for unknown parameters". is the minimal integer Template:Mvar such that
The classical birthday problem thus corresponds to determining n(365)Script error: No such module "Check for unknown parameters".. The first 99 values of n(d)Script error: No such module "Check for unknown parameters". are given here (sequence A033810 in the OEIS):
Template:Mvar 1–2 3–5 6–9 10–16 17–23 24–32 33–42 43–54 55–68 69–82 83–99 n(d)Script error: No such module "Check for unknown parameters". 2 3 4 5 6 7 8 9 10 11 12
A similar calculation shows that n(d)Script error: No such module "Check for unknown parameters". = 23 when Template:Mvar is in the range 341–372.
A number of bounds and formulas for n(d)Script error: No such module "Check for unknown parameters". have been published.[9] For any d ≥ 1Script error: No such module "Check for unknown parameters"., the number n(d)Script error: No such module "Check for unknown parameters". satisfies[10]
These bounds are optimal in the sense that the sequence n(d) −
- REDIRECT Template:Radic
Template:Rcat shellScript error: No such module "Check for unknown parameters". gets arbitrarily close to
while it has
as its maximum, taken for d = 43Script error: No such module "Check for unknown parameters"..
The bounds are sufficiently tight to give the exact value of n(d)Script error: No such module "Check for unknown parameters". in most of the cases. For example, for d =Script error: No such module "Check for unknown parameters". 365 these bounds imply that 22.7633 < n(365) < 23.7736Script error: No such module "Check for unknown parameters". and 23 is the only integer in that range. In general, it follows from these bounds that n(d)Script error: No such module "Check for unknown parameters". always equals either
where ⌈ · ⌉Script error: No such module "Check for unknown parameters". denotes the ceiling function. The formula
holds for 73% of all integers Template:Mvar.[11] The formula
holds for almost all Template:Mvar, i.e., for a set of integers Template:Mvar with asymptotic density 1.[11]
The formula
holds for all d ≤ Script error: No such module "val".Script error: No such module "Check for unknown parameters"., but it is conjectured that there are infinitely many counterexamples to this formula.[12]
The formula
holds for all d ≤ Script error: No such module "val".Script error: No such module "Check for unknown parameters"., and it is conjectured that this formula holds for all Template:Mvar.[12]
More than two people sharing a birthday
It is possible to extend the problem to ask how many people in a group are necessary for there to be a greater than 50% probability that at least 3, 4, 5, etc. of the group share the same birthday.
The first few values are as follows: >50% probability of 3 people sharing a birthday - 88 people; >50% probability of 4 people sharing a birthday - 187 people (sequence A014088 in the OEIS).[13]
The strong birthday problem asks for the number of people that need to be gathered together before there is a 50% chance that everyone in the gathering shares their birthday with at least one other person. For d=365 days the answer is 3,064 people.[14][15]
The number of people needed for arbitrary number of days is given by (sequence A380129 in the OEIS)
The birthday problem can be generalized as follows:
- Given Template:Mvar random integers drawn from a discrete uniform distribution with range [1,d]Script error: No such module "Check for unknown parameters"., what is the probability p(n; d)Script error: No such module "Check for unknown parameters". that at least two numbers are the same? (d = 365Script error: No such module "Check for unknown parameters". gives the usual birthday problem.)[16]
The generic results can be derived using the same arguments given above.
Conversely, if n(p; d)Script error: No such module "Check for unknown parameters". denotes the number of random integers drawn from [1,d]Script error: No such module "Check for unknown parameters". to obtain a probability Template:Mvar that at least two numbers are the same, then
The birthday problem in this more generic sense applies to hash functions: the expected number of NScript error: No such module "Check for unknown parameters".-bit hashes that can be generated before getting a collision is not 2NScript error: No such module "Check for unknown parameters"., but rather only 2<templatestyles src="Fraction/styles.css" />N⁄2Script error: No such module "Check for unknown parameters".. This is exploited by birthday attacks on cryptographic hash functions and is the reason why a small number of collisions in a hash table are, for all practical purposes, inevitable.
The theory behind the birthday problem was used by Zoe Schnabel[17] under the name of capture-recapture statistics to estimate the size of fish population in lakes. The birthday problem and its generalizations are also useful tools for modelling coincidences.[18]
Probability of a unique collision
The classic birthday problem allows for more than two people to share a particular birthday or for there to be matches on multiple days. The probability that among Template:Mvar people there is exactly one pair of individuals with a matching birthday given Template:Mvar possible days is[18]
Unlike the standard birthday problem, as Template:Mvar increases the probability reaches a maximum value before decreasing. For example, for d = 365Script error: No such module "Check for unknown parameters"., the probability of a unique match has a maximum value of 0.3864 occurring when n = 28Script error: No such module "Check for unknown parameters"..
Generalization to multiple types of people
The basic problem considers all trials to be of one "type". The birthday problem has been generalized to consider an arbitrary number of types.[19] In the simplest extension there are two types of people, say Template:Mvar men and Template:Mvar women, and the problem becomes characterizing the probability of a shared birthday between at least one man and one woman. (Shared birthdays between two men or two women do not count.) The probability of no shared birthdays here is
where d = 365Script error: No such module "Check for unknown parameters". and S2Script error: No such module "Check for unknown parameters". are Stirling numbers of the second kind. Consequently, the desired probability is 1 − p0Script error: No such module "Check for unknown parameters"..
This variation of the birthday problem is interesting because there is not a unique solution for the total number of people m + nScript error: No such module "Check for unknown parameters".. For example, the usual 50% probability value is realized for both a 32-member group of 16 men and 16 women and a 49-member group of 43 women and 6 men.
Other birthday problems
First match
A related question is, as people enter a room one at a time, which one is most likely to be the first to have the same birthday as someone already in the room? That is, for what Template:Mvar is p(n) − p(n − 1)Script error: No such module "Check for unknown parameters". maximum? The answer is 20—if there is a prize for first match, the best position in line is 20th.Script error: No such module "Unsubst".
Same birthday as you
In the birthday problem, neither of the two people is chosen in advance. By contrast, the probability q(n)Script error: No such module "Check for unknown parameters". that at least one other person in a room of Template:Mvar other people has the same birthday as a particular person (for example, you) is given by
and for general Template:Mvar by
In the standard case of d = 365Script error: No such module "Check for unknown parameters"., substituting n = 23Script error: No such module "Check for unknown parameters". gives about 6.1%, which is less than 1 chance in 16. For a greater than 50% chance that at least one other person in a roomful of Template:Mvar people has the same birthday as you, Template:Mvar would need to be at least 253. This number is significantly higher than Template:Sfrac = 182.5Script error: No such module "Check for unknown parameters".: the reason is that it is likely that there are some birthday matches among the other people in the room.
For any one person in a group of n people the probability that he or she shares his birthday with someone else is , as explained above. The expected number of people with a shared (non-unique) birthday can now be calculated easily by multiplying that probability by the number of people (n), so it is:
(This multiplication can be done this way because of the linearity of the expected value of indicator variables). This implies that the expected number of people with a non-shared (unique) birthday is:
Similar formulas can be derived for the expected number of people who share with three, four, etc. other people.
Number of people until every birthday is achieved
The expected number of people needed until every birthday is achieved is called the Coupon collector's problem. It can be calculated by nHnScript error: No such module "Check for unknown parameters"., where HnScript error: No such module "Check for unknown parameters". is the Template:Mvarth harmonic number. For 365 possible dates (the birthday problem), the answer is 2365.
Near matches
Another generalization is to ask for the probability of finding at least one pair in a group of Template:Mvar people with birthdays within Template:Mvar calendar days of each other, if there are Template:Mvar equally likely birthdays.[20]
The number of people required so that the probability that some pair will have a birthday separated by Template:Mvar days or fewer will be higher than 50% is given in the following table:
Template:Mvar Template:Mvar
for d = 365Script error: No such module "Check for unknown parameters".0 23 1 14 2 11 3 9 4 8 5 8 6 7 7 7
Thus in a group of just seven random people, it is more likely than not that two of them will have a birthday within a week of each other.[20]
Number of days with a certain number of birthdays
Number of days with at least one birthday
The expected number of different birthdays, i.e. the number of days that are at least one person's birthday, is:
This follows from the expected number of days that are no one's birthday:
which follows from the probability that a particular day is no one's birthday, Template:ParsScript error: No such module "Su".Script error: No such module "Check for unknown parameters"., easily summed because of the linearity of the expected value.
For instance, with Template:Var = 365Script error: No such module "Check for unknown parameters"., you should expect about 21 different birthdays when there are 22 people, or 46 different birthdays when there are 50 people. When there are 1000 people, there will be around 341 different birthdays (24 unclaimed birthdays).
Number of days with at least two birthdays
The above can be generalized from the distribution of the number of people with their birthday on any particular day, which is a Binomial distribution with probability Template:SfracScript error: No such module "Check for unknown parameters".. Multiplying the relevant probability by Template:Mvar will then give the expected number of days. For example, the expected number of days which are shared; i.e. which are at least two (i.e. not zero and not one) people's birthday is:
Number of people who repeat a birthday
The probability that the Template:Mvarth integer randomly chosen from [1,d]Script error: No such module "Check for unknown parameters". will repeat at least one previous choice equals q(k − 1; d)Script error: No such module "Check for unknown parameters". above. The expected total number of times a selection will repeat a previous selection as Template:Mvar such integers are chosen equals[21]
This can be seen to equal the number of people minus the expected number of different birthdays.
The distribution of the random variable reporting the number Template:Mvar of integers to be chosen in order to get exactly Template:Mvar repeats (for a constant Template:Mvar) converges to a chi-distributed random variable as .[22]
In an alternative formulation of the birthday problem, one asks the average number of people required to find a pair with the same birthday. If we consider the probability function Pr[[[:Template:Mvar]] people have at least one shared birthday], this average is determining the mean of the distribution, as opposed to the customary formulation, which asks for the median. The problem is relevant to several hashing algorithms analyzed by Donald Knuth in his book The Art of Computer Programming. It may be shown[23][24] that if one samples uniformly, with replacement, from a population of size MScript error: No such module "Check for unknown parameters"., the number of trials required for the first repeated sampling of some individual has expected value n = 1 + Q(M)Script error: No such module "Check for unknown parameters"., where
The function
has been studied by Srinivasa Ramanujan and has asymptotic expansion:
With M = 365Script error: No such module "Check for unknown parameters". days in a year, the average number of people required to find a pair with the same birthday is n = 1 + Q(M) ≈ 24.61659Script error: No such module "Check for unknown parameters"., somewhat more than 23, the number required for a 50% chance. In the best case, two people will suffice; at worst, the maximum possible number of M + 1 = 366Script error: No such module "Check for unknown parameters". people is needed; but on average, only 25 people are required
An analysis using indicator random variables can provide a simpler but approximate analysis of this problem.[25] For each pair (i, j) for k people in a room, we define the indicator random variable Xij, for , by
Let X be a random variable counting the pairs of individuals with the same birthday.
For n = 365Script error: No such module "Check for unknown parameters"., if k = 28Script error: No such module "Check for unknown parameters"., the expected number of pairs of individuals with the same birthday is Template:Sfrac ≈ 1.0356. Therefore, we can expect at least one matching pair with at least 28 people.
In the 2014 FIFA World Cup, each of the 32 squads had 23 players. An analysis of the official squad lists suggested that 16 squads had pairs of players sharing birthdays, and of these 5 squads had two pairs: Argentina, France, Iran, South Korea and Switzerland each had two pairs, and Australia, Bosnia and Herzegovina, Brazil, Cameroon, Colombia, Honduras, Netherlands, Nigeria, Russia, Spain and USA each with one pair.[26]
Voracek, Tran and Formann showed that the majority of people markedly overestimate the number of people that is necessary to achieve a given probability of people having the same birthday, and markedly underestimate the probability of people having the same birthday when a specific sample size is given.[27] Further results showed that psychology students and women did better on the task than casino visitors/personnel or men, but were less confident about their estimates.
Reverse problem
The reverse problem is to find, for a fixed probability Template:Mvar, the greatest Template:Mvar for which the probability p(n)Script error: No such module "Check for unknown parameters". is smaller than the given Template:Mvar, or the smallest Template:Mvar for which the probability p(n)Script error: No such module "Check for unknown parameters". is greater than the given Template:Mvar.Script error: No such module "Unsubst".
Taking the above formula for d = 365Script error: No such module "Check for unknown parameters"., one has
The following table gives some sample calculations.
Template:Mvar Template:Mvar n↓Script error: No such module "Check for unknown parameters". p(n↓)Script error: No such module "Check for unknown parameters". n↑Script error: No such module "Check for unknown parameters". p(n↑)Script error: No such module "Check for unknown parameters". 0.01 0.14178 - REDIRECT Template:Radic
Template:Rcat shell = 2.70864
2 0.00274 3 0.00820 0.05 0.32029 - REDIRECT Template:Radic
Template:Rcat shell = 6.11916
6 0.04046 7 0.05624 0.1 0.45904 - REDIRECT Template:Radic
Template:Rcat shell = 8.77002
8 0.07434 9 0.09462 0.2 0.66805 - REDIRECT Template:Radic
Template:Rcat shell = 12.76302
12 0.16702 13 0.19441 0.3 0.84460 - REDIRECT Template:Radic
Template:Rcat shell = 16.13607
16 0.28360 17 0.31501 0.5 1.17741 - REDIRECT Template:Radic
Template:Rcat shell = 22.49439
22 0.47570 23 0.50730 0.7 1.55176 - REDIRECT Template:Radic
Template:Rcat shell = 29.64625
29 0.68097 30 0.70632 0.8 1.79412 - REDIRECT Template:Radic
Template:Rcat shell = 34.27666
34 0.79532 35 0.81438 0.9 2.14597 - REDIRECT Template:Radic
Template:Rcat shell = 40.99862
40 0.89123 41 0.90315 0.95 2.44775 - REDIRECT Template:Radic
Template:Rcat shell = 46.76414
46 0.94825 47 0.95477 0.99 3.03485 - REDIRECT Template:Radic
Template:Rcat shell = 57.98081
57 0.99012 58 0.99166
Some values falling outside the bounds have been colored to show that the approximation is not always exact.
Partition problem
A related problem is the partition problem, a variant of the knapsack problem from operations research. Some weights are put on a balance scale; each weight is an integer number of grams randomly chosen between one gram and one million grams (one tonne). The question is whether one can usually (that is, with probability close to 1) transfer the weights between the left and right arms to balance the scale. (In case the sum of all the weights is an odd number of grams, a discrepancy of one gram is allowed.) If there are only two or three weights, the answer is very clearly no; although there are some combinations which work, the majority of randomly selected combinations of three weights do not. If there are very many weights, the answer is clearly yes. The question is, how many are just sufficient? That is, what is the number of weights such that it is equally likely for it to be possible to balance them as it is to be impossible?
Often, people's intuition is that the answer is above Script error: No such module "val".. Most people's intuition is that it is in the thousands or tens of thousands, while others feel it should at least be in the hundreds. The correct answer is 23.Script error: No such module "Unsubst".
The reason is that the correct comparison is to the number of partitions of the weights into left and right. There are 2N − 1Script error: No such module "Check for unknown parameters". different partitions for NScript error: No such module "Check for unknown parameters". weights, and the left sum minus the right sum can be thought of as a new random quantity for each partition. The distribution of the sum of weights is approximately Gaussian, with a peak at Script error: No such module "val".NScript error: No such module "Check for unknown parameters". and width Script error: No such module "val".
- REDIRECT Template:Radic
Template:Rcat shellScript error: No such module "Check for unknown parameters"., so that when 2N − 1Script error: No such module "Check for unknown parameters". is approximately equal to Script error: No such module "val".
- REDIRECT Template:Radic
Template:Rcat shellScript error: No such module "Check for unknown parameters". the transition occurs. 223 − 1 is about 4 million, while the width of the distribution is only 5 million.[28]
In fiction
Arthur C. Clarke's 1961 novel A Fall of Moondust contains a section where the main characters, trapped underground for an indefinite amount of time, are celebrating a birthday and find themselves discussing the validity of the birthday problem. As stated by a physicist passenger: "If you have a group of more than twenty-four people, the odds are better than even that two of them have the same birthday." Eventually, out of 22 present, it is revealed that two characters share the same birthday, May 23.
Notes
<templatestyles src="Reflist/styles.css" />
Script error: No such module "Check for unknown parameters".
References
<templatestyles src="Reflist/styles.css" />
- ↑ David Singmaster, Sources in Recreational Mathematics: An Annotated Bibliography, Eighth Preliminary Edition, 2004, section 8.B
- ↑ H.S.M. Coxeter, "Mathematical Recreations and Essays, 11th edition", 1940, p 45, as reported in I. J. Good, Probability and the weighing of evidence, 1950, p. 38
- ↑ Richard Von Mises, "Über Aufteilungs- und Besetzungswahrscheinlichkeiten", Revue de la faculté des sciences de l'Université d'Istanbul 4:145-163, 1939, reprinted in Script error: No such module "citation/CS1".
- ↑ Script error: No such module "Footnotes".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
- ↑ Jim Gray, Catharine van Ingen. Empirical Measurements of Disk Failure Rates and Error Rates
- ↑ <templatestyles src="Citation/styles.css"/>D. Brink, A (probably) exact solution to the Birthday Problem, Ramanujan Journal, 2012, [1].
- ↑ Brink 2012, Theorem 2
- ↑ a b Brink 2012, Theorem 3
- ↑ a b Brink 2012, Table 3, Conjecture 1
- ↑ Script error: No such module "citation/CS1".
- ↑ DasGupta, Anirban. "The matching, birthday and the strong birthday problem: a contemporary review." Journal of Statistical Planning and Inference 130.1-2 (2005): 377-389.
- ↑ Mario Cortina Borja, The Strong Birthday Problem, Significance, Volume 10, Issue 6, December 2013, Pages 18–20, https://doi.org/10.1111/j.1740-9713.2013.00705.x
- ↑ Script error: No such module "citation/CS1".
- ↑ Z. E. Schnabel (1938) The Estimation of the Total Fish Population of a Lake, American Mathematical Monthly 45, 348–352.
- ↑ a b M. Pollanen (2024) A Double Birthday Paradox in the Study of Coincidences, Mathematics 23(24), 3882. https://doi.org/10.3390/math12243882
- ↑ M. C. Wendl (2003) Collision Probability Between Sets of Random Variables, Statistics and Probability Letters 64(3), 249–254.
- ↑ a b M. Abramson and W. O. J. Moser (1970) More Birthday Surprises, American Mathematical Monthly 77, 856–858
- ↑ Script error: No such module "citation/CS1".
- ↑ Corollary 5 in Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
Script error: No such module "Check for unknown parameters".
Bibliography
- Script error: No such module "Citation/CS1".
- Script error: No such module "Citation/CS1".
- Script error: No such module "citation/CS1".
- Script error: No such module "Citation/CS1".
- Script error: No such module "Citation/CS1". Reprinted in Script error: No such module "citation/CS1".
- Script error: No such module "citation/CS1".
- Script error: No such module "citation/CS1".
External links
- The Birthday Paradox accounting for leap year birthdays
- Script error: No such module "Template wrapper".
- A humorous article explaining the paradox
- SOCR EduMaterials activities birthday experiment
- Understanding the Birthday Problem (Better Explained)
- Eurobirthdays 2012. A birthday problem. A practical football example of the birthday paradox.
- Script error: No such module "citation/CS1".
- Computing the probabilities of the Birthday Problem at WolframAlpha