Multivariate urn/coupon collector problem without replacement
Simply put, the problem I have is:
Given an urn containing $m$ total balls of $n$ different colors with $c_i$ balls of each color (i.e., $m = c_1 + ... + c_n$), what is the expected number of balls you would need to draw to see the $k^{\text{th}}$ color when drawing without replacement?
This is a variation of the coupon collectors problem without replacement, with the difference being the variable stopping criterion $k$. Naturally, if $k = n$, this reduces to the usual coupon collector problem (without replacement) of finding all colors.
Additional (optional) requirements:
- In my particular use case, there are actually also a number $c_b$ of balls without a color (referred to as black balls), which do not contribute to the number of colors seen.
- It can also happen in my use case that a single ball consists of multiple colors, which can be thought of as a multi-colored ball.
Although these additional requirements more accurately reflect the actual use case, they are optional as I believe I can either adjust a solution to the original problem to incorporate them, or ensure that they do not occur (removing cases where they do). Thus a solution which also easily handles black and/or multi-colored balls would be preferred, but is not necessary.
What I've found so far
I have been able to get a solution for a number of simplified cases:
- When there is only one color (with $c_1$ balls of that color) and $c_b$ black balls, the expected number of balls to draw before seeing the color is $\frac{m+1}{c_1+1}$, where $m = c_1 + c_b$.
- When there is only one ball of each color (i.e., $c_i = 1, \forall\ i$) and $c_b$ black balls, the expected number of balls before seeing the $k^{\text{th}}$ color is $\frac{k\cdot(m+1)}{n+1}$.
- The expected number of balls before seeing the first color is always: $$\frac{m+1}{\left(\sum^{n}_{i=1} c_i\right) + 1}$$
- The expected number of balls before seeing the last color (i.e., coupon collector) is always (taken from this SE answer): $$\displaystyle\sum\limits_{S\subseteq\{c_1, ..., c_n\}, S \neq \emptyset} (-1)^{|S|-1} \frac{m+1}{\left(\sum_{c_i \in S}c_i\right) + 1}$$
The similarities between the solutions of all of these special cases seems to indicate that there may exist a general solution that generalizes all of these.
There are also multiple related SE posts on the topic, however none of them seem to deal with quite the same problem. This post and this other post are perhaps the most related, as they deal with coupon collectors problem without replacement; However they do not factor in the variable stopping criteria. I suspect that due to the very similar problem, however, their solutions may be extendable to this situation.
1 answer
I must admit to not yet understanding how we get the expected values in your partial answers (1) and (2). But here is a solution to the main question.
Along the lines of your partial answer (4), we can use the Inclusion-Exclusion principle to figure out how many $n_2$-ball draws use exactly $n_1$ different colors (excluding black).
\[\sum_{K \subseteq \{C_1,…,C_n\} ∧ 0 < |K| ≤ n_1} (-1)^{n_1-|K|} \frac{(|\bigcup K|)_{n_2}}{(m)_{n_2}}\]where $C_i$ is the set of balls of color $i$.
This is because, if there are $b$ balls of one type and $m$ balls total, then the chance that, in an $n$-ball draw, all the balls will be of the said type is $\frac{(b)_n}{(m)_n}$, where $(m)_n$ is the falling Pochhammer symbol. The terms being summed then correspond to the probabilities that the first $n_2$ balls drawn will use only the colors within the elements of the set $K \subseteq \{C_1,…C_n\}$.
In accordance with the Inclusion-Exclusion Principle, the terms where $|K| = n_1$ (corresponding to unary intersections) are positive, those where $|K| = n_1-1$ (corresponding to binary intersections) are negative, those where $|K| = n_1-2$ (corresponding to ternary intersections) are positive, etc. The idea is analogous to the formula $\sum_{i=0}^n -1^i {n \choose i}$ for derangements and to the more general formula $\sum_{i=k}^n -1^{i-k} {n \choose i}$ for permutations that keep exactly $k$ elements fixed.

1 comment thread