$\newcommand{\tr}[1]{\text{Tr}\left\{#1\right\}}$ $\newcommand{\ket}[1]{\left|#1\right\rangle}$ $\newcommand{\bra}[1]{\left\langle#1\right|}$ $\newcommand{\braket}[2]{\left\langle#1\right| \left.#2\right\rangle}$ $\newcommand{\sandwich}[3]{\left\langle#1\right|#2 \left|#3\right\rangle}$
Last time I talked about the axioms of probability theory. They can be used to derive all other concepts and theorems of probability theory. Or one can use them to verify whether a given theory is a correct probabilistic theory.
Now, let me show that the probability calculations in QM obey axioms of probability. Again, I'm repeating what is written in chapters 1.5 (Probability Theory), 2.4 (Probability Distributions - Verification of Probability Axioms) and 9.6 (Joint and Conditional Probabilities) of Ballentine's book.
Dynamical variables are represented by hermitian operators. They have real eigenvalues, and that values are possible outcomes of a measurement of that dynamical variable (or the "observable". This fancy word is used to distinguish the classical quantities from its corresponding quantum versions).
Say $\left\{r_n\right\}$ is the set of all eigenvalues (they may be degenerate). $\Delta$ is a range on the eigenvalue space, it is a subset of $\left\{r_n\right\}$. In quantum mechanics we talk about the probabilities of that the outcome of a measurement of an observable will be in the range $\Delta$. Ballentine's notation is this: $p(R \in \Delta | \rho)$. It means, "the probability that the outcome of $R$ measurement is in the range $\Delta$, if the system is in the state $\rho$".
The calculation of this probability is done using projection operators. First one has to find the subspace of all Hilbert space $H$, where all the eigenstates corresponding to the eigenvalues in the range $\Delta$ lives. Ballentine's notation of this projection operator is $M_R(\Delta)$.
For the simplest case, pick a single non-degenerate eigenvalue, $r_n$, as the range, and a pure state as the system's state, $\ket{\psi}$. Corresponding eigenvector is $\ket{r_n}$ and the projection operator onto it is $
\ket{r_n}\bra{r_n}=M_R(r_n)$.
Quantum mechanics' postulates on probability (for pure states) is that, the probability is equal to the absolute square of the component of the state vector that belong to the eigensubspace related to $\Delta$, which is the square of the norm of the projection of the state to the subspace. For the most general case: $$ p(R \in \Delta | \rho) = Tr\left\{ \rho M_R(\Delta) \right\}$$
In our simplest case, which is shown in most introductory texts, $p(R = r_n | \psi)$ $ = |\braket{r_n}{\psi}|^2$ $=
\braket{\psi}{r_n}\braket{r_n}{\psi} $ $=\sandwich{\psi}{M_R(r_n)}{\psi}$ $=\sandwich{\psi}{M_R(r_n)M_R(r_n)}{\psi}$ (from the definition of projection operators $M^2=M$). $=\left(\bra{\psi}M_R(r_n)\right) \left( M_R(r_n)\ket{\psi} \right)$ $=\braket{\psi_{r_n}}{\psi_{r_n}}$ $=\| \ket{\psi_{r_n}} \|^2$. Where $
\ket{\psi_{r_n}}$ is the projection onto the eigensubspace which is $c_n \ket{r_n}$. (Longest way of showing the relation.) Hence the norm is $|c_n|^2$ where $\left\{ c_n \right\}$ are the coefficients when $\ket{\psi}$ is expanded on the eigenstates of $R$.
Axiom 1
$$0 \leq p(A|B) \leq 1$$
Here our $p(A|B)$ is $p(R \in \Delta | \rho)$. A is the event that the outcome is in the range $\Delta$. B is the event that our system is prepared in the state $\rho$.
The operator that represents a quantum state must satisfy these conditions: $$\tr{\rho}=1 \tag{1}$$ and $$\sandwich{u}{\rho}{u} \geq 0 \tag{2}$$
If we expect that the outcome will be anyone of the eigenvalues, then the related projection operator covers the whole Hilbert space, hence it is the identity operator. $M_R(\Delta_\text{all}) = \sum_{r_n \in \left\{r_n\right\}} \ket{r_n}\bra{r_n} = I$. Hence the probability that we will get an eigenvalue of $R$ is $p(R \in \Delta_\text{all} | \rho)$ $=\tr{\rho M_R(\Delta_\text{all})}$ $=\tr{\rho I} = 1$ (according to (1)) which is the maximum probability.
For any other range, the projection does not cover the whole subspace. $M_R(\Delta) = \sum_{r_n \in \Delta}\ket{r_n}\bra{r_n}$ $\neq I$. Therefore $\tr{\rho M_R(\Delta)}$ $=\tr{\rho
\sum_{r_n \in \Delta}\ket{r_n}\bra{r_n}}$ $= \sum_{r_n \in \Delta} \tr{\rho \ket{r_n}\bra{r_n}}$ $= \sum_{r_n \in \Delta} \sandwich{r_n}{\rho}{r_n}$. According to (2) all terms in this summation is non-negative, the sum either increases or remains the same as we add more terms (as we enlarge the subspace). Covering the whole Hilbert space gives $1$, hence covering any subspaces will give a number between $0$ and $1$, hence axiom (1) holds.
Axiom 2
$$p(A|A)=1$$
Here, $A$ is both the state of the system, and the state after the measurement. We are measuring an observable of which eigenvalue $1$ corresponds to the state. In pure state notation, the state is $\ket{\psi}$, and $M_R(\Delta) = \ket{\psi}\bra{\psi}$. And $\sandwich{\psi}{M_R(r_n)}{\psi}$ $=\braket{\psi}{\psi}\braket{\psi}{\psi} = 1$. This is not surprising because, it is a bit tautological. What we are saying is this: "If the system is in the state $\ket{\psi}$ then the probability of finding it in the state $\ket{\psi}$ is $1$.
In mixed state notation, the projection onto the state has this relation: $\rho = M_R(\Delta)\rho M_R(\Delta)$. The probability becomes $\tr{\rho M_R(\Delta)}$ $=\tr{\rho M_R(\Delta) M_R(\Delta)}$ $=\tr{M_R(\Delta) \rho M_R(\Delta)}$ $=\tr{\rho} = 1$. Hence axiom (2) holds.
Axiom 3
For the axiom 3 Ballentine follows a different way. Instead of using our axiom 3, which was: $p(\sim A|B) = 1 - p(A|B)$, he uses an alternative axiom "addition of probabilities for exclusive events" which is the definition of mutually exclusive events: $$p(A \vee B|C) = p(A|C) + p(B|C) \tag{3b}$$ One can either start with (3) and derive (3b), or vice versa. Hence proving (3b) is equivalent to proving (3).
Pick two ranges, $\Delta_1$ and $\Delta_2$ with no intersection. Therefore $R \in \Delta_1$ and $R \in \Delta_2$ are mutually exclusive events, they can't happen at the same time. Because the ranges are disjoint, the vectors belonging to one range are perpendicular to the vectors belonging to the other range. Hence successive application of their projection operators will give zero: $M_R(\Delta_1) M_R(\Delta_2) = 0$. And the projection on the union of the ranges is the sum of the projections: $M_R(\Delta_1 \cup \Delta_2)$ $ = M_R(\Delta_1) + M_R(\Delta_2)$. $$\begin{align}
p(R \in \Delta_1 \vee R \in \Delta_2 | \rho) & =\tr{\rho M_R(\Delta_1 \cup \Delta_2)} \\
&=\tr{\rho M_R(\Delta_1)}+\tr{\rho M_R(\Delta_2)} \\
& =p(R \in \Delta_1 | \rho)+p(R \in \Delta_2 | \rho)
\end{align}
$$ Hence axiom (3) holds.
Axiom 4
After this point, I think, we have no doubt that axiom (4) will also hold for QM. But, anyways, let us do this. $$p(A\&B|C) = p(A|C)p(B|A\&C)$$
Here $C$ is the quantum state, $\rho$ or $\ket{\psi}$. $A$ is the event of getting an eigenvalue in the range $\Delta_a$ when $R$ is measured, and similarly $B = S \in \Delta_b$ of the $S$ measurement. $A\&B$ means that simultaneously $R$ has a value in $\Delta_a$ and $S$ has a value in $\Delta_b$.
First let me show this for the pure state case with the Kolmogorov's version of the axiom, which was: $p(A\&B)=p(B|A)p(A)$.
$p(A) = p(R=r) = \sandwich{\psi}{M_R(r)}{\psi}$ $=\bra{\psi}\left( \ket{r}\bra{r} \right) \ket{\psi}$ $=|\braket{r}{\psi}|^2$.
$p(B|A)$ can be thought of the result of two successive measurements. First the event $A$ is happened. Then what is the probability of $B$ to happen. QM tells us that the if the eigenvalue $r$ is measured, than the state becomes (the wavefunction collapses) to $\ket{\psi} \rightarrow \ket{r}$. Or a more general expression is $$\ket{\psi} \rightarrow \frac{M_R(r) \ket{\psi}}{\sqrt{\sandwich{\psi}{M_R(r)}{\psi}}} = \frac{M_R(r) \ket{\psi}}{\sqrt{p(R=r)}} \equiv \ket{\psi^\prime}$$ Hence $p(B|A)$ now A is the new state $\ket{\psi^\prime}$ and to do the probability calculation one has to sandwich $M_S$ with this new state. $$\begin{align}
p(B|A) &= p(S=s|\ket{\psi^\prime}) \\
& = \sandwich{\psi^\prime}{M_S}{\psi^\prime} \\
& = \frac{\bra{\psi}M_R(r)}{\sqrt{p(R=r)}} M_S(s) \frac{M_R(r)\ket{\psi}}{\sqrt{p(R=r)}} \\
& = \frac{\sandwich{\psi}{M_R(r)M_S(s)M_R(r)}{\psi}}{p(R=r)} \\
& = \frac{\sandwich{\psi}{\left(M_S M_R + \left[M_R, M_S\right] \right)M_R}{\psi}}{p(R=r)} \\
& = \frac{\sandwich{\psi}{M_S M_R}{\psi} + \sandwich{\psi}{[M_R, M_S]}{\psi}}{p(R=r)} \\
& = \frac{\sandwich{\psi}{M_S M_R}{\psi}}{p(R=r)}, \quad \text{if} [M_R, M_S]=0
\end{align}$$
When the operators $R$ and $S$ commute, they have simultaneous eigenvectors, hence $M_R$ and $M_S$ also commute. $p(R=r)$ was $p(A)$. Hence $p(B|A)p(A)$ $= \sandwich{\psi}{M_S M_R}{\psi}$ which has to be $p(A\&B)$, and fortunately it is.
To find $p(A\&B) = p(R=r \& S=s)$ one need the projection operator on the eigensubspace, $\epsilon_{rs}$ where $R$ has the eigenvalue $r$, and $S$ has the eigenvalue $s$. $\epsilon_{rs}$ is the intersection of $\epsilon_{r}$ and $\epsilon_{s}$. One can project a vector on $\epsilon_{rs}$ by successively applying the $M_R$ and $M_S$ projection operators. The order is not important when $[R,S]=0$. Because then, they have simultaneous eigenvectors. That eigenvectors can be used to create an orthonormal basis. Projection operators constructed from that basis will commute too. And joint probability is defined only the operators commute.
Therefore, the component of $\ket{\psi}$ on the $\epsilon_{rs}$ is $\ket{\psi_{rs}} = M_R M_S \ket{\psi}$. Its norm square is the probability. $\left\| \ket{\psi_{rs}} \right\|^2$ $= \braket{\psi_{rs}}{\psi_{rs}}$ $= \bra{\psi} M_S M_R M_R M_S \ket{\psi}$. Using their commutation it becomes $p(R=r \& S=s)$ $= \bra{\psi} M_S M_R \ket{\psi}$. And this quantitiy is the numerator of the conditional probability result. Hence axiom (4) is also satisfied by quantum mechanics' postulates of probability calculations.
Using a mixture state and Cox' version, the calculations becomes:
$p(A|C) = p(R \in \Delta_a | \rho)$ $= \tr{\rho M_R(\Delta_a)}$.
$p(A\&B|C)$ $= p(R \in \Delta_a \& S \in \Delta_b | \rho)$ $= \tr{\rho M_R(\Delta_a) M_S(\Delta_b)}$.
$p(B|A\&C)$ $=p(S \in \Delta_b | R \in \Delta_a \& \rho)$. Here, $R \in \Delta_a \& \rho$ can be thought as a new state $\rho \rightarrow \rho^\prime$ which we got after the $R$ measurement. $p=p(S \in \Delta_b | \rho^\prime)$. QM postulate that $\rho^\prime$ $= \frac{M_R(\Delta_a) \rho M_R(\Delta_a)}{\tr{M_R(\Delta_a) \rho M_R(\Delta_a)}}$. Therefore, $p=\tr{\rho^\prime M_s(\Delta_b)}$. $$\begin{align}
p(A|C)p(B|A\&C) & = p(R \in \Delta_a | \rho) p(S \in \Delta_b | R \in \Delta_a \& \rho) \\
& = \tr{\rho M_R(\Delta_a)} \tr{\rho^\prime M_s(\Delta_b)} \\
& = \tr{\rho M_R(\Delta_a)} \tr{
\frac{M_R(\Delta_a) \rho M_R(\Delta_a)}{\tr{M_R(\Delta_a) \rho M_R(\Delta_a)}} M_s(\Delta_b)} \\
& = \tr{\rho M_R(\Delta_a)} \frac{\tr{M_R(\Delta_a) \rho M_R(\Delta_a)M_s(\Delta_b)}}{\tr{M_R(\Delta_a) \rho M_R(\Delta_a)}} \\
& = \tr{M_R(\Delta_a) \rho M_R(\Delta_a)M_s(\Delta_b)} \\
& = \tr{\rho M_R(\Delta_a)M_s(\Delta_b)} \\
& = p(R \in \Delta_a \& S \in \Delta_b | \rho) \\
& = p(A\&B|C)
\end{align}$$
for which axiom (4) holds too. So, as long as we are dealing with joint probability distributions of commuting observables, QM is a correct probability theory. Just in case you may had any doubts about it...