Tuesday, February 21, 2012

An Example of that the Projection Operator is Independent of the Chosen Basis of the Subspace

$\newcommand{\tr}[1]{\text{Tr}\left\{#1\right\}}$ $\newcommand{\ket}[1]{\left|#1\right\rangle}$ $\newcommand{\bra}[1]{\left\langle#1\right|}$ $\newcommand{\braket}[2]{\left\langle#1\right| \left.#2\right\rangle}$ $\newcommand{\sandwich}[3]{\left\langle#1\right|#2 \left|#3\right\rangle}$ $\newcommand{\span}[1]{\text{Span}\left\{#1\right\}}$ $\newcommand{\proj}[2]{\text{Proj}_{#1}\left(#2\right )}$
Say we want to find the projection operator onto the $\mathcal{V}$ $= xy$-plane. If one uses the simplest orthonormal basis as $B = \left\{ \vec{x}, \vec{y} \right\}$. $\vec{x} = \begin{pmatrix}1\\0\\0\end{pmatrix}$, $\vec{y} = \begin{pmatrix}0\\1\\0\end{pmatrix}$, the projection operator becomes $\proj{\mathcal{V}}{\vec{v}}$ $\equiv M_\mathcal{V}\vec{v}$.  $$\begin{align}  M_\mathcal{V} &= \ket{x}\bra{x} + \ket{y}\bra{y} \\
&= \vec{x}\vec{x}+\vec{y}\vec{y} \\
&= \begin{pmatrix}1\\0\\0\end{pmatrix}\begin{pmatrix}1&0&0\end{pmatrix}+
\begin{pmatrix}0\\1\\0\end{pmatrix}\begin{pmatrix}0&1&0\end{pmatrix}\\
&= \begin{pmatrix}1&0&0\\0&0&0\\0&0&0\end{pmatrix}
+\begin{pmatrix}0&0&0\\0&1&0\\0&0&0\end{pmatrix} \\
&= \begin{pmatrix}1&0&0\\0&1&0\\0&0&0\end{pmatrix}
\end{align}$$
Lets pick another basis $B^\prime = \left\{ \vec{u}_1, \vec{u}_2 \right\}$. $\vec{u}_1 = \begin{pmatrix}\cos{\theta}\\ \sin{\theta}\\0\end{pmatrix}$, $\vec{u}_2 = \begin{pmatrix}-\sin{\theta}\\ \cos{\theta}\\0\end{pmatrix}$. $$\begin{align}  M_\mathcal{V} &= \ket{u_1}\bra{ u_1 } + \ket{ u_2}\bra{ u_2} \\
&= \vec{u}_1\vec{u}_1+\vec{u}_2\vec{u}_2 \\

&= \begin{pmatrix}\cos{\theta}\\ \sin{\theta}\\0\end{pmatrix} \begin{pmatrix}\cos{\theta}& \sin{\theta}&0\end{pmatrix} +
\begin{pmatrix}-\sin{\theta}\\ \cos{\theta}\\0\end{pmatrix} \begin{pmatrix}-\sin{\theta}& \cos{\theta}&0\end{pmatrix}\\
&= \begin{pmatrix} \cos{\theta}^2 & \cos{\theta}\sin{\theta} &0\\ \cos{\theta}\sin{\theta} & \cos{\theta}^2 &0\\0&0&0\end{pmatrix}
+\begin{pmatrix} \sin{\theta}^2 & -\cos{\theta}\sin{\theta} &0\\ -\cos{\theta}\sin{\theta} & \sin{\theta}^2 &0\\0&0&0\end{pmatrix} \\
&= \begin{pmatrix}1&0&0\\0&1&0\\0&0&0\end{pmatrix}
\end{align}$$
As we see, the answer is independent of $\theta$ hence we get the same projection operator for all possible orthonormal basis that span the $xy$-plane.

In QM, to get the probabilities we calculate the square of the norm of a vector, $\| \vec{v}_\mathcal{V} \|^2$. It can either be found by first finding the projection operator, then getting the projected component and then getting its norm etc. The end result is sandwiching the projection operator with the state (expectation value of the projection) $\sandwich{\psi}{M_\mathcal{V}}{\psi}$. Or by finding the projected vector's components on the orthonormal basis first by taking the inner product of the vector with each element of the basis.
Say $\vec{r}_\mathcal{V}$ is the projection of a vector $\vec{r}$ onto $xy$-plane. Let us express it in two different basis (corresponding to two different $\theta$ values) that span onto the $xy$-plane. $\vec{r}_{\mathcal{V},u_1}$ $= \braket{u_1}{r}$ $=\vec{u}_1\cdot \vec{r}$. $\vec{r}_{\mathcal{V},u_2}$ $= \braket{u_2}{r}$ $=\vec{u}_2\cdot \vec{r}$. And same relations for $\vec{w}_i$s. The norm square is $$\begin{align} \| \vec{r}_\mathcal{V} \|^2 & = \braket{r_\mathcal{V}}{r_\mathcal{V}} \\
&= \| \vec{r}_{\mathcal{V},u_1} \|^2 + \| \vec{r}_{\mathcal{V},u_2} \|^2 \\
&= \left| \braket{u_1}{r} \right|^2 + \left| \braket{u_2}{r} \right|^2 \\
&= \braket{r}{u_1}\braket{u_1}{r} + \braket{r}{u_2}\braket{u_2}{r} \\
&= \bra{r} \left( \ket{u_1}\bra{u_1} + \ket{u_2}\bra{u_2} \right) \ket{r} \\
&\equiv \sandwich{r}{M_\mathcal{V}}{r}
\end{align}$$
Pythagorean law applies here. For any basis the square of the sides will be equal to the square of the hypotenuse, which is the square norm that we are looking. Therefore these two methods of probability calculation are equivalent. $\left| \braket{u_1}{\psi} \right|^2 + \left| \braket{u_2}{\psi} \right|^2$ $\equiv \sandwich{\psi}{M_\mathcal{V}}{\psi}$. According to Euclidean geometry this is correct for any dimensional subspaces. $\sum_i^K \left| \braket{u_i}{\psi} \right|^2 = \sandwich{\psi}{M_\mathcal{V}}{\psi}$.

Essential Linear Algebra for Projection Calculations

$\newcommand{\tr}[1]{\text{Tr}\left\{#1\right\}}$ $\newcommand{\ket}[1]{\left|#1\right\rangle}$ $\newcommand{\bra}[1]{\left\langle#1\right|}$ $\newcommand{\braket}[2]{\left\langle#1\right| \left.#2\right\rangle}$ $\newcommand{\sandwich}[3]{\left\langle#1\right|#2 \left|#3\right\rangle}$ $\newcommand{\span}[1]{\text{Span}\left\{#1\right\}}$ $\newcommand{\proj}[2]{\text{Proj}_{#1}\left(#2\right )}$
In QM, the probability of a measurement outcome, $p(R=\Delta)$, is the square of the norm of the component of the state vector on the related subspace, $\| \ket{\psi_\Delta} \|^2$. The components of vectors on some subspaces are found by applying a projection operator to the vector. Therefore an elementary knowledge about how these projection operators are constructed is essential. :-) It is "remember your linear algebra!" time. (In this entry I'll assume that the operator has discrete spectra, and it is finite dimensional).
The set on the left consists of possible outcomes of the measurement, hence the eigenvalues of the observable. Right set depicts the related vector spaces corresponding to eigenvalues.
The set on the left, $\left\{r_n\right\}$, is the set of all $N$ eigenvalues of the hermitian operator $R$, related to the observable/dynamic variable $R$. Each dots represent an eigenvalue. (There may be more than one eigenvalue with the same value, which results in degeneracy and they are called degenerate eigenvalues.) $\Delta$ is a subset of whole eigenvalues. A measurement of $R$ will give only one of the elements of $\left\{r_n\right\}$. We are looking for the probability that the outcome will be in the set $\Delta$. The probability distribution $p(R=r)$ (the probability that the outcome will be a certain $r$) depends both on the observable and on the quantum state.

The set on the right, $\mathcal{H}$, is the $N$ dimensional Hilbert space. (A Hilbert space is an abstract vector space on which an inner product is defined to measure the lengths of vectors.) The state of a quantum system is described by a vector $\ket{\psi}$ that lives in $\mathcal{H}$. For each eigenvalue $r_n$ in the set $\left\{r_n\right\}$ there is corresponding vector $\ket{r_n}$ in $\mathcal{H}$ which satisfies the relation $R\ket{r_n} = r_n \ket{r_n}$. That vector is called the eigenvector of R (corresponding to the eigenvalue $r_n$).

Linear algebra tells us that the eigenvectors of a $N\times N$ hermitian operator $R$, $\left\{ \ket{r_n} \right\}$, is a complete set that spans all of the $N$-dim Hilbert space. $\span{ \left\{ \ket{r_n} \right\}} = \mathcal{H}$. As we talked previously, if $R$ is non-degenerate (all eigenvalues are ), then $\left\{ \ket{r_n} \right\}$ is not only a complete set but a complete orthonormal set (CONS) of which elements satisfy the relation $\braket{r_m}{r_n}=\delta_{mn}$. If $R$ is degenerate, then eigenvectors in $\left\{ \ket{r_n} \right\}$ are only linearly independent (LI) (assuming $R$ is not a pathological case) but one can always construct a CONS from a LI set (remember Grams-Schmidt process). Any vector $\ket{\psi}$ in $\mathcal{H}$ can be written as a linear combination of elements of $\left\{ \ket{r_N} \right\}$ or its orthonormalized version $\left\{ \ket{u_N} \right\}$.

The set of eigenvectors $\ket{ r_\Delta }$ related to a subset of $\left\{ \ket{r_N} \right\}$ that we will call $\Delta$ can not span the whole $\mathcal{H}$ but a subset of it, that we will call $\mathcal{V}$. $\span{\ket{ r_\Delta }} = \mathcal{V}$. If $\Delta$ has $M$ elements then $\ket{ r_\Delta }$ has too. And $\mathcal{V}$ is an $M$ dimensional subspace of $\mathcal{H}$. We will call the rest of the Hilbert space $\mathcal{V}^\perp$ which is the complementary set of $\mathcal{V}$. $\mathcal{H} = \mathcal{V}\oplus\mathcal{V}^\perp$ and $\mathcal{V}\cap\mathcal{V}^\perp = 0$.

Any vector $\ket{\psi}$ can be written as a sum of its two components, one belonging to $\mathcal{V}$ and other to $\mathcal{V}^\perp$. $\ket{\psi} = \ket{\psi_\mathcal{V}} + \ket{\psi_{\mathcal{V}^\perp}}$. $\ket{\psi_\mathcal{V}} \in \mathcal{V}$ and $\ket{\psi_{\mathcal{V}^\perp}} \in \mathcal{V}^\perp$. (Or we could call them $\ket{\psi_\mathcal{V}}$ and $\ket{\psi_{\mathcal{\Delta}^\perp}}$ )

Our aim is to construct the projection operator $M_R(\Delta)$ which will give $\ket{\psi_\mathcal{V}}$ when applied to any $\ket{\psi}$ for a chosen operator $R$ and range $\Delta$

Constructing the Projection Operator onto a subspace using a LI set of vectors that span that subspace

Simple Case: Projection onto a line on $\mathbb{R}^2$


First let me demonstrate this by working in $\mathbb{R}^2$ and projection onto a line $L$. Say, our subspace is the $1D$ line $L$, which is spanned by a vector $\vec{r}$. $L=$ $\left\{c\vec{r} | c \in \mathbb{R} \right\}$ $=\span{\vec{r}}$. $\proj{L}{\vec{v}}\equiv \vec{v}_L$ and $\vec{v}$ $- \proj{L}{\vec{r}}$ $=\vec{v}_{L^\perp}$. 

We can express the projected line mathematically using this relation: $\vec{v} - \proj{L}{\vec{v}}$ $\perp L$. Which means that, the projection of $\vec{v}$ onto $L$ is a vector $\proj{L}{\vec{v}}$ on $L$ of which difference from $\vec{v}$ is perpendicular to all vectors on $L$. Any vector on $L$ can be described by a real multiple of $\vec{r}$. Hence $\vec{v}_L = c\vec{r}$. To find the projection, we have to find this $c$.

The inner product of two perpendicular vectors is $0$. $\vec{a} \cdot \vec{b} = 0$ if $\vec{a} \perp \vec{b}$. Using perpendicularity between $\vec{v} - \vec{v}_L = \vec{v}_{L^\perp}$ and $L$ we can write $$\begin{align}
\left( \vec{v} - c\vec{r} \right) \cdot \vec{v} & = 0 \\
\vec{v}\cdot \vec{r}- c\vec{r}\cdot \vec{r} & = 0 \\
\Rightarrow c & = \frac{\vec{v}\cdot \vec{r}}{\vec{r}\cdot \vec{r}} \\
\Rightarrow \proj{L}{\vec{v}} & = \frac{\vec{v}\cdot \vec{r}}{\vec{r}\cdot \vec{r}} \vec{r} \\
\proj{\span{\vec{r}}}{\vec{v}} & = \vec{r}\frac{\vec{r} \cdot \vec{v}}{\vec{r}\cdot \vec{r}}
\end{align}$$ $\vec{r}$ is not unique, it can be any vector on $L$. Picking a unit vector will simplify the calculations. Let $\vec{u} = \frac{\vec{r}}{\| \vec{r} \|}$, projection becomes $\proj{\span{\vec{u}}}{\vec{v}} = \vec{u}\left(\vec{u}\cdot\vec{v}\right)$

(Projection is a linear operation)
Let me show that this operation is linear.

  1. $\proj{L}{\vec{a}+\vec{b}}$ = $\vec{u}\left(\vec{u}\cdot\left(\vec{a}+\vec{b}\right)\right)$ $=\vec{u}\left(\vec{u}\cdot \vec{a}+\vec{u}\cdot \vec{b}\right)$ $=\vec{u}\left(\vec{u}\cdot \vec{a}\right)+\vec{u}\left(\vec{u}\cdot \vec{b}\right)$ $=\proj{L}{\vec{a}}+\proj{L}{\vec{b}}$. 
  2. $\proj{L}{c\vec{a}}$ $= \vec{u}\left(\vec{u}\cdot c \vec{a}\right)$ $=c \vec{u}\left(\vec{u}\cdot \vec{a}\right)$ $=c \proj{L}{c\vec{a}}$.
Hence $\proj{L}{\vec{a}}$ is a linear operation and therefore it can be represented by a matrix. $\proj{L}{\vec{a}}$ $\equiv M\vec{a}$. Let's find the matrix elements of $M$ by sandwiching it between unit basis vectors. $M_mn = \sandwich{m}{M}{n}$, where $\ket{1} = \begin{pmatrix}1\\0\end{pmatrix}$ and $\ket{2} = \begin{pmatrix}0\\1\end{pmatrix}$.
$$\begin{align}M_{mn} &= \sandwich{m}{M}{n} \\
&= \bra{m} \proj{L}{\ket{n}} \\
&= \vec{m} \cdot \left( \vec{u}\left( \vec{u}\cdot \vec{n} \right) \right) \\
&= \vec{m}\cdot \left( \vec{u}u_n \right) = u_n \vec{m}\cdot \vec{u}\\
&= u_m u_n,\quad \text{OR} \\
&= \bra{m} \left(\ket{u}\braket{u}{n}\right) \\
&= \braket{m}{u}u_n \\
&= u_m u_n\end{align}$$
Hence $M_L = \begin{pmatrix} u_x^2 & u_x u_y \\ u_x u_y & u_y^2\end{pmatrix}$ $=\ket{u}\bra{u}$.

General Case: Projection onto a $K$-dimensional subspace on $\mathbb{R}^N$

In the general case $\mathcal{H}$ is $N$-dimensional, and the subspace $\mathcal{V}$ is $K$-dimensional. Say, $\mathcal{V}$ is spanned by a set of $K$ LI vectors $\left\{ \vec{r}_K \right\}$. Then any vector on $\mathcal{V}$ can be expressed as a liner combination of $\vec{r}_i$'s. $\vec{x} \in \mathcal{V}$ $\Rightarrow \vec{x} = \sum_i^K c_i \vec{r_i}$.

If we think of these $c_i$'s as components of a vector $\vec{c}$, then this relation between $\vec{v}$, $c_i$'s and $\left\{ \vec{r}_K \right\}$ can be shown with a matrix multiplication. Define an $K\times N$ dimensional matrix $A$ $$A=
\begin{pmatrix}
\uparrow & \uparrow & & \uparrow\\
\vec{r}_1 & \vec{r}_2 & \ldots & \vec{r}_K \\
\downarrow & \downarrow & & \downarrow
\end{pmatrix}$$
Then, for each $\vec{x} \in \mathcal{V}$, $\vec{x} = A\vec{c}$ where $\vec{c}$ is unique to chosen $\vec{x}$. The projection of a vector $\vec{v} \in \mathcal{H}$ onto $\mathcal{V}$, called $\vec{v}_\mathcal{V}$, is a vector in $\mathcal{V}$. Hence it can be expressed by the same matrix multiplication form. $\vec{v} =$ $\vec{v}_\mathcal{V}$ $+ \vec{v}_\mathcal{V^\perp}$. Or $\vec{v} =$ $\proj{\mathcal{V}}{\vec{v}}$ $+\proj{\mathcal{V^\perp}}{\vec{v}}$ and $\proj{\mathcal{V}}{\vec{v}}$ $=A \vec{c}$.

According to this definition of $A$, $\mathcal{V}$ is the "column-space" of $A$. $\mathcal{V}$ $=\text{C}(A)$. A columnspace of a matrix is the subspace spanned by its column vectors. Some deep and mysterious relations of linear algebra tells us that, the complementary subspace of $\mathcal{V}$, which is $\mathcal{V}^\perp$, is the left-null-space of $A$, or the null-space of $A$ transpose, $A^\top$. $\mathcal{V}^\perp$ $=\text{C}(A)^\perp$, $=\text{N}(A^\top)$. Therefore, $\proj{\mathcal{V^\perp}}{\vec{v}}$ $\in \text{N}(A^\top)$.

If a vector belongs to the null-space of a matrix, it means that, when the matrix is applied to that vector the result is the null-vector. $$\begin{align}
A^\top \vec{v}_{\mathcal{V}^\perp} &= \vec{0} \\
A^\top \left( \vec{v} - \proj{\mathcal{V}}{\vec{v}} \right) &= \\
A^\top \left( \vec{v} - A\vec{c} \right) &= \\
A^\top \vec{v} - A^\top A\vec{c} &= \vec{0} \\
\Rightarrow A^\top \vec{v} &= A^\top A\vec{c}
\end{align}$$
Another linear algebra proverb says that, if the columns of a matrix $A$ are linearly independent then $A^\top A$ (which is an $(N\times{}K)(K\times{}N)=N\times{}N$ dimensional square matrix) is invertable. Therefore $\vec{c}=\left(A^\top A\right)^{-1}A^\top \vec{v}$. This is the way of calculating the $\left\{ c_i \right\}$ coefficients of the linear combination (in the form of $\vec{c}$) to expand the projection vector $\vec{v}_\mathcal{V}$ in this LI vectors basis, $\text{C}(A)$. Remember $\vec{v}_\mathcal{V}$ $ = A\vec{c}$ Hence: $$\boxed{\proj{\mathcal{V}}{\vec{v}} = A\left(A^\top A\right)^{-1}A^\top \vec{v}}$$
$A\left(A^\top A\right)^{-1}A^\top$ is the projection matrix $M_\mathcal{V}\equiv M_R(\Delta)$ that we are looking for. It only depends on the subspace, not the basis we chose to span that subspace. Different bases will give different $A$ matrices but $M_\mathcal{V}$ will be the same for all bases. In QM, the eigenvectors of the observables, the hermitian matrices, are orthonormal. (or degenerate eigenvectors can be orthogonalized). It may be good to look at the orthonormal basis case. And in general, from any linear independent set of vectors, one can create an orthonormal basis with Gramm-Schmidt process.

Assume the orthonormal basis $\left\{\vec{u}_K\right\}$ spans the $K$-dimensional subspace $\mathcal{V}$. This time the $A^\top A$ will be identity $I$. $$\begin{align}A^\top{}A &=
\begin{pmatrix}
\leftarrow & \vec{u}_1 & \rightarrow \\
\leftarrow & \vec{u}_2 & \rightarrow \\
\leftarrow & \vdots & \rightarrow \\
\leftarrow & \vec{u}_K & \rightarrow \\
\end{pmatrix}
\begin{pmatrix}
\uparrow & \uparrow & & \uparrow\\
\vec{u}_1 & \vec{u}_2 & \ldots & \vec{u}_K \\
\downarrow & \downarrow & & \downarrow
\end{pmatrix} \\
&=
\begin{pmatrix}
\vec{u}_1 \cdot{} \vec{u}_1 & \vec{u}_1 \cdot{} \vec{u}_2 & \cdots & \vec{u}_1 \cdot{} \vec{u}_K \\
\vec{u}_2 \cdot{} \vec{u}_1 & \vec{u}_2 \cdot{} \vec{u}_2 & \cdots & \vec{u}_2 \cdot{} \vec{u}_K \\
\vdots & \vdots & & \vdots \\
\vec{u}_K \cdot{} \vec{u}_1 & \vec{u}_K \cdot{} \vec{u}_2 & \cdots & \vec{u}_K \cdot{} \vec{u}_K \\
\end{pmatrix}
=
\begin{pmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots & \vdots & & \vdots \\
0 & 0 & \cdots & 1 \\
\end{pmatrix}
\end{align}
$$

Hence the expression for the projection operator reduces to $$\proj{\mathcal{V}}{\vec{v}} = A A^\top \vec{v}$$ Let us explicitly calculate its matrix elements. Let $u_{m,n}$ be the $n$th component of the $\vec{u}_m$. $$\begin{align}A A^\top{} &=
\begin{pmatrix}
\uparrow & \uparrow & & \uparrow\\
\vec{u}_1 & \vec{u}_2 & \ldots & \vec{u}_K \\
\downarrow & \downarrow & & \downarrow
\end{pmatrix}\begin{pmatrix}
\leftarrow & \vec{u}_1 & \rightarrow \\
\leftarrow & \vec{u}_2 & \rightarrow \\
\leftarrow & \vdots & \rightarrow \\
\leftarrow & \vec{u}_K & \rightarrow \\
\end{pmatrix} \\
&=

\begin{pmatrix}
u_{1,1} & u_{2,1} & \cdots & u_{K,1} \\
u_{1,2} & u_{2,2} & \cdots & u_{K,2} \\
\vdots & \vdots & & \vdots \\
u_{1,N} & u_{2,N} & \cdots & u_{K,N} \\
\end{pmatrix}
\begin{pmatrix}
u_{1,1} & u_{1,2} & \cdots & u_{1,N} \\
u_{2,1} & u_{2,2} & \cdots & u_{2,N} \\
\vdots & \vdots & & \vdots \\
u_{K,1} & u_{K,2} & \cdots & u_{K,N} \\
\end{pmatrix} \\
&=
\begin{pmatrix}
\sum_i^K u_{i,1} u_{i,1} & \sum_i^K u_{i,1} u_{i,2} & \cdots & \sum_i^K u_{i,1} u_{i,N} \\
\sum_i^K u_{i,2} u_{i,1} & \sum_i^K u_{i,2} u_{i,2} & \cdots & \sum_i^K u_{i,2} u_{i,N} \\
\vdots & \vdots & & \vdots \\
\sum_i^K u_{i,N} u_{i,1} & \sum_i^K u_{i,N} u_{i,2} & \cdots & \sum_i^K u_{i,N} u_{i,N} \\
\end{pmatrix} \\
&=
\sum_i^K
\begin{pmatrix}
u_{i,1} u_{i,1} & u_{i,1} u_{i,2} & \cdots & u_{i,1} u_{i,N} \\
u_{i,2} u_{i,1} & u_{i,2} u_{i,2} & \cdots & u_{i,2} u_{i,N} \\
\vdots & \vdots & & \vdots \\
u_{i,N} u_{i,1} & u_{i,N} u_{i,2} & \cdots & u_{i,N} u_{i,N} \\
\end{pmatrix} \\
&= \sum_i^K \ket{u_i}\bra{u_i}\end{align} $$
If the individual terms of the sum are thought as the projections onto the lines spanned by $\vec{u_i}$ then the projection onto the subspace is the sum of the projections onto orthogonal lines. This seems plausible because each term gives one component of the projection vector in $\left\{ \vec{u_K} \right\}$ basis. Hence using an orthonormal basis to span $\mathcal{V}$ we get $$M_\mathcal{V} = A A^\top = \sum_i^K \ket{u_i}\bra{u_i} = \sum_i^K M_{L_i}$$

Note that $M_\mathcal{H} = \sum_i^K \ket{u_i}\bra{u_i} = I$.

[1] Whole idea of the $A$ matrix and calculations of projection operators are stolen from Khan Academy's linear algebra online lectures.

Thursday, February 16, 2012

Proof that Probability Calculations in Quantum Mechanics Obey the Axioms of Probability Theory

$\newcommand{\tr}[1]{\text{Tr}\left\{#1\right\}}$ $\newcommand{\ket}[1]{\left|#1\right\rangle}$ $\newcommand{\bra}[1]{\left\langle#1\right|}$ $\newcommand{\braket}[2]{\left\langle#1\right| \left.#2\right\rangle}$ $\newcommand{\sandwich}[3]{\left\langle#1\right|#2 \left|#3\right\rangle}$
Last time I talked about the axioms of probability theory. They can be used to derive all other concepts and theorems of probability theory. Or one can use them to verify whether a given theory is a correct probabilistic theory.

Now, let me show that the probability calculations in QM obey axioms of probability. Again, I'm repeating what is written in chapters 1.5 (Probability Theory), 2.4 (Probability Distributions - Verification of Probability Axioms) and 9.6 (Joint and Conditional Probabilities) of Ballentine's book.

Dynamical variables are represented by hermitian operators. They have real eigenvalues, and that values are possible outcomes of a measurement of that dynamical variable (or the "observable". This fancy word is used to distinguish the classical quantities from its corresponding quantum versions).

Say $\left\{r_n\right\}$ is the set of all eigenvalues (they may be degenerate). $\Delta$ is a range on the eigenvalue space, it is a subset of $\left\{r_n\right\}$. In quantum mechanics we talk about the probabilities of that the outcome of a measurement of an observable will be in the range $\Delta$. Ballentine's notation is this: $p(R \in \Delta | \rho)$. It means, "the probability that the outcome of $R$ measurement is in the range $\Delta$, if the system is in the state $\rho$".

The calculation of this probability is done using projection operators. First one has to find the subspace of all Hilbert space $H$, where all the eigenstates corresponding to the eigenvalues in the range $\Delta$ lives. Ballentine's notation of this projection operator is $M_R(\Delta)$.

For the simplest case, pick a single non-degenerate eigenvalue, $r_n$, as the range, and a pure state as the system's state, $\ket{\psi}$. Corresponding eigenvector is $\ket{r_n}$ and the projection operator onto it is $ \ket{r_n}\bra{r_n}=M_R(r_n)$.

Quantum mechanics' postulates on probability (for pure states) is that, the probability is equal to the absolute square of the component of the state vector that belong to the eigensubspace related to $\Delta$, which is the square of the norm of the projection of the state to the subspace. For the most general case: $$ p(R \in \Delta | \rho) = Tr\left\{ \rho M_R(\Delta) \right\}$$
In our simplest case, which is shown in most introductory texts, $p(R = r_n | \psi)$ $ = |\braket{r_n}{\psi}|^2$ $= \braket{\psi}{r_n}\braket{r_n}{\psi} $ $=\sandwich{\psi}{M_R(r_n)}{\psi}$ $=\sandwich{\psi}{M_R(r_n)M_R(r_n)}{\psi}$ (from the definition of projection operators $M^2=M$). $=\left(\bra{\psi}M_R(r_n)\right) \left( M_R(r_n)\ket{\psi} \right)$ $=\braket{\psi_{r_n}}{\psi_{r_n}}$ $=\| \ket{\psi_{r_n}} \|^2$. Where $ \ket{\psi_{r_n}}$ is the projection onto the eigensubspace which is $c_n \ket{r_n}$. (Longest way of showing the relation.) Hence the norm is $|c_n|^2$ where $\left\{ c_n \right\}$ are the coefficients when $\ket{\psi}$ is expanded on the eigenstates of $R$.

Axiom 1
$$0 \leq p(A|B) \leq 1$$
Here our $p(A|B)$ is $p(R \in \Delta | \rho)$. A is the event that the outcome is in the range $\Delta$. B is the event that our system is prepared in the state $\rho$.

The operator that represents a quantum state must satisfy these conditions: $$\tr{\rho}=1 \tag{1}$$ and $$\sandwich{u}{\rho}{u} \geq 0 \tag{2}$$
If we expect that the outcome will be anyone of the eigenvalues, then the related projection operator covers the whole Hilbert space, hence it is the identity operator. $M_R(\Delta_\text{all}) = \sum_{r_n \in \left\{r_n\right\}} \ket{r_n}\bra{r_n} = I$. Hence the probability that we will get an eigenvalue of $R$ is $p(R \in \Delta_\text{all} | \rho)$ $=\tr{\rho M_R(\Delta_\text{all})}$ $=\tr{\rho I} = 1$ (according to (1)) which is the maximum probability.

For any other range, the projection does not cover the whole subspace. $M_R(\Delta) = \sum_{r_n \in \Delta}\ket{r_n}\bra{r_n}$ $\neq I$. Therefore $\tr{\rho M_R(\Delta)}$ $=\tr{\rho   \sum_{r_n \in \Delta}\ket{r_n}\bra{r_n}}$ $= \sum_{r_n \in \Delta} \tr{\rho \ket{r_n}\bra{r_n}}$ $= \sum_{r_n \in \Delta} \sandwich{r_n}{\rho}{r_n}$. According to (2) all terms in this summation is non-negative, the sum either increases or remains the same as we add more terms (as we enlarge the subspace). Covering the whole Hilbert space gives $1$, hence covering any subspaces will give a number between $0$ and $1$, hence axiom (1) holds.

Axiom 2
$$p(A|A)=1$$
Here, $A$ is both the state of the system, and the state after the measurement. We are measuring an observable of which eigenvalue $1$ corresponds to the state. In pure state notation, the state is $\ket{\psi}$, and $M_R(\Delta) = \ket{\psi}\bra{\psi}$. And $\sandwich{\psi}{M_R(r_n)}{\psi}$ $=\braket{\psi}{\psi}\braket{\psi}{\psi} = 1$. This is not surprising because, it is a bit tautological. What we are saying is this: "If the system is in the state $\ket{\psi}$ then the probability of finding it in the state $\ket{\psi}$ is $1$.

In mixed state notation, the projection onto the state has this relation: $\rho = M_R(\Delta)\rho M_R(\Delta)$. The probability becomes $\tr{\rho M_R(\Delta)}$ $=\tr{\rho M_R(\Delta) M_R(\Delta)}$ $=\tr{M_R(\Delta) \rho M_R(\Delta)}$ $=\tr{\rho} = 1$. Hence axiom (2) holds.

Axiom 3
For the axiom 3 Ballentine follows a different way. Instead of using our axiom 3, which was: $p(\sim A|B) = 1 - p(A|B)$, he uses an alternative axiom "addition of probabilities for exclusive events" which is the definition of mutually exclusive events: $$p(A \vee B|C) = p(A|C) + p(B|C) \tag{3b}$$ One can either start with (3) and derive (3b), or vice versa. Hence proving (3b) is equivalent to proving (3).

Pick two ranges, $\Delta_1$ and $\Delta_2$ with no intersection. Therefore $R \in \Delta_1$ and $R \in \Delta_2$ are mutually exclusive events, they can't happen at the same time. Because the ranges are disjoint, the vectors belonging to one range are perpendicular to the vectors belonging to the other range. Hence successive application of their projection operators will give zero: $M_R(\Delta_1) M_R(\Delta_2) = 0$. And the projection on the union of the ranges is the sum of the projections: $M_R(\Delta_1 \cup \Delta_2)$ $ = M_R(\Delta_1) + M_R(\Delta_2)$. $$\begin{align}
p(R \in \Delta_1 \vee R \in \Delta_2 | \rho) & =\tr{\rho M_R(\Delta_1 \cup \Delta_2)} \\
&=\tr{\rho M_R(\Delta_1)}+\tr{\rho M_R(\Delta_2)} \\
& =p(R \in \Delta_1 | \rho)+p(R \in \Delta_2 | \rho)
\end{align}
$$ Hence axiom (3) holds.

Axiom 4
After this point, I think, we have no doubt that axiom (4) will also hold for QM. But, anyways, let us do this. $$p(A\&B|C) = p(A|C)p(B|A\&C)$$

Here $C$ is the quantum state, $\rho$ or $\ket{\psi}$. $A$ is the event of getting an eigenvalue in the range $\Delta_a$ when $R$ is measured, and similarly $B = S \in \Delta_b$ of the $S$ measurement. $A\&B$ means that simultaneously $R$ has a value in $\Delta_a$ and $S$ has a value in $\Delta_b$.

First let me show this for the pure state case with the Kolmogorov's version of the axiom, which was: $p(A\&B)=p(B|A)p(A)$.

$p(A) = p(R=r) = \sandwich{\psi}{M_R(r)}{\psi}$ $=\bra{\psi}\left( \ket{r}\bra{r} \right) \ket{\psi}$ $=|\braket{r}{\psi}|^2$.

$p(B|A)$ can be thought of the result of two successive measurements. First the event $A$ is happened. Then what is the probability of $B$ to happen. QM tells us that the if the eigenvalue $r$ is measured, than the state becomes (the wavefunction collapses) to $\ket{\psi} \rightarrow \ket{r}$. Or a more general expression is $$\ket{\psi} \rightarrow \frac{M_R(r) \ket{\psi}}{\sqrt{\sandwich{\psi}{M_R(r)}{\psi}}} = \frac{M_R(r) \ket{\psi}}{\sqrt{p(R=r)}} \equiv \ket{\psi^\prime}$$ Hence $p(B|A)$ now A is the new state $\ket{\psi^\prime}$ and to do the probability calculation one has to sandwich $M_S$ with this new state. $$\begin{align}
p(B|A) &= p(S=s|\ket{\psi^\prime}) \\
& = \sandwich{\psi^\prime}{M_S}{\psi^\prime} \\
& = \frac{\bra{\psi}M_R(r)}{\sqrt{p(R=r)}} M_S(s) \frac{M_R(r)\ket{\psi}}{\sqrt{p(R=r)}} \\
& = \frac{\sandwich{\psi}{M_R(r)M_S(s)M_R(r)}{\psi}}{p(R=r)} \\
& = \frac{\sandwich{\psi}{\left(M_S M_R + \left[M_R, M_S\right] \right)M_R}{\psi}}{p(R=r)} \\
& = \frac{\sandwich{\psi}{M_S M_R}{\psi} + \sandwich{\psi}{[M_R, M_S]}{\psi}}{p(R=r)} \\
& = \frac{\sandwich{\psi}{M_S M_R}{\psi}}{p(R=r)}, \quad \text{if} [M_R, M_S]=0
\end{align}$$

When the operators $R$ and $S$ commute, they have simultaneous eigenvectors, hence $M_R$ and $M_S$ also commute. $p(R=r)$ was $p(A)$. Hence $p(B|A)p(A)$ $= \sandwich{\psi}{M_S M_R}{\psi}$ which has to be $p(A\&B)$, and fortunately it is.

To find $p(A\&B) = p(R=r \& S=s)$ one need the projection operator on the eigensubspace, $\epsilon_{rs}$ where $R$ has the eigenvalue $r$, and $S$ has the eigenvalue $s$. $\epsilon_{rs}$ is the intersection of $\epsilon_{r}$ and $\epsilon_{s}$. One can project a vector on $\epsilon_{rs}$ by successively applying the $M_R$ and $M_S$ projection operators. The order is not important when $[R,S]=0$. Because then, they have simultaneous eigenvectors. That eigenvectors can be used to create an orthonormal basis. Projection operators constructed from that basis will commute too. And joint probability is defined only the operators commute.

Therefore, the component of $\ket{\psi}$ on the $\epsilon_{rs}$ is $\ket{\psi_{rs}} = M_R M_S \ket{\psi}$. Its norm square is the probability. $\left\| \ket{\psi_{rs}} \right\|^2$ $= \braket{\psi_{rs}}{\psi_{rs}}$ $= \bra{\psi} M_S M_R M_R M_S \ket{\psi}$. Using their commutation it becomes $p(R=r \& S=s)$ $= \bra{\psi} M_S M_R \ket{\psi}$. And this quantitiy is the numerator of the conditional probability result. Hence axiom (4) is also satisfied by quantum mechanics' postulates of probability calculations.

Using a mixture state and Cox' version, the calculations becomes:

$p(A|C) = p(R \in \Delta_a | \rho)$ $= \tr{\rho M_R(\Delta_a)}$.

$p(A\&B|C)$ $= p(R \in \Delta_a \& S \in \Delta_b | \rho)$ $= \tr{\rho M_R(\Delta_a) M_S(\Delta_b)}$.

 $p(B|A\&C)$ $=p(S \in \Delta_b | R \in \Delta_a \& \rho)$. Here, $R \in \Delta_a \& \rho$ can be thought as a new state $\rho \rightarrow \rho^\prime$ which we got after the $R$ measurement. $p=p(S \in \Delta_b | \rho^\prime)$. QM postulate that $\rho^\prime$ $= \frac{M_R(\Delta_a) \rho M_R(\Delta_a)}{\tr{M_R(\Delta_a) \rho M_R(\Delta_a)}}$. Therefore, $p=\tr{\rho^\prime M_s(\Delta_b)}$. $$\begin{align}
p(A|C)p(B|A\&C) & = p(R \in \Delta_a | \rho) p(S \in \Delta_b | R \in \Delta_a \& \rho) \\
& = \tr{\rho M_R(\Delta_a)} \tr{\rho^\prime M_s(\Delta_b)} \\
& = \tr{\rho M_R(\Delta_a)} \tr{ \frac{M_R(\Delta_a) \rho M_R(\Delta_a)}{\tr{M_R(\Delta_a) \rho M_R(\Delta_a)}}  M_s(\Delta_b)} \\
& = \tr{\rho M_R(\Delta_a)} \frac{\tr{M_R(\Delta_a) \rho M_R(\Delta_a)M_s(\Delta_b)}}{\tr{M_R(\Delta_a) \rho M_R(\Delta_a)}} \\
& = \tr{M_R(\Delta_a) \rho M_R(\Delta_a)M_s(\Delta_b)} \\
& = \tr{\rho M_R(\Delta_a)M_s(\Delta_b)} \\
& = p(R \in \Delta_a \& S \in \Delta_b | \rho) \\
& = p(A\&B|C)
\end{align}$$

for which axiom (4) holds too. So, as long as we are dealing with joint probability distributions of commuting observables, QM is a correct probability theory. Just in case you may had any doubts about it...

Axioms of Probability Theory and Some Definitions and Theorems

I was thinking about non-trivial probabilities in quantum mechanics, such as, the probability of getting a degenerate eigenvalue of an observable, or the joint probabilities of commuting observables, or when to use expectation values of projection operators etc.

Like all thinking processes, I started by asking wrong questions. Then, by reading about different concepts of probability theory from random sources, a picture started to emerge in my mind. Thanks to my tendency to go deeper, I started by pondering why the joint probability is calculated by this formula $p(A=a,B=b)$ $=<\psi|P(A=a)P(B=b)|\psi>$ and found myself came down to learning the axioms of probability theory.

Axioms of Probability Theory

The axioms of probability theory is a set of mathematical definitions and identities from which one can derive all of the relations that are related to probabilities. These mathematical objects can be used where the context is appropriate to use probabilistic ideas.

The axioms are not unique, but one can derive the same theorems by starting from different compatible sets of axioms. I read two versions. First one is Kolomogorov's axioms that I came up when I reading Conditional probability article on Wikipedia, where I see the idea of drawing Venn diagrams to visualize the probability relations. Second one is the chapter 1.5 of Ballentine's book, where he based the form of the axioms on the R. T. Cox's work The Algebra of Probable Inference.

We start we the empty concepts "event" and "probability", and we'll give them specific meaning according to the interpretation of the probability we use. (Like while writing the axioms of geometry we use the concept of "point" without defining it, but we have a intuition about its role.)

As far as I understand, the basic difference between the perspectives of Kolmogorov and Cox is that, Cox is always talking about conditional probabilities, while in Kolmogorov's version probabilities without any conditions can be used.
Left: Cox, Right: Kolmogorov
Think of E as a region of area 1 which contains all possible events. The probability of the event A's occurrence, $p(A)$, is the area of A, $a(A)$. We can also talk about conditional probabilities such as "given the occurrence of A what is the probability of the occurrence of B?", which is shown by $p(B|A)$. Cox' version does not explicitly based on the existence of the set E and deals with conditional probabilities only. Therefore, probabilities are ratios of the areas. I think, one can compromise these to by telling that $p(A)$ in Kolmogorov is $p(A|E)$ in Cox.

First start with the symbols. $A$ is the occurrence of the event A. $\sim A$, negation, is not occurrence of the event A. $A\& B$, conjunction, is the occurrence of both A and B. $A\vee B$, disjunction, is the occurrence of at least one of the A and B.

$$ 0 \leq p(A|B) \leq 1 \tag{Axiom (1)}$$
Gives the limits of a probability. No negative numbers, hence no less than absolute improbability. No numbers bigger than 1, hence no probability higher than absolute certainty.

$p(A|A) = 1 \tag{Axiom (2)}$
The probability of certainty is one. "Given $A$ happened, what is the probability that $A$ happened."

$$p(\sim A|B) = 1 - p(A|B) \tag{Axiom (3)}$$
How negation is expressed in terms of certainty and conditional probability. This is an expression of the intuition that the probability of occurrence decreases as the probability of non-occurrence increases.

$$p(A\& B|C) = p(A|C)p(B|A\& C) \tag{Axiom (4)}$$
The most complicated one, which defines joint probabilities in terms of a product of conditional probabilities.  "Given $C$, the probability of two events occur is the same as the probability of one of them (given $C$) times probability of other one given that first one already occurred.") Let us think about it using the diagram on the left, which I thought is the most general one.

$p(A|C)$ is related to the area of the intersection $A \cap B$ where our sample space is the set $C$. Divide the area of the intersection to the area of $C$. $p(A|C) = \frac{a(A \cap B)}{a(C)}$. $p(B|A \& C)$ means that we know that both $A$ and $C$ has happened hence our sample space is the intersection $A \cap C$. On that region, which portion of it also covers $B$. $p(B|A \& C)$ $= \frac{a(A \cap B \cap C)}{a(A \cap C)}$. $p(A \& B|C)$ is given C, what is the joint probability of both $A$ and $B$ occurs. If we express this as the ratios of two areas as usual, on the numerator we have again the area of the triple integration, but this time we have the area of $C$ in the denominator. $p(A \& B|C)$ $=\frac{a(A \cap B \cap C)}{a(C)}$.

$$\frac{a(A \cap B \cap C)}{a(C)} = \frac{a(A \cap B)}{a(C)} \frac{a(A \cap B \cap C)}{a(A \cap C)}$$

Hence the fourth axiom makes sense if you visualize the probability ideas with Venn diagrams.

Kolmogorov's version of the axiom (4) is written like:
$$p(A \& B) = p(A|B) p(B) = p(B|A) p(A)$$
Which looks much more understandable for me without thinking about diagrams.

Let me write its Cox version by assuming that they all live inside the set $E$. $p(A \& B|E) = p(A|B) p(B|E)$. Check their equality using ratios of areas: $\frac{a(A \cap B)}{a(E)}$ $ =\frac{a(A \cap B)}{a(B)} \frac{a(B)}{a(E)}$. Yes!

Some Definitions, Applications and Theorems

Now we have everything needed the rest of the probability theory. Let's check their plausibility and extend the amount of tools.

Use (2) on $p(\sim A|A) = 1 - p(A|A)$ use (1) $=1-1=0$. Given A occurred, its non-occurrence probability is zero. :-)

$\sim$ and $\&$ are defined with axioms. From them we can derive $\vee$. According to logic $A \vee B$ $=\sim (\sim A \& \sim B)$. "$A$ or $B$" means "not neither $A$ nor $B$." Hence,
$$ \begin{align}
p(A \vee B|C) & = p(\sim (\sim A \& \sim B|C)) \quad \text{Use (3)} \\
& = 1 - p(\sim A \& \sim B|C) \quad \text{Use (4)} \\
& = 1 - p(\sim A|C)p(\sim B|\sim A \& C) \quad \text{Use (3)} \\
& = 1-\left[1 - p(A|C) \right]\left[1 - p(B|\sim A \& C) \right] \\
& = 1-1+p(A|C)+p(B|\sim A \& C)-p(A|C)p(B|\sim A \& C) \\
& = p(A|C) + p(B|\sim A \& C)\left[1 - p(A|C)\right] \\
& = p(A|C) + p(\sim A|C) p(B|\sim A \& C)\\

& = p(A|C) + p(\sim A \& B|C) = p(A|C) + p(B\&{}\sim{}A |C) \\

& = p(A|C) + p(B|C)p(\sim A|B \& C) \\
& = p(A|C) + p(B|C)\left[ 1 - p(A|B \& C) \right] \\


& = p(A|C) + p(B|C) - p(B|C)p(A|B\& C) \\
& = p(A|C) + p(B|C) - p(B\& A|C)
\end{align}$$

Now we derived the disjunction operation from the axioms. Its Kolmogorov version will be $p(A \vee B) = = p(A) + p(B) - p(B\& A)$ which really looks like the addition of areas of two sets.

Ballentine derives the same thing using another way. He uses an important lemma. Let me just show that  lemma.
$$ \begin{align}
p(X\& Y|Z) + p(X\&{}\sim{}Y|Z) & = p(X|Z)p(Y|X\&{}Z)+p(X|Z)p(\sim{}Y|X\&{}Z) \\
& = p(X|Z)\left[p(Y|X\&{}Z) + 1 - p(Y|X\&{}Z) \right] \\
& = p(X|Z)
\end{align}$$

This really looks like the calculation of a marginal distribution from a joint distribution by summing/integration over all values of one of the variables. $p(x) = \sum_y p(x, y)$.

Using the expression of $p(A \vee B|C)$ one can define the mutually exclusiveness. Given $C$, $A$ and $B$ are mutually exclusive if $p(A\&{}B|C) = 0$. Inside the set $C$, $A$ and $B$ does not intersect.

If $A$ and $B$ are mutually exclusive then $p(A \vee B|C)$ $= p(A|C) + p(B|C)$. This is called "addition of probabilities for exclusive events".

By writing down in two ways one gets the Bayes' theorem:

$$\begin{align}
p(A\&{}B|C)& = p(A|C)p(B|A\&{}C) \\
p(B\&{}A|C)& = p(B|C)p(A|B\&{}C) \\
\Rightarrow p( B|A\&{}C) &= p(A|B\&{}C) \frac{p(B|C)}{p(A|C)} \\
\end{align}
$$

In the other notation: $p(B|A) = p(A|B)\frac{p(B)}{p(A)}$. This is also called the principle of inverse probability.

The last concept that Ballentine talks about in that chapter is the statistical independence. $B$ is statistically independent of $A$ if $p(B|A\&{}C)=p(B|C)$. The existence of $A$ in the conditions does not affect the probability of $B$.

Apply this to the fourth axiom:
$$\begin{align}
p(A\&{}B|C) & = p(A|C)p(B|A\& C) \\
& =  p(A|C)p(B|C) \\
p(B\&{}A|C) & = p(B|C)p(A|C) \\
\end{align}$$

Which means that the independence is mutual and that the joint probability becomes the product of marginal distributions when $A$ and $B$ are independent. (In the other notation: $p(A\&{}B)=p(A)p(B)$).

The definition of more then two independent events is this. Let our events are the set $\left\{ A_n \right\}$. $p(A_i\&{}A_j\&{}\cdots\&{}A_k|C)=p(A_i|C)p(A_j|C)\cdots p(A_k|C)$. Where this equation holds for all subsets of $\left\{ A_n \right\}$ (including all elements case, all pairs cases, and all other cases.) This looks interesting!


Monday, February 6, 2012

The Mystery of the Complete Set of Commuting Observables (CSCO) - Part 2

In the last post, Part I, I talked about how commutation of operators leads them to have the same set of eigenvectors, and if some of the eigenvectors of one operator belongs to the same eigenvalue (degenerate in the first operator), how they may belong to different eigenvalues of the other operator (nondegenerate in the second operator).

This time, let me go the opposite way and show that if two operators have same eigenvectors, then they will commute. I'll use the explanation in Gilbert Strang's Linear Algebra and Its Application [1].

The diagonalization matrix $S$
But first, "remember your linear algebra!" If one puts all eigenvectors of an operator into the columns of a matrix, that matrix will be the diagonalization matrix of the operator. What?

The matrix elements of an operator can be found in any (orthonormal) basis using the sandwich $\left\langle \alpha_m \left| A \right| \alpha_n \right\rangle = A_{mn}$. If the basis is chosen as the eigenvectors of the operator (and remember that whether the eigenvalues are degenerate or not one can build an orthonormal basis using the eigenvectors, as long there are n linearly independent eigenvectors), then the matrix  representation of the operator will be diagonal. If $\left\{ \left| a_n \right\rangle \right\}$ is the set of orthonormal eigenvectors with corresponding eigenvalues $\left\{ a_n \right\}$, then $A_{mn}$ $= \left\langle a_m \left| A \right| a_n \right\rangle $ $= a_n \left\langle a_m | a_n \right\rangle$ which is $a_n \delta_{mn}$ due to orthonormality. Hence $$A_{mn} =

\begin{cases}
a_n & \text{ if } m=n \\
0 & \text{ if } m\neq n
\end{cases}$$
To express this process of 'representing an operator in its own eigenvector basis' with a simpler matrix multiplication operation, one uses the similarity transformation with the diagonalization matrix.

$$S=

\begin{pmatrix}
\uparrow & \uparrow &  & \uparrow\\
\left| a_1 \right\rangle & \left| a_2 \right\rangle & \ldots & \left| a_n \right\rangle \\
\downarrow & \downarrow &  & \downarrow
\end{pmatrix}

$$

of which inverse is
$$S^{-1}=

\begin{pmatrix}
\leftarrow & \left\langle a_1 \right| & \rightarrow \\
\leftarrow & \left\langle a_2 \right| & \rightarrow \\
 & \vdots & \\
\leftarrow & \left\langle a_n \right| & \rightarrow \\
\end{pmatrix}$$
Because their product must be identity $S^{-1}S=I$. Now we see that $\tilde{A}=S^{-1}AS$ is the diagonalized version of $A$, because this calculation involves all sandwiches in the calculations of previous diagonalized $A_{mn}$.

Simultaneously diagonalization of two operators [1]
It is time to show that if $A$ and $B$ have the same eigenvectors, hence diagonalized with the same diagonalization matrix (a fancier way of putting this is "if they are simultaneously diagonalizable") then $\left[A,B\right]=0$.

$AB = S\tilde{A}S^{-1}S\tilde{B}S^{-1}$ $=S\tilde{A}\tilde{B}S^{-1}$. Similarly, $BA = S\tilde{B}\tilde{A}S^{-1}$. Therefore $AB-BA$ $=S\tilde{A}\tilde{B}S^{-1}$ $-S\tilde{B}\tilde{A}S^{-1}$ $=S\left(\tilde{A}\tilde{B}-\tilde{B}\tilde{A}\right)S^{-1}$ $=S\left[\tilde{A},\tilde{B}\right]S^{-1}$ which is $0$, because diagonal matrices always commute (another reason why we love diagonal matrices). (The point of using the same diagonalization matrix is that, it cancels its inverse in $S\tilde{A}S^{-1}S\tilde{B}S^{-1}$ and we get the commutation in the end.)

Lifting up the degeneracy
Let me show an hypothetical examples of how the degeneracy is lifted up by using a commuting operator.
Say $A$ is a $3\times 3$ (hermitian) matrix. When diagonalized using $S$, we get $$\tilde{A}=
\begin{pmatrix}
a_d & 0  & 0 \\
0 & a_d  & 0 \\
0 & 0 & a_3
\end{pmatrix}$$ The first two eigenvalues are equal. Hence first two eigenvectors are degenerate. The set of eigenvectors labelled according to corresponding eigenvalues is $\left\{ \left| a_{d}^{(1)} \right\rangle, \left|  a_{d}^{(2)}  \right\rangle, \left| a_{3} \right\rangle \right\}$.

One can always find a commuting (hermitian) operator with different eigenvalues. How? Using spectral theorem of course! Say $B = b_1 \left| a_{d}^{(1)} \right\rangle \left\langle a_{d}^{(1)}\right|$ $+ b_d \left| a_{d}^{(2)} \right\rangle \left\langle a_{d}^{(2)}\right|$ $+ b_d \left| a_3 \right\rangle \left\langle a_3\right|$ and voila! $\tilde{B}$ in the basis is: $$
\begin{pmatrix}
b_1 & 0  & 0 \\
0 & b_d  & 0 \\
0 & 0 & b_d
\end{pmatrix}$$

Now $B$ commutes with $A$ and have different eigenvalues. Here we defıned the eigenvectors of $B$ as such $\left\{ \left| b_1 \right\rangle = \left| a_{d}^{(1)} \right\rangle \right.$, $\left| b_{d}^{(1)} \right\rangle = \left| a_{d}^{(2)} \right\rangle$, $ \left. \left| b_{d}^{(2)} \right\rangle \left| a_3 \right\rangle \right\}$.

If we label these eigenvectors not according to their corresponding eigenvalue of one operator, but corresponding eigenvalues of both operators, we'll have $\left\{ \left| b_1, a_d \right\rangle \right.$ $, \left| b_d, a_d \right\rangle$ $,\left.  \left| b_d, a_3 \right\rangle \right\}$. Here, we lifted the degeneracy, because, each eigenvector has two corresponding eigenvalues, and for each eigenvector we have a different pair of eigenvalues. So lovely!

Next time I will talk about CSCOs in the quantum mechanical context of hydrogen atom, free particle, spin$\tfrac{1}{2}$ systems.

For curios people, an example of "pathalogical" or "defective"  matrix from [1]: $\begin{pmatrix}

0  & 1 \\
0 &  0 \\
\end{pmatrix}$ Both of its eigenvalues are $0$. And its eigenvectors are $(0,0)$ and $(0,1)$, they are not linearly independent, they span a one dimensional space. Hence it is impossible to construct a complete orthonormal basis from them, namely there is no $S$ matrix.


[1] Gilbert Strang's Linear Algebra and Its Application third edition, Chapter 5.2.

The Mystery of the Complete Set of Commuting Observables (CSCO) - Part 1

I was thinking about joint probabilities in quantum mechanics. In order to have a joint probability distribution of two quantities, their corresponding observables should commute. Non-commutation prohibits the existence of simultaneous eigenvectors, leads to uncertainty relations and all the complicated stuff.

Commuting Operators and Their Simultaneous Eigenvectors
Lets start with the simple proof of the sameness of eigenvectors of commuting operators. Take two commuting operators: \( \left[A,B\right] = 0 \). Say \( \psi \) is an eigenvector of \( A \). \( A\psi=a\psi\). Check the relation between \(B\) and \(\psi\). \( \left[A,B\right]\psi = AB\psi - BA\psi \) \( = AB\psi - Ba\psi = (A-a)B\psi = 0\) \( \Rightarrow A(B\psi) = a(B\psi) \) (Eq. 1).

"Remember your linear algebra!" course: If \(\psi\) is an eigenvector of \(A\) with the eigenvalue a, then \(\psi' = c\psi\) is an eigenvector too with the same eigenvalue. The parameter \(c\) defines a set of eigenvectors with the same direction but different magnitude, all of them are eigenvectors belonging to the same eigenvalue. But this set does not mean degeneracy. To have degenerate eigenvectors, the two vectors belonging to the same eigenvalue have to have different directions.

In this light, (Eq. 1) means that \( B\psi \) is too an eigenvector of \(A\), with the eigenvalue a, but have a different magnitude. How can it be? There are two cases:

Either i) \(B\) only changes the magnitude of the vector \(\psi\). Changing only the magnitude is the idiosyncrasy of operator-eigenvector relations. Hence \(\psi\) is an eigenvector of \(B\) too. (\(B\psi=b\psi\). (Eq. 1) \( \Rightarrow A(b\psi) = a(b\psi) \) \( \Rightarrow A\psi' = a\psi' \)

Or, ii) We have the more complicated case of "eigenvalue \(a\) of \(A\) is degenerate".

Again "remember your linear algebra!": If \(\psi_1\) and \(\psi_2\) are two eigenvectors with two different eigenvalues, then their sum (or any other linear combination) is not an eigenvector. \(A\psi_1=a_1\psi_1\), \(A\psi_2=a_2\psi_2\) \( \Rightarrow A\left(c_1\psi_1+c_2\psi_2\right) \) \( = \left(a_1 c_1\psi_1+a_2 c_2\psi_2\right) \) \( \neq a_n \left(c_1\psi_1+c_2\psi_2\right) = a_n \psi_n \).

But if  \(\psi_1\) and \(\psi_2\) are two degenerate eigenvectors with the same eigenvalue, then their linear combinations are eigenvectors with the same eigenvalue too. \(A\psi_1=a\psi_1\), \(A\psi_2=a\psi_2\) \( \Rightarrow A\left(c_1\psi_1+c_2\psi_2\right) \) \( = \left(a c_1\psi_1+a c_2\psi_2\right) \)  \( = a \left(c_1\psi_1+c_2\psi_2\right) = a \psi_3 \). One can call the space spanned by these two degenerate eigenvectors an "eigensubspace",\( \varepsilon_a \). [2] (Now, besides the eigen prefix, we are also borrowing the power of the German Language in building compound words.)

An arbitrary \( \psi_3 \) may not be an eigenvector of \(B\). But one can try to find eigenvectors of B living in  \( \varepsilon_a \) by constructing them using \(\psi_1\) and \(\psi_2\). (Assume \(\psi_i\)s are orthonormal, or find and use \(\psi_i''\)s which are orthonormal and found by applying Gram–Schmidt process on  \(\psi_i\)s.) The eigenvectors of \(B\) will satisfy \(B\psi_3 = b\psi_3 \). \( B\left(d_1\psi_1+d_2\psi_2\right) = b\left(d_1\psi_1+d_2\psi_2\right) \) (Eq. 2) Using the orthonormality of \(\psi_i\)s, one can get matrix elements of \(B\) by sandwiching it with \(\psi_i\)s. \(B_{ij} = \left\langle \psi_i | B | \psi_j \right\rangle\). Writing (Eq. 2) as \(d_1 B \left| \psi_1 \right\rangle + d_2 B \left| \psi_2 \right\rangle \) \(= d_1 b \left| \psi_1 \right\rangle + d_2 b \left| \psi_2 \right\rangle\) and hitting with first \( \left\langle \psi_1 \right| \) and then \( \left\langle \psi_2 \right| \) from the left, one gets two equations. \(d_1 B_{11} + d_2 B_{12} = b d_1 \) and \( d_1 B_{21} + d_2 B_{22} = b d_2 \). We have 2 equations and 2 unknowns (\(d_1\) and \(d_2\)). Expressing these equations in matrix form
\[

\begin{pmatrix}
B_{11} - b & B_{12} \\
B_{21} & B_{22} - b
\end{pmatrix}

\begin{pmatrix} d1 \\ d2 \end{pmatrix}
=
\begin{pmatrix} 0 \\ 0 \end{pmatrix} 


\]
The nontrivial solution for \(d\)s exists if the determinant is zero. \( \left(B_{11} - b\right)\left(B_{22} - b\right)-B_{12}B_{21}=0 \) (Eq. 3). From this second order equation in \(b\), one gets

ii.i) either two distinct \(b\) values, each one giving a different \(\left(d_1, d_2\right)\), meaning having different linear combinations of \(\psi_1\) and \(\psi_2\), hence different eigenvectors of \(B\). This way the degeneracy is resolved. \(A\) had degenerate eigenvectors (which span \(\varepsilon_a\)) for the eigenvalue \(a\). We found two nondegenerate eigenvectors of \(B\) in  \(\varepsilon_a\), with eigenvalues \(b_1\) and \(b_2\)

ii.ii) or,  \(b_1\) and \(b_2\) are equal. We failed in our try of searching distinct eigenvalues of \(B\) in \(\varepsilon_a\). \(\psi_3\) is a degenerate eigenvector of \(B\) too, with the eigenvalue \(b\). \(\varepsilon_a = \varepsilon_b\). But no worries, there is definitely a third operator \(C\) which commutes with both \(A\) and \(B\) and have nondegenerate eigenvectors in \(\varepsilon_a\). (I don't know why, yet).

These can be generalized to higher dimensions. The degeneracy will be decreased by the amount of the distinct roots of the (Eq. 3).

Why the fuss? What is the big deal with these simultaneous eigenvectors, and resolving the degeneracy business?
In QM observables are hermitian operators and the eigenvectors of a hermitian operators build a set of complete orthogonal basis if there is no degeneracy.

In the nondegenerate case i) \( A\left|\psi\right\rangle = a\left|\psi\right\rangle \) and \(B\left|\psi\right\rangle=b\left|\psi\right\rangle\). According to the convention of labeling eigenvectors by their corresponding eigenvalues, we can label \( \left|\psi\right\rangle \) as \( \left|a\right\rangle \), \( \left|b\right\rangle \) or \(\left|a,b\right\rangle\). If \(A\) operates on \(\left|a,b\right\rangle\) the eigenvalue will be \(a\) and if \(B\) operators on \(\left|a,b\right\rangle\) the eigenvalue will be \(b\). The eigenvectors of \(A\) and \(B\) are the same but they correspond to different eigenvalues for \(A\) and \(B\).

If there is degeneracy, the nondegenerate eigenvectors are orthonormal and the degenerate ones span subspaces, \(\varepsilon_{a_{d}}\), for each degenerate eigenvalue \(a_d\). Any vector on \(\varepsilon_{a_{d}}\) is another eigenvector. One can use Gram-Schmidt process and find \(N\) orthonormal vectors in each \(N\)-dimensional eigensubspace \(\varepsilon_{a_{d}}\). Actually it is always possible to find a complete orthonormal basis from eigenvectors of \(A\) anyhow (either some of them are degenerate or not). [3] The contribution of adding \(B\) is to change the eigenvalues of vectors in \(\varepsilon_{a_{d}}\) and eliminating the degeneracy. (We don't like degeneracies in this town.)

In the degenerate case ii.i) \(B\) is used for trying to lift the degeneracy. Again we could label \( \left|\psi\right\rangle \) as \( \left|a_d\right\rangle \) but this time \( \left|\psi\right\rangle \) is not unique. And, in general, it is not an eigenvector of \(B\). But it is possible to find eigenvectors of \(B\) in the eigensubspace, \(\varepsilon_a\) spanned by the degenerate eigenvectors of \(A\) corresponding the degenerate eigenvalue \(a_d\). If somehow distinct eigenvectors of \(B\) is found, we can label the simultaneous eigenvectors as such: \( \left|a_d, b_1\right\rangle \), \( \left|a_d, b_2\right\rangle \) etc. Now they are unique distinct functions. If \(A\) hits on all of these vectors, \(a_d\) will be the eigenvalue. If \(B\) hits them \(b_1\), \(b_2\) etc will be the eigenvalue. Any vector can be written in this orthonormal basis \( \left|\phi\right\rangle = \sum_{a,b} c_{a,b} \left|a,b\right\rangle \) for each possible combination of the eigenvalues \(a\) and \(b\). \(A\) and \(B\) forms a CSCO.

In the degenerate case ii.ii) although again we got our basis, \(B\) could not lift the degeneracy. But there is sure an operator \(C\) which can do the job. (Why?!)

Next I will talk about the number of objects in CSCO and give some simple examples.

[1] http://faculty.physics.tamu.edu/herschbach/commuting%20observables%20and%20simultaneous%20eigenfunctions.pdf
[2] http://eecourses.technion.ac.il/046241/files/Rec2.pdf
[3] http://www.pa.msu.edu/~mmoore/Lect4_BasisSet.pdf

Saturday, February 4, 2012

Riesz Theorem and the true meaning of bras in Dirac notation

Prologue
It is time to start the obligatory research blog! When I put something on paper, it stays longer in my arsenal of usable concepts. And fastest way of remembering something is reading my previous notes about it. Hence, it is a good idea to write about the new concepts that I learned, and about the problems I'm working on. I hope this blog will last long and is not going to fall into oblivion, in the realm of blogs which had been started on a whim and slowly fades away in time with the increasing lack of enthusiasm and then eternally forgotten.

Ballentine's Quantum Book
Today's "look what I learned today" section is about the "bra"s in the Dirac notation. I started to read Ballentine's "Quantum Mechanics: A Modern Development". My research problem (about which I will talk later) involves joint probabilities in quantum mechanics. While searching about it on the web I came up to his book, read some pages on Google Books and liked it very much.

The book uses rigorous mathematics, and spends enough pages on fundamentals. (When I am left alone, I can't go further but I tend to go deeper, think about the details of the concepts that I already know.) Just in the first chapter, I learned where the bras come from, what the rigged Hilbert space is and some probability theory. Thanks Prof. Ballentine! As a graduate student, I do not want that anything is kept hidden from me for pedagogical reasons. I am brave enough to confront the rigged version of Hilbert spaces or to call Dirac delta a distributions, not a function. :-)

The Dual Space of Linear Functionals
I thought that I know the Dirac notation. Unfortunately, that was an illusion. Let me share my enlightenment with you.

Start with a linear vector space \(V\), with an inner product \( (\psi, \phi) = c \) defined on it. Elements of \(V\) are symbolized with kets: \( | \psi \rangle \). \( (|\psi\rangle, |\phi\rangle) = c \). A ket can be a column vector, or a function etc.

Now define a dual space of linear functionals on \(V\). What is a functional? Operators are mathematical objects which maps "vectors" to "vectors". (double quotes to indicate abstractness of the vectors, the kets). They take a vector and give a vector in return: \(A:\psi \rightarrow \phi\), which is shown as: \(A\psi=\phi\). Functionals are objects which maps vectors to numbers. They take a "vector" and give a number: \(F:\psi \rightarrow c\), maybe shown as: \(F \left\{ \phi \right\} = c \). For example, the norm of a vector is a functional (although not linear).

What makes an operator linear is the property that the equation \( A(\alpha\psi+\beta\phi) = \alpha A \psi + \beta A \phi \) holds for every \(\psi\) and \(\phi\). Similarly, a functional is linear when  \(F \left\{ \alpha \psi + \beta \phi \right\} = \alpha F \left\{ \psi \right\} + \beta F \left\{ \phi \right\} \) (1)

If \( F_1 \left\{ \phi \right\} + F_2\left\{ \phi \right\} = \left(F_1 + F_2\right)\left\{ \phi \right\} = F_3 \left\{ \phi \right\} \), which means that addition of two functionals (two elements of the dual space) gives another element of the dual space, the dual space is closed under addition, and is itself a vector space, \(V'\).

Riesz Theorem
Riesz Theorem says that there is a one to one correspondence (isomorphism) between elements of \(V\) and \(V'\). The operation of an arbitrary linear functional (from  \(V'\)) on the elements of \(V\) can be imitated by an inner product of elements from \(V\). \( F \left\{ \phi \right\} = (f, \phi) \), where \(f\) is fixed for an \(F\), and \(\phi\) is arbitrary.

Dirac assumed this isomorphism between \(f\) and \(F\). Riesz proved it, hence assumption is not necessary. Say \(\left\{\phi_n\right\}\) is an orthonormal basis. Any vector in \(V\) can be expanded on that basis. \(\psi = \sum_n c_n \phi_n\). According to (1), when a linear functional operates on \(\psi\), \(F\left\{\psi\right\} = F\left\{ \sum_n c_n \phi_n \right\} = \sum_n c_n F\left\{\phi_n\right\}  \). Now, we have to find a vector \(f\), of which inner product with the basis vectors, has the same effect as the functional on that basis vector.

\(f = \sum_m F\left\{\phi_m\right\}^* \phi_m\) does the job. \( \left(f,\psi \right) = \left(  \sum_m F\left\{\phi_m\right\}^* \phi_m, \sum_n c_n \phi_n \right) \) \( = \sum_m F\left\{\phi_m\right\} \sum_n c_n \left( \phi_m, \phi_n \right) = \sum_m \sum_n F\left\{\phi_m\right\} c_n \delta_{mn} \) \( = \sum_n c_n F\left\{\phi_n\right\} = F\left\{\psi\right\} \). There is a unique \(f\) in \(V\) for each \(F\) in \(V'\). Hence one can use these symbols interchangeably.

In Dirac notation, the number that the functional \(F\) gives, if it takes the vector \(\phi\) as its argument, \(F\left\{ \phi \right\}\), is shown as \(\langle F | \phi \rangle \). For an \(f\) defined as in the previous paragraph, \(\langle F | \phi \rangle = \left(f, \phi\right) \). Thanks to the uniqueness of \(f\) due to one-to-one correspondence between \(f\) and \(F\), we use the symbol \(F\) for both the linear functional in \(V'\) and the isomorphic vector in \(V\). \( \left(f, \phi\right) \equiv \left(F, \phi\right) \) And we get
\[ \langle F | \phi \rangle = \left(F, \phi\right) \]
Hence the braket can be thought as merely a new notation for inner product. But the real motivation behind it was that the bras are linear functionals on the ket space.