Tuesday, February 21, 2012

Essential Linear Algebra for Projection Calculations

$\newcommand{\tr}[1]{\text{Tr}\left\{#1\right\}}$ $\newcommand{\ket}[1]{\left|#1\right\rangle}$ $\newcommand{\bra}[1]{\left\langle#1\right|}$ $\newcommand{\braket}[2]{\left\langle#1\right| \left.#2\right\rangle}$ $\newcommand{\sandwich}[3]{\left\langle#1\right|#2 \left|#3\right\rangle}$ $\newcommand{\span}[1]{\text{Span}\left\{#1\right\}}$ $\newcommand{\proj}[2]{\text{Proj}_{#1}\left(#2\right )}$
In QM, the probability of a measurement outcome, $p(R=\Delta)$, is the square of the norm of the component of the state vector on the related subspace, $\| \ket{\psi_\Delta} \|^2$. The components of vectors on some subspaces are found by applying a projection operator to the vector. Therefore an elementary knowledge about how these projection operators are constructed is essential. :-) It is "remember your linear algebra!" time. (In this entry I'll assume that the operator has discrete spectra, and it is finite dimensional).
The set on the left consists of possible outcomes of the measurement, hence the eigenvalues of the observable. Right set depicts the related vector spaces corresponding to eigenvalues.
The set on the left, $\left\{r_n\right\}$, is the set of all $N$ eigenvalues of the hermitian operator $R$, related to the observable/dynamic variable $R$. Each dots represent an eigenvalue. (There may be more than one eigenvalue with the same value, which results in degeneracy and they are called degenerate eigenvalues.) $\Delta$ is a subset of whole eigenvalues. A measurement of $R$ will give only one of the elements of $\left\{r_n\right\}$. We are looking for the probability that the outcome will be in the set $\Delta$. The probability distribution $p(R=r)$ (the probability that the outcome will be a certain $r$) depends both on the observable and on the quantum state.

The set on the right, $\mathcal{H}$, is the $N$ dimensional Hilbert space. (A Hilbert space is an abstract vector space on which an inner product is defined to measure the lengths of vectors.) The state of a quantum system is described by a vector $\ket{\psi}$ that lives in $\mathcal{H}$. For each eigenvalue $r_n$ in the set $\left\{r_n\right\}$ there is corresponding vector $\ket{r_n}$ in $\mathcal{H}$ which satisfies the relation $R\ket{r_n} = r_n \ket{r_n}$. That vector is called the eigenvector of R (corresponding to the eigenvalue $r_n$).

Linear algebra tells us that the eigenvectors of a $N\times N$ hermitian operator $R$, $\left\{ \ket{r_n} \right\}$, is a complete set that spans all of the $N$-dim Hilbert space. $\span{ \left\{ \ket{r_n} \right\}} = \mathcal{H}$. As we talked previously, if $R$ is non-degenerate (all eigenvalues are ), then $\left\{ \ket{r_n} \right\}$ is not only a complete set but a complete orthonormal set (CONS) of which elements satisfy the relation $\braket{r_m}{r_n}=\delta_{mn}$. If $R$ is degenerate, then eigenvectors in $\left\{ \ket{r_n} \right\}$ are only linearly independent (LI) (assuming $R$ is not a pathological case) but one can always construct a CONS from a LI set (remember Grams-Schmidt process). Any vector $\ket{\psi}$ in $\mathcal{H}$ can be written as a linear combination of elements of $\left\{ \ket{r_N} \right\}$ or its orthonormalized version $\left\{ \ket{u_N} \right\}$.

The set of eigenvectors $\ket{ r_\Delta }$ related to a subset of $\left\{ \ket{r_N} \right\}$ that we will call $\Delta$ can not span the whole $\mathcal{H}$ but a subset of it, that we will call $\mathcal{V}$. $\span{\ket{ r_\Delta }} = \mathcal{V}$. If $\Delta$ has $M$ elements then $\ket{ r_\Delta }$ has too. And $\mathcal{V}$ is an $M$ dimensional subspace of $\mathcal{H}$. We will call the rest of the Hilbert space $\mathcal{V}^\perp$ which is the complementary set of $\mathcal{V}$. $\mathcal{H} = \mathcal{V}\oplus\mathcal{V}^\perp$ and $\mathcal{V}\cap\mathcal{V}^\perp = 0$.

Any vector $\ket{\psi}$ can be written as a sum of its two components, one belonging to $\mathcal{V}$ and other to $\mathcal{V}^\perp$. $\ket{\psi} = \ket{\psi_\mathcal{V}} + \ket{\psi_{\mathcal{V}^\perp}}$. $\ket{\psi_\mathcal{V}} \in \mathcal{V}$ and $\ket{\psi_{\mathcal{V}^\perp}} \in \mathcal{V}^\perp$. (Or we could call them $\ket{\psi_\mathcal{V}}$ and $\ket{\psi_{\mathcal{\Delta}^\perp}}$ )

Our aim is to construct the projection operator $M_R(\Delta)$ which will give $\ket{\psi_\mathcal{V}}$ when applied to any $\ket{\psi}$ for a chosen operator $R$ and range $\Delta$

Constructing the Projection Operator onto a subspace using a LI set of vectors that span that subspace

Simple Case: Projection onto a line on $\mathbb{R}^2$


First let me demonstrate this by working in $\mathbb{R}^2$ and projection onto a line $L$. Say, our subspace is the $1D$ line $L$, which is spanned by a vector $\vec{r}$. $L=$ $\left\{c\vec{r} | c \in \mathbb{R} \right\}$ $=\span{\vec{r}}$. $\proj{L}{\vec{v}}\equiv \vec{v}_L$ and $\vec{v}$ $- \proj{L}{\vec{r}}$ $=\vec{v}_{L^\perp}$. 

We can express the projected line mathematically using this relation: $\vec{v} - \proj{L}{\vec{v}}$ $\perp L$. Which means that, the projection of $\vec{v}$ onto $L$ is a vector $\proj{L}{\vec{v}}$ on $L$ of which difference from $\vec{v}$ is perpendicular to all vectors on $L$. Any vector on $L$ can be described by a real multiple of $\vec{r}$. Hence $\vec{v}_L = c\vec{r}$. To find the projection, we have to find this $c$.

The inner product of two perpendicular vectors is $0$. $\vec{a} \cdot \vec{b} = 0$ if $\vec{a} \perp \vec{b}$. Using perpendicularity between $\vec{v} - \vec{v}_L = \vec{v}_{L^\perp}$ and $L$ we can write $$\begin{align}
\left( \vec{v} - c\vec{r} \right) \cdot \vec{v} & = 0 \\
\vec{v}\cdot \vec{r}- c\vec{r}\cdot \vec{r} & = 0 \\
\Rightarrow c & = \frac{\vec{v}\cdot \vec{r}}{\vec{r}\cdot \vec{r}} \\
\Rightarrow \proj{L}{\vec{v}} & = \frac{\vec{v}\cdot \vec{r}}{\vec{r}\cdot \vec{r}} \vec{r} \\
\proj{\span{\vec{r}}}{\vec{v}} & = \vec{r}\frac{\vec{r} \cdot \vec{v}}{\vec{r}\cdot \vec{r}}
\end{align}$$ $\vec{r}$ is not unique, it can be any vector on $L$. Picking a unit vector will simplify the calculations. Let $\vec{u} = \frac{\vec{r}}{\| \vec{r} \|}$, projection becomes $\proj{\span{\vec{u}}}{\vec{v}} = \vec{u}\left(\vec{u}\cdot\vec{v}\right)$

(Projection is a linear operation)
Let me show that this operation is linear.

  1. $\proj{L}{\vec{a}+\vec{b}}$ = $\vec{u}\left(\vec{u}\cdot\left(\vec{a}+\vec{b}\right)\right)$ $=\vec{u}\left(\vec{u}\cdot \vec{a}+\vec{u}\cdot \vec{b}\right)$ $=\vec{u}\left(\vec{u}\cdot \vec{a}\right)+\vec{u}\left(\vec{u}\cdot \vec{b}\right)$ $=\proj{L}{\vec{a}}+\proj{L}{\vec{b}}$. 
  2. $\proj{L}{c\vec{a}}$ $= \vec{u}\left(\vec{u}\cdot c \vec{a}\right)$ $=c \vec{u}\left(\vec{u}\cdot \vec{a}\right)$ $=c \proj{L}{c\vec{a}}$.
Hence $\proj{L}{\vec{a}}$ is a linear operation and therefore it can be represented by a matrix. $\proj{L}{\vec{a}}$ $\equiv M\vec{a}$. Let's find the matrix elements of $M$ by sandwiching it between unit basis vectors. $M_mn = \sandwich{m}{M}{n}$, where $\ket{1} = \begin{pmatrix}1\\0\end{pmatrix}$ and $\ket{2} = \begin{pmatrix}0\\1\end{pmatrix}$.
$$\begin{align}M_{mn} &= \sandwich{m}{M}{n} \\
&= \bra{m} \proj{L}{\ket{n}} \\
&= \vec{m} \cdot \left( \vec{u}\left( \vec{u}\cdot \vec{n} \right) \right) \\
&= \vec{m}\cdot \left( \vec{u}u_n \right) = u_n \vec{m}\cdot \vec{u}\\
&= u_m u_n,\quad \text{OR} \\
&= \bra{m} \left(\ket{u}\braket{u}{n}\right) \\
&= \braket{m}{u}u_n \\
&= u_m u_n\end{align}$$
Hence $M_L = \begin{pmatrix} u_x^2 & u_x u_y \\ u_x u_y & u_y^2\end{pmatrix}$ $=\ket{u}\bra{u}$.

General Case: Projection onto a $K$-dimensional subspace on $\mathbb{R}^N$

In the general case $\mathcal{H}$ is $N$-dimensional, and the subspace $\mathcal{V}$ is $K$-dimensional. Say, $\mathcal{V}$ is spanned by a set of $K$ LI vectors $\left\{ \vec{r}_K \right\}$. Then any vector on $\mathcal{V}$ can be expressed as a liner combination of $\vec{r}_i$'s. $\vec{x} \in \mathcal{V}$ $\Rightarrow \vec{x} = \sum_i^K c_i \vec{r_i}$.

If we think of these $c_i$'s as components of a vector $\vec{c}$, then this relation between $\vec{v}$, $c_i$'s and $\left\{ \vec{r}_K \right\}$ can be shown with a matrix multiplication. Define an $K\times N$ dimensional matrix $A$ $$A=
\begin{pmatrix}
\uparrow & \uparrow & & \uparrow\\
\vec{r}_1 & \vec{r}_2 & \ldots & \vec{r}_K \\
\downarrow & \downarrow & & \downarrow
\end{pmatrix}$$
Then, for each $\vec{x} \in \mathcal{V}$, $\vec{x} = A\vec{c}$ where $\vec{c}$ is unique to chosen $\vec{x}$. The projection of a vector $\vec{v} \in \mathcal{H}$ onto $\mathcal{V}$, called $\vec{v}_\mathcal{V}$, is a vector in $\mathcal{V}$. Hence it can be expressed by the same matrix multiplication form. $\vec{v} =$ $\vec{v}_\mathcal{V}$ $+ \vec{v}_\mathcal{V^\perp}$. Or $\vec{v} =$ $\proj{\mathcal{V}}{\vec{v}}$ $+\proj{\mathcal{V^\perp}}{\vec{v}}$ and $\proj{\mathcal{V}}{\vec{v}}$ $=A \vec{c}$.

According to this definition of $A$, $\mathcal{V}$ is the "column-space" of $A$. $\mathcal{V}$ $=\text{C}(A)$. A columnspace of a matrix is the subspace spanned by its column vectors. Some deep and mysterious relations of linear algebra tells us that, the complementary subspace of $\mathcal{V}$, which is $\mathcal{V}^\perp$, is the left-null-space of $A$, or the null-space of $A$ transpose, $A^\top$. $\mathcal{V}^\perp$ $=\text{C}(A)^\perp$, $=\text{N}(A^\top)$. Therefore, $\proj{\mathcal{V^\perp}}{\vec{v}}$ $\in \text{N}(A^\top)$.

If a vector belongs to the null-space of a matrix, it means that, when the matrix is applied to that vector the result is the null-vector. $$\begin{align}
A^\top \vec{v}_{\mathcal{V}^\perp} &= \vec{0} \\
A^\top \left( \vec{v} - \proj{\mathcal{V}}{\vec{v}} \right) &= \\
A^\top \left( \vec{v} - A\vec{c} \right) &= \\
A^\top \vec{v} - A^\top A\vec{c} &= \vec{0} \\
\Rightarrow A^\top \vec{v} &= A^\top A\vec{c}
\end{align}$$
Another linear algebra proverb says that, if the columns of a matrix $A$ are linearly independent then $A^\top A$ (which is an $(N\times{}K)(K\times{}N)=N\times{}N$ dimensional square matrix) is invertable. Therefore $\vec{c}=\left(A^\top A\right)^{-1}A^\top \vec{v}$. This is the way of calculating the $\left\{ c_i \right\}$ coefficients of the linear combination (in the form of $\vec{c}$) to expand the projection vector $\vec{v}_\mathcal{V}$ in this LI vectors basis, $\text{C}(A)$. Remember $\vec{v}_\mathcal{V}$ $ = A\vec{c}$ Hence: $$\boxed{\proj{\mathcal{V}}{\vec{v}} = A\left(A^\top A\right)^{-1}A^\top \vec{v}}$$
$A\left(A^\top A\right)^{-1}A^\top$ is the projection matrix $M_\mathcal{V}\equiv M_R(\Delta)$ that we are looking for. It only depends on the subspace, not the basis we chose to span that subspace. Different bases will give different $A$ matrices but $M_\mathcal{V}$ will be the same for all bases. In QM, the eigenvectors of the observables, the hermitian matrices, are orthonormal. (or degenerate eigenvectors can be orthogonalized). It may be good to look at the orthonormal basis case. And in general, from any linear independent set of vectors, one can create an orthonormal basis with Gramm-Schmidt process.

Assume the orthonormal basis $\left\{\vec{u}_K\right\}$ spans the $K$-dimensional subspace $\mathcal{V}$. This time the $A^\top A$ will be identity $I$. $$\begin{align}A^\top{}A &=
\begin{pmatrix}
\leftarrow & \vec{u}_1 & \rightarrow \\
\leftarrow & \vec{u}_2 & \rightarrow \\
\leftarrow & \vdots & \rightarrow \\
\leftarrow & \vec{u}_K & \rightarrow \\
\end{pmatrix}
\begin{pmatrix}
\uparrow & \uparrow & & \uparrow\\
\vec{u}_1 & \vec{u}_2 & \ldots & \vec{u}_K \\
\downarrow & \downarrow & & \downarrow
\end{pmatrix} \\
&=
\begin{pmatrix}
\vec{u}_1 \cdot{} \vec{u}_1 & \vec{u}_1 \cdot{} \vec{u}_2 & \cdots & \vec{u}_1 \cdot{} \vec{u}_K \\
\vec{u}_2 \cdot{} \vec{u}_1 & \vec{u}_2 \cdot{} \vec{u}_2 & \cdots & \vec{u}_2 \cdot{} \vec{u}_K \\
\vdots & \vdots & & \vdots \\
\vec{u}_K \cdot{} \vec{u}_1 & \vec{u}_K \cdot{} \vec{u}_2 & \cdots & \vec{u}_K \cdot{} \vec{u}_K \\
\end{pmatrix}
=
\begin{pmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots & \vdots & & \vdots \\
0 & 0 & \cdots & 1 \\
\end{pmatrix}
\end{align}
$$

Hence the expression for the projection operator reduces to $$\proj{\mathcal{V}}{\vec{v}} = A A^\top \vec{v}$$ Let us explicitly calculate its matrix elements. Let $u_{m,n}$ be the $n$th component of the $\vec{u}_m$. $$\begin{align}A A^\top{} &=
\begin{pmatrix}
\uparrow & \uparrow & & \uparrow\\
\vec{u}_1 & \vec{u}_2 & \ldots & \vec{u}_K \\
\downarrow & \downarrow & & \downarrow
\end{pmatrix}\begin{pmatrix}
\leftarrow & \vec{u}_1 & \rightarrow \\
\leftarrow & \vec{u}_2 & \rightarrow \\
\leftarrow & \vdots & \rightarrow \\
\leftarrow & \vec{u}_K & \rightarrow \\
\end{pmatrix} \\
&=

\begin{pmatrix}
u_{1,1} & u_{2,1} & \cdots & u_{K,1} \\
u_{1,2} & u_{2,2} & \cdots & u_{K,2} \\
\vdots & \vdots & & \vdots \\
u_{1,N} & u_{2,N} & \cdots & u_{K,N} \\
\end{pmatrix}
\begin{pmatrix}
u_{1,1} & u_{1,2} & \cdots & u_{1,N} \\
u_{2,1} & u_{2,2} & \cdots & u_{2,N} \\
\vdots & \vdots & & \vdots \\
u_{K,1} & u_{K,2} & \cdots & u_{K,N} \\
\end{pmatrix} \\
&=
\begin{pmatrix}
\sum_i^K u_{i,1} u_{i,1} & \sum_i^K u_{i,1} u_{i,2} & \cdots & \sum_i^K u_{i,1} u_{i,N} \\
\sum_i^K u_{i,2} u_{i,1} & \sum_i^K u_{i,2} u_{i,2} & \cdots & \sum_i^K u_{i,2} u_{i,N} \\
\vdots & \vdots & & \vdots \\
\sum_i^K u_{i,N} u_{i,1} & \sum_i^K u_{i,N} u_{i,2} & \cdots & \sum_i^K u_{i,N} u_{i,N} \\
\end{pmatrix} \\
&=
\sum_i^K
\begin{pmatrix}
u_{i,1} u_{i,1} & u_{i,1} u_{i,2} & \cdots & u_{i,1} u_{i,N} \\
u_{i,2} u_{i,1} & u_{i,2} u_{i,2} & \cdots & u_{i,2} u_{i,N} \\
\vdots & \vdots & & \vdots \\
u_{i,N} u_{i,1} & u_{i,N} u_{i,2} & \cdots & u_{i,N} u_{i,N} \\
\end{pmatrix} \\
&= \sum_i^K \ket{u_i}\bra{u_i}\end{align} $$
If the individual terms of the sum are thought as the projections onto the lines spanned by $\vec{u_i}$ then the projection onto the subspace is the sum of the projections onto orthogonal lines. This seems plausible because each term gives one component of the projection vector in $\left\{ \vec{u_K} \right\}$ basis. Hence using an orthonormal basis to span $\mathcal{V}$ we get $$M_\mathcal{V} = A A^\top = \sum_i^K \ket{u_i}\bra{u_i} = \sum_i^K M_{L_i}$$

Note that $M_\mathcal{H} = \sum_i^K \ket{u_i}\bra{u_i} = I$.

[1] Whole idea of the $A$ matrix and calculations of projection operators are stolen from Khan Academy's linear algebra online lectures.

1 comment:

  1. Nice but bit complicated blog and I am here to discuss about algebra that is,It is the part of mathematics in which numbers and quantities in formula and equations can be represented by letters and other symbols.
    math word problems

    ReplyDelete