The first day of a reading cycle of PRML

We began a reading cycle of "Pattern recognition and machine learning" written by Christopher M. Bishop. I used techinique of pattern recognnition for my research, but havn't studied pattern recognition systematically. Therefore, through this reading cycle, I want to study the essentials of pattern recognition.

On the first day of a reading cycle, we read 1〜1.2.

  • 1. Introduction

x denotes the data, for example the 784 real numbers which represents 28x28 pixel image of hand-written digits. The target vector t denotes the category of data, for example the digit in the automatic recognition problem of hand-written digits. y(x) denotes the function which takes a new data x as input and generates an output vector y as target vector.

Our goal is to find precise form of y(x) using training data.

We can classify problems by the kind of training data. If the training data consists of examples of the input vectors along with their corresponding target vectors, the problem called supervised learning problem. If the training data consists of a set of input vectors x without any corresponding target vectors, the problem called unsupervised learning problem.

Classification problems are supervised learning problems where each input vector x assign to one of a finite number of discrete categories (discrete target vector t.). If the target vector t consists continuous variables, it called regression. On the other hand, clustering, density estimation, and visualization are unsupervised learning problem.

  • 1.1 Example: Polynomial Curve Fitting

I omitted it ;-) though we didn't omit it in the reading cycle.

  • 1.2 Probability Theory

The rules of probability

sum rule:
p(X) = \sum_{Y}{p(X,Y)}
product rule:
p(X,Y) = p(Y|X)p(X)

From product rule, together with the symmetry property p(X,Y) = p(Y,X), we can obtain Bayes' theorem.
\begin{eqnarray}p(X,Y) &=& p(Y,X) \\ p(Y|X)P(X) &=& p(X|Y)P(Y) \\ p(Y|X) &=& \frac{p(X|Y)p(Y)}{p(X)}\end{eqnarray}

From sum rule and product rule, the denominator in Bayes' theorem as being the normalization constant.
\begin{eqnarray}p(X) &=& \sum_Y{p(X,Y)} \\ &=& \sum_Y{p(X|Y)p(Y)}\end{eqnarray}

covariances
\begin{eqnarray}cov(x,y) &=& E\left( \left( x - E(x) \right) \left( y - E(y) \right)^T\right) \\ &=&  E\left( xy^T - xE(y^T) - E(x)y^T + E(x)E(y^T) \right) \\ &=& E(xy^T) - E(x)E(y^t) - E(x)E(y^t) + E(x)E(y^t)  \\ &=& E(xy^T) - E(x)E(y^t) \end{eqnarray}