On the density estimation


For probability space (\Omega, \mathcal{F}, \mathbb{P}) with A \in \mathcal{F} the indicator random variable

{\bf 1}_A : \Omega \rightarrow \mathbb{R} = \left\{ \begin{array}{cc} 1, & \omega \in A \\ 0, & \omega \notin A \end{array} \right.

Than expected value of the indicator variable is the probability of the event \omega \in A .

E( {\bf 1}_A )= \int_{\Omega} {\bf 1}_A(\omega) d \mathbb{P} = P(A).

Next we are interested in real valued random variable X:\Omega \rightarrow \mathbb{R}. We are now interested to find, assuming absolutely continuous measure \mathbb{P} , the probability density p(X), given a finite set of observations X_i= X(\omega_i),\ i=1..N .

P(A)= E( {\bf 1}_A )= \int_{\Omega} {\bf 1}_A(\omega) d \mathbb{P} = \int_{ \mathbb{R} } {\bf 1}_A (x) p(x) dx = \int_{ \mathbb{R}^2 } {\bf 1}_A (x) p(x) \delta(x-y) {\bf 1}_A (y) dx dy = \int_{ \mathbb{R}^2 } {\bf 1}_A (x) K(x, y) {\bf 1}_A (y) dx dy ,

where \delta(x-y) is Dirac delta function, and K(x, y)= p(x) \delta(x-y).

Next we want another complication – a resolution of identity, e.g. to represent all indicator function in some basis, e.g {\bf 1}_A (x) = \sum_i | i > < i | {\bf 1}_A (x)

We have

P(A)= \sum_{i,k} \int_{ \mathbb{R}^2 } {\bf 1}_A (x) | i > < i | K(x, y) | k > < k | {\bf 1}_A(y) dx dy= \sum_{i,k} \int_{ \mathbb{R}^2 } {\bf 1}_A (x) | i > \rho < k|{\bf 1}_A (y) dx dy ,

where \rho is the density matrix by the analogy with quantum mechanics.

To be concrete we can use the real valued Hermite functions \psi_n(x) as a complete basis set. Here we have \sum \limits_{n=0}^\infty \psi_n(z) \psi_n(z') = \delta( z-z') , so
P(A)= \sum_{i,k} \int_{ \mathbb{R}^2 } {\bf 1}_A (x) | i > \rho < k|{\bf 1}_A (y) dx dy =
\sum_{i,k} \int_{ \mathbb{R}^4 } {\bf 1}_A (x) \psi_i( x ) \psi_i( z_1 ) p(z_1) \delta( z_1 - z_2 ) \psi_k( z_2) \psi_k( y ) {\bf 1}_A (y) dx dy dz_1 dz_2 .

The density matrix elements \rho_{i,k}= \int_{ \mathbb{R}^2 } \psi_i( z_1 ) p(z_1) \delta( z_1 - z_2 ) \psi_k( z_2) dz_1 dz_2 = \int_{ \mathbb{R} } \psi_i( z ) p(z) \psi_k( z) dz. The density p(x)= \sum_{i,k} \psi_i( x ) \rho_{i,k} \psi_k( x ).

In order to estimate density one need to use samples to estimate density matrix, e.g. replace the integral with sample mean \hat {\rho}_{i,k} = \frac{1}{N} \psi_i( X_i ) \psi_k( X_i ) . In terms of asymptotic behavior, each element of the density matrix will converge to the true value (asymptotically unbiased), and the variance of the expected value decay as \frac{1}{N} . Think of it as a Monte-Carlo integration.

Comments.

  1. This seems (since no theorems are proven here) to be more efficient than the kernel method of density estimation.
  2. The truncated version of Hermite functions will lead to systematic bias. Since Hermite functions are the basis function of continuous Fourier transform, the bias will consist of truncating the bandwidth of probability density function.
  3. The complexity is in the complexity of matrix computation, e.g. N*M^2, where M is the number of Hermite functions used, compared to just N for kernel method.
  4. The basis set does not have to be real, and if complex resemble quantum mechanical description of reality, where \psi_k(x) represent an experimental setup, and p(x) “classical” probabilities.
  5. Depending on the M/N ratio one will get either smooth or sharp approximation of the density.
  6. The mothod allow to compare different distributions on the same basis set.
  7. There are too many details need to be filled in to become publication quality. This will take unreasonable long time for me, so if anyone sees rational grain in it and can use as the basis, please go ahead.
  8. The method works in practice. It has some additional information, not mentioned here.
Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s