## On the density estimation

For probability space $(\Omega, \mathcal{F}, \mathbb{P})$ with $A \in \mathcal{F}$ the indicator random variable

${\bf 1}_A : \Omega \rightarrow \mathbb{R} = \left\{ \begin{array}{cc} 1, & \omega \in A \\ 0, & \omega \notin A \end{array} \right.$

Than expected value of the indicator variable is the probability of the event $\omega \in A$.

$E( {\bf 1}_A )= \int_{\Omega} {\bf 1}_A(\omega) d \mathbb{P} = P(A)$.

Next we are interested in real valued random variable $X:\Omega \rightarrow \mathbb{R}$. We are now interested to find, assuming absolutely continuous measure $\mathbb{P}$, the probability density $p(X)$, given a finite set of observations $X_i= X(\omega_i),\ i=1..N$.

$P(A)= E( {\bf 1}_A )= \int_{\Omega} {\bf 1}_A(\omega) d \mathbb{P} =$ $\int_{ \mathbb{R} } {\bf 1}_A (x) p(x) dx = \int_{ \mathbb{R}^2 } {\bf 1}_A (x) p(x) \delta(x-y) {\bf 1}_A (y) dx dy = \int_{ \mathbb{R}^2 } {\bf 1}_A (x) K(x, y) {\bf 1}_A (y) dx dy$,

where $\delta(x-y)$ is Dirac delta function, and $K(x, y)= p(x) \delta(x-y)$.

Next we want another complication – a resolution of identity, e.g. to represent all indicator function in some basis, e.g ${\bf 1}_A (x) = \sum_i | i > < i | {\bf 1}_A (x)$

We have

$P(A)= \sum_{i,k} \int_{ \mathbb{R}^2 } {\bf 1}_A (x) | i >$ $< i | K(x, y) | k >$ $< k |$ ${\bf 1}_A(y) dx dy=$ $\sum_{i,k} \int_{ \mathbb{R}^2 } {\bf 1}_A (x) | i > \rho < k|{\bf 1}_A (y) dx dy$,

where $\rho$ is the density matrix by the analogy with quantum mechanics.

To be concrete we can use the real valued Hermite functions $\psi_n(x)$ as a complete basis set. Here we have $\sum \limits_{n=0}^\infty \psi_n(z) \psi_n(z') = \delta( z-z')$, so
$P(A)= \sum_{i,k} \int_{ \mathbb{R}^2 } {\bf 1}_A (x) | i > \rho < k|{\bf 1}_A (y) dx dy =$
$\sum_{i,k} \int_{ \mathbb{R}^4 } {\bf 1}_A (x) \psi_i( x ) \psi_i( z_1 ) p(z_1) \delta( z_1 - z_2 ) \psi_k( z_2) \psi_k( y ) {\bf 1}_A (y) dx dy dz_1 dz_2$.

The density matrix elements $\rho_{i,k}= \int_{ \mathbb{R}^2 } \psi_i( z_1 ) p(z_1) \delta( z_1 - z_2 ) \psi_k( z_2) dz_1 dz_2 = \int_{ \mathbb{R} } \psi_i( z ) p(z) \psi_k( z) dz$. The density $p(x)= \sum_{i,k} \psi_i( x ) \rho_{i,k} \psi_k( x )$.

In order to estimate density one need to use samples to estimate density matrix, e.g. replace the integral with sample mean $\hat {\rho}_{i,k} = \frac{1}{N} \psi_i( X_i ) \psi_k( X_i )$. In terms of asymptotic behavior, each element of the density matrix will converge to the true value (asymptotically unbiased), and the variance of the expected value decay as $\frac{1}{N}$. Think of it as a Monte-Carlo integration.

3. The complexity is in the complexity of matrix computation, e.g. $N*M^2$, where $M$ is the number of Hermite functions used, compared to just $N$ for kernel method.
4. The basis set does not have to be real, and if complex resemble quantum mechanical description of reality, where $\psi_k(x)$ represent an experimental setup, and $p(x)$ “classical” probabilities.
5. Depending on the $M/N$ ratio one will get either smooth or sharp approximation of the density.