For probability space with the indicator random variable
Than expected value of the indicator variable is the probability of the event .
Next we are interested in real valued random variable . We are now interested to find, assuming absolutely continuous measure , the probability density , given a finite set of observations .
where is Dirac delta function, and .
Next we want another complication – a resolution of identity, e.g. to represent all indicator function in some basis, e.g
where is the density matrix by the analogy with quantum mechanics.
To be concrete we can use the real valued Hermite functions as a complete basis set. Here we have , so
The density matrix elements . The density .
In order to estimate density one need to use samples to estimate density matrix, e.g. replace the integral with sample mean . In terms of asymptotic behavior, each element of the density matrix will converge to the true value (asymptotically unbiased), and the variance of the expected value decay as . Think of it as a Monte-Carlo integration.
- This seems (since no theorems are proven here) to be more efficient than the kernel method of density estimation.
- The truncated version of Hermite functions will lead to systematic bias. Since Hermite functions are the basis function of continuous Fourier transform, the bias will consist of truncating the bandwidth of probability density function.
- The complexity is in the complexity of matrix computation, e.g. , where is the number of Hermite functions used, compared to just for kernel method.
- The basis set does not have to be real, and if complex resemble quantum mechanical description of reality, where represent an experimental setup, and “classical” probabilities.
- Depending on the ratio one will get either smooth or sharp approximation of the density.
- The mothod allow to compare different distributions on the same basis set.
- There are too many details need to be filled in to become publication quality. This will take unreasonable long time for me, so if anyone sees rational grain in it and can use as the basis, please go ahead.
- The method works in practice. It has some additional information, not mentioned here.