Aapo Hyvärinen suggested a method to estimate non-normalized statistical models by the means of score matching. There is an alternative way to derive his objective function.
We start with absolutely continuous positive in the domain probability density function
, where
is a partition function, and
is a potential. Then for any good enough probing function
with
on the boundary of the domain (
) the expected value for
by integration by parts is
In other words, for any good enough probability density function, and for any food enough probing function the following identity holds.
(1) .
Now choosing the set of probing functions and summing up identities (1) for all probing function we get
i.e. Aapo cost function. In other words, my interpretation of score matching is a particular way to impose identity (1) for the distribution given a set of data points via sampling mean.
Now consider exponential family of distributions, i.e. than the cost function reads
That will fail miserably when is ill conditioned for any realistic sample sizes. Moreover, if
is singular the problem is ill defined, and one need to do some kind of regularization to obtain
, i.e. biasing estimated distribution.