Sanitized `density' bandwidth calculation for very dense data sets

Now, for example, (plot (density '(0))) works.

Calculated bandwidth is now bounded by both 1e-308, below which
`kde' produces nonsense, and 1e-14 * max absolute value. The latter
bound ensures the bandwidth is wide enough to make a smooth-looking
curve even in the presence of floating-point rounding in the domain,
by ensuring that at least 100 floating-point numbers or so in the
domain get nonzero density.

It's a little weird to use the gap between floating-point numbers
for this, but it ensures density estimates aren't jagged because
of rounding (at least until you zoom in, in some instances), and
it's at least a decent method of estimating bandwidth for single-
sample density estimators.
This commit is contained in:
Neil Toronto 2014-02-14 16:05:47 -07:00
parent 9a6cdf420b
commit b92b6c8337

View File

@ -160,7 +160,9 @@
[ws (if ws (sequence->list ws) #f)])
(define n (length xs))
(define sd (stddev xs ws))
(define h (* bw-adjust 1.06 sd (expt n -0.2)))
(define h (max 1e-308
(* 1e-14 (apply max (map abs (filter rational? xs))))
(* bw-adjust 1.06 sd (expt n -0.2))))
(define-values (f fx-min fx-max) (kde xs h ws))
(let ([x-min (if x-min x-min fx-min)]
[x-max (if x-max x-max fx-max)])