#lang scribble/manual @(require scribble/eval racket/sandbox (for-label racket/base racket/promise racket/list math plot (only-in typed/racket/base Flonum Real Boolean Any Listof Integer case-> -> U Sequenceof Positive-Flonum Nonnegative-Flonum Nonnegative-Real)) "utils.rkt") @(define typed-eval (make-math-eval)) @interaction-eval[#:eval typed-eval (require)] @title[#:tag "stats"]{Statistics Functions} @(author-neil) @defmodule[math/statistics] This module exports functions that compute summary values for collections of data, or @deftech{statistics}, such as means, standard devations, medians, and @italic{k}th order statistics. It also exports functions for managing collections of sample values. Most of the functions that compute statistics also accept a sequence of nonnegative reals that correspond one-to-one with the data values. These are used as weights; equivalently counts, pseudocounts or proportions. While this makes it easy to work with weighted samples, it introduces some subtleties in bias correction. In particular, central moments must be computed without bias correction by default. See @secref{stats:expected-values} for a discussion. @local-table-of-contents[] @section{Counting} @defthing[samples->hash Any]{ This stub represents forthcoming documentation. } @defthing[count-samples Any]{ This stub represents forthcoming documentation. } @section[#:tag "stats:expected-values"]{Expected Values} Functions documented in this section that compute higher central moments, such as @racket[variance], @racket[stddev] and @racket[skewness], can optionally apply bias correction to their estimates. For example, when @racket[variance] is given the argument @racket[#:bias #t], it multiplies the result by @racket[(/ n (- n 1))], where @racket[n] is the number of samples. The meaning of ``bias correction'' becomes less clear with weighted samples, however. Often, the weights represent counts, so when moment-estimating functions receive @racket[#:bias #t], they interpret it as ``use the sum of @racket[ws] for @racket[n].'' In the following example, the sample @racket[4] is first counted twice and then given weight @racket[2]; therefore @racket[n = 5] in both cases: @interaction[#:eval typed-eval (variance '(1 2 3 4 4) #:bias #t) (variance '(1 2 3 4) '(1 1 1 2) #:bias #t)] However, sample weights often do not represent counts. For these cases, the @racket[#:bias] keyword can be followed by a real-valued pseudocount, which is used for @racket[n]: @interaction[#:eval typed-eval (variance '(1 2 3 4) '(1/2 1/2 1/2 1) #:bias 5)] Because the magnitude of the bias correction for weighted samples cannot be known without user guidance, in all cases, the bias argument defaults to @racket[#f]. @defproc[(mean [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f]) Real]{ When @racket[ws] is @racket[#f] (the default), returns the sample mean of the values in @racket[xs]. Otherwise, returns the weighted sample mean of the values in @racket[xs] with corresponding weights @racket[ws]. @examples[#:eval typed-eval (mean '(1 2 3 4 5)) (mean '(1 2 3 4 5) '(1 1 1 1 10.0)) (define d (normal-dist)) (mean (sample d 10000)) (define arr (array-strict (build-array #(5 1000) (λ (_) (sample d))))) (array-map mean (array->list-array arr 1))] } @deftogether[(@defproc[(variance [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f] [#:bias bias (U #t #f Real) #f]) Nonnegative-Real] @defproc[(stddev [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f] [#:bias bias (U #t #f Real) #f]) Nonnegative-Real] @defproc[(skewness [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f] [#:bias bias (U #t #f Real) #f]) Real] @defproc[(kurtosis [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f] [#:bias bias (U #t #f Real) #f]) Nonnegative-Real])]{ If @racket[ws] is @racket[#f], these compute the sample variance, standard deviation, skewness and excess kurtosis the samples in @racket[xs]. If @racket[ws] is not @racket[#f], they compute weighted variations of the same. @examples[#:eval typed-eval (stddev '(1 2 3 4 5)) (stddev '(1 2 3 4 5) '(1 1 1 1 10))] See @secref{stats:expected-values} for the meaning of the @racket[bias] keyword argument. } @deftogether[(@defproc[(variance/mean [mean Real] [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f] [#:bias bias (U #t #f Real) #f]) Nonnegative-Real] @defproc[(stddev/mean [mean Real] [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f] [#:bias bias (U #t #f Real) #f]) Nonnegative-Real] @defproc[(skewness/mean [mean Real] [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f] [#:bias bias (U #t #f Real) #f]) Real] @defproc[(kurtosis/mean [mean Real] [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f] [#:bias bias (U #t #f Real) #f]) Nonnegative-Real])]{ Like @racket[variance], @racket[stddev], @racket[skewness] and @racket[kurtosis], but computed using known mean @racket[mean]. } @section[#:tag "stats:running"]{Running Expected Values} The @racket[statistics] object allows computing the sample minimum, maximum, count, mean, variance, skewness, and excess kurtosis of any number of samples in O(1) space. To use it, start with @racket[empty-statistics], then use @racket[update-statistics] to obtain a new statistics object with updated values. Use @racket[statistics-min], @racket[statistics-mean], and similar functions to get the current estimates. @examples[#:eval typed-eval (let* ([s empty-statistics] [s (update-statistics s 1)] [s (update-statistics s 2)] [s (update-statistics s 3)] [s (update-statistics s 4 2)]) (values (statistics-mean s) (statistics-stddev s #:bias #t)))] @defstruct*[statistics ([min Flonum] [max Flonum] [count Nonnegative-Flonum])]{ Represents running statistics. The @racket[min] and @racket[max] fields are the minimum and maximum value observed so far, and the @racket[count] field is the total weight of the samples (which is the number of samples if all samples are unweighted). The remaining, hidden fields are used to compute moments, and their number and meaning may change in future releases. } @defthing[empty-statistics statistics]{ The empty statistics object. @examples[#:eval typed-eval (statistics-min empty-statistics) (statistics-max empty-statistics) (statistics-range empty-statistics) (statistics-count empty-statistics) (statistics-mean empty-statistics) (statistics-variance empty-statistics) (statistics-skewness empty-statistics) (statistics-kurtosis empty-statistics)] } @defproc[(update-statistics [s statistics] [x Real] [w Real 1.0]) statistics]{ Returns a new statistics object that includes @racket[x] in the computed statistics. If @racket[w] is given, @racket[x] is weighted by @racket[w] in the moment computations. } @defproc[(update-statistics* [s statistics] [xs (Sequenceof Real)] [ws (U #f (Sequenceof Real)) #f]) statistics]{ Like @racket[update-statistics], but includes all of @racket[xs], possibly weighted by corresponding elements in @racket[ws], in the returned statistics object. @examples[#:eval typed-eval (define s (update-statistics* empty-statistics '(1 2 3 4) '(1 1 1 2))) (statistics-mean s) (statistics-stddev s #:bias #t)] } @deftogether[(@defproc[(statistics-range [s statistics]) Nonnegative-Flonum] @defproc[(statistics-mean [s statistics]) Flonum] @defproc[(statistics-variance [s statistics] [#:bias bias (U #t #f Real) #f]) Nonnegative-Flonum] @defproc[(statistics-stddev [s statistics] [#:bias bias (U #t #f Real) #f]) Nonnegative-Flonum] @defproc[(statistics-skewness [s statistics] [#:bias bias (U #t #f Real) #f]) Flonum] @defproc[(statistics-kurtosis [s statistics] [#:bias bias (U #t #f Real) #f]) Nonnegative-Flonum])]{ Compute the range, mean, variance, standard deviation, skewness, and excess kurtosis of the observations summarized in @racket[s]. See @secref{stats:expected-values} for the meaning of the @racket[bias] keyword argument. } @section{Correlation} @section{Order Statistics} @(close-eval typed-eval)