An article by Vincent Granville posted to Hadoop360 introduces a formal method to generalize the notion of variance based on L^p norms. Whereas the formal generalization suggested in the article did meet several desired criteria, it left other desirable criteria unmet. In particular, there was no formal connection between the generalized variance and an associated generalized mean, and there was no guidance as to determining the existence of the generalized mean or variance for a given distribution. Leaving aside the motivation for stable, efficient computation, I wanted to find out if such a formulation was possible that would meet all desired criteria and would provide robust estimates of both locality and scale of deviation.
As it happens, such a generalization has recently been published in the open access journal Entropy by George Livadiotis of the Southwest Research Institute. In his article he introduces a generalized expectation value and an associated variance based on L^p norms. His construct differs from that of the more familiar Kolmogorov-Nagumo means, to which examples like the geometric mean and harmonic mean belong, and in which one computes a strictly monotone function at each data point, computes the arithmetic mean of the result, and then computes the corresponding inverse function of the mean. Livatiodis adopts the term “quasiarithmetic” to describe this type of mean and to differentiate it from his own construct.
To summarize his construct concisely, we define the L^p mean m_p as the solution to the following balance condition:
Sum_k |y_k – m_p|^(p – 1) sgn(y_k – m_p) = 0
For p=1, this condition is a “mass balance” equation, where each data point can be considered as a unit mass placed on the left (–) or right (+) plate of a balance, thus rendering m_p as the median value of y.
For p=2, the condition is a “torque balance” equation, where each data point can be considered as a unit mass placed on a balancing lever at a distance to the left (–) or right (+) of the fulcrum that is proportional to the data point value. The value m_p is the position of the fulcrum along the lever using the same distance metric and is the arithmetic mean of y.
For p between 1 and 2, we have a construct, the only one of which I’m aware, that allows us to make a smooth transition between the median and the arithmetic mean using L^p norms while preserving all of the desirable properties of a location parameter. In his paper Livatiodis proves that, in conditions where it exists, the L^p mean is unbiased. The L^p mean exists provided the (p–1)-norm of y over its distribution is finite.
Livatiodis defines the associated L^p variance V_p as a quantity with proportionality to the sum of p-deviations, Sum_k |y_k – m_p|^p, so that its minimization, as desired, results in the balance equation given above. His actual construction is, in parallel to the standard variance, is:
V_p = E[(Y – E(Y))L_p(Y – E(Y))],
where E() denotes the expectation value, Y denotes the random variable generating y and L_p is an L^p norm operator defined so that m_p = E[L_p(y)]. This construct leads to:
V_p = [Sum_k |y_k – m_p|^p] / [(p – 1) Sum_k |y_k – m_p|^(p – 2)]
For p=1, V_p is the mean absolute deviation (MAD) of y. For p=2, V_p is the standard Fisher variance of y. Livatiodis further shows that, using this form for the variance, the variance of the N-point L^p mean estimator approaches the actual L^p mean with error decreasing as 1/N, regardless of p, thus preserving a more generalized form of the central limit theorem. The existence of the L^p variance, however, requires that the p-norm of y over its distribution is finite.
Thus, we have a generalized L^p variance that, provided the variance exists for the distribution that generates the data, is stable to compute, has a clear connection to a mean value, and satisfies all seven of Granville’s desired properties, and then some. It does incur some computational cost, however, in that the solution to the balance equation is in general a nonlinear root-finding problem.
Comment
L^p norms can be used for regression also. Here too one has the advantage of robustness against thick tailed distributions. In the almost pre-historic year of 1972 I published a paper with the following abstract:
It is well known that the mean is very sensitive to deviations from normality, especially due to outliers or long tails. We propose the use of an estimator which has been demonstrated to be more robust than least squares for estimating the simple mean, that is, the estimator which minimizes the pth power of the deviations for a power of p between one and two. We also show that a reasonably fast and widely available computer subroutine is available to solve the problem.
"Robust estimation of straight line regression coefficients by minimizing pth power deviations". Technometrics. 02/1972; 14(1):159-166.
If you want a copy I think I can get a copy from ReaseachGate and pass it to you. Of course a lot has appeared in the literature since then,
Alan B. Forsythe
© 2021 TechTarget, Inc. Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central