In
statistics and
probability theory, the
median is the numerical value separating the higher half of a data
sample, a
population, or a
probability distribution, from the lower half. The
median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one (eg, the median of {3, 5, 9} is 5). If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the
mean of the two middle values, which corresponds to interpreting the median as the fully
trimmed mid-range. The median is of central importance in
robust statistics, as it is the most resistant statistic, having a
breakdown point of 50%: so long as no more than half the data is contaminated, the median will not give an arbitrarily large result.
A median is only defined on ordered one-dimensional data, and is independent of any distance metric. A
geometric median, on the other hand, is defined in any number of dimensions.
In a sample of data, or a finite population, there may be no member of the sample whose value is identical to the median (in the case of an even sample size); if there is such a member, there may be more than one so that the median may not uniquely identify a sample member. Nonetheless, the value of the median is uniquely determined with the usual definition. A related concept, in which the outcome is forced to correspond to a member of the sample, is the
medoid.
At most, half the population have values strictly less than the
median, and, at most, half have values strictly greater than the median. If each group contains less than half the population, then some of the population is exactly equal to the median. For example, if
a <
b <
c, then the median of the list {
a,
b,
c} is
b, and, if
a <
b <
c <
d, then the median of the list {
a,
b,
c,
d} is the mean of
b and
c; i.e., it is (
b +
c)/2.
The median can be used as a measure of
location when a distribution is
skewed, when end-values are not known, or when one requires reduced importance to be attached to
outliers, e.g., because they may be measurement errors.
In terms of notation, some authors represent the median of a variable
x either as
or as
sometimes also
M. There is no widely accepted standard notation for the median, so the use of these or other symbols for the median needs to be explicitly defined when they are introduced.
The median is the 2nd
quartile, 5th
decile, and 50th
percentile.