02-ECDF and Histogram
Difference between ECDF(Empirical Cumulative Distribution Function) and CDF:
The empirical CDF is built from an actual data set . The CDF is a theoretical construct.
Let X be a random variable.
- The cumulative distribution function F(x)gives the P(X≤x).
- An empirical CDF function G(x)gives the P(X≤x) in your actual sample.
The distinction is which probability measure is used. For the ECDF, you use the probability measure implicitly defined by the frequency counts in your sample.
Simple example (coin flip):
Let X be a random variable denoting the result of a single coin flip where X=1denotes heads and X=0 denotes tails.
The CDF for a fair coin is given by:
F(x)=⎧⎩⎨⎪⎪0121for x<0for 0≤x<1for 1≤x
If you flipped 2 heads and 1 tail, the empirical CDF would be:
G(x)=⎧⎩⎨⎪⎪0231for x<0for 0≤x<1for 1≤x
The empirical CDF would reflect that 2/3 of your flips were heads.
Why ECDF is useful?
- 1st step for data visualization. Usually along with Histogram;
- Help us to define the distribution of dataset;