LabPlot/2DPlotting/BoxPlot: Difference between revisions
Line 1: | Line 1: | ||
== Basic Concepts == | == Basic Concepts == | ||
A box plot (also known as box-and-whisker plot) visualizes the data | A box plot (also known as a box-and-whisker plot) visualizes the set of data by means of a small number of quantities providing a summary of the distribution of values in the data set. | ||
Elements of a box plot: | Elements of a box plot: | ||
[[File:LabPlot boxplot elements.png |500px|thumb|center]] | [[File:LabPlot boxplot elements.png |500px|thumb|center]] | ||
* Box - the upper and the lower lines of the box correspond to the third (Q3) and to the first (Q1) quartiles respectively. The difference between Q3 and Q1 is called the interquartile range (IQR). The height of the box represents the IQR. | * Box - the upper and the lower lines of the box correspond to the third (Q3) and to the first (Q1) quartiles respectively. The difference between the Q3 and the Q1 is called the interquartile range (IQR). The height of the box represents the IQR. | ||
* Median line - the line dividing the box into two parts and representing the median value of the data set. | * Median line - the line dividing the box into two parts and representing the median value of the data set. | ||
* Imaginary inner fences (not shown) - the upper inner fence represents the value that is 1.5 times IQR above the Q3 and the lower inner fence represents the value that is 1.5 times IQR below the Q1. | * Imaginary inner fences (not shown by default) - the upper inner fence represents the value that is 1.5 times the IQR above the Q3 and the lower inner fence represents the value that is 1.5 times the IQR below the Q1. | ||
* Imaginary outer fences (not shown) - the upper outer fence represents the value that is 3 times IQR above the Q3 and the lower outer fence represents the value that is 3 times IQR below the Q1. | * Imaginary outer fences (not shown by default) - the upper outer fence represents the value that is 3 times the IQR above the Q3 and the lower outer fence represents the value that is 3 times the IQR below the Q1. | ||
* Adjacent values - these are the outermost values on each end that are still within the corresponding inner fence. | * Adjacent values - these are the outermost values on each end that are still within the corresponding inner fence. | ||
* Caps - the lines referring to upper and lower adjacent values. | * Caps - the lines referring to upper and lower adjacent values. | ||
* Whiskers - the lines extending to caps, i.e. from Q3 and Q1 to upper and lower adjacent values respectively. | * Whiskers - the lines extending to caps, i.e. they lead from Q3 and Q1 to upper and lower adjacent values respectively. | ||
Note that there are also other ways to define whiskers in LabPlot: min/max, mean +/- 1 standard deviation, mean +/- 3 standard deviations, median +/- 1 median absolute deviation, median +/- 3 median absolute deviation, 10/90 percentiles, 5/95 percentiles, 1/99 percentiles. | Note that there are also other ways to define whiskers in LabPlot, namely: min/max, mean +/- 1 standard deviation, mean +/- 3 standard deviations, median +/- 1 median absolute deviation, median +/- 3 median absolute deviation, 10/90 percentiles, 5/95 percentiles, 1/99 percentiles. | ||
== Outliers == | == Outliers == |
Revision as of 09:09, 20 June 2021
Basic Concepts
A box plot (also known as a box-and-whisker plot) visualizes the set of data by means of a small number of quantities providing a summary of the distribution of values in the data set.
Elements of a box plot:
- Box - the upper and the lower lines of the box correspond to the third (Q3) and to the first (Q1) quartiles respectively. The difference between the Q3 and the Q1 is called the interquartile range (IQR). The height of the box represents the IQR.
- Median line - the line dividing the box into two parts and representing the median value of the data set.
- Imaginary inner fences (not shown by default) - the upper inner fence represents the value that is 1.5 times the IQR above the Q3 and the lower inner fence represents the value that is 1.5 times the IQR below the Q1.
- Imaginary outer fences (not shown by default) - the upper outer fence represents the value that is 3 times the IQR above the Q3 and the lower outer fence represents the value that is 3 times the IQR below the Q1.
- Adjacent values - these are the outermost values on each end that are still within the corresponding inner fence.
- Caps - the lines referring to upper and lower adjacent values.
- Whiskers - the lines extending to caps, i.e. they lead from Q3 and Q1 to upper and lower adjacent values respectively.
Note that there are also other ways to define whiskers in LabPlot, namely: min/max, mean +/- 1 standard deviation, mean +/- 3 standard deviations, median +/- 1 median absolute deviation, median +/- 3 median absolute deviation, 10/90 percentiles, 5/95 percentiles, 1/99 percentiles.
Outliers
With the above definitions we can use the following rule of thumb to identify and plot outliers.
- Outliers - smaller black circles (or other symbols) referring to values lying beyond either inner fence.
- Far outliers - larger black circles (or other symbols) referring to values lying beyond either outer fence.
Note that the rule of thumb for outliers is just a handy guideline that doesn't necessarily substitute good judgment.
Jittering
Box plot summary doesn't provide any insights into the actual distribution of numerical values in the data set can hide potentially important information. To get more information about the distribution of values, jittering can be added on top of the actual box plot visualization.
TODO: use the data sets presented on https://www.autodesk.com/research/publications/same-stats-different-graphs having completely different distribution but leading to the same box plot visualizations and show how jittering can provide more insights.
Note: instead of (or in addition to) jittering, a combined visualization of histogram and box plot can be used. TODO: produce an example similar to Darek's visualization.
Notches
TODO: https://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/boxplot.stats.html