LabPlot/2DPlotting/BoxPlot

From KDE Wiki Sandbox

Basic Concepts

A box plot visualizes the data set by means of a small number of quantities providing a summary of the distribution of values in the data set.

Elements of a box plot:

  • box - the upper and the lower lines of the box correspond to the third (Q3) and to the first (Q1) quartiles, respectively. The difference between Q3 and Q1 is called the interquartile range (IQR). So, the height of the box is showing the size of the IQR.
  • median line - the line dividing the box into two parts and representing the median of the data set.
  • whiskers

Outliers

Jittering

Box plot summary doesn't provide any insights into the actual distribution of numerical values in the data set can hide potentially important information. To get more information about the distribution of values, jittering can be added on top of the actual box plot visualization.


TODO: use the data sets presented on https://www.autodesk.com/research/publications/same-stats-different-graphs having completely different distribution but leading to the same box plot visualizations and show how jittering can provide more insights.

Note: instead of (or in addition to) jittering, a combined visualization of histogram and box plot can be used. TODO: produce an example similar to Darek's visualization.

Notches

TODO: https://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/boxplot.stats.html