Distributions


This post is a great resource on probability distributions.

The key insight is that the long tails of distributions are what set them apart:

How much all this matters depends on what type of question you want to answer with your model. If you just want to know where the bulk of the data is, then your choice of distribution doesn’t matter that much. If you are trying to determine the probability of exceptional events, events far away from the mean, then the size of the tails is the only thing that matters. An event six standard deviations from the center of a logistic distribution is several thousand times more likely than an event six standard deviations from the center of a gaussian distribution.

The section at the bottom of that post, titled Menagerie, is the log of various distributions, plotted with different parameters. It’s a good way to visualize the possible configurations of each distribution’s tails.