A histogram, one of the Seven Basic QC Tools, is a very good tool to use to picture what a set of data looks like. It give shape to a set of data by grouping the data into “cells.” It shows you the spread or dispersion and the central tendency which can be used to compare to a standard or another group of data. In this way it can be an excellent troubleshooting tool by using it to compare different suppliers, equipment, processes to reveal their differences or similarities.
Although most statistical or spreadsheet software can create a histogram for you very easily I am going to talk you through how to create a good histogram by hand. The real key to a good histogram is to get the correct number of “cells” for the size of the set of data you have. If you have to few or to many it will not give you much of a feel for the spread or center of the data you have. Too few looks like a big clump and too many looks like a broad scatter of points. Neither shows or tells you much about your data. So here is what you do to build a histogram by hand.
- Find your largest and smallest number in the data and calculate the data range by subtracting the smallest value from the largest one.
- Now we determine the all important number of cells for our histogram. These cells will be the columns you see in a histogram. The “Six Sigma Handbook” by Thomas Pyzdek shows two ways to get the correct number of cells for you data. This first number will change a bit as you do some calculations but they are a very good starting point. The first is to use the table below.
Sample Size |
Number of Cells |
100 or less |
7 to 10 |
101-200 |
11 to 15 |
201 or more |
13 to 20 |
The second method, using a calculator, you can take the square root of the sample size and round that number to the nearest integer.
- Next we determine the width of each cell by dividing the range that you found in the step 1 by the number of cells we determined in step 2.
Once you have calculated the cell width round it to a convenient number. Doing this will affect the number of cells in your histogram, but that will be ok.
- Next we will computer the “cell boundaries.” Look at a cell as a range of values of your data. The cell boundaries define the start and end point for each cell in your histogram. Since it will be these start and end point we will make them one more decimal place more than our data values. Thus if our data values are integers (1, 12, 36) then our cell boundaries will be one decimal place (xx.x).
- Now we determine the low boundary of the first cell. This boundary has to be set less than the smallest value of your data set.
- Now that the lowest cell boundary is determined all the other cell boundaries are determined by adding the cell width to the previous boundary. Continue this until the upper boundary is larger than the largest value in the data set.
- Now go through the data that you have and determine in what cell each value goes and make a tick mark in that cell (bounded by the boundaries you calculated).
- Count the ticks in each cell and record the total count in each cell.
- Now we have all the statistics to create the histogram. First, on graph paper, draw a horizontal line near the bottom of the page. Leave room below to label the cell boundaries on this line.
- Starting with the lowest cell boundary, equally space all the boundaries along this line.
- Next at the left end of the horizontal line draw a vertical line. This lines length will be just longer than the largest cell count that you found. This line should be label from 0 to the largest cell count or just beyond. This is the count or frequency axis
- Last you draw in the columns (or bars) for each of the cells up to the count/frequency of that cell .
So below is a histogram made in Minitab but let me give you the basic information about its data.
- Lowest Value = 596.2
- Highest Value = 604.2
- Range = 8.0
- There are 200 values in this set of data
Now, let’s see how close it is to the manual method.
- Number of cells: Table value 15; Square root method 14.142
- Cell Width: Table: 8/15 = .5333 ~ .5 Square Root: 8/14.142 = .5656 ~ .5
- lowest Cell Boundary < 596.2 (and one decimal more) = 596.15
- All the other boundaries (Largest must be larger than the highest value [604.2]
# Cells | Lower Boundary | Cell Center | Upper Boundary | Lowest Val= |
596.2 |
1 |
596.15 |
596.4 |
596.65 |
Cell Width= |
0.5 |
2 |
596.65 |
596.9 |
597.15 |
||
3 |
597.15 |
597.4 |
597.65 |
||
4 |
597.65 |
597.9 |
598.15 |
||
5 |
598.15 |
598.4 |
598.65 |
||
6 |
598.65 |
598.9 |
599.15 |
||
7 |
599.15 |
599.4 |
599.65 |
||
8 |
599.65 |
599.9 |
600.15 |
||
9 |
600.15 |
600.4 |
600.65 |
||
10 |
600.65 |
600.9 |
601.15 |
||
11 |
601.15 |
601.4 |
601.65 |
||
12 |
601.65 |
601.9 |
602.15 |
||
13 |
602.15 |
602.4 |
602.65 |
||
14 |
602.65 |
602.9 |
603.15 |
||
15 |
603.15 |
603.4 |
603.65 |
||
16 |
603.65 |
603.9 |
604.15 |
||
17 |
604.15 |
604.4 |
604.65 |
Highest Val= |
604.2 |
Well there you have how to build a histogram by hand. . If, you have questions or comments please feel free to contact me by leaving a comment below, emailing me, calling me, or leaving a comment on my website.
Bersbach Consulting
Peter Bersbach
Six Sigma Master Black Belt
http://sixsigmatrainingconsulting.com
peter@bersbach.com
1.520.829.0090