Posts Tagged ‘DMAIC Measure’

How to build a good Histogram

Monday, February 20th, 2012

A histogram, one of the Seven Basic QC Tools,  is a very good tool to use to picture what a set of data looks like.  It give shape to a set of data by grouping the data into “cells.” It shows you the spread or dispersion and the central tendency which can be used to compare to a standard or another group of data. In this way it can be an excellent troubleshooting tool by using it to compare different suppliers, equipment, processes to reveal their differences or similarities.

Although most statistical or spreadsheet software can create a histogram for you very easily I am going to talk you through how to create a good histogram by hand. The real key to a good histogram is to get the correct number of “cells” for the size of the set of data you have. If you have to few or to many it will not give you much of a feel for the spread or center of the data you have. Too few looks like a big clump and too many looks like a broad scatter of points. Neither shows or tells you much about your data. So here is what you do to build a histogram by hand.

  1. Find your largest and smallest number in the data and calculate the data range by subtracting the smallest value from the largest one.
  2. Now we determine the all important number of cells for our histogram. These cells will be the columns you see in a histogram. The “Six Sigma Handbook” by Thomas Pyzdek shows two ways to get the correct number of cells for you data. This first number will change a bit as you do some calculations but they are a very good starting point. The first is to use the table below.

Sample Size

Number of Cells

100 or less

7 to 10


11 to 15

201 or more

13 to 20


The second method, using a calculator, you can take the square root of the sample size and round that number to the nearest integer.

  1. Next we determine the width of each cell by dividing the range that you found in the step 1 by the number of cells we determined in step 2.


Once you have calculated the cell width round it to a convenient number. Doing this will affect the number of cells in your histogram, but that will be ok.

  1. Next we will computer the “cell boundaries.” Look at a cell as a range of values of your data. The cell boundaries define the start and end point for each cell in your histogram. Since it will be these start and end point we will make them one more decimal place more than our data values.  Thus if our data values are integers (1, 12, 36)  then our cell boundaries will be one decimal place (xx.x).
  2. Now we determine the low boundary of the first cell. This boundary has to be set less than the smallest value of your data set.
  3. Now that the lowest cell boundary is determined all the other cell boundaries are determined by adding the cell width to the previous boundary. Continue this until the upper boundary  is larger than the largest value in the data set.
  4. Now go through the data that you have and determine in what cell each value goes and make a tick mark in that cell (bounded by the boundaries you calculated).
  5. Count the ticks in each cell and record the total count in each cell.
  6. Now we have all the statistics to create the histogram. First, on graph paper, draw a horizontal line near the bottom of the page. Leave room below to label the cell boundaries on this line.
  7. Starting with the lowest cell boundary, equally space all the boundaries along this line.
  8. Next at the left end of the horizontal line draw a vertical line. This lines length will be just longer than the largest cell count that you found. This line should be label from 0 to the largest cell count or just beyond. This is the count or frequency axis
  9. Last you draw in the columns (or bars) for each of the cells up to the count/frequency of that cell .


So below is a histogram made in Minitab but let me give you the basic information about its data.

    • Lowest Value = 596.2
    • Highest Value = 604.2
    • Range = 8.0
    • There are 200 values in this set of data



Now, let’s see how close it is to the manual method.

  1. Number of cells: Table value 15; Square root method 14.142
  2. Cell Width: Table: 8/15 = .5333 ~  .5 Square Root: 8/14.142 = .5656 ~ .5
  3. lowest Cell Boundary < 596.2  (and one decimal more) = 596.15
  4. All the other boundaries (Largest must be larger than the highest value [604.2]


# Cells Lower Boundary Cell Center Upper Boundary Lowest Val=






Cell Width=

















































































Highest Val=



Well there you have how to build a histogram by hand. . If, you have questions or comments please feel free to contact me by leaving a comment below, emailing me, calling me, or leaving a comment on my website.

Bersbach Consulting
Peter Bersbach
Six Sigma Master Black Belt


The Second step of DMAIC – Measure

Thursday, November 5th, 2009

Measure is the second step of the Six Sigma five step process DMAIC. The objectives of Measure are two fold. First is, using GEMBA, factually understand the existing process. For those that have not heard of GEMBA it means “go see” and in Six Sigma we use the term to imply that you need to get up and go out to the process that you are going to improve and actually “see” what is really happening.  You can NOT do this is a conference room or at your desk alone. The process may in fact be all done at your desk but you need to have other with you looking at what is done to collect the actual way things are done. Second, you then compile that data into a characterization of the current state. Many times you will hear this called the process Baseline.

In measure it is important that you capture exactly what is really happening and do not take for granted that everything is being done exactly to procedure. It usually is not. If more than one person is doing this process check them all as you will find they all are doing it slightly different. Different is not necessarily bad but that difference IS the variation in your process.  So to help me (and you) capture the current state I have a series of questions that if you address them will help insure you complete this step. Here they are:

  1. 1. How does it work now?
  2. 2. What are the Key metrics for this process? Are they valid?
  3. 3. What is the Current Sigma level of the process?
  4. 4. What are the Detail steps of the process?
  5. 5. What is good about the process you want to keep?
  6. 6. What are the problems and their causes?

It is very important that you get all of these answered. Some that stand out are identifying key metrics  and identifying what is good about the process. You need to always look and find out how the process is currently being measured. How does any one know that things are going good or bad with this process? They have to have some “measure” they take to get that idea. Yes sometimes it is a gut feel but you will have to back it up with data so if they say it takes to long; time it and see how long it takes. The second key question that many forget is what works well in the process. This is key because you want to make sure that when you improve the bad you have not made things worse in the good areas. Those you do not want to touch.

To answer the question above it may take several tools and techniques to collect the “facts and data” to answer these questions. We like opinions, they help guide us, but we need data that validates the opinions. So in answering these questions many times you will get opinions but you still need to gather the data to show the opinion is true (or false). So here in Measure, there are several good tool and techniques that can help get you that data.

  • Descriptive Statistics –These are generally calculated from a sample of  information (data) off your process. They tend to be in three areas of interest about your data group (distribution). They are location, it’s spread and it’s shape.

Location _______Spread ________Shape

Location _______Spread ________Shape

  • Location or Central Tendencies – These are the mean (average), Median (or middle value) and the Mode (the most frequent value).
  • Spread or Dispersion – The most popular are Range (The difference or spread between the highest value and the lowest value) or the Standard Deviation a calculated measure of variation around the Mean or Average.
  • Shape – Two things determine shape. One is it Skewed to one side or the other (you can calculate this) and two is it flat or peaked (again you can calculate this [kurtosis])
  • Dot Plots – Are just that a plot of every data point on a chart.

Dot Plot

  • Histograms – These are a pictorial of the data. They are created by grouping the data into what is called bins or cells.


  • Run Charts – Similar to the dot chart, the run chart plot the data over time, in time sequence.

run chart

  • Control Charts – Like the Run Charts, Control Charts plot data over time. Unlike Run Charts they have control limits plotted on the chart as well. Although you can plot individual data points like the one below, a control chart many time plots Summary data over time.

Control Chart x

  • Gauge R&R (Repeatability & Reproducibility) – These studies are important to understanding how much of the variation that you see is due to the process and how much is do to the measurement system (the way the measurements are taken). Many times this is overlooked but you have to understand that everything varies including the way you measure (collect) your data.
  • Pareto Charts – This is a special type of Histogram that arranges the Bins or Cells (categories) into an order of highest frequency to the lowest. This is done so one can see what the major categories are.


  • Process Observation Log – This is just a log sheet that you list the process steps in order, how long they take and what the yield is at each step. Sometimes people include a column that identifies the step as value added or Non value added.
  • Process Flow Diagramming – This is a  diagram draw to show the sequence of steps from the process observation log. People draw these because it is easier to understand and see what is happening in the diagram then the log list.

Flow Chart

  • DE & UDE – DE (Desired Effect and Undesired Effect) is just a list of those things that are desirable (things you want to keep) and undesirable (things you do not want). This seems simple, and it is, but we forget to write these things down so that when we get into the heat of things we can not remember them unless they are written down.

Plus here are two more that I talked about in Define: Brainstorming and the 5 Whys.

Well there you have it. A little more understanding of the Measure step of the Six Sigma 5 step DMAIC process.

Peter Bersbach

Six Sigma Master Black Belt

Bersbach Consulting

From Process to Profits


The Power of the “X” Chart

Monday, June 29th, 2009

X charts can be used for almost anything and in this article Tom Pyzdek show just how powerful this little chart can be using percentages. And how the X chart does a good approximation of a p chart.