ARTICLES
Topics

Hot topics
Hypothesis Testing

The following steps describe how to conduct a hypothesis test for a difference in means.However, these steps are the same for any hypothesis test on any other population parameter that a Black Belt may conduct.

1.      Define the problem or issue to be studied.

2.      Define the objective.

3.       State the null hypothesis, identified as H0.

l  The null hypothesis is a statement of no difference between the before and after states (similar to a defendant being not guillty in court).

H0:μbefore=μafter

The goal of the test is to either reject or not reject H0.

4.      State the alternative hypothesis, identified as Ha.

l  The alternative hypothesis is what the Black Belt is trying to prove and can be one of the following:

H0:μbeforeμafter(a two-sided test)

l  H0:μbefore<μafter(a two-sided test)

l  H0:μbefore>μafter(a two-sided test)

l  The alternative chosen depends on  what the Black Belt is trying to prove.In a two-sided test, it is important to detect differences form the hypothesized mean, μbefore,that lie on either side ofμbefore.The α risk in a two-sided test is split on both sides of the histogram.In a one-sided test, it is only important to detect a difference on one side or the other.

5.Determine the practical difference(δ).

l  The practical difference is the meaningful difference the hypothesis test should detect.

6.Establish the α and β risks for the test.

7.Determine the number of samples needed to obtain the desired β risk.

l  Remember that the power of the test is (1-β).

8.Collect the samples and conduct the test to determine a p-value.

l  Use a software package to analyze the data and determine a p-value.

9.Compare the p-value to the decision criteria (α risk) and determine whether to reject H0 in favor of Ha, or not to reject H0.

l  If the p-value is less than the α risk, then reject H0 in favor Ha.

l  If the p-value is greater than the α risk, there is not enough evidence to reject H0.

The risks associated with making an incorrect decision are described in the following table.

 

           

 

 

Decision Table

If the decision is:

 

 

H0

 

 

Ha

H0              Ha

Right Decision

α Risk Type I Error

β Risk Type Error

Right Decision

 

 

If the

Correct

answer is:

 

 

 

Depending on the population parameter of interest there are different types of hypothesis tests; these types are different types of hypothesis tests; these types are described in the following table.

 

Note: The table is divided into two sections: parametric and non-parametric.Parametric tests are used when the underlying distribution of the data is known or can be assumed(e.g., the data used for t-testing should subscribe to the normal distribution). Non-parametric tests are used when there is no assumption of a specific underlying distribution of the data.

Different Hypothesis Tests


Hypothesis Test

Underlying Distribution

Purpose

Parametric  (Assumes the data subscribes to a   distribution)

1   Sample t-Test

Normal

Compares   one sample average to a historical average or target

2   Sample t-Test

Normal

Compares   two independent sample averages

Paired   t-Test

Normal

Compares   two dependent sample averages

Test   for Equal Variances

Chi-square

Compares   two or more independent sample variances or standard deviations

1   Proportion Test

Binomial  

Compares   one sample proportion (percentage) to a historical average or target

2   Proportion Test

Binomial

Compares   two independent proportions

Chi-square   Goodness of Fit

Chi-square  

Determines   whether a data set fits a known distribution

Chi-square   Test for Independence

Chi-square

Determines   whether probabilities classified for one variable are associated with the   classification of a second

Non-Parametric(Makes   no assumption about the underlying distribution of the data)

1   Sample Sign Test

None

Compares   one sample median to a historical median or target

Mann-

Whitney   Test

None

Compares   two independent sample medians

 

2 Sample t-Test Example:

A Black Belt is interested in determining whether temperature has an impact on the yield of a process.The current process runs at 100℃ and results in a nominal yield of 28 kg.The Black Belt would like to change the temperature to 110℃ with the hope of detecting a 3-kg increase in output.The null hypothesis is defined as:

H0:μ100μ110(one sided)

and the alternative hypothesis is chosen as:

Ha:μ100<μ110(one sided)

The practical difference the Black Belt would like to detect is 3 kg (an increase to 31 kg).The test is conducted with an α and β risk of 5% and 10%, respectively. To achieve a β risk of 10%, twenty-one samples will need to be collected at both 100℃ and 110℃, the process temperature was changed to 110℃, and twenty-one samples were collected.The respective averages and standard deviations were 28.2 and 3.2, and 32.4 and 3.2.The data was entered into a software program and the p-value was determined to be 0.01.After comparing the p-value (0.01) to the α risk (0.05), H0 is rejected in favor of Ha

as there is only a 1% risk in deciding Ha is greater than H0 when compared to the initial 5% risk the Black Belt was willing to take.

2 Proportion Test Example:

A Black Belt is interested in determining whether a new method of processing forms will result in fewer defective forms. The old method resulted in 5.2% defectives.The Black Belt would like to change to a new method with the hope of reducing the percent defectives to 2.0%.The null hypothesis is defined as:

H0:Pold methodPnew method

and the alternative hypothesis is chosen as:

Ha:Pold method>Pnew method

The practical difference the Black Belt would like to detect is a 3.2% reduction.The test will be conducted with an α and β risk of 5% and 10%, respectively.To achieve a β risk of 10%, 579 forms will need to be collected at the old and new methods; therefore, 579 samples were collected at the old process, the new method was implemented, and 579 more samples were collected. The respective percentages were 5.2% (thirty defectives) and 2.9% (seventeen defectives).The data was entered into a software program and the p-value was determined to be 0.026. Comparing the p-value (0.026) to the α risk (0.05) results in a conclusion that H0 should be rejected.