P – Value
In this blog we will discuss the important functionality of p – value in statistical experiments. Why p – value is the deciding factor for accepting or rejecting a hypothesis we develop before any experiment.
Problem Statement:
You have launched a product (e.g. a phone) in the market. And you get customer feedback that the phone has over heating problem. As the phone is already launched in the market you can’t recall all of them to test if the majority of the phones have overheating problem due to some manufacturing problem.
Hence to address the issue you have decided to take surveys regarding the phone and do a statistical test to overrule your apprehension regarding the manufacturing issue. You ask all the employees of your company to share their feedback regarding the phone overheating problem. You also took an online survey from the customer regarding the same. Now you have a random sample of 500 feedbacks against the total number of 250000phones you have sold.
Population size = 25000, Sample size = 500
However, before the release of the product to the market you already have conducted a test and found that at maximum 3% of the phone may have the overheating problem which is due to some random event and it does not relate to any manufacturing issue. It may have occurred due to overcharging or overusing. This is acceptable to your company. Otherwise you have to recall all the phones from market to do a re-evaluation.
Now you have to take a decision whether you will recall the phone from the market or not.
Hypothesis:
The important step of your statistical experiment journey is to set up null hypothesis and alternate hypothesis first.
Null hypothesis (H0):= Overheating of phones are as expected and due the some random events which was observed during the production process.
Alternate hypothesis (H1):= Overheating of the phones are not due to some random events. There must be some strong reason behind the overheating.
If p value is large you accept null hypothesis.
If p value is small you fail to accept null hypothesis. You believe that the alternate hypothesis is somewhat acceptable. Your test is statistically significant.
Data:
We will demonstrate the test with two data scenarios.
Scenario 1:
Scenario 2:
The sample size n = 500.
m = 2. (Number of categorical values (Here they are Overheating & Non-overheating OR Yes & No))
Experiments and Results:
Before we proceed with the experiment, let’s set the confidence interval first and know about the types of error.
H1 Error: - We reject null hypothesis even though it is true. (In our example, even after observing that the overheating of phones happen due to random events we still reject null hypothesis and assume that the overheating happens due to some manufacturing issue.)
H2 Error: - We retain null hypothesis even though it is false. (In our example, even after observing that the overheating of phones happen due to some manufacturing issue we still accept null hypothesis and assume that the overheating happens due to random events.)
CI: Confidence interval for our test will be 95%. This means we are 95% confident that the test results of our sample will fall 95% close to the population.
α (Significance level) is the probability of H1 error. Here α = 0.05.
We will use Chi Square (X^{2}) test to perform the experiments for below two scenarios.
Scenario 1:-
From the experiment we saw the Chi Square (X^{2}) value is 0.3436 and p – value is 0.56 (calculated).
p - value is greater than the α (=0.05).
We can also search the critical value of X^{2} from the table provided below and find that the next critical value after X^{2}=0.3436 is 2.706 and its corresponding p-value is 0.1. And our X^{2} value lies between X^{2}_{0}_{.10} and X^{2}_{0}_{.90}. This means our p-value (though we have already calculated) calculated above is between 0.1 to 0.9 and it is not smaller than 0.05.
(Figure represents the critical value of Chi Square (X^{2}).)
We fail to prove any evidence against null hypothesis. We can’t reject null hypothesis. This means the number of overheating phones we found from the survey is not significantly different than what we observed during our production process.
Scenario 2:
From the experiment we saw the Chi Square (X^{2}) value is 16.8384 and p –value is 4.10E-05 (calculated). P value is smaller than the α.
We can also search the critical value of X^{2} from the table provided below and find that the next critical value after X^{2}=16.83 fall beyond (right) X^{2}_{.01 } and its corresponding p-value is less than 0.01. This means our p-value (though we have already calculated) should be less than 0.01 and it is obviously less than 0.05 s well.
(Figure represents the critical value of Chi Square (X^{2}).)
We have to reject null hypothesis. This means the number of overheating phones we found from the survey is significantly different than what we observed during our production process. And it is not due to some random events.
Outcome of the Problem Statement:
For the 1^{st} scenario we will accept the null hypothesis and our phones don’t have any manufacturing problem.
For the 2^{nd} scenario we have to check the phones for their manufacturing problem as we have strong evidence against the null hypothesis.
Conclusion:
Alternate hypothesis can’t be true. You can only fail to accept null hypothesis. That means you have weak evidence that null hypothesis is true. Hence your experiment (Scenario 2) suggests there might be some reason that your phones are overheating and needs to be addressed and they are not overheating due to some random events. P-value along with the confidence level plays a major role in hypothesis testing apart from the critical values of the test.
Dear readers, I will be pleased to receive your comments/suggestions on this post. Please feel free to post.
Thank you.
Comment
Amlan,
P value can be important in small scale research, but it loses meaning when you have n >10,000. Effect size is also needed to see if your p value has any real impact. For Chi-square, effect size is estimated by phi. Phi = sqrt(ChiSquare/n). In the first and second case, your phi is small, meaning that although you had a significant result, the effect was small. In practical terms, if you react to this result, you may end up costing the company more in fixing the issue than it is worth. The p value of your second case is .00004, far less than your alpha of .05 - a significant result. The phi however is .18, which is a fairly small effect size. Statistically significant is different from real world impact. Something we have to keep in mind. Always look for effect size along with P value. P is just one of several outcome parameters we look at in stats.
This post demonstrates the poverty of frequentist statistical inference. First, the procedure does not answer the question. The engineers' question was "Given the data y, what's Pr(theta >= .03 | y)?". The hypothesis testing procedure addresses the question "What's Pr(y | theta=.03)"? Answering the first question by the second can lead to error.
Consider the Bayesian approach.
Assume the propensity to failure follows a beta density.
Assume further the engineers' beliefs are beta(.3, 9.7), so that the mean propensity to failure is 3% and they are 95% confident that the propensity lies between 0 and 18%. Their prior probability that the propensity (designated theta) is greater than 3% is 28%.
According to the first scenario, the observed failure rate is 20 out of 500. Using Bayes Theorem, the posterior mean is beta(20+.3, 480+9.7). The posterior mean is 4% and they are now 95% confident that it lies between 2% and 7%. Their posterior Pr(theta>.03 | y= 20 out of 500) = .88. This is a 50% increase from their prior and in a business and safety climate might be worrisome.
According to the second scenario, the observed failure rate is 50 out of 500. Using Bayes Theorem, the posterior mean is beta(50+.3, 450+9.7). The posterior mean is 10% and the engineers are now 95% confident that it lies between 8% and 13%. Their posterior Pr(theta>.03 | y= 50 out of 500) = 1. In this scenario, the engineers can clearly conclude that the propensity to failure exceeds the 3% threshold.
The Bayesian approach gives a clear answer to the question posed. The engineers' prior information is taken into account. The confidence interval is interpreted naturally as the probability that an uncertain parameter falls within fixed bounds (and not as one in a hypothetical infinite sequence of such intervals that cover the unknown parameter 95% of the time. The author makes this common misinterpretation of frequentist confidence intervals)
Furthermore, the Bayesian and frequentist disagree on the import of the evidence from the first scenario. The frequentist makes the dogmatic claim that there is insufficient evidence to worry. The Bayesian makes a more nuanced claim that the evidence is sufficiently but not definitively worrisome.
Hi Amlan,
thank you, I have ambiguity if you clarify which are
If p-value=1%(not alpha), then which one is correct
Hi Amlan
It is a very interesting example, Would you answer the following questions that
how you find out(is there any formula) p-value, and
If p-value=1%, then which one is correct
1. The probability that null hypothesis is true is 1 in 100
2. The probability that null hypothesis is false is 1 in 100
3. The probability that alternative hypothesis is true is 1 in 100
4. The probability of getting data as extreme as we have done is 1 in 100 if the null hypothesis is true
© 2018 Data Science Central™ Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central