Smrati Sharma's Posts - Data Science Central 2020-11-26T04:16:58Z Smrati Sharma https://www.datasciencecentral.com/profile/SmratiSharma https://storage.ning.com/topology/rest/1.0/file/get/2808682810?profile=original&width=48&height=48&crop=1%3A1 https://www.datasciencecentral.com/profiles/blog/feed?user=33e9k665nbynx&xn_auth=no p-value and level of significance explained tag:www.datasciencecentral.com,2017-11-30:6448529:BlogPost:658698 2017-11-30T01:30:00.000Z Smrati Sharma https://www.datasciencecentral.com/profile/SmratiSharma <p>The concepts of p-value and level of significance are vital components of hypothesis testing and advanced methods like regression. However, they can be a little tricky to understand, especially for beginners and good understanding of these concepts can go a long way in understanding advanced concepts in statistics and econometrics. Here, we try to simplify the concept in an easy, logical manner. Hope this helps.</p> <p><strong>P-value</strong></p> <p>In hypothesis testing, we set…</p> <p>The concepts of p-value and level of significance are vital components of hypothesis testing and advanced methods like regression. However, they can be a little tricky to understand, especially for beginners and good understanding of these concepts can go a long way in understanding advanced concepts in statistics and econometrics. Here, we try to simplify the concept in an easy, logical manner. Hope this helps.</p> <p><strong>P-value</strong></p> <p>In hypothesis testing, we set a <strong>null hypothesis</strong> (lets say mean x = 10), and then using a sample, test this hypothesis. After testing the hypothesis, we get a result (lets say x = 12). Now with p value, we obtain a probability that given than the <strong>population mean</strong> was 10, what is the probability that we get a <strong>sample mean</strong> of 12.</p> <p>If that <strong>probability</strong> is too <strong>low</strong>, we <strong>reject</strong> the null hypothesis, that is, we say that based on current evidence and testing, the null hypothesis is <strong>not true</strong>. If that probability is too <strong>high</strong>, we <strong>accept</strong> the null hypothesis, that is, we say that based on current evidence and testing, the null hypothesis is <strong>true</strong>. This probability is the <strong>p-value</strong>. It is a result that we obtain after conducting our statistical test (e.g regression).</p> <p>To explain more, it is important to understand what we are trying to do. We have a population. We are <strong>assuming</strong> something about that population (lets say mean i.e. x = 10) and now we want to <strong>test</strong> from a given sample whether it is true or not that the mean is 10. Now how do we do that? We perform our statistical test with the sample (and NOT the population). We get the result. Lets say the result is x=12.</p> <p>Now, it is important to understand what we have <strong>assumed</strong> and what we have <strong>got</strong>. We have assumed that the population mean is 10, and we have got the result that sample mean is 12. In a sense, assumed <strong>population mean</strong> is an <strong>assumption</strong> and <strong>sample mean</strong> is a <strong>result</strong> that we have obtained. assumed mean is an assumption, a possibility. It is what we are assuming the value to be. Sample mean is a result that we have obtained after performing the test.</p> <p>Now we have to verify whether what we have obtained (sample mean) is <strong>consistent</strong> with what we have assumed (population mean). In other words, what are the chances of getting the result (sample mean) if the assumption is actually true (population mean). What are the chances that sample mean is 12, under the assumption that population mean is 10? That chance or probability is called as p-value.</p> <p>If that p-value is <strong>low</strong>, it means that the chances were very low to obtain the sample mean as 12, if the assumption that population mean is 10 was true. Thus, something is wrong. Sample mean cannot be wrong, as it is our result. It is what our sample data says. Thus, the only thing that can be wrong is the assumption of population mean. In other words, it appears that the <strong>assumption</strong> that population mean is 10 (our null hypothesis) is itself <strong>wrong</strong> and we should <strong>reject</strong> that. In this case, we say that our result is <strong>SIGNIFICANT</strong>, which means that from our results, we have concluded that our sample mean is <strong>significantly different</strong> from our population mean.</p> <p>If that p-value is <strong>high</strong>, it means that the chances were very high to obtain the sample mean as 12, if the assumption that population mean is 10 was true. Thus, it appears that the assumption that population mean is 10 (our null hypothesis) is <strong>right</strong> and we should <strong>accept</strong> (or not reject it). In this case, we say that our result is <strong>INSIGNIFICANT</strong>, which means that from our results, we have concluded that our sample mean is <strong>NOT significantly different</strong> from our population mean.</p> <p><strong>Level of significance</strong></p> <p>Now, the next question is, how do we know that the p-value or the probability we have obtained after our statistical test is <strong>too high</strong> or <strong>too low</strong> to accept or reject the null hypothesis. Is 0.03 or 3% too low or too high, is 0.07 to 7% too low or too high.</p> <p>To decide, whether the p-value is too low or too high, we have to set a <strong>standard</strong> (as a checkpoint or a benchmark). If the obtained p-value is <strong>lesser</strong> than that standard, we conclude that the p-value is <strong>too low</strong> or our results are <strong>significant</strong> and we should <strong>reject</strong> the null hypothesis. If the obtained p-value is <strong>higher</strong> than that standard, we conclude that the p-value is <strong>too high</strong> or our results are <strong>insignificant</strong> and we should <strong>accept</strong> the null hypothesis.</p> <p>This standard or checkpoint that we set is called <strong>LEVEL OF SIGNIFICANCE</strong>. It is upon us as a statistical investigator to choose our level of significance. Most often, level of significance of 5% is chosen as a standard practice. However, levels like 1% and 10% can also be chosen.</p> <p>e.g if our p-value is <strong>0.07</strong>, we say that out results are <strong>insignificant at 5% level</strong> (and we should accept our null hypothesis at this level) and are <strong>significant at 10% level</strong> (and we should reject our null hypothesis at this level).</p> <p></p> <p></p>