]]>

]]>

]]>

]]>

I’m unsure if it is a common misconception, but I know I used to believe that the only thing needed to present findings from statistical analysis was a P-value less than the chosen alpha; that if there was evidence to reject the null hypothesis it could be concluded that a meaningful discovery had been made. What I learned during the Mod 3 project is that statistical significance is not indicative of a substantive significance in what is being measured or compared. In order to determine the true impact of a test drug on treatment or a new webpage design on sales, etc., it is important to also calculate effect size.Effect size is a quantitative measure of the strength of a test group and is a description of the size of the difference or relationship between two compared samples. Picture two normal bell curves, representing two samples one is comparing. Imagine the samples have the same mean, so the curves lay atop one another. Now imagine that the mean of the alternate sample, the group hypothesized to have some statistical difference than a control group, increases. The alternate bell curve begins to slide along the x-axis away from the control curve and the distance between the vertical lines in the middle of each curve (representing their means) begins to widen. That gap between their means is the effect size.It is important to be able to present the effect because that will provide tangible value to statistical findings. An example is a restaurant owner wanting to know if she should renovate her New York City based location. The renovation will cost $100,000 and the owner needs to know if the cost will be covered by new business the renovation brings. A data scientist is contracted to see if renovated restaurants conduct more business than those left unrenovated. Data is collected and a t-test is run that finds a significant p-value, stating that yes, renovated restaurants do fair better. However, until the data scientist determines the effect size between the sample means, it would not be known if renovations return an additional $200 in annual revenue or an additional $200,000. The analysis would not offer the business owner the full picture about their investment.A common way to calculate effect size is to find Cohen’s d term. d is calculated by taking the difference in means of two samples divided by the standard deviations of the samples. Cohen determined d values of 0.2, 0.5, and 0.8 to denote small, medium, and large effect sizes, respectively. When comparing samples from different groups – e.g. test scores from students at schools in California vs students from Michigan – the samples might have different variance so standard deviation is pooled for the denominator.When two samples have an effect size of 0, the mean of the alternate group is at the 50th percentile of the control group and the bell curves would completely overlap. With an effect size of 0.8, the mean of the alternate group falls at the 79th percentile of the control group, so an average case of the alternate group would be higher than 79% of the control group.Absolute effect size can be useful when the units of the means we are comparing are well known – something like distance or cost. When presenting information on data that is comparing means of something with less meaning to a general audience, we can best understand effect size by calculating a standardized measure of effect, which transform the effect to a digestible scale. This could be useful if you were comparing the magnitudes of earthquakes in various regions of the world and wanted to determine how significant a difference was in their measure on the Richter scale. An additional metric that can be used when describing the outcome of statistical analysis when comparing two groups is Power. Power is a product of many things but what it tells us is how likely we are to reject the null Hypothesis when it is false, thereby giving Beta, the likelihood of committing a Type II error. Power considers effect size, the chosen alpha, and the sample size, and determines how often will we be able to find a statistically significant outcome that will provide enough evidence to reject a false null hypothesis (which is what any good statistician wants!). A generally acceptable level for power is 80% - if you cannot produce a power level of at least 80%, many statisticians would suggest you should not continue with the analysis.Here is a quick refresher on the potential outcomes of a statistical test and how α and β play in: Null Hypothesis is: TrueFalseJudgement of Null(Statistical Result)Reject Ho (p<.05)Type I ErrorFalse PositiveαCorrect inferenceTrue positive(1 – β)Fail to reject (p>.05)Correct inferenceTrue negative(1- α)Type II ErrorFalse NegativeβWhere α is alpha (probability of Type I Error) and β is beta (probability of Type II Error)A data scientist does not control the effect size, as that is dependent on the means of the two groups. However, a larger effect size will lead to a higher power of the significance test and reduce the probability of making a Type II Error (β). Because power = (1 – β), so if power goes from .8 to .9, that would be a reduction in β, the P(Type II Error) goes from .2 to .1.Operating with a lower power can present a problem when conducting study replication because we would treat a replication as an independent occurrence and have to multiply powers to find a combined likelihood of finding real effect. E.g. if both the original and the replication studies both have a power of 80%, then we are only 64% likely to detect a real effect in both.A helpful acronym to remember methods to increase the power of a test is APRONS. APRONS stands for:Relax alpha levelUse a parametric statisticIncrease the reliability of the measureUse one tailed testIncrease the sample size(n)Increase the sensitivity of the design and or analysisBelow are some visual examples of how relaxing the alpha level and increasing the sample size increase power:Given a sample of 20, an alpha level of 0.05, and a calculated effect size of 0.63 (a relatively medium effect size), you can see we are sure that the given sample size will help us avoid throwing a Type II error 80% of the time (power presents as the light blue shaded region):Fig. 1If we increase the acceptable rate for a Type I error by increasing alpha to 0.1, we see that power increases to .88:Fig. 2By increasing alpha, we are pulling the Zcrit line to the left, thereby increasing the amount of the alternative distribution that lies to the right of the Zcrit threshold.And finally, if we were to maintain our alpha level, we could also choose to increase sample size to 30, thereby increasing power from .8 to .93:Fig. 3By increasing sample size, one can effectively decrease the standard error between the two groups and reduce the amount of overlap between them, which will also cause the range and variability of the sample distributions to decrease.While using Python, one can use the TTestIndPower method, found in statsmodels.stats.power package, to input three known metrics of alpha, effect size, desired power, and sample size in order to determine a fourth. The method completes Statistical Power calculations for t-test for two independent samples using pooled variance.Hopefully this provides a better idea of why calculating effect size and knowing the power of a test is important for statistical analysis.Sources:Information on p-value: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/Charts: https://rpsychologist.com/d3/NHST/See More

By the time I graduated university four years ago, I sensed that maybe I had missed out on something by not fitting any computer science or coding courses into my schedule. “Computer Science: two words that still held every ounce of mystery when I received my diploma as they had when I first met a few of the guys who lived down the hall that were majoring in it. Wasn’t computer science just something you studied if you wanted to work in IT and ask befuddled office workers, “Have you tried turning it off and back on again?” Soon after entering the professional working world, I understood that computer science was, in fact, a whole new way to work with information ……My math skills in high school were strong enough to be accepted into three top engineering schools—I had plans to pursue a career as a civil engineer. However, after my first year, I decided that pursuing a course of study that was so rooted in numbers and theory was cutting out the part of my personality that excels in approaching conflict with open eyes, gathering information and facts from multiple involved parties, and developing solutions with optimal outcomes for all stakeholders. Thus, in my second year at university, I pivoted out of engineering into Political Science, hitting a stride with a balanced curriculum of political theory and economics.Needing a new outlet for my future work, I realized I didn’t need to look further than my own backyard –Harlem. There was (and still is!) a national need for quality public education to combat the systemic under-education of traditionally underserved minority communities, especially in urban areas. Charter schools looked like the solution to me, so I jumped aboard the train.Over the last 4 years, I have worked at two district-sized charter systems in New York City–two systems that couldn’t be more different from each other in so many ways: Their metrics of success, rates of growth, ability to be self-reflective as an organization, office culture—you name it. I’ve been able to work in positions focused on performance of the organization and in positions that focused on the performance of the people that make the organization. No matter the approach, the goal is the same: offering kids the opportunity to become the first generation of college graduates in their families. One thing I’ve learned is that numbers talk. If you can prove there is a problem area that has negative impact on the organization, or can show that a small positive occurrence could drive positive results if multiplied, leadership is often willing to devote the necessary resources. This understanding is what awakened and then sustained my interest to learn to paint pictures with information. I want to dig into the nitty gritty and uncover the drivers of the deepest impact. Unfortunately, my toolbox has been somewhat limited. Spreadsheets don’t make great brushes–but I have had the opportunity to play with some of the tools of a true artist. …For most people, finishing a graduate degree means you are done being tested forever. Not the teachers at our schools. My current role has me assessing the performance of teachers; and let me tell you, we’re tough graders. What helps me get comfortable with the rigor of our instructional staff assessments is the thought that, eventually, I will want the best-of-the-best teaching my kids. That, and we have some seriously sweet perks for teachers who knock it out of the park.So, to capture all the great work these often-under-appreciated professionals are doing, I’ve learned a little SQL and figured out how to build a report in Tableau. It’s exciting to wrangle information together and present it, and the self-teaching that has accompanied learning these programs helps me relate to all of the students in our schools. Personally and professionally, I have always gained satisfaction when I can give someone a helpful answer; so imagine my appreciation for these newfound tools and the answers they could help me deliver.These are the exciting days for me in the office, because it is when I am doing this work that I get back to learning and challenging my mind. I’ve been exposed to new concepts that have started to build off one another–basic data structures, data types, joins, etc. When I am given an assignment for a report to build, I open my small toolbox and get to work. And when I hit a groove in these projects, or when I hit a block that I eventually solve, I’m transported back in time. Once again, I am 10 years old, sitting on my living room floor, with my 12-gallon tub of LEGO pieces spread out in front of me, and the world is a blank slate for me to build upon. Just like sorting the parts out to build the next great structure, I can sort information into helpful piles and look for the piece that will bring everything together to complete my masterpiece. There is a sense of freedom knowing that I have access to instruction manuals written by fellow builders, some master, some junior, all creators. I wanted to just glance at a picture to get an idea for a project, and when I needed to really follow some instructions step-by-step to get what was on the box—and believe me, I'll be reading a lot of instructions for the time being! The first few steps along the path of learning Data Science—dabbling in SQL and Python, differentiating data by types, building math equations with pandas functions—have solidified this vision in my mind. It is incredibly exciting for me because I know that while today, I’m stacking big blocks to build rudimentary houses, soon I’ll be constructing spaceships.P.S. Yes, I still have that 12-gallon tub of LEGOs.See More

By the time I graduated university four years ago, I sensed that maybe I had missed out on something by not fitting any computer science or coding courses into my schedule. “Computer Science: two words that still held every ounce of mystery when I received my diploma as they had when I first met a few of the guys who lived down the hall that were majoring in it. Wasn’t computer science just something you studied if you wanted to work in IT and ask befuddled office workers, “Have you tried turning it off and back on again?” Soon after entering the professional working world, I understood that computer science was, in fact, a whole new way to work with information ……My math skills in high school were strong enough to be accepted into three top engineering schools—I had plans to pursue a career as a civil engineer. However, after my first year, I decided that pursuing a course of study that was so rooted in numbers and theory was cutting out the part of my personality that excels in approaching conflict with open eyes, gathering information and facts from multiple involved parties, and developing solutions with optimal outcomes for all stakeholders. Thus, in my second year at university, I pivoted out of engineering into Political Science, hitting a stride with a balanced curriculum of political theory and economics.Needing a new outlet for my future work, I realized I didn’t need to look further than my own backyard –Harlem. There was (and still is!) a national need for quality public education to combat the systemic under-education of traditionally underserved minority communities, especially in urban areas. Charter schools looked like the solution to me, so I jumped aboard the train.Over the last 4 years, I have worked at two district-sized charter systems in New York City–two systems that couldn’t be more different from each other in so many ways: Their metrics of success, rates of growth, ability to be self-reflective as an organization, office culture—you name it. I’ve been able to work in positions focused on performance of the organization and in positions that focused on the performance of the people that make the organization. No matter the approach, the goal is the same: offering kids the opportunity to become the first generation of college graduates in their families. One thing I’ve learned is that numbers talk. If you can prove there is a problem area that has negative impact on the organization, or can show that a small positive occurrence could drive positive results if multiplied, leadership is often willing to devote the necessary resources. This understanding is what awakened and then sustained my interest to learn to paint pictures with information. I want to dig into the nitty gritty and uncover the drivers of the deepest impact. Unfortunately, my toolbox has been somewhat limited. Spreadsheets don’t make great brushes–but I have had the opportunity to play with some of the tools of a true artist. …For most people, finishing a graduate degree means you are done being tested forever. Not the teachers at our schools. My current role has me assessing the performance of teachers; and let me tell you, we’re tough graders. What helps me get comfortable with the rigor of our instructional staff assessments is the thought that, eventually, I will want the best-of-the-best teaching my kids. That, and we have some seriously sweet perks for teachers who knock it out of the park.So, to capture all the great work these often-under-appreciated professionals are doing, I’ve learned a little SQL and figured out how to build a report in Tableau. It’s exciting to wrangle information together and present it, and the self-teaching that has accompanied learning these programs helps me relate to all of the students in our schools. Personally and professionally, I have always gained satisfaction when I can give someone a helpful answer; so imagine my appreciation for these newfound tools and the answers they could help me deliver.These are the exciting days for me in the office, because it is when I am doing this work that I get back to learning and challenging my mind. I’ve been exposed to new concepts that have started to build off one another–basic data structures, data types, joins, etc. When I am given an assignment for a report to build, I open my small toolbox and get to work. And when I hit a groove in these projects, or when I hit a block that I eventually solve, I’m transported back in time. Once again, I am 10 years old, sitting on my living room floor, with my 12-gallon tub of LEGO pieces spread out in front of me, and the world is a blank slate for me to build upon. Just like sorting the parts out to build the next great structure, I can sort information into helpful piles and look for the piece that will bring everything together to complete my masterpiece. There is a sense of freedom knowing that I have access to instruction manuals written by fellow builders, some master, some junior, all creators. I wanted to just glance at a picture to get an idea for a project, and when I needed to really follow some instructions step-by-step to get what was on the box—and believe me, I'll be reading a lot of instructions for the time being! The first few steps along the path of learning Data Science—dabbling in SQL and Python, differentiating data by types, building math equations with pandas functions—have solidified this vision in my mind. It is incredibly exciting for me because I know that while today, I’m stacking big blocks to build rudimentary houses, soon I’ll be constructing spaceships.P.S. Yes, I still have that 12-gallon tub of LEGOs.See More