]]>

The Ultimate R Cheat Sheet simply put makes it easy to learn R. The Ultimate R Cheat Sheet saves you time by providing hyperlinks to the documentation and package-level cheat sheets for the most important packages in R. The latest update doubles its ultimateness by providing a second page that includes Special Topics:Time Series, Forecasting, and Financial AnalysisNetwork Analysis and Natural Language ProcessingGeospatial AnalysisMachine Learning and Deep LearningSpeed and Scale, Interoperability, and Miscellaneous TopicsAbout The Ultimate R Cheat SheetThe Ultimate R Cheat Sheet links to every document you need by including visuals for where each package fits in the process.About The Upgrade - Special Topics (Page 2)We’ve now added a second sheet that includes special topics.Learn R for Business With The Ultimate R Cheat SheetLearn tools that will advance your career while generating business value for your organization.Learn by following the Ultimate R Cheat SheetFollow a clear-cut path from data import to business reportingLearn dplyr and tidyr for cleaning and wrangling dataLearn ggplot2 for visualizationLearn rmarkdown for business reportingLearn 2 modeling techniques: regression and clusteringLike our cheat sheets? Get them all!BUSINESS SCIENCE PROBLEM FRAMEWORK (BSPF)ULTIMATE R CHEAT SHEETULTIMATE PYTHON CHEAT SHEETSee More

The Ultimate R CheatsheetWe are developing a revolutionary new system for teaching Business Analysis with R (Business Analysis with R is a new course we are developing at Business Science University). The system is revolutionary for a number of reasons (we’ll get to these in a minute). The cornerstone of our teaching process is the Data Science with R Workflow that was originally taught by Hadley Wickham and Garrett Grolemund in the excellent book, R For Data Science. The Ultimate R Cheatsheet links the documentation, cheatsheets, and key resources available for every R package in the data science with R workflow into one meta-cheatsheet that illustrates the workflow.The Ultimate R Cheatsheet is available here. HOW TO USE THE CHEATSHEETThe cheatsheet contains every resource you need for referencing the tidyverse documentation in one spot. Let’s take a look.THE WORKFLOWThe first thing you will notice is the workflow that is prominently presented. You can see where the various R Packages are used.Data Science WorkflowLINKS TO DOCUMENTATIONHere’s the beauty of the R cheatsheet. With one click, you can easily get to the web documentation for any of the key tidyverse R packages.One-Click To DocumentationLINKS TO PACKAGE CHEATSHEETSBy clicking “CS”, you can even get the individual R package cheatsheets. These are PDF documents maintained by RStudio that provide snapshots of the most important functions contained in the package cheatsheets.One-Click To Package CheatsheetsLINKS TO KEY RESOURCESWe didn’t stop at documentation and cheatsheets. We also added in important references to get you up to speed quickly.One-Click To Important ReferencesLEARNING DATA SCIENCE FOR BUSINESS WITH RTo be efficient as a data scientist, you need to learn R. Take the course that has cut data science projects in half and has progressed data scientists more than anything they have tried before. Over 10-weeks you learn what it has taken data scientists 10-years to learn:Our systematic data science for business frameworkR and H2O for Machine LearningHow to produce Return-On-Investment from data scienceAnd much more.See More

]]>

The Ultimate R CheatsheetWe are developing a revolutionary new system for teaching Business Analysis with R (Business Analysis with R is a new course we are developing at Business Science University). The system is revolutionary for a number of reasons (we’ll get to these in a minute). The cornerstone of our teaching process is the Data Science with R Workflow that was originally taught by Hadley Wickham and Garrett Grolemund in the excellent book, R For Data Science. The Ultimate R Cheatsheet links the documentation, cheatsheets, and key resources available for every R package in the data science with R workflow into one meta-cheatsheet that illustrates the workflow.The Ultimate R Cheatsheet is available here. HOW TO USE THE CHEATSHEETThe cheatsheet contains every resource you need for referencing the tidyverse documentation in one spot. Let’s take a look.THE WORKFLOWThe first thing you will notice is the workflow that is prominently presented. You can see where the various R Packages are used.Data Science WorkflowLINKS TO DOCUMENTATIONHere’s the beauty of the R cheatsheet. With one click, you can easily get to the web documentation for any of the key tidyverse R packages.One-Click To DocumentationLINKS TO PACKAGE CHEATSHEETSBy clicking “CS”, you can even get the individual R package cheatsheets. These are PDF documents maintained by RStudio that provide snapshots of the most important functions contained in the package cheatsheets.One-Click To Package CheatsheetsLINKS TO KEY RESOURCESWe didn’t stop at documentation and cheatsheets. We also added in important references to get you up to speed quickly.One-Click To Important ReferencesLEARNING DATA SCIENCE FOR BUSINESS WITH RTo be efficient as a data scientist, you need to learn R. Take the course that has cut data science projects in half and has progressed data scientists more than anything they have tried before. Over 10-weeks you learn what it has taken data scientists 10-years to learn:Our systematic data science for business frameworkR and H2O for Machine LearningHow to produce Return-On-Investment from data scienceAnd much more.See More

The Ultimate R CheatsheetWe are developing a revolutionary new system for teaching Business Analysis with R (Business Analysis with R is a new course we are developing at Business Science University). The system is revolutionary for a number of reasons (we’ll get to these in a minute). The cornerstone of our teaching process is the Data Science with R Workflow that was originally taught by Hadley Wickham and Garrett Grolemund in the excellent book, R For Data Science. The Ultimate R Cheatsheet links the documentation, cheatsheets, and key resources available for every R package in the data science with R workflow into one meta-cheatsheet that illustrates the workflow.The Ultimate R Cheatsheet is available here. HOW TO USE THE CHEATSHEETThe cheatsheet contains every resource you need for referencing the tidyverse documentation in one spot. Let’s take a look.THE WORKFLOWThe first thing you will notice is the workflow that is prominently presented. You can see where the various R Packages are used.Data Science WorkflowLINKS TO DOCUMENTATIONHere’s the beauty of the R cheatsheet. With one click, you can easily get to the web documentation for any of the key tidyverse R packages.One-Click To DocumentationLINKS TO PACKAGE CHEATSHEETSBy clicking “CS”, you can even get the individual R package cheatsheets. These are PDF documents maintained by RStudio that provide snapshots of the most important functions contained in the package cheatsheets.One-Click To Package CheatsheetsLINKS TO KEY RESOURCESWe didn’t stop at documentation and cheatsheets. We also added in important references to get you up to speed quickly.One-Click To Important ReferencesLEARNING DATA SCIENCE FOR BUSINESS WITH RTo be efficient as a data scientist, you need to learn R. Take the course that has cut data science projects in half and has progressed data scientists more than anything they have tried before. Over 10-weeks you learn what it has taken data scientists 10-years to learn:Our systematic data science for business frameworkR and H2O for Machine LearningHow to produce Return-On-Investment from data scienceAnd much more.See More

]]>

]]>

One of the most difficult and most critical parts of implementing data science in business is quantifying the return-on-investment or ROI. In this article, we highlight three reasons you need to learn the Expected Value Framework, a framework that connects the machine learning classification model to ROI. 3 REASONS YOU NEED TO LEARN THE EXPECTED VALUE FRAMEWORKHere are the 3 reasons you need to know about Expected Value if you want to tie data science to ROI for a machine learning classifier. We’ll an example related to employee churn (also called employee turnover or employee attrition).REASON #1: CLASSIFICATION MACHINE LEARNING ALGORITHMS OFTEN AAXIMIZE THE WRONG METRICF1 is the threshold that harmonically balances the precision and recall (in other words, it optimally aims to reduce both the false positives and the false negatives finding a threshold that achieves a relative balance). The problem is that, in business, the costs associated with false positives (Type 1 Errors) and false negatives (Type 2 Errors) are rarely equal. In fact, in many cases false negatives are much more costly ( by a factor of 3 to 1 or more!).EXAMPLE: COST OF TYPE 1 AND TYPE 2 ERRORS FOR EMPLOYEE ATTRITIONWe develop a prediction algorithm that finds employees are 5X more likely when work too much overtime.Calculating Expected Attrition Cost From H2O + LIME ResultsWe develop a proposal to reduce overtime using the extremely powerful H2O classification model along with LIME, which explains the results. Like many algorithms, by default we optimize by treating Type 1 and Type 2 errors. This ends up misclassifying people that quit (Type 2 error) at roughly the same rate as we misclassify people that stay as leave (Type 1 error). The cost of overtime reduction for an employee is estimated at 30% of the lost productivity if the employee quits. However, the cost of reducing the overtime incorrectly for someone that stays is 30% or 3x more costly than Type 1 Errors, yet we are treating them the same! The optimal threshold for business problems is almost always less than the F1 threshold. This leads us to our second reason you need to know the Expected Value Framework.REASON #2: the solution is Maximizing for Expected ValueWhen we have a calculation to determine the expected value using business costs, we can perform the calculation iteratively to find the optimal threshold that maximizes the expected profit or savings of the business problem. By iteratively calculating the savings generated at different thresholds, we can see which threshold optimizes the targeting approach.In the detailed example, we can see in the threshold optimization results that the maximum savings ($546K) occurs at a threshold of 0.149, which is 16% more savings than the savings at threshold at max F1 ($470K). It’s worth mentioning that the threshold that maximizes F1 was 0.280, and that for a test set containing 15% of the total population it cost $76K due to being sub-optimal ($546K - $470K). Extending this inefficiency to the full population (train + test data), this is a missed opportunity of $500K annually!However, the model is based on a number of assumptions including the average overtime percentage, the anticipated net profit per employee, and so on.REASON #3: EXPECTED VALUE CAN TEST FOR VARIABILITY IN ASSUMPTIONSWe can use Sensitivity Analysis along with Expected Value. We test the effect of model assumptions on expected profit (or savings) of an employee quittingIn the human resources example below, we tested for a range of values average overtime percentage and net revenue per employee because our estimates for the future may be off. In the Sensitivity Analysis Results shown below, we can see in the profitability heat map that as long as the average overtime percentage is less than or equal to 25%, implementing a targeted overtime policy saves the organization money.Sensitivity Analysis Results (Profitability Heat Map)Wow! Not only can we test for the optimal threshold that maximizes the business case, we can use expected value to test for a range of inputs that are variable from year to year and person to person. If you’re interested in learning how to apply the expected value framework for your business, we show you how, provide code, have a video, and show you other industries where this may apply on Business Science University.See More

One of the most difficult and most critical parts of implementing data science in business is quantifying the return-on-investment or ROI. In this article, we highlight three reasons you need to learn the Expected Value Framework, a framework that connects the machine learning classification model to ROI. 3 REASONS YOU NEED TO LEARN THE EXPECTED VALUE FRAMEWORKHere are the 3 reasons you need to know about Expected Value if you want to tie data science to ROI for a machine learning classifier. We’ll an example related to employee churn (also called employee turnover or employee attrition).REASON #1: CLASSIFICATION MACHINE LEARNING ALGORITHMS OFTEN AAXIMIZE THE WRONG METRICF1 is the threshold that harmonically balances the precision and recall (in other words, it optimally aims to reduce both the false positives and the false negatives finding a threshold that achieves a relative balance). The problem is that, in business, the costs associated with false positives (Type 1 Errors) and false negatives (Type 2 Errors) are rarely equal. In fact, in many cases false negatives are much more costly ( by a factor of 3 to 1 or more!).EXAMPLE: COST OF TYPE 1 AND TYPE 2 ERRORS FOR EMPLOYEE ATTRITIONWe develop a prediction algorithm that finds employees are 5X more likely when work too much overtime.Calculating Expected Attrition Cost From H2O + LIME ResultsWe develop a proposal to reduce overtime using the extremely powerful H2O classification model along with LIME, which explains the results. Like many algorithms, by default we optimize by treating Type 1 and Type 2 errors. This ends up misclassifying people that quit (Type 2 error) at roughly the same rate as we misclassify people that stay as leave (Type 1 error). The cost of overtime reduction for an employee is estimated at 30% of the lost productivity if the employee quits. However, the cost of reducing the overtime incorrectly for someone that stays is 30% or 3x more costly than Type 1 Errors, yet we are treating them the same! The optimal threshold for business problems is almost always less than the F1 threshold. This leads us to our second reason you need to know the Expected Value Framework.REASON #2: the solution is Maximizing for Expected ValueWhen we have a calculation to determine the expected value using business costs, we can perform the calculation iteratively to find the optimal threshold that maximizes the expected profit or savings of the business problem. By iteratively calculating the savings generated at different thresholds, we can see which threshold optimizes the targeting approach.In the detailed example, we can see in the threshold optimization results that the maximum savings ($546K) occurs at a threshold of 0.149, which is 16% more savings than the savings at threshold at max F1 ($470K). It’s worth mentioning that the threshold that maximizes F1 was 0.280, and that for a test set containing 15% of the total population it cost $76K due to being sub-optimal ($546K - $470K). Extending this inefficiency to the full population (train + test data), this is a missed opportunity of $500K annually!However, the model is based on a number of assumptions including the average overtime percentage, the anticipated net profit per employee, and so on.REASON #3: EXPECTED VALUE CAN TEST FOR VARIABILITY IN ASSUMPTIONSWe can use Sensitivity Analysis along with Expected Value. We test the effect of model assumptions on expected profit (or savings) of an employee quittingIn the human resources example below, we tested for a range of values average overtime percentage and net revenue per employee because our estimates for the future may be off. In the Sensitivity Analysis Results shown below, we can see in the profitability heat map that as long as the average overtime percentage is less than or equal to 25%, implementing a targeted overtime policy saves the organization money.Sensitivity Analysis Results (Profitability Heat Map)Wow! Not only can we test for the optimal threshold that maximizes the business case, we can use expected value to test for a range of inputs that are variable from year to year and person to person. If you’re interested in learning how to apply the expected value framework for your business, we show you how, provide code, have a video, and show you other industries where this may apply on Business Science University.See More