Home » Uncategorized

An example machine learning notebook

This notebook was written by Dr. Randal S. Olson from GitHub. In this notebook, Randal is going to go over a basic Python data analysis pipeline from start to finish to show you what a typical data science workflow looks like. In addition to providing code examples, he also hopes to imbue in you a sense of good practices so you can be a more effective — and more collaborative — data scientist. Randal will be following along with the data analysis checklist from The Elements of Data Analytic Style, which he strongly recommends reading as a free and quick guidebook to performing outstanding data analysis.

In the time it took you to read this sentence, terabytes of data have been collectively generated across the world — more data than any of us could ever hope to process, much less make sense of, on the machines we’re using to read this notebook.In response to this massive influx of data, the field of Data Science has come to the forefront in the past decade. Cobbled together by people from a diverse array of fields — statistics, physics, computer science, design, and many more — the field of Data Science represents our collective desire to understand and harness the abundance of data around us to build a better world.


Table of contents:

  1. Introduction

  2. License

  3. Required libraries

  4. The problem domain

  5. Step 1: Answering the question

  6. Step 2: Checking the data

  7. Step 3: Tidying the data

  8. Step 4: Exploratory analysis

  9. Step 5: Classification

  10. Step 6: Reproducibility

  11. Conclusions

  12. Further reading

  13. Acknowledgements

To check out all this information, click here. For more articles about machine learning, click here.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Leave a Reply

Your email address will not be published. Required fields are marked *