Subscribe to DSC Newsletter

88 Resources & Tools to Become a Data Scientist

Harvard Business Review has regareded data scientist as the sexiest job of the 21st century. In this article, with the assistance of Octoparse V7, we aggregated the resources and tools that you may need to become a data scientist.

1. Learning resources: Courses, Degrees/Certificates, Books;

2. Tools: Data Extractors, Data Analytics, Reporting.

3. Data Science Competitions/Programs

 

 

20 Online Courses about Data Science

 

1. Data Science Specialization

Creator: John Hopkins University

This Specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.

https://www.coursera.org/specializations/jhu-data-science

 

2. Introduction to Data Science in Python

Creator: University of Michigan

This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library.

https://www.coursera.org/learn/python-data-analysis

 

3. Applied Plotting, Charting & Data Representation in Python

Creator: University of Michigan

This course will introduce the learner to information visualization basics, with a focus on reporting and charting using the matplotlib library.

https://www.coursera.org/learn/python-plotting

 

 

4. Applied Machine Learning in Python

Creator: University of Michigan

This course will introduce the learner to applied machine learning, focusing more on the techniques and methods than on the statistics behind these methods.

https://www.coursera.org/learn/python-machine-learning

 

5. Applied Text Mining in Python

 

6. Applied Social Network Analysis in Python

Creator: University of Michigan

This course will introduce the learner to network analysis through tutorials using the NetworkX library.

This course should be taken after: Introduction to Data Science in Python, Applied Plotting, Charting & Data Representation in Python, and Applied Machine Learning in Python.

https://www.coursera.org/learn/python-social-network-analysis

 

7. What is Data Science?

Creator: IBM

In this course, we will meet some data science practitioners and we will get an overview of what data science is today.

https://www.coursera.org/learn/what-is-datascience

 

8. Open Source tools for Data Science

Creator: IBM

In this course, you'll learn about Jupyter Notebooks, RStudio IDE, Apache Zeppelin and Data Science Experience.

https://www.coursera.org/learn/open-source-tools-for-data-science

 

9. Data Science Methodology

Creator: IBM

you will learn: - The major steps involved in tackling a data science problem. - The major steps involved in practicing data science, from forming a concrete business or research problem, to collecting and analyzing data, to building a model, and understanding the feedback after model deployment. - How data scientists think!

https://www.coursera.org/learn/data-science-methodology

 

10. Applied Data Science

Creator: IBM

This is an action-packed specialization is for data science enthusiasts who want to acquire practical skills for real world data problems. It appeals to anyone interested in pursuing a career in Data Science, and already has foundational skills (or has completed the Introduction to Applied Data Science specialization). You will learn Python - no prior programming knowledge necessary. You will then learn data visualization and data analysis. Through our guided lectures, labs, and projects you’ll get hands-on experience tackling interesting data problems.

https://www.coursera.org/specializations/applied-data-science

 

11. Databases and SQL for Data Science

The purpose of this course is to introduce relational database concepts and help you learn and apply knowledge of the SQL language. It is also intended to get you started with performing SQL access in a data science environment.

https://www.coursera.org/learn/sql-data-science

 

12. Data Science Math Skills

Duke University

This course is designed to teach learners the basic math you will need in order to be successful in almost any data science math course and was created for learners who have basic math skills but may not have taken algebra or pre-calculus.

https://www.coursera.org/learn/datasciencemathskills

 

13. Data Science: Wrangling

HarvardX

This course covers several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining.

https://www.edx.org/course/data-science-wrangling-harvardx-ph125-6x

 

14. Data Science: Productivity Tools

HarvardX

https://www.edx.org/course/data-science-productivity-tools-harvardx...

 

15. Data Science Research Methods: Python Edition

Microsoft

https://www.edx.org/course/research-methods-for-data-science-python...

 

16. How to Win a Data Science Competition: Learn from Top Kagglers

Created by: National Research University Higher School of Economics

If you want to break into competitive data science, then this course is for you! Participating in predictive modelling competitions can help you gain practical experience, improve and harness your data modelling skills in various domains such as credit, insurance, marketing, natural language processing, sales’ forecasting and computer vision to name a few.

https://www.coursera.org/learn/competitive-data-science

 

17. Introduction to Computational Thinking and Data Science

Instructors: Prof. Eric Grimson; Prof. John Guttag; Dr. Ana Bell

6 0002 is the continuation of 6 0001 Introduction to Computer Science and Programming in Python and is intended for students with little or no programming experience. It aims to provide students with an understanding of the role computation can play in solving problems and to help students, regardless of their major, feel justifiably confident of their ability to write small programs that allow them to accomplish useful goals. The class uses the Python 25 programming language.

https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0002-introduction-to-computational-thinking-and-data-science-fall-2016/

 

18. Introduction to Computer Science and Programming in Python

Instructors: Dr. Ana Bell; Prof. Eric Grimson; Prof. John Guttag

Introduction to Computer Science and Programming in Python is intended for students with little or no programming experience. It aims to provide students with an understanding of the role computation can play in solving problems and to help students, regardless of their major, feel justifiably confident of their ability to write small programs that allow them to accomplish useful goals. The class uses the Python 25 programming language.

 

https://ocw.mit.edu/courses/electrical-engineering-and-computer-sci...

 

19. Statistical Thinking and Data Analysis

Instructor(s): Prof. Cynthia Rudin; Allison Chang (Teaching Assistant); Dimitrios Bisias (Teaching Assistant)

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and nonparametric statistics.

https://ocw.mit.edu/courses/sloan-school-of-management/15-075j-stat...

 

20. SQL for Data Science

University of California, Davis

This course is designed to give you a primer in the fundamentals of SQL and working with data so that you can begin analyzing it for data science purposes.

https://www.coursera.org/learn/sql-for-data-science

 

 

 

 

Data Science Degrees/Certificates

 

1. Master of Computer Science

 

University of Illinois at Urbana-Champaign

Tuition: $21,000

The Master of Computer Science is a non-thesis degree that requires 32 credit hours of coursework. Students can complete the eight courses required for the Master of Computer Science at their own pace, in as little as one year or as many as five years. Students receive lectures through the Coursera platform, but are advised and assessed by Illinois faculty and teaching assistants on a rigorous set of assignments, projects, and exams required for university degree credit.

The Master of Computer Science assesses $19,200 in tuition for the 32 credit-hour degree.

https://www.coursera.org/degrees/master-of-computer-science-illinois

 

2. Bachelor of Science in Computer Science

University of London

Tuition: £9,600-£17,000, depending upon geographic location of student.

The degree, designed by the team at Goldsmiths, University of London, is designed to give you a strong foundation in Computer Science and specialized knowledge of topics such as Data Science, Artificial Intelligence, Virtual Reality and Web Development. Your learning will involve industry and academic case studies to help you understand your studies in terms of real-world problems

https://www.coursera.org/degrees/bachelor-of-science-computer-scien...

 

3. Data Science

Harvard University

Tuition: $441.90 USD for the entire program.

You will learn: Fundamental R programming skills; Statistical concepts such as probability, inference, and modeling and how to apply them in practice; Gain experience with the tidyverse, including data visualization with ggplot2 and data wrangling with dplyr; Become familiar with essential tools for practicing data scientists such as Unix/Linux, git and GitHub, and RStudio; Implement machine learning algorithms; In-depth knowledge of fundamental data science concepts through motivating real-world case studies.

https://www.edx.org/professional-certificate/harvardx-data-science

 

4. Microsoft Professional Program in Data Science

Creator: Microsoft

Tuition: $1,089 for the entire program

You will learn: Use Microsoft Excel to explore data; Use Transact-SQL to query a relational database; Create data models and visualize data using Excel or Power BI; Apply statistical methods to data; Use R or Python to explore and transform data; Follow a data science methodology; Create and validate machine learning models with Azure Machine Learning; Write R or Python code to build machine learning models; Apply data science techniques to common scenarios; Implement a machine learning solution for a given data problem.

 

https://www.edx.org/microsoft-professional-program-data-science

 

6. Master of Computer Science

Arizona State University

Tuition: $15,000

You will choose 10 courses out of 20 course options in order to develop expertise on emerging in-demand technologies. Choose from areas of focus such as AI, Software Engineering, Cloud Computing, Big Data, and Cybersecurity. You’ll also create a project portfolio that you’ll use to showcase your experience to prospective employers.

https://www.coursera.org/degrees/master-of-computer-science-asu

 

 

Books

 

 

1. The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists

Author: Carl Shan

25 experts in the industry gave out some advice in this handbook, very helpful for starters.

 

2. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

Author: Foster Provost and Tom Fawcett

Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

 

3. Doing Data Science: Straight Talk from the Frontline

Author: Cathy O'Neil and Rachel Schutt

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

 

4. Data Science From Scratch With Python: Step By Step Guide

Author: Peters Morgan

If you are looking for a complete step by step guide to data science using Python from scratch, this book is for you. After his great success with his first book “Data Analysis from Scratch with Python”, Peters Morgan publish his second book focusing now in data science and machine learning. It is considered by practitioners as the easiest guide ever written in this domain.

 

5. Data Science For Dummies (For Dummies (Computers))

Author: Lillian Pierson

Data Science For Dummies is the perfect starting point for IT professionals and students who want a quick primer on all areas of the expansive data science space. With a focus on business cases, the book explores topics in big data, data science, and data engineering, and how these three areas are combined to produce tremendous value.

 

6. Introduction to Probability, Statistics, and Random Processes

Author: Hossein Pishro-Nik

This book introduces students to probability, statistics, and stochastic processes. It can be used by both students and practitioners in engineering, various sciences, finance, and other related fields. It provides a clear and intuitive approach to these topics while maintaining mathematical accuracy. You can also find courses and videos online.

https://www.probabilitycourse.com

 

7. OpenIntro Statistics

Author: David M Diez and Christopher D Barr

The OpenIntro project was founded in 2009 to improve the quality and availability of education by producing exceptional books and teaching tools that are free to use and easy to modify. And whose inaugural effort is OpenIntro Statistics. Corresponding courses and videos can be found in:

https://www.openintro.org

 

8. Statistical Inference

Author: George Casella

It’s a textbook for fresh graduates in many colleges.

Discusses both theoretical statistics and the practical applications of the theoretical developments. Includes a large number of exercises covering both theory and applications.

 

9. Applied Linear Statistical Models

Author: Kutner

Applied Linear Statistical Models is the long established leading authoritative text and reference on statistical modeling. The Fifth edition provides an increased use of computing and graphical analysis throughout, without sacrificing concepts or rigor. In general, the 5e uses larger data sets in examples and exercises, and where methods can be automated within software without loss of understanding, it is so done.

 

10. An Introduction to Generalized Linear Models

Author: Annette J. Dobson and Adrian G. Barnett

It provides a cohesive framework for statistical modelling, with an emphasis on numerical and graphical methods. This new edition of a bestseller has been updated with new sections on non-linear associations, strategies for model selection, and a Postface on good statistical practice.

 

 

 

 

Data Extractors: to create your own database by aggregating big data from websites.

 

1. Octoparse

License: Free

Website: https://www.octoparse.com/

Data Export Format: Excel, HTML, CSV, JSON, and Databases

Octoparse is the best free web data extractor with comprehensive features, which supports extracting almost all kinds of data from the websites. There are two kinds of applied mode - Wizard Mode and Advanced Mode - for non-programmers to quickly get used to Octoparse.

Moreover, its Cloud Extraction enables to run the scraper in the cloud and save the data in Octoparse cloud, which could enpower everyone access to scraping dynamic information in real time. Not only providing SAAS, Octoparse also provides customization service for web scraper setup and data collection.

 

2. Mozenda

License: Commercial

Website: https://www.mozenda.com/

Mozenda is a cloud web scraping service (SaaS) with useful utility features for data extraction. Mozenda Web Console is a web-based application that allows you to run your Agents (scrape projects), view and organize your results, and export or publish the extracted data to cloud storage such as Dropbox, Amazon and Microsoft Azue. Agent Builder is a Windows application used to build your data project.

 

3. Scraper

License: Free

https://chrome.google.com/webstore/category/extensions

Scraper is a Chrome extension with limited data extraction features but it’s helpful for making online research, and exporting data to Google Spreadsheets. This tool is intended for beginners as well as experts who can easily copy data to the clipboard or store to the spreadsheets using OAuth. Scraper is a free web crawler tool, which works right in your browser and auto-generates smaller XPaths for defining URLs to crawl. It may not offer all-inclusive crawling services, but novices also needn’t tackle messy configurations.

 

4. Docparser

Starting Price: $25.00/month/user

Docparser allows you to extract specific data fields from PDFs and scanned documents, convert PDF to text, PDF to JSON, PDF to XML, convert PDF tables into CSV or Excel, etc. 

 

5. Visual Scraper

VisualScraper is another great free and non-coding web scraper with simple point-and-click interface and could be used to collect data from the web. You can get real-time data from several web pages and export the extracted data as CSV, XML, JSON or SQL files. Besides the SaaS, VisualScraper offer web scraping service such as data delivery services and createing software extractors services.

 

6. Datahut

Starting Price: $2000/month

No coding, No servers or expensive DIY software required, Datahut is a fully managed web data extraction service, which supports delivering ready to use data feeds from the web to help quickly build apps and conduct business analysis.

http://datahut.co

 

7. WebHarvy

WebHarvy Single User License: USD 129 00/year

WebHarvy is a point-and-click web scraping software. It’s designed for non-programmers. WebHarvy can automatically scrape Text, Images, URLs & Emails from websites, and save the scraped content in various formats. It also provides built-in scheduler and proxy support which enables anonymously crawling and prevents the web scraping software from being blocked by web servers, you have the option to access target websites via proxy servers or VPN.

https://www.webharvy.com

 

8. OutWit Hub

License: Free

https://www.outwit.com/products/hub

OutWit Hub is a Firefox add-on with dozens of data extraction features to simplify your web searches. This web crawler tool can browse through pages and store the extracted information in a proper format. OutWit Hub offers a single interface for scraping tiny or huge amounts of data per needs. OutWit Hub lets you scrape any web page from the browser itself and even create automatic agents to extract data and format it per settings.

 

9. Data Integration

Free Version: Yes

https://www.talend.com

Talend Data Fabric is an integration platform that lets customers seamlessly move between batch, streaming and real-time while running on-premises, in the Cloud or with Big Data. It can be easily connect big data sources, cloud applications, and databases with a secure cloud integration platform-as-a-service (iPaaS).

 

10. Dexi.io

https://dexi.io/

As a browser-based web crawler, Dexi.io allows you to scrape data based on your browser from any website and provide three types of robot for you to create a scraping task - Extractor, Crawler and Pipes. The freeware provides anonymous web proxy servers for your web scraping and your extracted data will be hosted on Dexi.io’s servers for two weeks before the data is archived, or you can directly export the extracted data to JSON or CSV files. It offers paid services to meet your needs for getting real-time data.

 

Data Analytics Tools

 

1. WebFOCUS

by Information Builders

Information Builders WebFOCUS is the industrys most flexible and pervasive BI and analytics platform, able to deliver a broad range of governed analytical tools, applications, reports, and documents to any and all business stakeholders.

www.ibi.com

 

2. Minitab 18

by Minitab

Starting Price: $1,495.00/one-time/user

Minitab is the leading statistical software used for quality improvement and statistics education worldwide.

 

3. Stata

by StataCorp

Stata is the solution for your data science needs. Obtain and manipulate data. Explore. Visualize. Model. Make inferences. Collect your results into reproducible reports.

https://www.stata.com/

 

4. SAS/STAT

by SAS Institute

Statistics analysis system provides a wide range of statistical software, ranging from traditional analysis of variance to exact methods and dynamic data visualization techniques.

https://www.sas.com/en_us/home.html

 

5. MicroStrategy Enterprise Analytics

by MicroStrategy

A comprehensive enterprise analytics and mobility platform that delivers a full range of analytical and reporting capabilities

www.microstrategy.com

 

6. IDEA

by CaseWare International

CaseWare IDEA® is a comprehensive, powerful and easy-to-use data analysis tool that quickly analyzes 100% of your data, guarantees data integrity and speeds your analysis, paving the way to faster, more effective audits.

https://www.casewareanalytics.com/products/idea-data-analysis

 

7. NVivo

by QSR International

More than just a tool for organizing and managing data, NVivo helps you think differently about your research, uncover more and back it all up with rigorous evidence.

http://www.qsrinternational.com/nvivo/nvivo-products

 

8. ATLAS.ti

by Scientific Software Development

ATLAS.ti is a sophisticated tools help you to arrange, reassemble, and manage your material in creative, yet systematic ways.

https://atlasti.com/

 

9. QueryStorm

by Stormy Range Software

Free Version Yes

QueryStorm is a development and data processing plugin for Excel. It offers SQL and C# support in Excel, making it much easier for tech people to interact with data in spreadsheets.

https://www.querystorm.com/

 

10. Toucan Toco

by Toucan Toco

Gartner’s Comment: Toucan's a great company to work with. The tool is user friendly, easy to install, easy to deploy, and does a great job at making data digestible. The team is helpful, professional and takes you through their agile methodology which allowed us to push the project out quickly to put it in the hands of our collaborators.

https://toucantoco.com/en/

 

 

 

Reporting Tools

 

1. QlikView

by Qlik

QlikView combines ETL, data storage, multi-dimensional analysis and the end-user interface in the same package - so deployments are lightning fast and ongoing maintenance is simple.

www.qlik.com

 

2. TapReports

tapclicks

TapReports is a cloud-based collaboration and reporting solution which allows businesses to manage communication with their clients and generate customizable marketing reports and interactive sales reports for their clients.

https://www.tapclicks.com/

 

3. IBM Cognos Analytics

by IBM

IBM Cognos Analytics is a cohesive performance management and business intelligence solution, with budgeting, strategic planning, forecasting, and consolidations. 

www.ibm.com/products/cognos-analytics

 

4. Zoho Reports

by Zoho

Zoho Reports is a self-service business intelligence and analytics software that allows you to create insightful dashboards and data visualizations.

https://www.zoho.com/reports/

 

5. SAP Crystal Reports

by SAP Crystal Reports

With SAP Crystal Reports, you can create powerful, richly formatted, dynamic reports from virtually and data source delivered in dozens of formats, in up to 24 languages.

www.sap.com

 

6. BI360

by Solver

Solver specializes in providing world-class financial reporting, budgeting and analysis with push-button access to all data sources that drive company-wide profitability. BI360 is available for cloud and on-premise deployment, focusing on reporting, budgeting, dashboards and datawarehouse.

www.solverglobal.com

 

7. Domo

Domo, Inc

Domo is a cloud-based business management suite that integrates with multiple data sources, including spreadsheets, databases, social media and any existing cloud-based or on-premise software solution.

https://www.domo.com/product

 

8. Exchange Reporter Plus

by ManageEngine

Microsoft Exchange serves as the hub of all email communications in most corporate environments that use the Active Directory technology. 

https://www.manageengine.com/products/exchange-reports

 

9. Izenda Reports

Izenda is a business intelligence (BI) platform that enables real-time data exploration and report creation.

https://www.izenda.com/

 

10. Grow BI Dashboard

Grow is a cloud-based business analytics and reporting solution suitable for small to midsize organizations. The solution allows users to create customizable dashboards for monitoring business workflows and key activities.

https://www.domo.com/product

 

 

 

12 Data Science Competitions/Programs

1. Kaggle

Kaggle is a subsidiary of Alphabet now, it’s a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users.

www.kaggle.com

 

2. CrowdAI

CrowdAI is an open source platform of the École Polytechnique Fédérale de Lausanne in Switzerland, for hosting open data challenges and gaining insight into how the problems in question were solved.

 

https://crowdai.com/

 

3. CrowdANALYTIX

CrowdANALYTIX is a crowdsourcing platform for building customized AI solutions built by a global community of numerous data scientists. It is also an AI driven platform for auto-creating context-aware product attributes and meta-tags for retail product catalogs.

 

https://crowdanalytix.com/community

 

4. Datascience.net

Datascience.net is the first french-speaking data science platform, launched in 2013 by a pool of data specialists. It bridges the gap between organizations having complex data-centric problems, and the best data scientists willing to solve them.

https://www.datascience.net/fr/challenge#

 

5. Hacker Rank

HackerRank focuses on competitive programming challenges for both consumers and businesses, where developers compete by trying to program according to provided specifications. HackerRank's programming challenges can be solved in a variety of programming languages (including Java, C++, PHP, Python, SQL, JavaScript) and span multiple computer science domains. (Wikipedia)

https://www.hackerrank.com/contests

 

6. Inno Centive

InnoCentive is an open innovation and crowdsourcing company with its worldwide headquarters in Waltham, MA. They enable organizations to put their unsolved problems and unmet needs, which are framed as ‘Challenges’, out to the crowd to address.

 

https://www.innocentive.com/ar/challenge/browse

 

 

7. Top Coder

Topcoder is a crowdsourcing company with an open global community of designers, developers, data scientists, and competitive programmers. Topcoder sells community services to corporate, mid-size, and small-business clients, and pays community members for their work on the projects. Topcoder also organizes the annual Topcoder Open Tournament and a series of smaller regional events. (Wikipedia)

https://www.topcoder.com/

 

8. Hacker Earth

Hacker Earth is a startup technology company based in Bangalore, India that provides recruitment solutions. Its clients include Adobe, Altimetrik, Citrix Systems, InMobi, Symantec and Wipro. It has a competitive programming platform which supports over 32 programming languages (including C, C++, Python, Java, and Ruby). (Wikipedia)

 

https://www.hackerearth.com/challenges/

 

9. Analytics Vidhya

https://datahack.analyticsvidhya.com/contest/all/

 

10. Driven Data

https://www.drivendata.org/competitions/

 

11. Codility

https://app.codility.com/programmers/challenges/

 

12. CodaLab

https://competitions.codalab.org/competitions/

Original Post

Views: 3356

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Lee Schlesinger on October 12, 2018 at 4:21am

Allow me to add one more tool to your list: Stitch, which is a SaaS platform for ETL – a cloud-native data pipeline. Stitch's website also has links to dozens of reporting, data visualization, and analysis tools to add to the ones above.

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service