Subscribe to DSC Newsletter

Data Geek's Blog (78)

Handbook of Statistical Analysis and Data Mining Applications - 2nd Edition

Handbook of Statistical Analysis and Data Mining Applications, Second Edition, is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers, both academic and industrial, through all stages of data analysis, model building and implementation. The handbook helps users discern technical and business problems, understand the strengths and weaknesses of modern data mining algorithms and employ the right statistical methods for…

Continue

Added by Data Geek on November 15, 2017 at 6:00pm — No Comments

The Gaussian Correlation Inequality in One Picture

Yet another one of these One Picture tutorials, and in some ways, in the same old-fashioned style as our Type I versus Type II Errors in One Picture.…

Continue

Added by Data Geek on November 15, 2017 at 5:30pm — 1 Comment

6 Types of Programmers in One Picture

Which type are you? Can you recognize the programming language used in this illustration? Click on the picture to zoom in. …

Continue

Added by Data Geek on October 16, 2017 at 8:00am — 1 Comment

Free Book: Probability and Statistics Cookbook

The format is very similar to a BIG cheat sheet. This cookbook integrates a variety of topics in probability theory and statistics. It is based on literature and in-class material from courses of the statistics department at the University of California in Berkeley but also influenced by other sources . 

Author: Matthias Vallentin…

Continue

Added by Data Geek on October 2, 2017 at 3:00pm — 5 Comments

Quick Guide to R and Statistical Programming

Guest blog by Rob Kabacoff. Rob is Professor of Quantitative Analytics at Wesleyan University.

R is an elegant and comprehensive statistical and graphical programming language. Unfortunately, it can also have a steep learning curve. I created this website for both current R users, and experienced users of other statistical packages (e.g., SASSPSSStata) who…

Continue

Added by Data Geek on August 21, 2017 at 10:00am — No Comments

Evolution of Machine Learning - Infographics

Interesting infographics produced by PwC. To view the original article, download the infographics in PDF format, and read the comments, click here

DSC Resources

Continue

Added by Data Geek on August 21, 2017 at 10:00am — No Comments

New Book: Data Science: Mindset, Methodologies, and Misconceptions

From the author of the bestsellers, Data Scientist and Julia for Data Science, this book covers four foundational areas of data science. The first area is the data science pipeline including methodologies and the data scientist's toolbox. The second are essential practices needed in understanding the data including questions and hypotheses. The third are pitfalls to avoid in the data science process. The fourth is an awareness of future trends…

Continue

Added by Data Geek on August 21, 2017 at 10:00am — No Comments

Book: R for Data Science

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.…

Continue

Added by Data Geek on August 21, 2017 at 10:00am — No Comments

Development of AI and its future state

Below is an extract from a 36-page report entitled "Technology and Innovation for the Future of Production: Accelerating Value Creation", available for free here, and produced by the World Economic Forum.

The extract below, about the future of AI, is figure 7 at page 13. This long report also discusses other interested topics and is peppered with many useful…

Continue

Added by Data Geek on May 2, 2017 at 1:30pm — No Comments

What to look for when hiring an entry-level data scientist?

The question was posted on Quora as "What do you look for when hiring an entry-level data scientist? Would a master’s in Data Science or a bootcamp be beneficial?" The answer below is from Eduardo Arino de la Rubia, Chief Data Scientist at Domino Data Lab.

I think…

Continue

Added by Data Geek on May 2, 2017 at 1:00pm — No Comments

Deep Learning Cheat Sheet (using Python Libraries)

This cheat sheet was produced by DataCamp, and it is based on the Keras library..Keras is an easy-to-use and powerful library for Theano and TensorFlow that provides a high-level neural networks API to develop and evaluate deep learning models. Originally posted here in PDF format. Click on the image below to zoom in. …

Continue

Added by Data Geek on April 27, 2017 at 4:30pm — No Comments

18 Data Science Certificates Rated

This infographic was produced by Springboard, and it lists a few short online, inexpensive courses along with some university programs, leading to a certificate. The infographics provides some highlights for each program, for comparison purposes. For more data science programs and certificates, click here or…

Continue

Added by Data Geek on April 26, 2017 at 10:00am — No Comments

The seven deadly sins of statistical misinterpretation, and how to avoid them

By Winnifred Louis, Associate Professor, Social Psychology, The University of Queensland, and Cassandra Chapman,PhD Candidate in Social Psychology, The University of Queensland.…

Continue

Added by Data Geek on March 29, 2017 at 8:30am — No Comments

Matplotlib Cheat Sheet

This Python cheat sheet was produced by DataCamp. Click on the image to zoom in. The original, published here, is available as a PDF document. The translation from PDF to image format was done using PDF2PNG

To check dozens of data science related cheat sheets,…

Continue

Added by Data Geek on March 6, 2017 at 5:30pm — No Comments

Most cited deep learning papers

This is a curated list of the most cited deep learning papers (since 2012) posted by Terry Taewoong Um.

Source for picture: …

Continue

Added by Data Geek on March 6, 2017 at 8:00am — No Comments

Big Data's Most Influential Rock Stars: 10 Must-Follow Leaders

This list of hand-picked leaders was compiled by Wojtek Aleksander, from GetResponse.com.

Other bigger lists (sometimes created by robots) can be found here and are usually based on your Klout score, which in my opinion is not accurate. The list below is truly original and I would even add, somewhat unexpected, as you won't find…

Continue

Added by Data Geek on May 14, 2016 at 9:00am — No Comments

32 New External Machine Learning Resources and Updated Articles

Starred articles are candidates for the picture of the week. A comprehensive list of all past resources is found here. We are in the process of automatically categorizing them using indexation and automated tagging…

Continue

Added by Data Geek on May 6, 2016 at 8:30am — No Comments

70 MongoDB Interview Questions and Answers

According to Wikipedia, MongoDB is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB avoids the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. MongoDB is developed by MongoDB Inc. and is published as free and open-source software. MongoDB is the fourth most popular type of database…

Continue

Added by Data Geek on May 3, 2016 at 9:00am — No Comments

Least enjoyable and most time consuming data science tasks

These are the findings from a CrowdFlower survey. Data preparation accounts for about 80% of the work of data scientists. Cleaning data is the least enjoyable and most time consuming data science task, according to the survey. Interestingly, when we asked the question to our data scientist, his answer was:…

Continue

Added by Data Geek on April 5, 2016 at 9:30am — 1 Comment

Machine Learning Algorithm Identifies Tweets Sent Under the Influence of Alcohol

Interesting article posted recently in MIT Technology Reviews. What kind of metrics would help detect such tweets? We think the following might be useful:

  1. Local time (like late at night)
  2. Whether a picture or not is associated with the tweet
  3. Whether a link or not is associated with the tweet
  4. Number of typos for the tweet in question, compared with average for the user in question 
  5. Frequency of tweets (sudden spike) for user in…
Continue

Added by Data Geek on March 18, 2016 at 6:30am — 1 Comment

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service