Subscribe to DSC Newsletter

All Blog Posts (7,714)

10 Python Machine Learning Projects on GitHub

Here is a list of top Python Machine learning projects on GitHub. A continuously updated list of open source learning projects is available on Pansop.



scikit-learn is a Python…


Added by Pansop on May 21, 2015 at 8:00pm — 2 Comments

Data Integrity: The Rest of the Story Part II

Buzz words are one of my least favorite things, but as buzz words go, I can appreciate the term “Data Lake.” It is one of the few buzz words that communicates a meaning very close to its intended definition. As you might imagine, with the advent of large scale data processing, there would be a need to name the location where lots of data resides, ergo, data lake. I personally prefer to call it a series of redundant commodity servers with Direct-Attached Storage, or hyperscale computing with…


Added by Randall Shane on May 21, 2015 at 3:13pm — 1 Comment

Measuring Information Retrieval Performance Using Extrapolated Precision

This is a brief overview of my paper “Information Retrieval Performance Measurement Using Extrapolated Precision,” which I’ll be presenting on June 8th at the DESI VI workshop at ICAIL 2015.  The paper provides a novel method for extrapolating a precision-recall point to a different level of recall, and…


Added by Bill Dimm on May 21, 2015 at 2:44pm — No Comments

9 Python Analytics Libraries

Python & data analytics go hand in hand. Here is a list of 9 Python data analytics libraries. This list is going to be…


Added by Pansop on May 21, 2015 at 4:30am — No Comments

Weekly Digest - May 25

The full version is always published Monday. Starred articles are new additions or updated content, posted between Thursday and Sunday.


  • Webinar: Flipping the 80/20 Rule for Analytics - Hear how Teradata helps businesses flip the 80/20 model so they can spend only 20% preparing and organizing data and 80% on the analytics, accelerating time to value.…

Added by Vincent Granville on May 20, 2015 at 5:30pm — No Comments

100 Best Data Science Companies to Work for in 2015

This is an interesting article recently published in Forbes. The author gathered data from, to rank companies. is a website where employees make comments about, and rate their company, and can even post their job title and salary range. Keep in mind that the author is not a statistician, and his analysis is…


Added by Mirko Krivanek on May 20, 2015 at 10:00am — 2 Comments

What Defines a Big Data Scenario?

Big data is a new marketing term that highlights the everincreasing and exponential growth of data in every aspect of our lives. The term big data originated from within the open-source community, where there was an effort to develop analytics processes that were faster and more scalable than traditional data warehousing, and could extract value from the vast amounts of unstructured and semistructured data produced daily by web users. Consequently, big data origins are tied to web data,…


Added by Khosrow Hassibi on May 20, 2015 at 7:51am — No Comments

How Do I Become a Data Scientist? / Data Science Aspects

I asked myself this question a few months ago. Next I thought: What is the definition of Data Science? So the first thing I started to do is read as many posts on the topic as I could get my hands on and also lookup definitions of related topics such as Data Mining and Machine Learning. Looking at the discussions and posts around Data Science it …


Added by Michael Laux on May 20, 2015 at 5:30am — 1 Comment

Machine Learning Resources for Spam Detection

Spam is a kind of messaging where the cost of sending is usually negligible and the receiver and the ISP pays the cost in terms of bandwidth usage. 

An example of a manual approach to detecting spam is using knowledge engineering. When you are aware of what is spam and what is not, you can usually filter it by creating a set of rules like,

  • If the subject line of an email contains words ‘Buy viagra’ its…


Added by Pansop on May 19, 2015 at 1:00am — 1 Comment

Predictive Analytics Demystified

This 30 minute video aims to demystify predictive analytics and present the IBM SPSS predictive analytics portfolio. The contents of the video are as follows:

  • Evolution of Analytics 5:45
  • Why is Predictive Analytics Important? 11:35
  • Demystifying Predictive Analytics 21:30
  • IBM…

Added by Venky Rao on May 18, 2015 at 11:30am — No Comments

Welcome to Sparkling Land

Note: Opinions expressed are solely my own and do not express the views or opinions of my employer.

As a data scientist who has been munging data and building machine learning models in tools like R, Python and other software(s) (open source and proprietary), I had always longed for a world without technical limitations. A world which would allow me to create data structures (data scientists usually call them vectors, matrices or dataframes) of virtually any…


Added by Fawad Alam on May 18, 2015 at 8:30am — No Comments

Data science to understand and fight cancer

For higher resolution, interactive Tableau charts, read original article. In this version, only static screenshots are displayed. It does not give justice to Tableau.

Coming up with a topic for today's blog post was tough. My last blog about Wine got attention from wine entrepreneurs…


Added by Tatiana Sorokina on May 18, 2015 at 6:30am — No Comments

An Introduction to Deep Learning and it’s role for IoT/ future cities

By Ajit Jaokar @ajitjaokar Please connect with me if you want to stay in touch on linkedin and for future updates

Cross posted from my blog - I look forward to discussion/feedback here…


Added by ajit jaokar on May 18, 2015 at 6:30am — 1 Comment

Web Crawling & Analytics Case Study - Database Vs Self Hosted Message Queuing Vs Cloud Message Queuing

The Business Problem:


To build a repository of used car prices and identify trends based on data available from used car dealers. The solution to the problem necessarily involved building large scale crawlers to crawl & parse thousands of used…


Added by Pansop on May 17, 2015 at 6:50pm — 1 Comment

Experimenting with AWS Machine Learning for Classification

In this post, I'll explore the new AWS Machine Learning services.

The problem we are trying to solve is to classify auto accident severity given a set of features. I'll not go into further details of the data set and what classification algorithms,etc. here since the goal of this blog is to explore the new AWS Machine Learning service step by step.

In the next blog post, I'll explore another service: Microsoft Azure Machine…


Added by Peter Chen on May 17, 2015 at 6:00pm — 3 Comments

The Handbook Of Data science

“If you treat an individual as he is, he will stay as he is, but if you treat him as if he were what he ought to be and could be, he will become what he ought to be and could be." —JOHANN WOLFGANG VON GOETHE

The last few years I have been trying to get an handle on the field which encompasses  analytics , big data, modeling, prediction, machine learning, algorithms , data mining techniques, rules, computational complexity, latency, data products, data engineering, statistical…


Added by Vasanth Gopal on May 17, 2015 at 3:00am — 2 Comments

Self-learning Machines & Deep Convolutional Neural Networks Classify Scenes & Identify Objects

Recent research using deep convolutional neural networks and new system architectures have demonstrated the ability of smart machines to autonomously learn to classify image scenes and identify…


Added by Michael Walker on May 16, 2015 at 2:38pm — 1 Comment

The Institutional Response

When I talk about "the institutional response," I am referring to an increasingly common occurrence: a standardized or large-scale approach is supported, promoted, and applied by a particular institution - sometimes governmental in nature - premised on its apparent suitability or superiority to achieve desirable outcomes. I suspect that in recent years, there has been a push to get citizens to file their income tax returns electronically. I know that in Canada, it has become difficult…


Added by Don Philip Faithful on May 16, 2015 at 8:48am — No Comments

There is no analytics without data management -an imperative for digital marketers.

In my experience at startups and large companies, good analytics often boils down to the availability of organized data to answer business questions. This is especially important for digital marketers, with the audience data from many channels pouring in and the need to stay on top of key metrics.

Seemingly simple questions can spin up the entire MarTech engineering team!

“If I increase my spend on display ads retargeting by 20%, for middle of the funnel prospects, what can I…


Added by Sri Desikan on May 15, 2015 at 1:02pm — No Comments

Data Science and its problems

A very warm welcome back to all here in Data Science Central. I decided to post today given that a friend in a common Social network shared with me one link that I thought to be in the interest of the community of good and responsible Data Scientists, as it were.

It concerns a blog post from Quantopian, which is an interesting new crowd-sourced investing platform vendor, a new…


Added by Nuno Fernandes on May 14, 2015 at 8:00am — No Comments

Blog Topics by Tags

Monthly Archives













© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service