Here is a list of top Python Machine learning projects on GitHub. A continuously updated list of open source learning projects is available on Pansop.
scikit-learn is a Python…Continue
Buzz words are one of my least favorite things, but as buzz words go, I can appreciate the term “Data Lake.” It is one of the few buzz words that communicates a meaning very close to its intended definition. As you might imagine, with the advent of large scale data processing, there would be a need to name the location where lots of data resides, ergo, data lake. I personally prefer to call it a series of redundant commodity servers with Direct-Attached Storage, or hyperscale computing with…Continue
This is a brief overview of my paper “Information Retrieval Performance Measurement Using Extrapolated Precision,” which I’ll be presenting on June 8th at the DESI VI workshop at ICAIL 2015. The paper provides a novel method for extrapolating a precision-recall point to a different level of recall, and…Continue
Added by Bill Dimm on May 21, 2015 at 2:44pm — No Comments
Python & data analytics go hand in hand. Here is a list of 9 Python data analytics libraries. This list is going to be…Continue
Added by Pansop on May 21, 2015 at 4:30am — No Comments
The full version is always published Monday. Starred articles are new additions or updated content, posted between Thursday and Sunday.
Added by Vincent Granville on May 20, 2015 at 5:30pm — No Comments
This is an interesting article recently published in Forbes. The author gathered data from Glassdoor.com, to rank companies. Glassdoor.com is a website where employees make comments about, and rate their company, and can even post their job title and salary range. Keep in mind that the author is not a statistician, and his analysis is…Continue
Big data is a new marketing term that highlights the everincreasing and exponential growth of data in every aspect of our lives. The term big data originated from within the open-source community, where there was an effort to develop analytics processes that were faster and more scalable than traditional data warehousing, and could extract value from the vast amounts of unstructured and semistructured data produced daily by web users. Consequently, big data origins are tied to web data,…Continue
Added by Khosrow Hassibi on May 20, 2015 at 7:51am — No Comments
I asked myself this question a few months ago. Next I thought: What is the definition of Data Science? So the first thing I started to do is read as many posts on the topic as I could get my hands on and also lookup definitions of related topics such as Data Mining and Machine Learning. Looking at the discussions and posts around Data Science it …Continue
Spam is a kind of messaging where the cost of sending is usually negligible and the receiver and the ISP pays the cost in terms of bandwidth usage.
An example of a manual approach to detecting spam is using knowledge engineering. When you are aware of what is spam and what is not, you can usually filter it by creating a set of rules like,
If the subject line of an email contains words ‘Buy viagra’ its…
This 30 minute video aims to demystify predictive analytics and present the IBM SPSS predictive analytics portfolio. The contents of the video are as follows:
Added by Venky Rao on May 18, 2015 at 11:30am — No Comments
Note: Opinions expressed are solely my own and do not express the views or opinions of my employer.
As a data scientist who has been munging data and building machine learning models in tools like R, Python and other software(s) (open source and proprietary), I had always longed for a world without technical limitations. A world which would allow me to create data structures (data scientists usually call them vectors, matrices or dataframes) of virtually any…Continue
Added by Fawad Alam on May 18, 2015 at 8:30am — No Comments
For higher resolution, interactive Tableau charts, read original article. In this version, only static screenshots are displayed. It does not give justice to Tableau.
Coming up with a topic for today's blog post was tough. My last blog about Wine got attention from wine entrepreneurs…Continue
Added by Tatiana Sorokina on May 18, 2015 at 6:30am — No Comments
Cross posted from my blog - I look forward to discussion/feedback here…Continue
The Business Problem:
To build a repository of used car prices and identify trends based on data available from used car dealers. The solution to the problem necessarily involved building large scale crawlers to crawl & parse thousands of used…Continue
In this post, I'll explore the new AWS Machine Learning services.
The problem we are trying to solve is to classify auto accident severity given a set of features. I'll not go into further details of the data set and what classification algorithms,etc. here since the goal of this blog is to explore the new AWS Machine Learning service step by step.
In the next blog post, I'll explore another service: Microsoft Azure Machine…
“If you treat an individual as he is, he will stay as he is, but if you treat him as if he were what he ought to be and could be, he will become what he ought to be and could be." —JOHANN WOLFGANG VON GOETHE
The last few years I have been trying to get an handle on the field which encompasses analytics , big data, modeling, prediction, machine learning, algorithms , data mining techniques, rules, computational complexity, latency, data products, data engineering, statistical…Continue
Recent research using deep convolutional neural networks and new system architectures have demonstrated the ability of smart machines to autonomously learn to classify image scenes and identify…Continue
When I talk about "the institutional response," I am referring to an increasingly common occurrence: a standardized or large-scale approach is supported, promoted, and applied by a particular institution - sometimes governmental in nature - premised on its apparent suitability or superiority to achieve desirable outcomes. I suspect that in recent years, there has been a push to get citizens to file their income tax returns electronically. I know that in Canada, it has become difficult…
Added by Don Philip Faithful on May 16, 2015 at 8:48am — No Comments
In my experience at startups and large companies, good analytics often boils down to the availability of organized data to answer business questions. This is especially important for digital marketers, with the audience data from many channels pouring in and the need to stay on top of key metrics.
Seemingly simple questions can spin up the entire MarTech engineering team!
“If I increase my spend on display ads retargeting by 20%, for middle of the funnel prospects, what can I…Continue
Added by Sri Desikan on May 15, 2015 at 1:02pm — No Comments
A very warm welcome back to all here in Data Science Central. I decided to post today given that a friend in a common Social network shared with me one link that I thought to be in the interest of the community of good and responsible Data Scientists, as it were.
It concerns a blog post from Quantopian, which is an interesting new crowd-sourced investing platform vendor, a new…Continue
Added by Nuno Fernandes on May 14, 2015 at 8:00am — No Comments