For statistical models, selecting those predictors is what tests the steel of data scientists. It is really challenging to lay out the steps, as for every step, they should evaluate the situation and make decisions for the next or upcoming steps. It is a completely different story when running predictive models, and if relationship among the variables is not the main focus, situations get easier. Data analysts can go ahead to run step-wise regression models, empowering the data to give best…Continue
Added by Chirag Shivalker on July 31, 2017 at 10:30pm — No Comments
Hadoop – Introduction & features
Let us start with what is Hadoop and what are Hadoop features that make it so popular.
Hadoop is an open-source software framework for distributed storage and distributed processing of extremely large data sets. Important features of Hadoop are:
Hadoop is an open source project. It means its code can be modified to business requirements.
In Hadoop, data is highly available and…
Added by Sheetal Sharma on July 31, 2017 at 7:30pm — No Comments
This article was written by Jacob Joseph.
Unlike evaluating the accuracy of models that predict a continuous or discrete dependent variable like Linear Regression models, evaluating the accuracy of a classification model could be more complex and time-consuming.Before measuring the accuracy of classification models, an analyst would first…Continue
Added by Amelia Matteson on July 31, 2017 at 9:00am — No Comments
Added by Sandipan Dey on July 31, 2017 at 4:00am — No Comments
Abstract – Blockchain is a mystery story or provides the foundation for cryptocurrencies like Bitcoin. What’s different about blockchains compared to traditional big-data distributed databases like MongoDB. Its like featuring a product that contains small blocks of brain in form of dust but consider that the innovation efforts of several publicly traded asset managers and banks are also on this brain block dust quest. Computers start simulating the brain’s sensation,…Continue
Added by Vinod Sharma on July 31, 2017 at 4:00am — No Comments
1.Introduction to Big data and Cloud Computing
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). It’s a virtualization framework.
It is like resource on demand whether it be storage, computing etc. Cloud follows pay per usage model. You need to pay the amount of resource you use.
This computing service by cloud charges…Continue
Added by Shreya Gupta on July 30, 2017 at 8:00pm — No Comments
This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, Hadoop, decision trees, ensembles, correlation, outliers, regression, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, time series, cross-validation, model fitting,…Continue
Added by Vincent Granville on July 30, 2017 at 6:00pm — No Comments
Let’s start with the bottom line - there is no excuse for virtually any company today, regardless of size or manpower (and within reason), not to be making data analyics a part of their normal business routines. Traditional objections such as cost, resources and expertise no longer cut the mustard. As many observers have noted, a company’s internally generated data is a key asset that needs to be leveraged in the same way as any other corporate asset if the…Continue
Added by Gregory Thompson on July 30, 2017 at 4:30pm — No Comments
Customer analytics has been one of hottest buzzwords for years. Few years back it was only marketing department’s monopoly carried out with limited volumes of customer data, which was stored in relational databases like Oracle or appliances like Teradata and Netezza. SAS & SPSS were the leaders in providing customer analytics but it was restricted to conducting segmentation of customers who are likely to buy your products or services. In the 90’s came web…
Added by Sandeep Raut on July 29, 2017 at 7:30pm — No Comments
Many neural network applications implemented in Java, such as Neuroph, Encog and Joone, may look rather different when switching from the Java language to Python with the help of the DMelt computing environment. First of all, they look simpler. You can use your favorite Python tricks to load and display data. The Python coding is simpler for viewing and fast modifications. It does not require recompiling after each change. At the same time, the platform…Continue
Added by jwork.ORG on July 29, 2017 at 1:00pm — No Comments
The following problems appeared as assignments in the coursera course Data-Driven Astronomy.
Added by Sandipan Dey on July 29, 2017 at 12:00pm — No Comments
Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week.
Added by Vincent Granville on July 29, 2017 at 11:00am — No Comments
Guest blog by Vinod Sharma.
Abstract – Blockchain is a mystery story or provides the foundation for cryptocurrencies like Bitcoin. What’s different about blockchains compared to traditional big-data distributed databases like MongoDB. Its like featuring a product that contains small blocks of brain in form of dust but consider that the innovation efforts of several publicly traded asset managers and banks are also on this brain block dust quest. Computers…Continue
Added by Vincent Granville on July 29, 2017 at 10:30am — No Comments
This article was contributed by Nikita Johnson.
The cost of large scale data collection and annotation often makes the application of machine learning algorithms to new tasks or datasets prohibitively expensive. One…
Added by Emmanuelle Rieuf on July 28, 2017 at 6:00pm — No Comments
To simplify this task, my team has prepared an overview of the main existing recommendation system…Continue
Added by Luba Belokon on July 28, 2017 at 4:00am — No Comments
Here is an interesting visualization of machine learning algorithms:
Originally posted here. Also check out the following great visual summaries:…Continue
This article was written by John Hammink. John Hammink is an American engineer, musician, artist and linguist, with his own entry in Wikipedia. …
Added by Amelia Matteson on July 26, 2017 at 2:00pm — No Comments
This is part of a new series of articles: once or twice a month, we post previous articles that were very popular when first published. These articles are at least 6 month old but no more than 12 month old. The previous digest in this series was posted here a while back.
12 Great Blogs Posted in the last 12…Continue
Added by Vincent Granville on July 26, 2017 at 12:00pm — No Comments
Summary: There are a variety of new Automated Machine Learning (AML) platforms emerging that led us recently to ask if we’d be automated and unemployed any time soon. In this article we’ll cover the “Professional AML tools”. They require that you be fluent in R or Python which means that Citizen Data Scientists won’t be using them. They also significantly enhance productivity and reduce the redundant and tedious work that’s part of model…Continue
Added by William Vorhies on July 25, 2017 at 1:30pm — No Comments
Data collection is at record levels. Social media platforms, websites, and smartphones are just some examples of what most consumers use on a daily basis and are where significant data collection takes place. This infographic takes a look at the ethics of big-data collection.
Added by Jay Taylor on July 25, 2017 at 8:00am — No Comments