Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and…Continue
A pdf version of this document created using latex can be downloaded by clicking here.
Polymorphic malware detection is challenging due to the continual mutations miscreants introduce to successive instances of a particular virus. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence…
Added by Jake Drew Ph.D. on May 22, 2016 at 6:32pm — No Comments
Big data is a term for data sets that are extremely large and complex that only a few short years ago were not capable of being processed with traditional data processing applications. Challenges in big data include the capture, search, sharing, storage, transfer, visualization, querying and privacy, among other concerns. Data sets are growing rapidly because there are increasingly more avenues for data including mobile devices, software logs, cameras, microphones, wireless networks,…Continue
Added by Sam Carr on May 22, 2016 at 10:00am — No Comments
Thousands of articles and tutorials have been written about data science and machine learning. Hundreds of books, courses and conferences are available. You could spend months just figuring out what to do to get started, even to understand what data science is about.
In this short contribution, I share what I believe to be the most valuable resources - a small list of top resources and starting points. This will be most valuable to any data practitioner who has very little free…Continue
Single regression on Exxon's stock
[Introduction of Multi-regression]
Let's recall our last job. We conducted the single regression on Exxon Mobil's stock along with WTI crude oil spot price. The result was fantastic, which accounts for 25% of the variation of stock movement. Put it in other way, R-square. The problem is "are you happy with the…
Machine Learning today tends to be “open-loop” – collect tons of data offline, process them in batches and generate insights for eventual action. There is an emerging category of ML business use cases that are called “In-Stream Analytics (ISA)”. Here, the data is processed as soon as it arrives and insights are generated quickly. However, action may be taken offline and the effects of the actions are not immediately incorporated back into the learning process. If we did, it is an…Continue
Added by PG Madhavan on May 20, 2016 at 5:30am — No Comments
Today we are really happy to host a post from Ariadni-Karolina Alexiou or Caroline in short. Caroline is a Data Engineer, with deep expertise in Python for Data Applications, Web…Continue
Added by George Psistakis on May 20, 2016 at 3:00am — No Comments
At the Data Science Association our members often complain about the major data engineering problem of finding the right tools and programming models to build both robust data processing pipelines and efficient ETL processes for data transformation and integration.…
Added by Michael Walker on May 19, 2016 at 10:00pm — No Comments
The healthcare industry was a pioneer in consistently applying data mining techniques and analytics procedures to identify areas subject to optimization and potential improvements of clinical practice. The research methodology was typically focused on accepting or discarding an initial…Continue
Added by Rafael San Miguel Carrasco on May 19, 2016 at 10:12am — No Comments
As part of Data Science tutorial Series in my previous post I posted on basic data types in R. I have kept the tutorial very simple so that beginners of R programming may takeoff immediately.
Please find the online R editor at the end of the post so that you can execute the code on the page itself.
In this section we learn about control structures loops used…
Added by dataperspective on May 18, 2016 at 8:30pm — No Comments
The bagged trees algorithm is a commonly used classification method. By resampling our data and creating trees for the resampled data, we can get an aggregated vote of classification prediction. In this blog post I will demonstrate how bagged trees work visualizing each step.…Continue
Added by Maiia Bakhova on May 18, 2016 at 2:12pm — No Comments
Starred articles are new additions posted between Thursday and Sunday, published in the Monday edition exclusively. The Monday edition has six sections: (1) Featured Resources and Technical Contributions, (2) Featured Articles and Case Studies, (3) From our Sponsors, (4) News, Events, Books, Training, Forum Questions, (5) Picture of the Week, and (6) Syndicated Content. The Thursday edition covers articles…Continue
Added by Vincent Granville on May 18, 2016 at 9:30am — No Comments
Here are three useful resources for learning about Data Science:
Added by Ujjwal Karn on May 18, 2016 at 8:59am — No Comments
I created an R package for exploratory data analysis. You can read about it and install it here.
The package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any…Continue
Added by Ujjwal Karn on May 18, 2016 at 8:30am — No Comments
Recently, I rediscovered a TED Talk by David McCandless, a data journalist, called “The beauty of data visualization.” It’s a great reminder of how charts (though scary to many) can help you tell an actionable story about a topic in a way that bullet points alone usually cannot. If you have not seen the talk, I recommend you take a look for some inspiration about visualizing big…Continue
There can be no doubt that technology trends over the years point to a rapid change in user requirements. The days of relying on a large, clunky desktop PC to provide a portal to the internet and other traditional, desktop-only applications are quickly diminishing. While PC sales continue to plummet, smartphone sales continue to…Continue
Added by Tom Jardine on May 17, 2016 at 11:00pm — No Comments
Bernard Marr is a best-selling business author, keynote speaker and consultant in big data, analytics and enterprise performance. As the founder and CEO of the Advanced Performance Institute he is one of the world's most highly respected thought leaders anywhere when it comes to data in business. He regularly advises companies and government organisations on how to improve their performance and gain better insights from their data. …Continue
Added by Vincent Granville on May 16, 2016 at 11:02am — No Comments
Summary: Proof of Concept projects are a popular place to start but they may be the wrong solution. To ensure success focus on Proof of Value and alignment with the company’s strategy. Get the right executive sponsor and keep them involved.
Added by William Vorhies on May 16, 2016 at 9:17am — No Comments
In the healthcare industry, what could be more important than having better healthcare outcomes? Each and every day healthcare workers around the globe are striving hard to find more ways of improving our lives. However, the world is changing, and frankly, at a faster rate than most of us can keep up. Intuition alone will no longer be enough for quality patient outcomes. The amount of healthcare data continues to mound every second, making it harder and harder to find any form of…Continue
We are constantly generating increasing volumes of data with everything we do. During a recent business trip, I started thinking about how travelling presents a great example of this. With the explosion of the Internet of Things into our lives, the amount of analysable data we leave behind us as we go about our day-to-day lives is growing exponentially. So I decided to try and identify some key bits of data I generated and left behind on a trip from my home in…Continue
Added by Bernard Marr on May 14, 2016 at 9:30am — No Comments