Subscribe to DSC Newsletter

Featured Blog Posts – May 2016 Archive (71)

Crime Analysis with Zeppelin, R & Spark

Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and…

Continue

Added by Raghavan Madabusi on May 23, 2016 at 1:30am — 1 Comment

Polymorphic Malware Detection Using Sequence Classification Methods

A pdf version of this document created using latex can be downloaded by clicking here.

Abstract



Polymorphic malware detection is challenging due to the continual mutations miscreants introduce to successive instances of a particular virus. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence…

Continue

Added by Jake Drew Ph.D. on May 22, 2016 at 6:32pm — No Comments

Big Data on a Smaller Scale in Healthcare

Big data is a term for data sets that are extremely large and complex that only a few short years ago were not capable of being processed with traditional data processing applications. Challenges in big data include the capture, search, sharing, storage, transfer, visualization, querying and privacy, among other concerns. Data sets are growing rapidly because there are increasingly more avenues for data including mobile devices, software logs, cameras, microphones, wireless networks,…

Continue

Added by Sam Carr on May 22, 2016 at 10:00am — No Comments

Hitchhiker's Guide to Data Science, Machine Learning, R, Python

Thousands of articles and tutorials have been written about data science and machine learning. Hundreds of books, courses and conferences are available. You could spend months just figuring out what to do to get started, even to understand what data science is about.

In this short contribution, I share what I believe to be the most valuable resources - a small list of top resources and starting points. This will be most valuable to any data practitioner who has very little free…

Continue

Added by Vincent Granville on May 20, 2016 at 11:30am — 8 Comments

Multi-Regression in R (Exxon Mobil stock price ~ WTI, Gas, and S&P500)

[Previous Post]

Single regression on Exxon's stock



[Introduction of Multi-regression]



Let's recall our last job. We conducted the single regression on Exxon Mobil's stock along with WTI crude oil spot price. The result was fantastic, which accounts for 25% of the variation of stock movement. Put it in other way, R-square. The problem is "are you happy with the…

Continue

Added by Gregory Choi on May 20, 2016 at 9:05am — 1 Comment

ADAPTIVE Machine Learning

Machine Learning today tends to be “open-loop” – collect tons of data offline, process them in batches and generate insights for eventual action. There is an emerging category of ML business use cases that are called “In-Stream Analytics (ISA)”. Here, the data is processed as soon as it arrives and insights are generated quickly. However, action may be taken offline and the effects of the actions are not immediately incorporated back into the learning process. If we did, it is an…

Continue

Added by PG Madhavan on May 20, 2016 at 5:30am — No Comments

Interview with Karolina Alexiou: Building Data pipelines

Today we are really happy to host a post from Ariadni-Karolina Alexiou or Caroline in short. Caroline is a Data Engineer, with deep expertise in Python for Data Applications, Web…

Continue

Added by George Psistakis on May 20, 2016 at 3:00am — No Comments

Apache Beam - Create Data Processing Pipelines

At the Data Science Association our members often complain about the major data engineering problem of finding the right tools and programming models to build both robust data processing pipelines and efficient ETL processes for data transformation and integration.…



Continue

Added by Michael Walker on May 19, 2016 at 10:00pm — No Comments

Visual Analytics: Upcoming Revolution in Clinical Research?

Big data in Healthcare

The healthcare industry was a pioneer in consistently applying data mining techniques and analytics procedures to identify areas subject to optimization and potential improvements of clinical practice. The research methodology was typically focused on accepting or discarding an initial…

Continue

Added by Rafael San Miguel Carrasco on May 19, 2016 at 10:12am — No Comments

Control Structures Loops in R

As part of Data Science tutorial Series in my previous post I posted on basic data types in R. I have kept the tutorial very simple so that beginners of R programming  may takeoff immediately. 

Please find the online R editor at the end of the post so that you can execute the code on the page itself.

In this section we learn about control structures loops used…

Continue

Added by dataperspective on May 18, 2016 at 8:30pm — No Comments

Visualizing Bagged Trees as Approximating Borders

The bagged trees algorithm is a commonly used classification method. By resampling our data and creating trees for the resampled data, we can get an aggregated vote of classification prediction. In this blog post I will demonstrate how bagged trees work visualizing each step.…

Continue

Added by Maiia Bakhova on May 18, 2016 at 2:12pm — No Comments

Weekly Digest, May 23

Starred articles are new additions posted between Thursday and Sunday, published in the Monday edition exclusively. The Monday edition has six sections: (1) Featured Resources and Technical Contributions, (2) Featured Articles and Case Studies, (3) From our Sponsors, (4) News, Events, Books, Training, Forum Questions, (5) Picture of the Week, and (6) Syndicated Content. The Thursday edition covers articles…

Continue

Added by Vincent Granville on May 18, 2016 at 9:30am — No Comments

Curated Lists of Data Science, Machine Learning, Deep Learning and NLP resources

Here are three useful resources for learning about Data Science:

Continue

Added by Ujjwal Karn on May 18, 2016 at 8:59am — No Comments

xda: R package for exploratory data analysis (plotting, univariate, bivariate)

I created an R package for exploratory data analysis. You can read about it and install it here.  

The package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any…

Continue

Added by Ujjwal Karn on May 18, 2016 at 8:30am — No Comments

Visualizing Social Media Analytics: Beyond the Bar Chart

Recently, I rediscovered a TED Talk by David McCandless, a data journalist, called “The beauty of data visualization.” It’s a great reminder of how charts (though scary to many) can help you tell an actionable story about a topic in a way that bullet points alone usually cannot. If you have not seen the talk, I recommend you take a look for some inspiration about visualizing big…

Continue

Added by Chris Atwood on May 18, 2016 at 3:57am — 3 Comments

Data Migration and Cloud-Based Analytics

There can be no doubt that technology trends over the years point to a rapid change in user requirements. The days of relying on a large, clunky desktop PC to provide a portal to the internet and other traditional, desktop-only applications are quickly diminishing. While PC sales continue to plummet, smartphone sales continue to…

Continue

Added by Tom Jardine on May 17, 2016 at 11:00pm — No Comments

10 Great Data Science Articles by Bernard Marr

Bernard Marr is a best-selling business author, keynote speaker and consultant in big data, analytics and enterprise performance. As the founder and CEO of the Advanced Performance Institute he is one of the world's most highly respected thought leaders anywhere when it comes to data in business. He regularly advises companies and government organisations on how to improve their performance and gain better insights from their data. …

Continue

Added by Vincent Granville on May 16, 2016 at 11:02am — No Comments

The POC Problem

Summary:  Proof of Concept projects are a popular place to start but they may be the wrong solution.  To ensure success focus on Proof of Value and alignment with the company’s strategy.  Get the right executive sponsor and keep them involved.

 

If you Google ‘Data Science Proof of Concept’ you will find dozens if not hundreds of articles…

Continue

Added by William Vorhies on May 16, 2016 at 9:17am — No Comments

How Healthcare industry will benefit by embracing Data Sciences

In the healthcare industry, what could be more important than having better healthcare outcomes? Each and every day healthcare workers around the globe are striving hard to find more ways of improving our lives. However, the world is changing, and frankly, at a faster rate than most of us can keep up. Intuition alone will no longer be enough for quality patient outcomes. The amount of healthcare data continues to mound every second, making it harder and harder to find any form of…

Continue

Added by Sameer Dhanrajani on May 16, 2016 at 6:05am — 2 Comments

The Astonishing Big Data Generated In A Single Journey

We are constantly generating increasing volumes of data with everything we do. During a recent business trip, I started thinking about how travelling presents a great example of this. With the explosion of the Internet of Things into our lives, the amount of analysable data we leave behind us as we go about our day-to-day lives is growing exponentially. So I decided to try and identify some key bits of data I generated and left behind on a trip from my home in…

Continue

Added by Bernard Marr on May 14, 2016 at 9:30am — No Comments

Featured Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service