Subscribe to DSC Newsletter

All Blog Posts (2,751)

Boltzmann and MCMC

I want to know more about the Boltzmann and MCMC techniques from a very basic level in a layman's language. Can someone guide me?

Added by Malay Kapoor on May 24, 2016 at 8:13am — No Comments

Tips for Effectively Communicating Complex Ideas to Non-Technical Clients

As a data scientist, your job doesn’t always make sense to others. Ever tried explaining what you do to your parents? They may nod their heads, but their eyes scream confusion.

Well, aside from possibly stifling job-related conversations, this isn’t a big deal. However, when it comes to explaining what you do to potential clients, who happen to be just as technology averse, it’s a major issue.

Here are some helpful tips for explaining exactly what you do to…

Continue

Added by Larry Alton on May 24, 2016 at 7:30am — No Comments

Why Hadoop? Streamlined Nature, Scalability and Cost-Effectiveness

Since its inception in the year 2008, the global Hadoop market has observed growth at a tremendous pace. This market, valued US$1.5 billion in 2012, is estimated to grow at a CAGR of 54.7% from 2012 to 2018. By the end of 2018, this market could amass a net worth of US$20.9 billion. With the massive amount of data generated every day across major industries, the global Hadoop market is anticipated to observe significant growth in the future as well.

Why…

Continue

Added by Ankit Jain on May 23, 2016 at 11:00pm — No Comments

Statistical Attribution & Optimization in the B2B World.

There has been a lot of activity recently around revenue attribution - marketers want to develop a better understanding of their customer acquisition funnel and be able to measure progress against it.  Most of this attention has been focused on the B2C space. However, less work has been done measuring the performance of B2B marketing activities. 

Certainly the marketing automation segment is very vibrant with a large number of vendors (both big and small) providing solutions that…

Continue

Added by Gregory Thompson on May 23, 2016 at 4:33pm — No Comments

Data Science & Machine Learning Encyclopedia - 4,000 Entries

This is one of the first comprehensive machine learning, data science, statistical science, and computer science repository -- featuring many brand new scalable, big-data algorithms published in the last two years, such as automated cataloging, causation detection, or model-free tests of hypotheses, in addition to the classics. The original title for this project was Handbook of Data Science, but over time, it grew much bigger than an handbook. This is still an ongoing…

Continue

Added by Vincent Granville on May 23, 2016 at 2:10pm — No Comments

Data Visualization: U.S. Smoking Rates, Cancer and Cigarette Tax

Data science student project contributed by Brian. Brian took NYC Data Science Academy 12 week full time Data Science Bootcamp program between Sept 23 to Dec 18, 2015. The post was based on his first class project (due the 2nd week of the program).

Overview

This project utilizes publicly available data to visualize…

Continue

Added by SupStat on May 23, 2016 at 9:00am — No Comments

Big Data on a Smaller Scale in Healthcare

Big data is a term for data sets that are extremely large and complex that only a few short years ago were not capable of being processed with traditional data processing applications. Challenges in big data include the capture, search, sharing, storage, transfer, visualization, querying and privacy, among other concerns. Data sets are growing rapidly because there are increasingly more avenues for data including mobile devices, software logs, cameras, microphones, wireless networks,…

Continue

Added by Sam Carr on May 22, 2016 at 10:00am — No Comments

Hitchhiker's Guide to Data Science, Machine Learning, R, Python

Thousands of articles and tutorials have been written about data science and machine learning. Hundreds of books, courses and conferences are available. You could spend months just figuring out what to do to get started, even to understand what data science is about.

In this short contribution, I share what I believe to be the most valuable resources - a small list of top resources and starting points. This will be most valuable to any data practitioner who has very little free…

Continue

Added by Vincent Granville on May 20, 2016 at 11:30am — No Comments

ADAPTIVE Machine Learning

Machine Learning today tends to be “open-loop” – collect tons of data offline, process them in batches and generate insights for eventual action. There is an emerging category of ML business use cases that are called “In-Stream Analytics (ISA)”. Here, the data is processed as soon as it arrives and insights are generated quickly. However, action may be taken offline and the effects of the actions are not immediately incorporated back into the learning process. If we did, it is an…

Continue

Added by PG Madhavan on May 20, 2016 at 5:30am — No Comments

Interview with Karolina Alexiou: Building Data pipelines

Today we are really happy to host a post from Ariadni-Karolina Alexiou or Caroline in short. Caroline is a Data…

Continue

Added by George Psistakis on May 20, 2016 at 3:00am — No Comments

Apache Beam - Create Data Processing Pipelines

At the Data Science Association our members often complain about the major data engineering problem of finding the right tools and programming models to build both robust data processing pipelines and efficient ETL processes for data transformation and integration.…



Continue

Added by Michael Walker on May 19, 2016 at 10:00pm — No Comments

Collaborative Business Intelligence Aspects

Collaborative business intelligence is an environment In which users can communicate and collaborate each other with ease, they are able to sharing information, ideas, and decision making in their communities.

Information retention

Each and every day, no one holds the millions of data items of intellectual property (telephone calls , conversations, and e-mails) in companies and organizations across the world. Using important collaborative software to…

Continue

Added by Priyanka Jain on May 19, 2016 at 9:00pm — No Comments

Blending Marketing Mix and Attribution

Marketing measurement has long been an arcane field - companies interested in understanding how their marketing programs impacted revenue (or brand value) would hire expensive consultants who labored long and hard to deliver complex models at great cost to help their clients set high level marketing strategies and advertising budgets.

 

This worked well until the internet came along and changed the game - new digital channels and online marketing techniques were embraced by…

Continue

Added by Gregory Thompson on May 19, 2016 at 11:00am — No Comments

Visual Analytics: Upcoming Revolution in Clinical Research?

Big data in Healthcare

The healthcare industry was a pioneer in consistently applying data mining techniques and analytics procedures to identify areas subject to optimization and potential improvements of clinical practice. The research methodology was typically focused on accepting or discarding an initial…

Continue

Added by Rafael San Miguel Carrasco on May 19, 2016 at 10:12am — No Comments

Control Structures Loops in R

As part of Data Science tutorial Series in my previous post I posted on basic data types in R. I have kept the tutorial very simple so that beginners of R programming  may takeoff immediately. 

Please find the online R editor at the end of the post so that you can execute the code on the page itself.

In this section we learn about control structures loops used…

Continue

Added by dataperspective on May 18, 2016 at 8:30pm — No Comments

Visualizing Bagged Trees as Approximating Borders

The bagged trees algorithm is a commonly used classification method. By resampling our data and creating trees for the resampled data, we can get an aggregated vote of classification prediction. In this blog post I will demonstrate how bagged trees work visualizing each step.…

Continue

Added by Maiia Bakhova on May 18, 2016 at 2:12pm — No Comments

Weekly Digest, May 23

Starred articles are new additions posted between Thursday and Sunday, published in the Monday edition exclusively. The Monday edition has six sections: (1) Featured Resources and Technical Contributions, (2) Featured Articles and Case Studies, (3) From our Sponsors, (4) News, Events, Books, Training, Forum Questions, (5) Picture of the Week, and (6) Syndicated Content. The Thursday edition covers articles…

Continue

Added by Vincent Granville on May 18, 2016 at 9:30am — No Comments

Curated Lists of Data Science, Machine Learning, Deep Learning and NLP resources

Here are three useful resources for learning about Data Science:

Continue

Added by Ujjwal Karn on May 18, 2016 at 8:59am — No Comments

xda: R package for exploratory data analysis (plotting, univariate, bivariate)

I created an R package for exploratory data analysis. You can read about it and install it here.  

The package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any…

Continue

Added by Ujjwal Karn on May 18, 2016 at 8:30am — No Comments

Visualizing Social Media Analytics: Beyond the Bar Chart

Recently, I rediscovered a TED Talk by David McCandless, a data journalist, called “The beauty of data visualization.” It’s a great reminder of how charts (though scary to many) can help you tell an actionable story about a topic in a way that bullet points alone usually cannot. If you have not seen the talk, I recommend you take a look for some inspiration about visualizing big…

Continue

Added by Chris Atwood on May 18, 2016 at 3:57am — 1 Comment

Monthly Archives

2016

2015

2014

2013

2012

2011

1999

Follow Us

Announcements

Loading… Loading feed

Videos

  • Add Videos
  • View All

Resources

© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service