Subscribe to DSC Newsletter

May 2016 Blog Posts (94)

Twitter Analytics using Tweepsmap

This morning I saw #tweepsmap on my twitter feed and decided to…

Continue

Added by Salman Khan on May 25, 2016 at 5:00am — No Comments

Four Effective Field Data Collection Software/App for Your Team

When we talk about understand what a business requires to improve and upscale their operations, it is widely recognized that there is a need to integrate data integration in real time. If we take the example of a retail business, it is necessary that we understand just how and to what level performance reports and field data inputs can be utilized to improve upon an existing business. A business’ success depends…

Continue

Added by Daina Martin on May 25, 2016 at 2:51am — No Comments

Identify, describe, plot, and remove the outliers from the dataset with R (rstats)

In statistics, a outlier is defined as a observation which stands far away from the most of other observations. Often a outlier is present due to the measurements error. Therefore, one of the most important task in data analysis is to identify and (if is necessary) to remove the outliers.

There are different methods to detect the outliers, including standard deviation approach and Tukey’s method which use interquartile (IQR) range approach. In this post I will use…

Continue

Added by Klodian on May 24, 2016 at 11:07pm — No Comments

Boltzmann and MCMC

I want to know more about the Boltzmann and MCMC techniques from a very basic level in a layman's language. Can someone guide me?

Added by Malay Kapoor on May 24, 2016 at 8:13am — 3 Comments

Tips for Effectively Communicating Complex Ideas to Non-Technical Clients

As a data scientist, your job doesn’t always make sense to others. Ever tried explaining what you do to your parents? They may nod their heads, but their eyes scream confusion.

Well, aside from possibly stifling job-related conversations, this isn’t a big deal. However, when it comes to explaining what you do to potential clients, who happen to be just as technology averse, it’s a major issue.

Here are some helpful tips for explaining exactly what you do to…

Continue

Added by Larry Alton on May 24, 2016 at 7:30am — No Comments

Why Hadoop? Streamlined Nature, Scalability and Cost-Effectiveness

Since its inception in the year 2008, the global Hadoop market has observed growth at a tremendous pace. This market, valued US$1.5 billion in 2012, is estimated to grow at a CAGR of 54.7% from 2012 to 2018. By the end of 2018, this market could amass a net worth of US$20.9 billion. With the massive amount of data generated every day across major industries, the global Hadoop market is anticipated to observe significant growth in the future as well.

Why…

Continue

Added by Ankit Jain on May 23, 2016 at 11:00pm — No Comments

Statistical Attribution & Optimization in the B2B World.

There has been a lot of activity recently around revenue attribution - marketers want to develop a better understanding of their customer acquisition funnel and be able to measure progress against it.  Most of this attention has been focused on the B2C space. However, less work has been done measuring the performance of B2B marketing activities. 

Certainly the marketing automation segment is very vibrant with a large number of vendors (both big and small) providing solutions that…

Continue

Added by Gregory Thompson on May 23, 2016 at 4:33pm — No Comments

Data Science & Machine Learning Encyclopedia - 4,000 Entries

This is one of the first comprehensive machine learning, data science, statistical science, and computer science repository -- featuring many brand new scalable, big-data algorithms published in the last two years, such as automated cataloging, causation detection, or model-free tests of hypotheses, in addition to the classics. The original title for this project was Handbook of Data Science, but over time, it grew much bigger than an handbook. This is still an ongoing…

Continue

Added by Vincent Granville on May 23, 2016 at 2:10pm — No Comments

Data Visualization: U.S. Smoking Rates, Cancer and Cigarette Tax

Data science student project contributed by Brian. Brian took NYC Data Science Academy 12 week full time Data Science Bootcamp program between Sept 23 to Dec 18, 2015. The post was based on his first class project (due the 2nd week of…

Continue

Added by NYC Data Science Academy on May 23, 2016 at 9:00am — No Comments

Crime Analysis with Zeppelin, R & Spark

Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and…

Continue

Added by Raghavan Madabusi on May 23, 2016 at 1:30am — 1 Comment

Polymorphic Malware Detection Using Sequence Classification Methods

A pdf version of this document created using latex can be downloaded by clicking here.

Abstract



Polymorphic malware detection is challenging due to the continual mutations miscreants introduce to successive instances of a particular virus. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence…

Continue

Added by Jake Drew Ph.D. on May 22, 2016 at 6:32pm — No Comments

Big Data on a Smaller Scale in Healthcare

Big data is a term for data sets that are extremely large and complex that only a few short years ago were not capable of being processed with traditional data processing applications. Challenges in big data include the capture, search, sharing, storage, transfer, visualization, querying and privacy, among other concerns. Data sets are growing rapidly because there are increasingly more avenues for data including mobile devices, software logs, cameras, microphones, wireless networks,…

Continue

Added by Sam Carr on May 22, 2016 at 10:00am — No Comments

Hitchhiker's Guide to Data Science, Machine Learning, R, Python

Thousands of articles and tutorials have been written about data science and machine learning. Hundreds of books, courses and conferences are available. You could spend months just figuring out what to do to get started, even to understand what data science is about.

In this short contribution, I share what I believe to be the most valuable resources - a small list of top resources and starting points. This will be most valuable to any data practitioner who has very little free…

Continue

Added by Vincent Granville on May 20, 2016 at 11:30am — 8 Comments

Multi-Regression in R (Exxon Mobil stock price ~ WTI, Gas, and S&P500)

[Previous Post]

Single regression on Exxon's stock



[Introduction of Multi-regression]



Let's recall our last job. We conducted the single regression on Exxon Mobil's stock along with WTI crude oil spot price. The result was fantastic, which accounts for 25% of the variation of stock movement. Put it in other way, R-square. The problem is "are you happy with the…

Continue

Added by Gregory Choi on May 20, 2016 at 9:05am — 1 Comment

ADAPTIVE Machine Learning

Machine Learning today tends to be “open-loop” – collect tons of data offline, process them in batches and generate insights for eventual action. There is an emerging category of ML business use cases that are called “In-Stream Analytics (ISA)”. Here, the data is processed as soon as it arrives and insights are generated quickly. However, action may be taken offline and the effects of the actions are not immediately incorporated back into the learning process. If we did, it is an…

Continue

Added by PG Madhavan on May 20, 2016 at 5:30am — No Comments

4 Small Business Cyber Security Weaknesses and Tips to Avoid Them

Think cyber attacks are only a problem for large, multi-national companies? Not quite.

Small businesses are increasingly becoming the preferred target for cyber criminals and hackers. Research from online security company Symantec showed that more than half of all spear phishing attacks in 2015, which are done using fake emails, were against small…

Continue

Added by Carmelo Hannity on May 20, 2016 at 4:10am — No Comments

Interview with Karolina Alexiou: Building Data pipelines

Today we are really happy to host a post from Ariadni-Karolina Alexiou or Caroline in short. Caroline is a Data…

Continue

Added by George Psistakis on May 20, 2016 at 3:00am — No Comments

Apache Beam - Create Data Processing Pipelines

At the Data Science Association our members often complain about the major data engineering problem of finding the right tools and programming models to build both robust data processing pipelines and efficient ETL processes for data transformation and integration.…



Continue

Added by Michael Walker on May 19, 2016 at 10:00pm — No Comments

Collaborative Business Intelligence Aspects

Collaborative business intelligence is an environment In which users can communicate and collaborate each other with ease, they are able to sharing information, ideas, and decision making in their communities.

Information retention

Each and every day, no one holds the millions of data items of intellectual property (telephone calls , conversations, and e-mails) in companies and organizations across the world. Using important collaborative software to…

Continue

Added by Priyanka Jain on May 19, 2016 at 9:00pm — No Comments

Blending Marketing Mix and Attribution

Marketing measurement has long been an arcane field - companies interested in understanding how their marketing programs impacted revenue (or brand value) would hire expensive consultants who labored long and hard to deliver complex models at great cost to help their clients set high level marketing strategies and advertising budgets.

 

This worked well until the internet came along and changed the game - new digital channels and online marketing techniques were embraced by…

Continue

Added by Gregory Thompson on May 19, 2016 at 11:00am — No Comments

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

1999

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service