Added by Salman Khan on May 25, 2016 at 5:00am — No Comments
When we talk about understand what a business requires to improve and upscale their operations, it is widely recognized that there is a need to integrate data integration in real time. If we take the example of a retail business, it is necessary that we understand just how and to what level performance reports and field data inputs can be utilized to improve upon an existing business. A business’ success depends…Continue
Added by Daina Martin on May 25, 2016 at 2:51am — No Comments
In statistics, a outlier is defined as a observation which stands far away from the most of other observations. Often a outlier is present due to the measurements error. Therefore, one of the most important task in data analysis is to identify and (if is necessary) to remove the outliers.
There are different methods to detect the outliers, including standard deviation approach and Tukey’s method which use interquartile (IQR) range approach. In this post I will use…Continue
Added by Klodian on May 24, 2016 at 11:07pm — No Comments
I want to know more about the Boltzmann and MCMC techniques from a very basic level in a layman's language. Can someone guide me?
As a data scientist, your job doesn’t always make sense to others. Ever tried explaining what you do to your parents? They may nod their heads, but their eyes scream confusion.
Well, aside from possibly stifling job-related conversations, this isn’t a big deal. However, when it comes to explaining what you do to potential clients, who happen to be just as technology averse, it’s a major issue.
Here are some helpful tips for explaining exactly what you do to…Continue
Added by Larry Alton on May 24, 2016 at 7:30am — No Comments
Since its inception in the year 2008, the global Hadoop market has observed growth at a tremendous pace. This market, valued US$1.5 billion in 2012, is estimated to grow at a CAGR of 54.7% from 2012 to 2018. By the end of 2018, this market could amass a net worth of US$20.9 billion. With the massive amount of data generated every day across major industries, the global Hadoop market is anticipated to observe significant growth in the future as well.
Added by Ankit Jain on May 23, 2016 at 11:00pm — No Comments
There has been a lot of activity recently around revenue attribution - marketers want to develop a better understanding of their customer acquisition funnel and be able to measure progress against it. Most of this attention has been focused on the B2C space. However, less work has been done measuring the performance of B2B marketing activities.
Certainly the marketing automation segment is very vibrant with a large number of vendors (both big and small) providing solutions that…Continue
Added by Gregory Thompson on May 23, 2016 at 4:33pm — No Comments
This is one of the first comprehensive machine learning, data science, statistical science, and computer science repository -- featuring many brand new scalable, big-data algorithms published in the last two years, such as automated cataloging, causation detection, or model-free tests of hypotheses, in addition to the classics. The original title for this project was Handbook of Data Science, but over time, it grew much bigger than an handbook. This is still an ongoing…Continue
Added by Vincent Granville on May 23, 2016 at 2:10pm — No Comments
Data science student project contributed by Brian. Brian took NYC Data Science Academy 12 week full time Data Science Bootcamp program between Sept 23 to Dec 18, 2015. The post was based on his first class project (due the 2nd week of…Continue
Added by NYC Data Science Academy on May 23, 2016 at 9:00am — No Comments
Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and…Continue
A pdf version of this document created using latex can be downloaded by clicking here.
Polymorphic malware detection is challenging due to the continual mutations miscreants introduce to successive instances of a particular virus. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence…
Added by Jake Drew Ph.D. on May 22, 2016 at 6:32pm — No Comments
Big data is a term for data sets that are extremely large and complex that only a few short years ago were not capable of being processed with traditional data processing applications. Challenges in big data include the capture, search, sharing, storage, transfer, visualization, querying and privacy, among other concerns. Data sets are growing rapidly because there are increasingly more avenues for data including mobile devices, software logs, cameras, microphones, wireless networks,…Continue
Added by Sam Carr on May 22, 2016 at 10:00am — No Comments
Thousands of articles and tutorials have been written about data science and machine learning. Hundreds of books, courses and conferences are available. You could spend months just figuring out what to do to get started, even to understand what data science is about.
In this short contribution, I share what I believe to be the most valuable resources - a small list of top resources and starting points. This will be most valuable to any data practitioner who has very little free…Continue
Single regression on Exxon's stock
[Introduction of Multi-regression]
Let's recall our last job. We conducted the single regression on Exxon Mobil's stock along with WTI crude oil spot price. The result was fantastic, which accounts for 25% of the variation of stock movement. Put it in other way, R-square. The problem is "are you happy with the…
Machine Learning today tends to be “open-loop” – collect tons of data offline, process them in batches and generate insights for eventual action. There is an emerging category of ML business use cases that are called “In-Stream Analytics (ISA)”. Here, the data is processed as soon as it arrives and insights are generated quickly. However, action may be taken offline and the effects of the actions are not immediately incorporated back into the learning process. If we did, it is an…Continue
Added by PG Madhavan on May 20, 2016 at 5:30am — No Comments
Think cyber attacks are only a problem for large, multi-national companies? Not quite.
Small businesses are increasingly becoming the preferred target for cyber criminals and hackers. Research from online security company Symantec showed that more than half of all spear phishing attacks in 2015, which are done using fake emails, were against small…Continue
Added by Carmelo Hannity on May 20, 2016 at 4:10am — No Comments
Today we are really happy to host a post from Ariadni-Karolina Alexiou or Caroline in short. Caroline is a Data…Continue
Added by George Psistakis on May 20, 2016 at 3:00am — No Comments
At the Data Science Association our members often complain about the major data engineering problem of finding the right tools and programming models to build both robust data processing pipelines and efficient ETL processes for data transformation and integration.…
Added by Michael Walker on May 19, 2016 at 10:00pm — No Comments
Collaborative business intelligence is an environment In which users can communicate and collaborate each other with ease, they are able to sharing information, ideas, and decision making in their communities.
Each and every day, no one holds the millions of data items of intellectual property (telephone calls , conversations, and e-mails) in companies and organizations across the world. Using important collaborative software to…Continue
Added by Priyanka Jain on May 19, 2016 at 9:00pm — No Comments
Marketing measurement has long been an arcane field - companies interested in understanding how their marketing programs impacted revenue (or brand value) would hire expensive consultants who labored long and hard to deliver complex models at great cost to help their clients set high level marketing strategies and advertising budgets.
This worked well until the internet came along and changed the game - new digital channels and online marketing techniques were embraced by…Continue
Added by Gregory Thompson on May 19, 2016 at 11:00am — No Comments