Subscribe to DSC Newsletter

Featured Blog Posts – February 2017 Archive (79)

For tax purposes, how do you define a robot?

More and more people are talking about the new economy, and in particular, the role played by robots. As jobs are being eliminated and replaced by robots, governments are losing tax money. There are discussions as to whether robots should be taxed. …


Added by Vincent Granville on February 21, 2017 at 5:00pm — 1 Comment

In Search of Artificial General Intelligence (AGI)

Summary:  Looking beyond today’s commercial applications of AI, where and how far will we progress toward an Artificial Intelligence with truly human-like reasoning and capability?  This is about the pursuit of Artificial General Intelligence (AGI).


There is no question that we’re making a lot of progress in artificial intelligence (AI).  So much so that we are rapidly approaching or have already arrived at a plateau in development where more effort is…


Added by William Vorhies on February 21, 2017 at 8:30am — No Comments

Data Matching – Entity Identification, Resolution & Linkage

Data matching is the task of identifying, matching, and merging records that correspond to the same entities from several source systems. The entities under consideration most commonly refer to people, places, publications or citations, consumer products, or businesses. Besides data matching, the names most prominently used are record or data linkage, entity resolution, object identification, or field matching.

A major challenge in data matching is the lack of common entity…


Added by Raghavan Madabusi on February 20, 2017 at 2:30pm — No Comments

Python vs R: 4 Implementations of Same Machine Learning Technique

Actually, this is about two R versions (standard and improved), a Python version, and a Perl version of a new machine learning technique recently published here. We asked for help to translate the original Perl script to Python and R, and finally decided to work with …


Added by Vincent Granville on February 20, 2017 at 1:30pm — 4 Comments

Storytelling And Bot Making

Posted with permission from author: Vaisagh Viswanathan

Chatbot frameworks, toolkits and the rest make it easy to build bots these days. And everyone seems to building some kind of bot or the other. How do you design a good chatbot? How does that have anything to do with storytelling?…


Added by Sudhanshu Ahuja on February 20, 2017 at 4:30am — No Comments

Top Hadoop Interview Questions & Answers

Q1. What exactly is Hadoop?

A1. Hadoop is a Big Data framework to process huge amount of different types of data in parallel to achieve performance benefits.

Q2. What are 5 Vs of Big Data ?

A2. Volume – Size of the data

Velocity – Speed of change of data

Variety – Different types of data : Structured, Semi-Structured, Unstructured data.

Q3. Give me examples of Unstructured data.

A3. Images, Videos, Audios etc.

Q4. Tell me about Hadoop file system…


Added by Sarvesh Kumar on February 20, 2017 at 1:30am — No Comments

Executive Guide to Artificial Intelligence

 Only Homo sapiens, of all the descendants of Homo erectus, survived on earth whereas other species such as homo soloensis, homo denisova, Homo neanderthalensis, Homo floresiensis faded away more than 40,000 years ago. What advantages did Homo sapiens possess that helped them to flourish while other species are extinct? Apparently a cognitive revolution (according to Prof. Yuval Harari in his famous book Sapiens) triggered by some kind of genetic mutation provided Homo Species with more…


Added by Amith Parameshwara on February 19, 2017 at 7:30am — No Comments

Timestamp Data Visualization by Matplotlib

A large volume of timestamp data is a reality, this is common when we are dealing with networked devices. Typically a network of devices generate a large number of alerts. Mining of alert dataset provides insights about the network .

Recently, I came across a situation where a business user was looking for a multidimensional visualization of timestamp data.  Data was  about a network  of thousand plus devices and alarms  generated from the devices  about the status of the network  - …


Added by Jishnu Bhattacharya on February 19, 2017 at 7:00am — 3 Comments

Weekly Digest, February 20

Monday newsletter published by Data Science Central. Previous editions can be found here.  The contribution flagged with a + is our selection for the picture of the week.

Upcoming DSC Webinar


Added by Vincent Granville on February 18, 2017 at 10:30am — No Comments

Analytics as Value lever in Oil and Gas industry

Over the decades, oil and gas companies have built their core skills in many areas such as engineering innovation, project execution, process management, risk management etc.  These core capabilities have been traditionally serving as the key value levers for companies in this sector. As benefits from these levers reach plateau, along with pressure from oil price, policy risks, political risks etc, these companies are looking at fortifying these levers with big data analytics as well as…


Added by Amith Parameshwara on February 18, 2017 at 9:30am — No Comments

Internal Capacity, External Demand, and the Metrics of Consumption

In my blogs, I often distinguish between event data and metrics.  I usually say something to the effect that events help to explain the metrics - or events “provide the story behind the metrics.”  In this blog, I will be discussing two competing lines of thought behind events:  internal capacity and external demand.  Why do sales appear much lower for the month of June compared to July?  Some explanations relating to internal capacity are as follows:  “There weren’t enough agents in June to…


Added by Don Philip Faithful on February 18, 2017 at 6:30am — No Comments

The Twilight Zone Between True and False

Recently we read a lot about fake news, alternate facts and journalism lies. Companies like Facebook develop data science algorithms to detect these postings, based among other things on crowd sourcing (collective intelligence.)

But can the data scientist, with her inquisitive mind and strong sense of numbers and probabilities, use her brain to assess how true a piece…


Added by Vincent Granville on February 16, 2017 at 4:30pm — No Comments

The Mathematics of Machine Learning

Guest blog post by Wale Akinfaderin, PhD Candidate in Physics. 

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and…


Added by Vincent Granville on February 15, 2017 at 8:00pm — 7 Comments

How Uber Depends on Data Analytics to Deliver Extreme Customer Service – Face To Face With Uber’s Chief Data Architect

From a simple limo hailing app for friends to the world’s go-to taxi app. Uber’s growth in the approximately 7 years of existence can be described by one word, “Phenomenal”.

But there’s another way to define Uber, one that not many have given thought to.  Uber is a Big Data company, on the likes of Google and Amazon. It not only uses existing…


Added by Raj Dalal on February 14, 2017 at 7:00pm — No Comments

Indicator Based Recommenders – The One We Missed

Summary:  In our recent article on “5 Types of Recommenders” we failed to mention Indicator-Based Recommenders.  These have some unique features and ease of implementation that may be important in your selection of a recommender strategy.


A few weeks ago in the midst of our series on recommenders we published an article “5 Types of Recommenders” in which…


Added by William Vorhies on February 14, 2017 at 9:38am — 1 Comment

Data Engineering vs. Data Science Infographic

If you're interested in the field of analytics, you've probably heard the terms Data Engineering and Data Science, but do you know the difference? Although there has historically been considerable overlap between the two professions, they are each becoming more distinct. Here is  an infographic to help you understand the skills and responsibilities of each role. You'll also get a chance to compare salaries, popular software and tools used by each, and some educational resources to…


Added by Jake Moody on February 14, 2017 at 8:30am — 2 Comments

Selecting Forecasting Methods in Data Science

We are dealing with plethora of data and information in the world today and expectation is to predict and forecast how we can gain competitive advantage based on the information that we have, to act in advance. We look forward to define and furnish various methods based on our gut feel, past historical data, simple mathematical averages, and many more to get an incredibly precise prediction. With advanced analytics and data science, we develop “always-on” forecasting…


Added by Kamala Kanta Mishra on February 13, 2017 at 11:30pm — 1 Comment

A Quick Guide on How to Prevail in the Graph Database Arena


There are endless discussions on the databases arena about which DBMS is best suited for operational or data warehousing analytics, which one is the most efficient for online transaction processing, or which one is suitable for semantic integration. Recently graph databases are growing in popularity, especially in the enterprise space, and perhaps that adds more headache on those vendors that try to differentiate from competition…


Added by Athanassios Hatzis on February 13, 2017 at 10:30pm — 2 Comments

A Discussion: IT Data, Ambiguities & Classification model performance

“Ambiguity is pervasive” – true to its definition, as increasingly data getting generated, system connectivity reaching its peak, data and outcome are diverging. IT systems are evolving from “BIG DATA” to “BIGGER DATA” systems. Not all of this data is structured and easily consumable, thus challenge is posed by nexus of technology & “Data Greed”.

Having said this, fact is that future is found in ambiguity and chaos. We will never have complete and perfect information or a full…


Added by Awadesh Tiwari on February 13, 2017 at 7:00pm — No Comments

23 types of regression

This contribution is from David Corliss. David teaches a class on this subject, giving a (very brief) description of 23 regression methods in just an hour, with an example and the package and procedures used for each case. 

Here you can check the webcast done for Central Michigan University. The slide deck can be found…


Added by Vincent Granville on February 13, 2017 at 5:00pm — 3 Comments

Featured Monthly Archives












© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service