Subscribe to DSC Newsletter

All Blog Posts (5,250)

Building an expert system for NLP

Smart tags process: an algorithm for efficiently extracting useful information from a piece of text and storing it in a retrieval system.

The knowledge is extracted by asking the reader to answers a certain number of questions. Every time the answers to a question is yes, specifics tags are collected and stored. Every time the answer to a question is no, specific tags are also collected and stored. Some question ask the user to select in a list. In this case, all the elements…


Added by Issoufou Seidou Sanda on August 5, 2018 at 7:30am — No Comments

BI strategy beyond Excel

(This post originally appeared here) …


Added by Matthew Gierc on August 4, 2018 at 10:30am — No Comments

22 Differences Between Junior and Senior Data Scientists

What do experienced data scientists know that beginner data scientists don't know? Here is a quick overview.

  1. Automating tasks. Writing code that writes code.
  2. Outsourcing tasks to junior members or to consultants.
  3. Managing people, hiring the right people, managing managers who report to you.
  4. Training colleagues who might not be tech-savvy. Be an adviser for senior managers.
  5. Identifying the right tools and assessing the benefits and minuses of…

Added by Vincent Granville on August 4, 2018 at 8:00am — No Comments

Weekly Digest, August 6

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week.

Featured Resources and Technical Contributions


Added by Vincent Granville on August 4, 2018 at 6:30am — No Comments

Open Peer-to-Peer Communications to Facilitate Real-time Insights Sharing

My blog “Blockchain + Analytics: Enabling Smart IOT” drew some great feedback asking me to clarify my autonomous vehicle example that used blockchain as a means of near real-time, peer-to-peer communications between clusters of intelligent devices and machines.  But first, some background.

Edge analytics within an Internet of Things (IOT) world is very…


Added by Bill Schmarzo on August 4, 2018 at 6:08am — No Comments

Overview and Classification of Machine Learning Problems

  Topic Difficulty Level

(High / Low)
Questions Refs / Answers
1. Text Mining L Explain :TFIDF,  Stanford NLP, Sentiment Analysis, Topic Modelling  
2. Text Mining H Explain Word2Vec. Explain how word vectors are…

Added by Rohit Walimbe on August 4, 2018 at 5:00am — No Comments

Everipedia as a desk reference for data mining topics

One interesting metric to check the  usefulness of Everipedia as a desk reference for data mining is to compare the number of relevant articles. Go to Everipedia ( and search for "data mining". You will get 7 articles.Then go to Wikipedia and search "data mining" You will see 4 articles (overlapped with similar Everipedia  articles).

Another example. Try the word "smoothing" which is a popular topic in data analysis.…


Added by jwork.ORG on August 2, 2018 at 1:34pm — No Comments

Harnessing the power of data to transform asset-intensive value chains

In the twentieth century, oil was the most valuable resource – but not anymore. In today’s digital age data is the new oil. It will play a similar, perhaps bigger role, becoming a game changer that provides power in terms of information and competitive advantage through actionable insights. Some…


Added by Amit Supe on August 2, 2018 at 10:15am — No Comments

Thursday News: Apache Spark, ML with C++, Deep Learning, AI, R, Trend Analysis...

Here is our selection of featured articles, forum questions, and resources posted since Monday.



Added by Vincent Granville on August 2, 2018 at 9:00am — No Comments

Scalable IoT ML Platform with Apache Kafka + Deep Learning + MQTT

I built a scenario for a hybrid machine learning infrastructure leveraging Apache Kafka as scalable central nervous system. The public cloud is used for training analytic models at extreme scale (e.g. using TensorFlow and TPUs on Google Cloud Platform (GCP) via Google ML Engine. The predictions (i.e.…


Added by Kai Waehner on August 1, 2018 at 11:29pm — No Comments

Need guidance: beginner data/business analyst

I have completed the following through courses at coursera:

R-Programming, Getting and cleaning data – John Hopkins University

Introduction to SQL – University of Michigan

Managing Big Data with MySQL and TERADATA – Duke University

Data Visualization and Communication with Tableau – Duke University

I am currently working on python courses. So i think i have got the basics covered. 

I have two questions:

1) what next to study? I am having a…


Added by Faaran Saleem on July 31, 2018 at 2:00pm — No Comments

Comparing the Four Major AI Strategies

Summary: Now that we’ve detailed the four main AI-first strategies:  Data Dominance, Vertical, Horizontal, and Systems of Intelligence, it’s time to pick.  Here we provide side-by-side comparison and our opinion on the winner(s) for your own AI-first startup.




Added by William Vorhies on July 31, 2018 at 8:20am — No Comments

It's Not Digital Transformation; It’s “Intelligence Transformation” We Seek

Forrester published a report titled “The Sorry State of Digital Transformation in 2018” (love the brashness of the title) that found that 21% of 1,559 business and IT decision makers consider their digital transformations complete.  Complete? Say what?!

The concept of “Digital Transformation” is confusing because many CIO’s (at least 21%) and their…


Added by Bill Schmarzo on July 30, 2018 at 3:47pm — No Comments

Top 10 Challenges to Practicing Data Science at Work

This article was written by Bob Hayes

A recent survey of over 16,000 data professionals showed that the most common challenges to data science included dirty data (36%), lack of data science talent (30%) and lack of management support (27%). Also, data professionals reported experiencing around three challenges in…


Added by Kelly Quintana on July 30, 2018 at 12:15pm — No Comments

Practical Apache Spark in 10 minutes. Part 5 - Streaming

Spark is a powerful tool which can be applied to solve many interesting problems. Some of them have been discussed in our previous posts. Today we will consider another important application, namely streaming. Streaming data is the data which continuously comes as small records…


Added by Igor Bobriakov on July 30, 2018 at 3:53am — No Comments

Machine Learning with C++ - Classification with Shark-ML

Shark-ML is an open-source machine learning library which offers a wide range of machine learning algorithms together with nice documentation, tutorials and samples. In this post I will show how to use this library for solving classification problem, with two different algorithms SVM and Random Forest. This post will tell you about how to use API for:

1. Loading data

2. Performing normalization and dimension…


Added by Kyrylo Kolodiazhnyi on July 30, 2018 at 2:40am — No Comments

AutoEncoders with Non-Linear Parameters — KernelML

By Rohan Kotwani.


KernelML is brute force optimizer that can be used to train machine learning models. The package uses a combination of a machine learning and monte carlo simulations to optimize a parameter vector with a…


Added by Vincent Granville on July 29, 2018 at 5:30am — No Comments

Weekly Digest, July 30

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week.

Featured Resources and Technical Contributions


Added by Vincent Granville on July 29, 2018 at 3:30am — No Comments

R Code for Cox & Stuart Test for Trend Analysis

Below is an R code for Cox & Stuart Test for Trend Analysis. Simply, copy and paste the code into R workspace and use it. Unlike cox.stuart.test in R package named "randtests", this version of the test does not return a p-value greater than one. This phenomenon occurs when the test statistic, T is half of the number of untied pairs, N.

Here is a simple example that reveals the situtaion:

> x

[1] 1 4 6 7 9 7 1 6

> cox.stuart.test(x)

Cox Stuart…


Added by Okan OYMAK on July 29, 2018 at 3:00am — No Comments

Digital Marketing: Are you avoiding these common problems?

Target audience: Marketers, analysts, campaign managers, and decision makers.

Preface: I teach multiple tools under Adobe's experience cloud and I often get to have a look at the shape of digital marketing in multiple companies and across various business domains. This post is a summary of the most common problems and ways of resolving them at early stages before they become blunders.

1. The accuracy (and single…


Added by Abhishek Srivastava on July 28, 2018 at 7:30pm — No Comments

Monthly Archives










© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service