Subscribe to DSC Newsletter

All Blog Posts (7,236)

Do you know what is bigger than Big Data?

From episode 10 of my Naked Analyst Channel on YouTube.

I think I do - and it is the ‘appification’ of analytics. What I mean by this is the reduction of a complex analytic activity such as market segmentation, down to a single button on your computer interface. Very much like the…


Added by Steve Bennett on October 6, 2014 at 2:07pm — 4 Comments

To build a MEL, you need a Private Detective

Over the past six months we have been working with the reliability and maintenance organization within a large oil and gas client to build out their Master Equipment List (MEL). Like many asset-intensive organizations, they have implemented an Enterprise Asset Management (EAM) tool to give them visibility and control over their capital equipment to optimize their maintenance strategies, reduce operating costs, and better manage their workforce and spare parts inventories. The challenge is…


Added by Sullexis LLC on October 6, 2014 at 8:00am — No Comments

Ebola Death Cases - Admitted , Recovered Cases

Having found a dataset on Ebola cases, thought of checking it out quickly what the statistics really look like.

The dataset contains 3 countries and within each there are multiple regions.

So just using the high level information at the country level this is what we can see in a simple line chart.

In the below Chart,

The blue line > Total Death cases

The green line > Total Cases

The Orange line > Currently admitted

And the Red…


Added by Nilesh Jethwa on October 6, 2014 at 6:57am — No Comments

Top 30 DSC blogs, based on new scoring technology

Most of you will read this article to discover the most popular blogs, but the real purpose here is to show what goes wrong with many data science projects as simple as this one, and how it can easily be fixed. In the process, we created a new popularity score, much more robust than any ranking used in similar articles (top bloggers, popular books, best websites etc.) This scoring, based on a decay function, could be incorporated in recommendation engines.…


Added by Vincent Granville on October 4, 2014 at 9:00am — No Comments

Is data science bad at detecting bogus Amazon or Yelp reviews?

There are lies, damn lies, and Amazon reviews. Why are so many Amazon or Yelp reviews bogus? Do they have bad data scientists who can't detect fraudulent reviews? No, they have unethical CEOs ready to do anything to make money short-term. And complaining about being unable to find real data scientists to solve their problems. This is a challenge for ethical data scientists who want to create value, but get punished by top management for not condoning their misdeeds.…


Added by Mirko Krivanek on October 2, 2014 at 8:00pm — 3 Comments

Optimizing Disease Management Programs Using Predictive Modeling

Summary:  Here’s an easy to understand example of how predictive analytics can reduce cost while increasing efficacy of disease management programs.

Healthcare providers have made major breakthroughs over the last two decades by creating and implementing increasingly sophisticated disease management programs (DMPs).  At their core there are always two motives, improve the human condition by…


Added by William Vorhies on October 2, 2014 at 11:05am — No Comments

Weekly Digest, October 6

The full version is always published Monday. Starred articles or sections are new additions or updated content, posted between Thursday and Sunday. 



Added by Vincent Granville on October 1, 2014 at 3:00pm — No Comments

The end of the Data Scientist Bubble

This was the subject of a provocative article posted on Oracle's blog, two days ago. It certainly shows how far from the reality some big companies are. They confuse people who call themselves data scientists (or get assigned that job title), with those who are true data scientists, and might use a different job title. Many times, the issue is internal politics that create the…


Added by Mirko Krivanek on October 1, 2014 at 8:00am — 9 Comments

The 22 Skills of a Data Scientist

There has been a number of interesting articles recently, discussing the skills a data scientist should or might have. The one entitled The 22 Skills of a Data Scientist is a popular one (see 22 skills listed below, or click on the link to read the full article). Earlier this morning, I read another one on LinkedIn: …


Added by Vincent Granville on September 29, 2014 at 1:00pm — 7 Comments

Elements of machine learning

The official title of this free book available in PDF format is Machine Learning Cheat Sheet. But it's more about elements of machine learning, with a strong emphasis on classic statistical modeling, and rather theoretical - maybe something like a rather comprehensive, theoretical foundations (or handbook) of statistical science. Anyway, very interesting, and it's free. See table of content screenshot below. …


Added by Marcel Remon on September 29, 2014 at 9:30am — 1 Comment

More about Shifting Culture, Less about Investing in Potential

Data Science is often brought to companies as a potential game changer. An investment that may pay off if the company's data can be leveraged to provide insight and gain a competitive edge. But bringing analytical offerings to organizations as a "maybe solution" to their pain points misses the mark. Data science is today's answer to our most pressing enterprise and socially innovative challenges given the data-driven nature of our markets and society as a whole. If an investment in data…


Added by Sean McClure on September 29, 2014 at 9:03am — 1 Comment

New Beginnings in Facial Recognition

As humans, we navigate our lives largely by the recognition of patterns. These patterns include the sound of a mother’s voice, the appearance of a dangerous animal or poisonous food, the familiarity of kin, and the attraction to potential mates. Accurate pattern recognition is key to an animal’s survival and progress, and has allowed humans to become the socially complex and advanced species we are today. 

It should come as no surprise that…


Added by Sean McClure on September 29, 2014 at 9:01am — No Comments

Keeping Corporate Data Safe: 5 Ways Lax BYOD Policies Create Security Risks

The proliferation of smartphones, tablets, and other mobile devices — here come the “wearables” — has opened up new opportunities for businesses to leverage employee-owned technology for competitive advantage. That being said, the use of such devices in the workplace can compromise sensitive data, especially when comprehensive BYOD policies are not implemented and…


Added by Beau Winchester on September 28, 2014 at 11:00am — No Comments

Apache Spark: distributed data processing faster than Hadoop

This blog is extrapolated from DataScience Hacks by the author himself. 

Apache Spark, another apache licensed top-level project that could perform large scale data processing way faster than Hadoop (I am referring to MR1.0 here). It is possible due to Resilient Distributed Datasets concept that is behind this fast data processing. RDD is basically a collection of objects,…


Added by Pavan Kumar on September 28, 2014 at 7:00am — 1 Comment

Data Instrumentalism

Being the son of a mechanic, I have spent many years handling power tools. I'm especially fond of a couple of hammer-drills in my possession. They can effortlessly drill holes through concrete. At least, this is what my father once claimed. He handed down his most treasured tools to me. I'm big on pliers and screwdrivers. This might be due to my vocational training as a technician. Even today - long after I completed my diploma and continued to further my education - I still carry a licence…


Added by Don Philip Faithful on September 27, 2014 at 7:39am — No Comments

Hadoop is Dead. DataFlow is Alive!

We've given Hadoop almost 10 years to mature, invested billions, and very few companies are  seeing the return on investment.  Several companies have tried to make Hadoop a real-time analytical platform, incorporating SQL-like facades on top, but the latency is still not where it needs to be for interactive applications.  Even Google, a true big data user, has moved on and is using more dataflow / flow-based programming approaches.    Why?  It just makes sense...

  • Why should I…

Added by Lars Fiedler on September 27, 2014 at 7:30am — No Comments

Decipher Neo4J Cypher Query Language (CQL)

This blog post is a follow up post to Embrace Relationships with Neo4J, R & Java

Neo4j Cypher is a declarative graph query language that allows for expressive and efficient querying and updating of the graph store. Cypher is a relatively simple but still very powerful language. Very complicated database queries can easily be expressed through Cypher. This allows…


Added by Raghavan Madabusi on September 26, 2014 at 2:17am — No Comments

How Tracking Analytics Can Improve Content Marketing

Inbound and content marketing are not going anywhere anytime soon. The content marketing association reports that over 90% of both enterprise B2B and B2C companies are using the tactic. There are a million different ways to leverage content strategy, and here at TechnologyAdvice, we’ve experimented with plenty of them. It’s been a fun, albeit, educational experience to say the…


Added by Keith Cawley on September 25, 2014 at 4:59am — No Comments

Weekly Digest - September 29

The full version is always published Monday. Starred articles or sections are new additions or updated content, posted between Thursday and Sunday. 



Added by Vincent Granville on September 24, 2014 at 5:30pm — No Comments

Caveat Data Scientist: Public Trust Low for Science

A new paper entitled "Gaining Trust as well as Respect in Communicating Science Topicspublished in the Proceedings of the…


Added by Michael Walker on September 24, 2014 at 5:07pm — No Comments

Blog Topics by Tags

Monthly Archives













  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service