Subscribe to Dr. Granville's Weekly Digest

All Blog Posts (1,024)

Weekly Digest - July 28

The full version is always published Monday. Starred articles are new additions or updated content, posted between Thursday and Sunday.

The first article below, posted this Monday, is extremely popular: chances are, you've already read it. I invite you to check it out again, as great comments and more regressions have been added. Also, read explanations (in the comments…


Added by Vincent Granville on July 23, 2014 at 2:30pm — No Comments

The Target Security Breach: What Happened and What It Can Teach Us About Cyber Security

On December 13th, 2013, a blog devoted to IT security news broke a startling story — Target, one of the country’s largest big-box retailers, had been the victim of a security breach that exposed the credit card data of thousands of shoppers.

The attackers targeted the data stored in the magnetic strips of customers’ cards. The website reported…


Added by Beau Winchester on July 23, 2014 at 11:24am — No Comments

10 Steps for Big data projects

I tried to put 10 step process for big data projects. Pls correct or suggest any addition/change. Your inputs are highly appreciated.

1. You really need to have a problem(s) for which you are not able to find the solution directly with your existing metrics and reports. 

2. You can have any size of data, however if it is small then you don't need to build any complex model around it. Its good if the size is good enough. Never use just a sample of…


Added by Ishaq Mohammed on July 23, 2014 at 2:09am — 1 Comment

Beyond the 4 Ms of Manufacturing

The industrial revolution of the 1800s established the building blocks of Manufacturing as we know it today. Man, Machine, Material and Method were connected together to form an intricate system on which manufacturing processes and its operational dynamics were based. The resulting complexity of such a system however, has resulted in ineptitudes which have become difficult to circumvent.  The significance of each element of the 4M’s and the consequent manner in which they mesh together…


Added by Sumit Prasad on July 23, 2014 at 1:08am — No Comments

Banned on Google: How to fight EU Censorship?

The new European laws about "the right to be forgotten", however absurd they might be, is a new government threat for data engines.

First these laws are absurd because

  • Very difficult to enforce: how do you (or Google) prove that a guy requesting content removal is real, as opposed to a fake, and that…

Added by Mirko Krivanek on July 22, 2014 at 7:30pm — No Comments

5 Industries That Need Big Data

“Big data” isn’t just a trendy buzzword and it’s not some revolutionary concept. It’s exactly what it sounds like: Large amounts of data that may be beneficial to a company’s marketing endeavors by helping them understand their demographics better. Big data can come from numerous sources, both internally and externally, and can include things like customer surveys, massive surveys such as the Census, and even an email list or analytics report from a social media…


Added by Larry Alton on July 22, 2014 at 3:55am — 1 Comment

10 types of regressions. Which one to use?

Should you use linear or logistic regression? In what contexts? There are hundreds of types of regressions. Here is an overview for data scientists and other analytic practitioners, to help you decide on what regression to use depending on your context. Many of the referenced articles are much better written (fully edited) in my data science Wiley book.…


Added by Vincent Granville on July 21, 2014 at 7:30pm — 7 Comments

Social Media for App Capture and Question Answering in Open Vocabulary Executable English over Web Databases.

Imagine government and other web sites answering an open ended collection of English questions, and also explaining the answers in English.  Imagine people socially networking, Wikipedia-style, to continually expand the range of questions that can be answered.

The approach starts from the observation that data by itself is necessary, but not enough, for many practical uses of an intranet or the Web.

What's also needed is knowledge about how to use the data to answer an ever…


Added by Adrian Walker on July 21, 2014 at 3:03pm — No Comments

Common Pitfalls that Can Doom a BYOD Program

From what you hear from all the latest technology news, Bring Your Own Device (BYOD) is the end-all, be-all of technology policies, guaranteed to make businesses run smoother, employees feel happier, and cash pile up like never before. BYOD can certainly be helpful, but it’s far from the solution to all of a business’s problems, mainly because there are many pitfalls companies can easily fall into. These mistakes are easy for anyone to make, which is why knowing them before implementing a…


Added by Rick Delgado on July 21, 2014 at 1:35pm — No Comments

Can A 50-Person Startup Threaten Oracle, IBM, And Microsoft?

Originally posted on Forbes.

Companies spend $35 billion on so-called SQL databases to store and retrieve their data. Goliaths Oracle, IBM ans Microsoft all sell such databases. But a 50-employee, Cambridge, Mass.-based startup claims to be winning business from their enterprise customers. Should investors in those Goliaths be worried?…


Added by Mirko Krivanek on July 21, 2014 at 9:30am — No Comments

How big data analytics can help to avoid fx rate fixing & Benchmark rate rigging?

Foreign exchange market is world's largest and least regulated financial market. Its estimated daily turnover is $5.35 trillion, according to the Bank for International Settlements’ triennial survey of 2013. Speculative trading dominates commercial transactions in the forex market, as the constant fluctuation of currency rates makes it an ideal venue for institutional players with deep pockets – such as large banks and hedge funds – to generate profits through speculative currency…


Added by ganeshbabu on July 20, 2014 at 8:03pm — No Comments

How many of us think Big Data is Big BS?

Digital world is continuously churning vast amount of data which is getting ever vaster ever more rapidly. Some analysts are saying that we are producing more than 200 exabytes of data each…


Added by Ali Syed on July 20, 2014 at 4:07pm — No Comments

New batch of machine learning resources and articles from niche bloggers

Starred articles were candidates for the picture of the week to be featured on July 24.



Added by Amy on July 20, 2014 at 4:00pm — No Comments

An Ethics Framework for Big Data

What do Target and the NSA have in common?

Court room

They both used Big Data and analytics in a way that inspired a major slap on the hand. Target was predicting which of their customers were pregnant and sending them targeted coupons for baby products, which prompted the ire of a father whose teenage daughter hadn’t told him yet. The NSA incurred the wrath of world leaders, the US Congress, and even a good portion of the American public when the…


Added by Randal Scott King on July 20, 2014 at 3:53pm — No Comments

Participation in Data and the Effectiveness of Intervention

The role of statistics in data science is often debated. Despite rapid developments in technology giving access to algorithmically sophisticated approaches, I feel that statistics can still provide many worthwhile insights. If I have a database of sales figures spanning many years, I feel that I can become more aware of historic trends and seasonal patterns through the use of statistics. Statistics offers a sense of state, direction, pace, and progress. Statistics can also enable estimation…


Added by Don Philip Faithful on July 19, 2014 at 7:57am — No Comments

Great list of resources - NoSQL, Big Data, Machine Learning and more | GitHub

Here I only posted the two categories of biggest interest to DSC readers, but it covers plenty of other topics, including:

  • Distributed Programming
  • Graph Data Model
  • NewSQL Databases
  • Time-Series Databases
  • SQL-like processing
  • Data Ingestion
  • R-Studio - IDE for R.
  • Service Programming
  • Scheduling
  • Benchmarking
  • Security
  • Search engine and framework
  • Memcached forks and…

Added by Amy on July 18, 2014 at 9:30am — 1 Comment

Interesting Infographics on Big Data, by NJ Institute of Technology


Effective use of Big Data can produce the following results:

- 8% savings on national health care costs…


Added by Amy on July 18, 2014 at 8:30am — No Comments

This Just In: How Big Data is Changing Journalism

When people talk about journalists or reporters, the image of a man or woman eagerly talking to people with a notepad and pen in hand likely comes to mind. It’s a look that has permeated much of pop culture, but it should probably come as little surprise that the modern day journalist is much different. While they still may interview people on the street or sit down to talk with them, today’s journalist is turning to numbers, spreadsheets, and computer programs more often…


Added by Rick Delgado on July 17, 2014 at 5:41am — No Comments

The fastest growing data science / big data profiles on Twitter

This article is co-authored with Dr. Livan Alonso Sarduy. The two charts below are based on a number of Twitter followers and their growth rate, after filtering out irrelevant or fake followers. Our next step is to identify top data science Twitter accounts, that are themselves followed by many other top data science Twitter accounts: in our opinion,…


Added by Vincent Granville on July 16, 2014 at 6:30pm — 1 Comment

Weekly Digest - July 21

The full version is always published Monday. Starred articles are new additions or updated content, posted between Thursday and Sunday.

Featured Articles


Added by Vincent Granville on July 16, 2014 at 2:30pm — No Comments

Blog Topics by Tags

Monthly Archives






Follow Us


  • Add Videos
  • View All

© 2014   Data Science Central

Badges  |  Report an Issue  |  Terms of Service