Subscribe to Dr. Granville's Weekly Digest

All Blog Posts (1,001)

Data sets and other machine learning resources from UC Irvine

They maintain 284 data sets as a service to the machine learning community.…


Added by Mirko Krivanek on April 21, 2014 at 6:00pm — No Comments

Nine new, great articles and resources posted externally

Here we go. Enjoy the reading!

MapReduce NextGen Architecture

Illustration of YARN (from first article below)

Articles from external publishers and bloggers:


Added by Vincent Granville on April 21, 2014 at 12:08pm — No Comments

Is survival analysis the right model for you?


Analytics industry is heavily biased towards statistical techniques and data handling software. But this does not mean, if you are coming from a non statistical background, you cannot be an analytics champion. What differentiates a champion analyst from others is not statistical knowledge but the ability to apply the right statistical tool in the right business problem. Survival…


Added by Tavish Srivastava on April 21, 2014 at 9:04am — No Comments

Manage performance of enterprise applications

Managing performance of enterprise applications and achieving high levels of Performance with minimum resources is topic of discussion in today’s large enterprises. Resolving performance issues is essential for database administrators (DBAs) when it happens however it is best to react to the problems proactively. Proactive management requires very high level of attention and to help make sense of the overwhelming data provided by the database engine.

In database management being…


Added by Muhammad Saeed on April 21, 2014 at 3:43am — No Comments

Alternative method to perform complex SQL join query

Based on the generic data type, esProc provides the sequence and the TSeq for implementing the complete set-lizing and the much more convenient relational queries.

The relation between the department and the employee is one-to-many and that between the employee and the SSN (Social Security…


Added by Jim King on April 20, 2014 at 5:12pm — No Comments

Data Embodiment – an Ecosystemic Approach

Embodiment is comparable to the idea of an “ecosystemic” or “holistic” approach. In an ecosystem, each thing affects everything else. In light of the interrelationship, a person would not attempt to correct a problem by considering only a single piece of the puzzle. Instead, there is a need to bring together many aspects of the body. To understand embodiment, it is necessary to recognize how “the body” separates an organism from its environment; in a manner of speaking, the body represents…


Added by Don Philip Faithful on April 19, 2014 at 7:30am — No Comments

Business Intelligence and Data Warehousing

In the modern era, business environment is changing rapidly. They are seeking for valuable business information as being essential assets which will not only lead organisation towards the path of success but also help to sustain business in a competitive environment. Business Intelligence (BI) is a model which relates managerial values, and a tool which is used in an organisation to handle and filter information in order to make healthy business decisions. It refers to the appropriate…


Added by Avesh Dhakal on April 18, 2014 at 9:30pm — No Comments

The data science alphabet

Feel free to add your keywords. Here's a start:

The alphabet:

  • Algorithm (also: API, accountability)
  • Big…

Added by Vincent Granville on April 17, 2014 at 6:30pm — 1 Comment

Employee Churn 201: Calculating Employee Value

Guest blog pst by Pasha Roberts, Chief Scientist, Talent Analytics @pasharoberts

Much has been written about customer churn - predicting who, when, and why customers will stop buying, and how (or…


Added by Vincent Granville on April 17, 2014 at 4:00pm — No Comments

Weekly digest - April 21

Sponsored Announcements

  • Predictive Analytics World, June 16-19 in Chicago is the business event for predictive analytics professionals, managers and commercial practitioners, covering today's commercial deployment of predictive analytics, across industries and across software vendors.…

Added by Vincent Granville on April 17, 2014 at 3:30pm — No Comments

esProc Helps Database realize Real-time Big Data computing

The Big Data Real-time Application is a scenario to return the computation and analysis results in real time even if there are huge amount of data. This is an emerging demand on database applications in recent years.


In the past, because there are not so many data, the computation is simple, and few parallelisms, the pressure on the database is not great. A high-end or middle-range database server or cluster can allocate…


Added by Jim King on April 16, 2014 at 9:30pm — No Comments

Can you identify this decoy data scientist?

Using data science, could you identify this profile? Explain the methodology that you used, and win $200. The first participant providing the correct answer will win the award. The solution will be posted here, once we have a winner. This profile was created as a test to check whether data science algorithms can successfully solve this type of problems.…


Added by Vincent Granville on April 16, 2014 at 9:00pm — 2 Comments

How to identify the right data scientist for your company

Should you hire someone who knows all the most recent flavors of logistic regression? Or an Hadoop developer?

In my opinion, this is the wrong strategy. These employees are very expensive (at least $120k per year), and they might not bring the ROI that you expect. At least, if going in that direction, hire someone favoring simple, scalable, robust, automated solutions over anything else. To automate, you need someone great at developing…


Added by Vincent Granville on April 16, 2014 at 4:30pm — 1 Comment

Read Chapter 1 of the Big Data, Mining, and Analytics: Components of Strategic Decision Making* book

Do you want to better understand Big Data and what it really means to businesses?  It’s not just huge volumes and high velocity…there’s another important factor that provides an essential element to decision support and that’s variety of data. Regardless the data source, value creation lies the application of the right analytic approach to a given strategic endeavor.

Read the Big Data, Mining, and Analytics: Components of Strategic Decisions* book in…


Added by Alesia on April 16, 2014 at 5:33am — 1 Comment

Revolution R Enterprise on AWS Marketplace

Revolution R Enterprise on AWS Marketplace

Big Data R, In the Cloud

Now you can get all of the power, performance and productivity of Revolution R Enterprise on Amazon Web Services. Revolution R Enterprise 7 on AWS Marketplace includes:

  • High-performance …

Added by Gregory Todd on April 15, 2014 at 10:56am — No Comments

esProc assists Hadoop to replace IOE

What is IOE? I=IBM, O=Oracle, and E=EMC. They represent the typical high-end database and data warehouse architecture. The high-end servers include HP, IBM, and Fujitsu, the high-end database software includes Teradata, Oracle, Greenplum; the high-end storages include EMC, Violin, and Fusion-io.

In the past, such typical high performance database architecture is the preference of large and middle sized organizations. They can run stably with superior performance, and became…


Added by Jim King on April 14, 2014 at 5:42pm — No Comments

Tricky Base SAS interview questions : Part-II

Working in analytics industry, SAS has become an inevitable part of our lives. This article is the second part of the series we have published on SAS interview questions. These article will help you optimize your SAS routines/algorithms, make your codes efficient and follow best pratices for coding on SAS. We will also like to hear your solution on the problem statements in the article. Following are the description of the two parts of this series :

1. Part I :…


Added by Tavish Srivastava on April 13, 2014 at 2:45am — No Comments

Top 10 List – The V's of Big Data

Recently I wrote about the "Top 10 Big Data Challenges – A Serious Look at 10 Big Data V’s", which summarizes some of the big issues associated with the deployment of big data projects. The use of the letter V may seem forced and contrived, but it is used primarily as a mnemonic device to label and recall these critical challenges, in much the same way the…


Added by Kirk Borne on April 12, 2014 at 8:30am — 9 Comments

4000 copies of data science book pre-ordered

I've heard from Wiley that our data science book had already 4,133 pre-ordered copies, which is (according to Wiley) a great start. It was published last Monday.

I invite you to check the final table of content or check out the book on…


Added by Vincent Granville on April 12, 2014 at 8:00am — 5 Comments

MongoDB performance optimization strategies

MongoDB performance tuning and scalability


This post is a live blog to the Enteros, Inc. presentation at NoSQL Now 2013 conference.

Start with tuning the Operating System, follow ulimits and follow production notes per mongo manual. Try iostat, vmstat, mpstat, sar, free -tm for Linux. You can also try open…


Added by Muhammad Saeed on April 11, 2014 at 8:35am — No Comments

Blog Topics by Tags

Monthly Archives






Follow Us

© 2014   Data Science Central

Badges  |  Report an Issue  |  Terms of Service