.

# The "Island of Games" Data Puzzle

This data puzzle was originally posted on my blog, The Well-Tempered Spreadsheet.  In a Data Science Central article, Kirk Borne cited the puzzle as an example of a data relationship that eludes many data discovery tools.

The Island of Games

Life is fun on the Island of Games. The thousand inhabitants enjoy competing at chess, checkers, and contests to solve the Rubik’s Cube puzzle as fast as possible. The islanders are rated for their skill at each of the three games. The ratings fall between 0 and 1.

The ratings for each category seem to follow a uniform distribution. For example, here is a histogram for the Chess ratings:

There isn't much correlation between the skills for the three games:

Correlations between Skills

 Chess Checkers Rubik's Cube Chess 1.0000 Checkers 0.0530 1.0000 Rubik's Cube 0.0452 -0.0049 1.0000

The next three charts confirm this lack of correlation. The first chart compares the ratings for Chess and Checkers, and includes a linear regression:

Here is Chess vs Rubik’s Cube:

And Checkers vs Rubik’s Cube:

The raw data and some statistical analysis can be found here. All of this looks like pure noise. But there is a hidden structure. Can you find it?

Views: 3658

Comment

Join Data Science Central

Comment by Christoph Fretter on August 1, 2016 at 4:13am

I figured that if I can't see two-point correlations there may still be three-point correlations.

The first thing I tried was plotting checkers+cube vs. chess, which turns out to be a good starting point.

Comment by Winthrop Smith on September 12, 2014 at 6:57am

Thank you again.

You made an excellent comment about the art of data analysis.  It highlights the importance of thorough visual exploration combined with deep subject matter understanding.

I wonder how many important relationships remain undetected because they are hard to visualize.

Comment by Gustavo Pereira on September 12, 2014 at 3:01am

I would probably have run some kPCA analysis to denoise the data.

Anyhow, I doubt that with higher dimensions it'd be easy to spot this kind of relationship. One of the main problems I see is that distances between points will be very similar, so discretization might become helpful as well.

That's the "art" part of data science! You cannot go blindly in with your favourite algorithm. You first need to try and understand where the data comes from and ways to manipulate/visualize it that make sense, and that still no computer can do automatically.

Comment by Winthrop Smith on September 11, 2014 at 2:43pm

Thanks Gustavo!  Very interesting.

Let me ask you another question, if you don't mind.  Let's say that it is possible to visualize up to n dimensions (using color, market size, animated time, etc.).  I could construct an n+1 dimensional checkerboard pattern that would be impossible to visualize.

When you solved my puzzle, it seems you used a combination of visualization and LSSVM.  If visualization were not helpful for a higher dimensional puzzle, do you think you still would have solved it?  Thanks!

Comment by Gustavo Pereira on September 11, 2014 at 9:53am

I looked at the data in 3D, it looked "blocky". I then run an LSSVM regression of Rubik ~ Chess,Checkers. The result was the following pictureI then transformed my Rubik data into a logical (Rubik>0.5).

I also transformed the Chess and Checkers into belonging to one or the other checkerboard squares.

The confusion matrix was a perfect match.

Another way I thought about it was to make a translation of all the data by -0.5. In that case you would have a perfect match of the sign of (Chess*Checkers) and the sign of Rubik.

Comment by Winthrop Smith on September 11, 2014 at 8:41am

Gustavo,

You are correct!

Most of the people who solved the puzzle did so by inspecting the data.  What did you do?

Comment by Gustavo Pereira on September 11, 2014 at 8:26am

It's a checkerboard!

If 0<Chess<.5 and 0<Checkers<0.5 then Rubik>.5

if .5<Chess<1 and .5<Checkers<1 then Rubik>.5

Otherwise Rubik<.5