Added by Vincent Granville on July 30, 2013 at 4:00pm — No Comments
Are you a hacker with a huge heart? Code for America is looking for developers, data scientists, designers, researchers, and product managers for its 2014 Fellowship. We connect talented technologists with municipal governments to explore new ways of resolving local challenges and create new web apps.…Continue
Added by Vincent Granville on July 30, 2013 at 2:03pm — No Comments
Data management in the Hadoop ecosystem is still in the early stages of development. The goal…Continue
Added by Michael Walker on July 30, 2013 at 12:13pm — No Comments
A while back I was running a data mining project for a customer and made a conversational blunder. In one of the meetings, I mentioned seeing one interesting relationship in the data. Customers who purchased one particular product tended to buy and implement a second product at a later time. I did not realize that Everyone in the room INTUITIVELY knew that there is absolutely no relationship between the two products. A big blunder. After the meeting, two friends told me that my standing in…Continue
There have been various attempts to integrate the D3.js visualization framework into iPython Notebook, in order to provide more visualization options than available with the standard Matplotlib. In my blog post today, I take one of the better integration attempts out there, port it from Windows to the Mac, and demonstrate:
2. Generating geo color maps in D3.js (not a built-in…Continue
Added by Michael Malak on July 29, 2013 at 4:23am — No Comments
The context here is about increasing conversion rate, from website visitor to active, converting user. Or from passive newsletter subscriber to a lead (a user who opens the newsletter, clicks on the links, and converts). Here we will discuss the newletter conversion problem, although it applies to many different settings.…Continue
What seemed to be an untractable problem involving trillions of quadrillions of computations - far more than required to process all the data produced or collected on Earth since the beginning of times - has been reduced to something computationally feasible and even possibly quite simple. One applicant…Continue
I was offered a surface for father's day this year. I had an old iPad that I've used for several years, and I was curious to know if you can use the Surface just like a Windows laptop. While it has great features, faster Internet, and much more, the answer is clearly no.…
This is another example where, if you lack analytic skills, you will jump to the wrong conclusions. This news article was published in MyNorthWest. It's about the new law that went into effect a year ago in WA, allowing grocery stores to sell hard liquor. Here we provide 16 reasons that…Continue
Added by Vincent Granville on July 17, 2013 at 3:30pm — No Comments
Big data and data science is not just for good guys. If properly leveraged, it also provides competitive advantages for criminals, over their competitors, or to avoid detection.…Continue
By Nicholas Hartman, Director
Recent revelations regarding the National Security Agency's (NSA) extensive data interception and monitoring practices (aka PRISM) have brought a branch of "Big Data's" research into the broader public light. The basic premise of such work is that computer algorithms can study…Continue
Added by Nicholas Hartman on July 15, 2013 at 8:51am — No Comments
Added by Vincent Granville on July 11, 2013 at 6:30pm — No Comments
Debugging Hadoop jobs can be a huge pain. The cycle time is slow, and error messages are often uninformative --- especially if you're using Hadoop streaming, or working on EMR.
I once found myself trying to debug a job that took a full six hours to fail. It took more than a week -- a whole week! -- to find and fix the problem. Of course, I was doing other things at the same time, but the need to constantly check up on the status of the job was a huge drain on my energy and…
Machine Learning: A Probabilistic Perspective, by Kevin Murphy.
Boosting: Foundations and Algorithms, by Robert E. Schapire.
Models Behaving Badly: Why Confusing Illusion with Reality Can Lead to Disaster, by Emanuel Derman.
Doing Data Science, by Cathy O'Neil and Rachel…
Recently, Bernard Wehbe at StatSlice Systems wrote an intriguing and thought-provoking whitepaper on information singularity and the principles of the analytics rock star.
Successful analytics professionals should follow a set of guiding principles which are very important and often missed by traditional…
Added by Jared Decker on July 9, 2013 at 8:23am — No Comments
I am not an expert in database design, since most of my career I have worked with alternate data storage / data access solutions. But one of the very first projects I had to do back in 1985 when I was a student was to write the code for a fully functional database architecture, in Pascal, from scratch. You will probably find some of my questions naive, and some intriguing.…Continue
Big data has the potential to alter the calculus by which data management groups buy, manage, and structure information storage.
Article originally published on TDWI, by Stephen…Continue
Added by Vincent Granville on July 8, 2013 at 1:00pm — No Comments