Mirko Krivanek's Blog (75)

100+ Interesting Data Sets for Data Science

Read full list if you find these examples interesting.…


Added by Mirko Krivanek on June 6, 2014 at 5:00pm — 1 Comment

List of NoSQL Databases

Interesting article posted on NoSQL-database.org, listing 150 databases. Here are some highlights.

Databases are categorized in the following categories:

  • Wide Column Store / Column Families
  • Document Store
  • Key Value / Tuple Store
  • Graph Databases 
  • Multimodel Databases
  • Object Databases
  • Grid & Cloud Database Solutions
  • XML…

Added by Mirko Krivanek on May 27, 2014 at 4:30pm — No Comments

Three interesting but little known programming languages

The descriptions below are from Wikipedia.

Julia Set



Added by Mirko Krivanek on May 27, 2014 at 4:00pm — 2 Comments

The 10 Algorithms That Dominate Our World

Interested article published by…


Added by Mirko Krivanek on May 25, 2014 at 8:00am — No Comments

77 People Who Truly Have Written Interesting Things About Data

This list is a bit old (I think 2011), but it features a bunch of very interesting people, true data scientists who can't afford wasting their time to post on Twitter - unlike other similar lists published by journalists.

Featured DSC…


Added by Mirko Krivanek on May 23, 2014 at 2:00pm — No Comments

50 big data companies to follow

Posted on Sandhill.…


Added by Mirko Krivanek on May 20, 2014 at 4:30pm — 2 Comments

Proposal for a new type of scoring system

In digital analytics, scoring Internet traffic is used to detect click fraud, and to find types of search keywords that convert well (to a sale). Quite often (for large ad networks) conversion data is poor or challenging: some clicks have a 0.2% conversion rate, some have a 30% - depending on the type of website, price, product, conversion type and other factors (even hour of the day has an impact).…


Added by Mirko Krivanek on May 18, 2014 at 6:30pm — No Comments

The Science News Cycle

Interesting cartoon, epitomizing innumeracy  (or simulated innumeracy). Necessary in today academia to survive and get grants.

Source: http://tapastic.com/episode/12010

Added by Mirko Krivanek on May 13, 2014 at 6:30pm — 1 Comment

How the gap between data science and statistics grew over time

Very interesting article published by the American Statistical Association. The picture below compares computer science with statistical science - before (I guess the early nineties) versus now. The column labeled CS3 (CS for Computer Science) represents modern computer science, actually this is data science. What's left in statistics is for the reader to guess, I suppose.…


Added by Mirko Krivanek on May 6, 2014 at 7:36am — 1 Comment

72 Infographics about big data

From BigData-Startups. The infographics below is just one of them.

Here's the list:

  1. How The USA Federal Government Thinks Big With Data
  2. Are You Ready For The Future of the Internet of Things?
  3. How Big Data Centers Impact the Environment
  4. A Look Into How Data Centers Actually Work
  5. How Big Data Gives Retailers a…

Added by Mirko Krivanek on May 2, 2014 at 9:00am — No Comments

45 surprising facts about big data

Posted by…


Added by Mirko Krivanek on May 2, 2014 at 7:00am — No Comments

How invisible data could provide the tobacco industry a second life

This indeed applies to all industries and all products. In short, how do you detect new uses of a declining product (cigars in this case) that could turn them into a good product and revive the industry in question? The tobacco industry has tried to sell outside US (especially Asia) to make up for declining sales in US. They came with e-cigarettes. But they never thought of a tobacco use in a context not associated with addiction.…


Added by Mirko Krivanek on April 29, 2014 at 9:30am — No Comments

Interesting new contests on Kaggle

This morning, I received the following in my mailbox, from Kaggle:…


Added by Mirko Krivanek on April 24, 2014 at 8:06am — No Comments

Data sets and other machine learning resources from UC Irvine

They maintain 284 data sets as a service to the machine learning community.…


Added by Mirko Krivanek on April 21, 2014 at 6:00pm — No Comments

Data Science for business hacking

You can call it business or data hacking, but the idea is to use analytic intelligence to reverse-engineer algorithms, transform, manipulate and modify data in external databases, without even accessing the databases in questions, for your business advantage.

A few examples:

  • Query tag hijacking. You find an…

Added by Mirko Krivanek on April 5, 2014 at 1:00pm — 1 Comment

Nate Silver's famous run of successful predictions came to an halt

This is a classic. A guy who correctly predicted election results in all 50 states, and many other correct predictions, now fails.

Nate Silver

First, Nate is well known not because of his previous correct predictions, but because he got hired by the Times magazine where he contributed as a…


Added by Mirko Krivanek on March 29, 2014 at 6:30am — 1 Comment

Is Data Scientist the right career path for you?

According to Paco Nathan, a data scientist should:

  • prepare an analysis and visualization of an unknown data set, while impatient stakeholders watch over your shoulder and ask pointed questions; be prepared to make quantitative arguments about the confidence of the results
  • describe “loss function” and “regularization term” each in 25 words…

Added by Mirko Krivanek on March 28, 2014 at 5:00pm — 11 Comments

Big Data Without Statisticians: BD2K Symposium At UCLA

Posted by Robert Weiss (biostatistician) on his UCLA webpage.

UCLA is having a big data conference on Thursday and Friday Mar 27, 28 2014.  The conference is organized by four computer science and genomic biology types. Speakers cluster [one of the rare appropriate uses of cluster analysis I know of] into three types of folks. Big biologists [they must be big, they're doing big data] doing big data, genomic stuff; computer scientists doing topic models, and a few math…


Added by Mirko Krivanek on March 27, 2014 at 8:20pm — No Comments

Cute but flawed API: What your name says about your politics

Published in the Wall Street Journal, designed by Clarity Campaigns, but not by someone statistically savvy.…


Added by Mirko Krivanek on March 26, 2014 at 7:30pm — No Comments

Interesting chart

Published in The Economist. It shows the difference in cost-of-living between 2003 and 2013. However, I see two issues:

  • Making index = 100 for New York both in 2003 and 2013 is wrong. The reader will think New York prices stayed flat over 10 years, and it makes all comparisons 2003-2010 for other cites meaningless, as index might not have evolved the same way outside New York.
  • The choice of cities listed below is questionable. Why is Mexico City not…

Added by Mirko Krivanek on March 26, 2014 at 7:00pm — No Comments

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service