Here is selection containing both external and internal papers, focusing on various technical aspects of data science and big data. Feel free to add your favorites.

*Complex Open Text Analysis: Source: …*

Added by Mirko Krivanek on August 9, 2015 at 8:30am — 1 Comment

Interesting article posted here. I've listed some of the most popular below. To find out about those not listed here (Redis, REVENDB, Riak, Perst, Voldemort, Terrastore, NeoDatis, MyOODB, OrientDB, InfoGrid, DB4objects), read the original article.…

ContinueAdded by Mirko Krivanek on August 6, 2015 at 2:30pm — No Comments

This is an interesting listing created by Bernard Marr. I would add the following great sources:

- DataScienceCentral selection of big data sets - check out the first itemized bullet list after clicking on …

Added by Mirko Krivanek on August 4, 2015 at 2:30pm — 1 Comment

*Guest blog post.*

After reading many blog posts, articles and books, I have collected ingredients of data science! Moreover, I've classified them with a purpose of easily making a cook named as data science with below lists for whom wants to construct own career road map! Maybe, you wonder "why I am not giving the recipes of it" is because I do not have any real life experience. …

ContinueAdded by Mirko Krivanek on August 2, 2015 at 8:30am — No Comments

These presentations have been viewed between more than 25,000 times on average, though old articles have obviously more pageviews than new ones (assuming the popularity is identical), and some articles get more than 50% of their traffic more than 3 months after being published. Indeed, it's a very interesting statistical problem to adjust for this natural time bias.…

ContinueAdded by Mirko Krivanek on August 2, 2015 at 8:30am — 2 Comments

What are your thoughts on this? What would be your answers?

Here's my list of questions:

- What best practices do you recommend, when starting and working on enterprise analytics projects?
- How do you see data science and exploitation of big data evolve, over the next 5-10 years?
- What are the bottlenecks and other issues that…

Added by Mirko Krivanek on August 2, 2015 at 7:30am — No Comments

This reference was first posted here on Galvanize by Dynelle Abeyta, and several authors contributed.

Here we provide a…

ContinueAdded by Mirko Krivanek on July 19, 2015 at 7:30am — No Comments

A lot of interesting images can be found on Google. You can search for machine learning cartoons, fake data scientists, Excel maps or any keyword, and get a bunch of interesting images or charts, though the images barely change over time (Google algorithms are very conservative).

Anyway, here's some really interesting stuff. It definitely proves how popular infographics are, and the growth of big data. Many of these infographics are of high quality, well thought out and based on real…

ContinueAdded by Mirko Krivanek on July 19, 2015 at 7:30am — No Comments

**For Python**:

- Seaborn - A visualization library based upon matplotlib. Although not interactive, the visualizations can be very nice.
- Bokeh - Bokeh provides a bit more interaction than Seaborn, but it is still not fully interactive.

**For R**: …

Added by Mirko Krivanek on June 21, 2015 at 2:30pm — 1 Comment

Originally posted in OCR. I've never heard about many of these programs (see top 10 below), and I have questions regarding the methodology used for rankings:

**Affordability (1/3):**the overall cost of the program and/or credit hour.**Flexibility (1/3):**the number of credit hours and/or the length of time it takes to obtain the degree**Requirements…**

Added by Mirko Krivanek on June 21, 2015 at 2:30pm — 3 Comments

Usually I tend to criticize this type of articles, but in this case I agree pretty much agree with BurtchWorks, the author of this article, even though the article is more than 6 months old. Note that BurtchWorks is a recruiting firm that recently posted interesting salary surveys for data…

ContinueAdded by Mirko Krivanek on June 21, 2015 at 2:30pm — No Comments

These companies gather and process gigantic amounts of data to serve their clients and/or users. They make money out of selling summarized, processed, real-time data. They are poised to succeed in the IoT (Internet of Things) revolution, leveraging all sort of devices and API's to gather data, and

- send alerts to users via text messages or other technology
- sell intelligence extracted from data, to other businesses

It is worth spending some time figuring out…

ContinueAdded by Mirko Krivanek on May 22, 2015 at 8:00pm — No Comments

This is an interesting article recently published in Forbes. The author gathered data from Glassdoor.com, to rank companies. Glassdoor.com is a website where employees make comments about, and rate their company, and can even post their job title and salary range. Keep in mind that the author is not a statistician, and his analysis is…

ContinueAdded by Mirko Krivanek on May 20, 2015 at 10:00am — 2 Comments

I’m attending Rachel Schutt’s Columbia University Data Science course on Wednesdays this semester and I’m planning to blog the class. Here’s what happened yesterday at the first meeting.…

ContinueAdded by Mirko Krivanek on May 13, 2015 at 10:42am — No Comments

Very interesting list of algorithm, data science, machine learning, and computer science keywords. To check the definition for any keyword, go to xlinux.nist.gov. For whatever reasons, and like in many similar lists, the top three letters have more entries than subsequent letters, as if the editor suddenly became lazy when hitting letter D (maybe product developers create products that start with letter A, B, or C, to show up at the…

ContinueAdded by Mirko Krivanek on April 23, 2015 at 5:00pm — No Comments

Broken down in eight categories.

1. Algorithms and Data Structures

- Big O Notation
- Sorting Algorithms
- Recursion
**Big Data**- Let’s assume you have a leak in a water pipe in your garden. You take a bucket and some sealing materials to fix the problem. After a while, you see that the leak is much bigger that you need a plumber to bring bigger tools. In the meanwhile, you are still using the bucket to drain the water. After a while,…

Added by Mirko Krivanek on April 23, 2015 at 12:30pm — No Comments

This was a great question posted on Quora.com, and attracted many comments. Here we summarize the must interesting contributions for you.

*Source for picture: …*

Added by Mirko Krivanek on April 19, 2015 at 6:30pm — 2 Comments

Many data set resources have been published on DSC, both big and little data. Some associated with our data science apprenticeship. A list can be found here. Below is a repository published on Github, originally posted here. …

ContinueAdded by Mirko Krivanek on April 19, 2015 at 1:30pm — 5 Comments

First, let's start with an article featuring many great Excel functions, entitled *11 Advanced Excel Tricks That Will Help You Get An Instant Raise At Work*. It describes the following Excel functions:

**Vlookup**: You can use the VLOOKUP function to search the first column of a range…

Added by Mirko Krivanek on April 9, 2015 at 10:00pm — 6 Comments

More than a thousand keywords with detailed explanations, and hundreds of machine learning / data science books categorized by programming language used to illustrate the concepts.

**Here's a selection of keywords, from the mega-list**

10 keywords starting with A, this is indeed a small subset of all the keywords starting with…

ContinueAdded by Mirko Krivanek on April 3, 2015 at 11:30am — No Comments

- 38 Seminal Articles Every Data Scientist Should Read
- 18 Open Source NoSQL Databases
- 20 Big Data Repositories You Should Check Out
- Ingredients Of Data Science
- 11 most popular data science presentations on Slideshare
- Eleven interesting questions about data science / big data
- 7 Python Tools All Data Scientists Should Know How to Use

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions