38 Seminal Articles Every Data Scientist Should Read

Here is selection containing both external and internal papers, focusing on various technical aspects of data science and big data. Feel free to add your favorites.

Complex Open Text Analysis: Source: Avinash Kaushik

External Papers

  1. Bigtable: A Distributed Storage System for Structured Data
  2. A Few Useful Things to Know about Machine Learning
  3. Random Forests
  4. A Relational Model of Data for Large Shared Data Banks
  5. Map-Reduce for Machine Learning on Multicore
  6. Pasting Small Votes for Classification in Large Databases and On-Line
  7. Recommendations Item-to-Item Collaborative Filtering
  8. Recursive Deep Models for Semantic Compositionality Over a Sentimen...
  9. Spanner: Google's Globally-Distributed Database
  10. Megastore: Providing Scalable, Highly Available Storage for Interac...
  11. F1: A Distributed SQL Database That Scales
  12. APACHE DRILL: Interactive Ad-Hoc Analysis at Scale
  13. A New Approach to Linear Filtering and Prediction Problems
  14. Top 10 algorithms on Data mining
  15. The PageRank Citation Ranking: Bringing Order to the Web
  16. MapReduce: Simplified Data Processing on Large Clusters
  17. The Google File System
  18. Amazon's Dynamo

DSC Internal Papers

  1. How to detect spurious correlations, and how to find the ...
  2. Automated Data Science: Confidence Intervals
  3. 16 analytic disciplines compared to data science
  4. From the trenches: 360-degree data science
  5. 10 types of regressions. Which one to use?
  6. Practical illustration of Map-Reduce (Hadoop-style), on real data
  7. Jackknife logistic and linear regression for clustering and predict...
  8. A synthetic variance designed for Hadoop and big data
  9. Fast Combinatorial Feature Selection with New Definition of Predict...
  10. Internet topology mapping
  11. 11 Features any database, SQL or NoSQL, should have
  12. 10 Features all Dashboards Should Have
  13. Clustering idea for very large datasets
  14. Hidden decision trees revisited
  15. Correlation and R-Squared for Big Data
  16. What Map Reduce can't do
  17. Excel for Big Data
  18. Fast clustering algorithms for massive datasets
  19. The curse of big data
  20. Interesting Data Science Application: Steganography

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 27722


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Dr. Vijay Srinivas Agneeswaran on November 19, 2018 at 11:47pm

Many good papers seems to be missing. Here is a list I had compiled recently:

 Vijay's List of Seminal ML Papers

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service