Here is selection containing both external and internal papers, focusing on various technical aspects of data science and big data. Feel free to add your favorites.

*Complex Open Text Analysis: Source: Avinash Kaushik*

**External Papers**

- Bigtable: A Distributed Storage System for Structured Data
- A Few Useful Things to Know about Machine Learning
- Random Forests
- A Relational Model of Data for Large Shared Data Banks
- Map-Reduce for Machine Learning on Multicore
- Pasting Small Votes for Classification in Large Databases and On-Line
- Recommendations Item-to-Item Collaborative Filtering
- Recursive Deep Models for Semantic Compositionality Over a Sentimen...
- Spanner: Google's Globally-Distributed Database
- Megastore: Providing Scalable, Highly Available Storage for Interac...
- F1: A Distributed SQL Database That Scales
- APACHE DRILL: Interactive Ad-Hoc Analysis at Scale
- A New Approach to Linear Filtering and Prediction Problems
- Top 10 algorithms on Data mining
- The PageRank Citation Ranking: Bringing Order to the Web
- MapReduce: Simplified Data Processing on Large Clusters
- The Google File System
- Amazon's Dynamo

**DSC Internal Papers**

- How to detect spurious correlations, and how to find the ...
- Automated Data Science: Confidence Intervals
- 16 analytic disciplines compared to data science
- From the trenches: 360-degree data science
- 10 types of regressions. Which one to use?
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- Jackknife logistic and linear regression for clustering and predict...
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predict...
- Internet topology mapping
- 11 Features any database, SQL or NoSQL, should have
- 10 Features all Dashboards Should Have
- Clustering idea for very large datasets
- Hidden decision trees revisited
- Correlation and R-Squared for Big Data
- What Map Reduce can't do
- Excel for Big Data
- Fast clustering algorithms for massive datasets
- The curse of big data
- Interesting Data Science Application: Steganography

Tags:

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions