Here is a new set of easy questions recently published, covering

- Statistics
- Programming (General, Big Data, Python, R, SQL)
- Modeling
- Behavioral
- Culture Fit
- Problem-Solving

**Statistics:**

- Prove that a random variable with a distribution on [0,1] (that is, the density function is equal to 0 outside [0, 1]) has an expectation always between 0 and 1. Prove that its variance is maximum and equal to 1/12 if and only if the distribution is uniform on [0, 1]. - Click here for more on this topic.
- What is the Central Limit Theorem and why is it important? - Answer (or click here for more advanced material on this topic)
- What is sampling? How many sampling methods do you know?
- What is the difference between Type I vs Type II error? Answer
- What is linear regression? What do the terms P-value, coefficient, R-Squared value mean? What is the significance of each of these components? - Answer, Answer
- What are the assumptions required for linear regression? - There are four major assumptions: 1. There is a linear relationship between the dependent variables and the regressors, meaning the model you are creating actually fits the data, 2. The errors or residuals of the data are normally distributed and independent from each other, 3. There is minimal multicollinearity between explanatory variables, and 4. Homoscedasticity. This means the variance around the regression line is the same for all values of the predictor variable.
- What is a statistical interaction? - Answer
- What is selection bias?
- What is an example of a dataset with a non-Gaussian distribution? - Example
- What is the Binomial Probability Formula?

**R programming**

- What are the different types of sorting algorithms available in R language? -There are insertion, bubble, and selection sorting algorithms.
- What are the different data objects in R?
- What packages are you most familiar with? What do you like or dislike about them?
- How do you access the element in the 2nd column and 4th row of a matrix named M?
- What is the command used to store R objects in a file?
- What is the best way to use Hadoop and R together for analysis?
- How do you split a continuous variable into different groups/ranks in R?
- Write a function in R language to replace the missing value in a vector with the mean of that vector.

**SQL**

- What is the purpose of the group functions in SQL? Give some examples of group functions.
- Group functions are necessary to get summary statistics of a dataset. COUNT, MAX, MIN, AVG, SUM, and DISTINCT are all group functions
- Tell me the difference between an inner join, left join/right join, and union. - Answer
- What does UNION do? What is the difference between UNION and UNION ALL?
- What is the difference between SQL and MySQL or SQL Server?
- If a table contains duplicate rows, does a query result display the duplicate values by default? How can you eliminate duplicate rows from a query result?

The full list is available here. Many lists of interview questions have been published over the last few years. Here is a selection, including some from me:

- 50 Questions to Test True Data Science Knowledge
- 66 job interview questions for data scientists
- 20 Job Interview Questions for IoT Professionals
- Answers to dozens of data science job interview questions
- 46 SQL Job Interview Questions for Data Scientists
- R Programming: 35 Job Interview Questions and Answers
- Top Hadoop Big Data Interview Questions and Answers
- 70 MongoDB Interview Questions and Answers
- 100 Data Science Interview Questions and Answers
- 40 Interview Questions asked at Startups in Machine Learning
- 19 Worst Mistakes at Data Science Job Interviews

**DSC Resources**

- Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Contributors: Post a Blog | Ask a Question
- Follow us: @DataScienceCtrl | @AnalyticBridge

Popular Articles

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central