This book, initially written for chemical engineers, is actually very interesting for data scientists and machine learning engineers alike. For more free books, visit this page.

**Content**

**1. Visualizing Process Data**

1.1. Data visualization in context

1.2. References and readings

1.3. Time-series plots

1.4. Bar plots

1.5. Box plots

1.6. Relational graphs: scatter plots

1.7. Tables as a form of data visualization

1.8. Topics of aesthetics and style

1.9. General summary: revealing complex data graphically

1.10. Exercises

**2. Univariate Data Analysis**

2.1. Univariate data analysis in context

2.2. References and readings

2.3. What is variability?

2.4. Histograms and probability distributions

2.5. Binary (Bernoulli) distribution

2.6. Uniform distribution

2.7. The normal distribution and checking for normality

2.8. The t-distribution

2.9. Poisson distribution

2.10. Confidence intervals

2.11. Testing for differences and similarity

2.12. Paired tests

2.13. Other types of confidence intervals

2.14. Statistical tables for the normal- and t-distribution

2.15. Exercises

**3. Process Monitoring**

3.1. Process monitoring in context

3.2. References and readings

3.3. What are process monitoring charts?

3.4. Shewhart charts

3.5. CUSUM charts

3.6. EWMA charts

3.7. Other types of monitoring charts

3.8. Process capability

3.9. The industrial practice of process monitoring

3.10. Industrial case study

3.11. Summary

3.12. Exercises

**4. Least Squares Modelling Review**

4.1. Least squares modelling in context

4.2. References and readings

4.3. Covariance

4.4. Correlation

4.5. Some definitions

4.6. Least squares models with a single x-variable

4.7. Least squares model analysis

4.8. Investigating an existing linear model

4.9. Summary of steps to build and investigate a linear model

4.10. More than one variable: multiple linear regression (MLR)

4.11. Outliers: discrepancy, leverage, and influence of the observations

4.12. Enrichment topics

4.13. Exercises

**5. Design and Analysis of Experiments**

5.1. Design and analysis of experiments in context

5.2. Terminology

5.3. Usage examples

5.4. References and readings

5.5. Why learning about systems is important

5.6. Experiments with a single variable at two levels

5.7. Changing one single variable at a time (COST)

5.8. Full factorial designs

5.8.1. Using two levels for two or more factors

5.8.2. Analysis of a factorial design: main effects

5.8.3. Analysis of a factorial design: interaction effects

5.8.4. Analysis by least squares modelling

5.8.5. Example: design and analysis of a three-factor experiment

5.8.6. Assessing significance of main effects and interactions

5.8.7. Summary so far

5.8.8. Example: analysis of systems with 4 factors

5.9. Fractional factorial designs

5.9.1. Half fractions

5.9.2. Generators and defining relationships

5.9.3. Generating the complementary half-fraction

5.9.4. Generators: to determine confounding due to blocking

5.9.5. Highly fractionated designs

5.9.6. Design resolution

5.9.7. Saturated designs for screening

5.9.8. Design foldover

5.9.9. Projectivity

5.10. Blocking and confounding for disturbances

5.11. Response surface methods

5.12. Evolutionary operation

5.13. General approach for experimentation

5.14. Extended topics related to designed experiments

5.15. Exercises

**6. Latent Variable Modelling**

6.1. In context

6.2. References and readings

6.3. Extracting value from data

6.4. What is a latent variable?

6.5. Principal Component Analysis (PCA)

6.5.1. Visualizing multivariate data

6.5.2. Geometric explanation of PCA

6.5.3. Mathematical derivation for PCA

6.5.4. More about the direction vectors (loadings)

6.5.5. PCA example: Food texture analysis

6.5.6. Interpreting score plots

6.5.7. Interpreting loading plots

6.5.8. Interpreting loadings and scores together

6.5.9. Predicted values for each observation

6.5.10. Interpreting the residuals

6.5.11. PCA example: analysis of spectral data

6.5.12. Hotelling’s T²

6.5.13. Preprocessing the data before building a model

6.5.14. Algorithms to calculate (build) PCA models

6.5.15. Testing the PCA model

6.5.16. Determining the number of components to use in the model with cross-validation

6.5.17. Some properties of PCA models

6.5.18. Latent variable contribution plots

6.5.19. Using indicator variables in a latent variable model

6.5.20. Visualization latent variable models with linking and brushing

6.5.21. PCA Exercises

6.6. Principal Component Regression (PCR)

6.7. Introduction to Projection to Latent Structures (PLS)

6.7.1. Advantages of the projection to latent structures (PLS) method

6.7.2. A conceptual explanation of PLS

6.7.3. A mathematical/statistical interpretation of PLS

6.7.4. A geometric interpretation of PLS

6.7.5. Interpreting the scores in PLS

6.7.6. Interpreting the loadings in PLS

6.7.7. How the PLS model is calculated

6.7.8. Variability explained with each component

6.7.9. Coefficient plots in PLS

6.7.10. Analysis of designed experiments using PLS models

6.7.11. PLS Exercises

6.8. Applications of Latent Variable Models

The book can be accessed online or downloaded as a PDF document, here.

**DSC Resources**

- Invitation to Join Data Science Central
- Free Book: Applied Stochastic Processes
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Post a Blog | Forum Questions

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- DataOps: How Bell Canada Powers their Business with Data - July 15

Demand for data outstrips the capacity of IT organizations and data engineering teams to deliver. The enabling technologies exist today and data management practices are moving quickly toward a future of DataOps. DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- DataOps: How Bell Canada Powers their Business with Data - July 15

Demand for data outstrips the capacity of IT organizations and data engineering teams to deliver. The enabling technologies exist today and data management practices are moving quickly toward a future of DataOps. DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central