Mega collection of data science books and terminology

More than a thousand keywords with detailed explanations, and hundreds of machine learning / data science books categorized by programming language used to illustrate the concepts.

Here's a selection of keywords, from the mega-list

10 keywords starting with A, this is indeed a small subset of all the keywords starting with A.

  • A/B Testing - In marketing, A/B testing is a simple randomized experiment with two variants, A and B, which are the control and treatment in the controlled experiment. It is a form of statistical hypothesis testing. Other names include randomized controlled experiments, online controlled experiments, and split testing. In online settings, such as web design (especially user experience design), the goal is to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement).
  • Adaptive Boosting (AdaBoost) - AdaBoost, short for “Adaptive Boosting”, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire who won the prestigious “Gödel Prize” in 2003 for their work. It can be used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms (‘weak learners’) is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems, however, it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (i.e., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner. While every learning algorithm will tend to suit some problem types better than others, and will typically have many different parameters and configurations to be adjusted before achieving optimal performance on a dataset, AdaBoost (with decision trees as the weak learners) is often referred to as the best out-of-the-box classifier. When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative ‘hardness’ of each training sample is fed into the tree growing algorithm such that later trees tend to focus on harder to classify examples.
  • Algorithmic Complexity (AC) - The information content or complexity of an object can be measured by the length of its shortest description. For instance the string “01010101010101010101010101010101” has the short description “16 repetitions of 01″, while “11001000011000011101111011101100” presumably has no simpler description other than writing down the string itself. More formally, the Algorithmic “Kolmogorov” Complexity (AC) of a string x is defined as the length of the shortest program that computes or outputs x , where the program is run on some fixed reference universal computer.
  • Agglomerative Hierarchical Clustering (AHC) - Hierarchical clustering algorithms are either top-down or bottom-up. Bottom-up algorithms treat each document as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all documents. Bottom-up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC . Top-down clustering requires a method for splitting a cluster. It proceeds by splitting clusters recursively until individual documents are reached.
  • Analysis of Covariance (ANCOVA) - Covariance is a measure of how much two variables change together and how strong the relationship is between them. Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether population means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV), while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV). Therefore, when performing ANCOVA, the DV means are adjusted to what they would be if all groups were equal on the CV.
  • Anomaly Detection - In data mining, anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or finding errors in text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.
  • Ant Colony Optimization (ACO) - In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it constitutes some metaheuristic optimizations. Initially proposed by Marco Dorigo in 1992 in his PhD thesis, the first algorithm was aiming to search for an optimal path in a graph, based on the behavior of ants seeking a path between their colony and a source of food. The original idea has since diversified to solve a wider class of numerical problems, and as a result, several problems have emerged, drawing on various aspects of the behavior of ants.
  • Apache Hive - The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides * tools to enable easy data extract/transform/load (ETL) * a mechanism to impose structure on a variety of data formats * access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * query execution via MapReduce Hive defines a simple SQL-like query language, called HiveQL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. HiveQL can also be extended with custom scalar functions (UDF’s), aggregations (UDAF’s), and table functions (UDTF’s).
  • AutoCorrelation Function (ACF) - The auto-correlation function measures the correlation of a signal x(t) with itself shifted by some time delay tau. The auto-correlation function can be used to detect repeats or periodicity in a signal. E.g. to use the auto-correlation to assess the effect of fluctuations (noise) on a periodic signal.
  • Autoregressive Integrated Moving Average (ARIMA) - In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). They are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the “integrated” part of the model) can be applied to remove the non-stationarity. (1) The model is generally referred to as an ARIMA(p,d,q) model where parameters p, d, and q are non-negative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model respectively. ARIMA models form an important part of the Box-Jenkins approach to time-series modelling. (2) When one of the three terms is zero, it is usual to drop “AR”, “I” or “MA” from the acronym describing the model. For example, ARIMA(0,1,0) is I(1), and ARIMA(0,0,1) is MA(1).

Here are all the eBooks starting with letter A

  • A Beginner’s Guide to R 228 Pages 2009
  • A Brief Introduction to Neural Networks 286 Pages 2005
  • A Computational Approach to Statistics 492 Pages 2006
  • A Course in Machine Learning 189 Pages 2013
  • A Course in Machine Learning 229 Pages 2012
  • A Course in Machine Learning 191 Pages 2014
  • A Field Guide to Genetic Programming 250 Pages 2008
  • A First Encounter With Machine Learning 2010 93 Pages 2010
  • A First Encounter With Machine Learning 93 Pages 2011
  • A Handbook of Statistical Analyses Using R 207 Pages 2005
  • A History of Mathematical Notations 870 Pages 1993
  • A Little Book of R For Bioinformatics 77 Pages 2011
  • A Little Book of R For Multivariate Analysis 51 Pages 2013
  • A Little Book of R For Time Series 75 Pages 2014
  • A Modern Introduction to Probability and Statistics 483 Pages 2005
  • A Nonparametric Statistical Approach to Clustering via Mode Identification 37 Pages 2007
  • A Probabilistic Theory of Pattern Recognition 661 Pages 1995
  • A Probability Course for the Actuaries 599 Pages 2014
  • A Programmer’s Guide to Data Mining 305 Pages 2013
  • A Tiny Handbook of R 94 Pages 2011
  • A Trio of Texts 
  • Advanced Data Analysis from an Elementary Point of View 697 Pages 2014
  • Advanced R 2014
  • AI Algorithms Data Structures and Idioms 463 Pages 2009
  • Algorithms for Reinforcement Learning 98 Pages 2010
  • All of Nonparametric Statistics 271 Pages 2006
  • An Example of Statistical Data Analysis 147 Pages 2014
  • An Introduction to Graphical Models 102 Pages 1997
  • An Introduction to Information Retrieval 569 Pages 2009
  • An Introduction to Mathematical Optimal Control Theory 126 Pages 2014
  • An Introduction to R 106 Pages 2013
  • An Introduction to Statistical Inference and Its Applications with R 459 Pages 2008
  • An Introduction to Statistical Learning 4th 440 Pages 2014
  • An Introduction to Statistical Learning 441 Pages 2013
  • An Introduction to Statistics 192 Pages 2015
  • Analysing spatial point patterns in R 232 Pages 2010
  • AnalyticBridge Data Science eBook 123 Pages 2013
  • Apache Hadoop YARN 337 Pages 2014
  • Applied Data Science 141 Pages 2014
  • Applied Numerical Computing 274 Pages 2011
  • Applied Numerical Linear Algebra 421 Pages 1996
  • Art and Visual Perception 263 Pages 1974
  • Artificial Intelligence – A Modern Approach 946 Pages 1995
  • Artificial Intelligence: Foundations of Computational Agents 2010

To check the entire list, with clickable links, not just titles, visit

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 11758


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service