We created a data science dictionary in 2012, and we are still adding keywords. It is also in our Wiley book (better English, recent update). Here we share with you another similar dictionary, from BigDataProjects.org.

Here are the first few enties, from the Techniques sections (there are two sections: techniques and technologies):

**A/B testing:**

A technique in which a control group is compared with a variety of test groups in order to determine what treatments (i.e., changes) will improve a given objective variable, e.g., marketing response rate. This technique is also known as split testing or bucket testing. An example application is determining what copy text, layouts, images, or colors will improve conversion rates on an e-commerce Web site.

Big data enables huge numbers of tests to be executed and analyzed, ensuring that groups are of sufficient size to detect meaningful (i.e., statistically significant) differences between the control and treatment groups When more than one variable is simultaneously manipulated in the treatment, the multivariate generalization of this technique, which applies statistical modeling, is often called “A/B/N” testing.

**Association rule learning: **

A set of techniques for discovering interesting relationships, i.e., “association rules,” among variables in large databases. These techniques consist of a variety of algorithms to generate and test possible rules.

One application is market basket analysis, in which a retailer can determine which products are frequently bought together and use this information for marketing (a commonly cited example is the discovery that many supermarket shoppers who buy diapers also tend to buy beer). Used for data mining.

**Classification:**

A set of techniques to identify the categories in which new data points belong, based on a training set containing data points that have already been categorized. One application is the prediction of segment-specific customer behavior (e.g., buying decisions, churn rate, consumption rate) where there is a clear hypothesis or objective outcome.

These techniques are often described as supervised learning because of the existence of a training set; they stand in contrast to cluster analysis, a type of unsupervised learning. Used for data mining.

**Cluster analysis:**

A statistical method for classifying objects that splits a diverse group into smaller groups of similar objects, whose characteristics of similarity are not known in advance. An example of cluster analysis is segmenting consumers into self-similar groups for targeted marketing.

This is a type of unsupervised learning because training data are not used. This technique is in contrast to classification, a type of supervised learning. Used for data mining.

**Crowd sourcing:**

A technique for collecting data submitted by a large group of people or community (i.e., the “crowd”) through an open call, usually through networked media such as the Web. This is a type of mass collaboration and an instance of using Web 2.0.

**Data mining:**

A set of techniques to extract patterns from large datasets by combining methods from statistics and machine learning with database management. These techniques include association rule learning, cluster analysis, classification, and regression. Applications include mining customer data to determine segments most likely to respond to an offer, mining human

**Other links**

- From the trenches: 360-degree data science
- 17 short tutorials all data scientists should read (and practice)
- 10 types of data scientists
- 66 job interview questions for data scientists
- Data Science Certification
- Update about our Data Science Apprenticeship
- Our Wiley Book on Data Science
- Data Science Top Articles
- Our Data Science Weekly Newsletter
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- What makes up data science?
- Data science webinars
- Data science competition

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central