Practicing Data science indeed a long term effort than a learning handful of skills. We ought to be academically good enough to take up this challenge. However, if you think you came a long way from your academic rebuilding, but you still have that zeal & passion to take the oil from the data and fill the skill gap of data science then here is the** warm-up** tips. Below points must **exercised **before jumping into any data science & data mining problems:

Not all datasets are in the form of a data matrix. For instance, more complex datasets can be in the form of sequences, text, time-series, images, audio, video, and so on, which may need special techniques for analysis. However, in many cases even if the raw data is not a data matrix it can usually be transformed into that form via feature extraction. A practical example of feature example is explained in my last post on scikit-learn library.

- Number of attributes defines the dimensionality of the data matrix. Save the dimensionality in mind when you think of any matrix operations.
- Each row may be considered as a d-dimensional column vector (all vectors are assumed to be column vectors by default). You must also understand the term row space and column space.
- Treating data instances and attributes as vectors, and the entire dataset as a matrix, enables one to apply both geometric and algebraic methods to aid in the data mining and analysis tasks. At least you must aware about unit vector, identity matrix etc..
- Clear dust from your school learning about matrix manipulation i.e. matrix addition, multiplication, transpose, inverse etc. Similar applies to some of the algebraic equation like distance between two points,
*Pythagorean theorem*—or*Pythagoras*'*theorem etc..* - Through understanding on matrix manipulation will help you to implement multiplication and summation of elements.
- Leaving probability is probably not a good idea. Run through some short probability problems & exercise before you go in detail of any supervised learning models.
- You may need to practice on the topics that you mightily left during schools like:
*Orthogonal projection of vector*(projecting a vector to another vector),*Probabilistic view of the data, Probability density function*. (i admit to avoid these topics during graduations :) ) - Relax yourself with all the formula of descriptive statistical analysis. From Mean, median, mode to normal distribution, standard deviation, skewness and most importantly don't forget to cover-up Variance and standard deviation. You should be ready with basic statistical analysis of univariate & multivariate numeric data. Believe me distance finding methodologies change due to distribution of the data. (Using Euclidean distance score when data is normally distributed otherwise Pearson coefficient score)
- Generalization, Correlation & regression concepts are widely used across statistics and mathematical modeling. So this must be broadly rehearsed before you go into modeling techniques.
- You must do some exercise on how to normalize vector. Vector normalization is the must-to-know concept in prediction algorithms.

" In fact, data mining is part of a larger knowledge discovery process, which includes pre-processing tasks like data extraction, data cleaning, data fusion, data reduction and feature construction. As well as post-processing steps like pattern and model interpretation, hypothesis confirmation and generation, and so on. This knowledge discovery and data mining process tends to be highly iterative and interactive. "

**CRUX**: The algebraic, geometric & probabilistic viewpoints of data play a key role in data mining. You should exercise them beforehand. So you can easily sail though your boat in Data Science !

Original post: http://datumengineering.wordpress.com/2013/10/18/warm-up-exercise-b...

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- Natural Language Trends in Visual Analysis - Aug 6

In this latest Data Science Central webinar, Vidya will discuss how natural language can be leveraged in various aspects of the analytical workflow ranging from smarter data transformations, visual encodings, autocompletion to supporting analytical intent. More recently, chatbot systems have garnered interest as conversational interfaces for a variety of tasks. Machine learning approaches have proven to be promising for approximating the heuristics and conversational cues for continuous learning in a chatbot interface. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- Natural Language Trends in Visual Analysis - Aug 6

In this latest Data Science Central webinar, Vidya will discuss how natural language can be leveraged in various aspects of the analytical workflow ranging from smarter data transformations, visual encodings, autocompletion to supporting analytical intent. More recently, chatbot systems have garnered interest as conversational interfaces for a variety of tasks. Machine learning approaches have proven to be promising for approximating the heuristics and conversational cues for continuous learning in a chatbot interface. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central