Feature engine python package for feature engineering


In this post, we explore a new python package for feature engineering


Feature engineering is the process of using domain knowledge of the data to transform existing features or to create new variables from existing ones, for use in machine learning. Using feature engineering, we can pre-process raw data and make it suitable for use in machine learning algorithms.


The package covers the following functions

1. Missing Data Imputation

  1. Complete Case Analysis
  2. Mean / Median / Mode Imputation
  3. Random Sample Imputation
  4. Replacement by Arbitrary Value
  5. End of Distribution Imputation
  6. Missing Value Indicator


2. Categorical Encoding

  1. One hot encoding
  2. Count and Frequency encoding
  3. Target encoding / Mean encoding
  4. Ordinal encoding
  5. Weight of Evidence
  6. Rare label encoding


3. Variable transformation

  1. Logarithm transformation - log(x)
  2. Reciprocal transformation - 1 / x
  3. Square root transformation - sqrt(x)
  4. Exponential transformation - exp(x)
  5. Yeo-Johnson transformation
  6. Box-Cox transformation


4. Discretisation

  1. Equal width discretisation
  2. Equal Frequency discretisation
  3. Discretisation using decision trees


5. Outliers

  1. Outlier removal
  2. Treating outliers as missing values
  3. Top / bottom / zero coding
  4. Discretisation


6. Feature Scaling

  1. Standardisation
  2. Min-Max Scaling
  3. Maximum Absolute Scaling
  4. Robust Scaling.
  5. Mean normalisation
  6. Scaling to unit length


8. Feature Creation


9. Aggregating Transaction Data


From the github page

Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data

Feature engine package on github

Documentation of feature engine package 

Package created  by Dr Soledad Galli

I plan to contribute to this package. In August, at Data Science Central, I also plan create a mini e-book on feature engineering which will use this page (co-authored with Aysa Tajeri ). Feature Engineering is a complex /multifaceted domain. Our goal is to present an overview of feature engineering for various domains. Proposed outline is

  • Understanding the feature engineering pipeline
  • Concepts/ maths techniques you need to understand feature engineering
  • Implementing feature engineering using the package above
  • Applications in industries

Views: 2947

Tags: dsc_code, dsc_tagged


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service