Home » Uncategorized

Summarize and explore the data using SmartEDA

Created an R package for exploratory data analysis. Package name is SmartEDA now available on CRAN. This package includes multiple custom functions to perform initial exploratory analysis on any input data describing the structure and the relationships present in the data. The generated output can be obtained in both summary and graphical form. The graphical form or charts can also be exported as reports.

SmartEDA package helps you to construct a good base of data understanding. The capabilities and functionalities are listed below

  • SmartEDA package will make you capable of applying different types of EDA without having to: (1) remember the different R package names, (2) write lengthy R scripts, (3) manual effort to prepare the EDA report
  • No need to categorize the variables into Character, Numeric, Factor etc. SmartEDA functions automatically categorize all the features into the right data type (Character, Numeric, Factor etc.) based on the input data.
  • ggplot2 functions are used for graphical presentation of data
  • Rmarkdown and knitr functions were used for build HTML reports

To summarize, SmartEDA package helps in getting the complete exploratory data analysis just by running the function instead of writing lengthy r code. You can find SmartEDA documentation here

“ExpCustomStat”: This is a generic function created using data.table functionalities, having multiple options to customize the table report in EDA.

Key functionalities of ExpCustomStat :

  1. Categorical data descriptive statistics (Frequencies, Proportions)
  2. Numerical data descriptive statistics (Mean, Median, Sum, Variance etc..)
  3. Comparison of numerical data based on categorical data
  4. Filter rows/cases where conditions are true. Options to apply filters at variable level or complete data set like base subsetting
  5. Options to calculate basic statistics like Mean, Median, Std.Dev, Variance, Count, Proportions, Quantiles, IQR, Percentages of Shares (PS) for numerical data

 ExpCustomStat examples and Rcode  can be found!