Subscribe to DSC Newsletter

All Blog Posts Tagged 'cleansing' (5)

Data Cleansing with Apache Spark and Optimus

Outdated, inaccurate, or duplicated data won’t drive optimal data driven solutions. When data is inaccurate, leads are harder to track and nurture, and insights may be flawed. The data on which you base your big data strategy must be accurate, up-to-date, as complete as possible, and should not contain duplicate entries. Clean data results in…

Continue

Added by Favio Vázquez on August 18, 2017 at 8:00am — No Comments

Are you brave enough to change your Data Habits?

Do you often go with gut feeling rather than data and insights? Is your data stored in separate databases, in different formats with different values? We all have bad habits and some are a little hard to kick. However, if there is one you must break, it is surely to make your bad data habits a thing of the…

Continue

Added by Martin Doyle on March 6, 2017 at 2:30am — No Comments

The Data Quality Tipping Point

Whatever your business sector, data is your most valuable asset. Along with the machinery and stock you hold, data and insights hold the key to profit and growth. But it has the unique ability to unite every department, and every function. It can reveal problems in processes, drive productivity among your staff and ensure everyone is ‘singing from the same hymn…

Continue

Added by Martin Doyle on April 6, 2016 at 3:30am — No Comments

Resolving Skewness

The fundamental assumption in many predictive models is that the predictors have normal distributions. Normal distribution is un-skewed. An un-skewed distribution is the one which is roughly symmetric. It means the probability of falling in the right side of mean is equal to probability of falling on left side of mean.

This article outlines the steps to detect…

Continue

Added by Shahram Abyari on December 25, 2015 at 7:00am — 3 Comments

Data Cleansing vs Data Maintenance: Which One Is Most Important?

There are always two aspects to data quality improvement. Data cleansing is the one-off process of tackling the errors within the database, ensuring retrospective anomalies are automatically located and removed. Another term, data maintenance, describes ongoing correction and verification – the process of continual improvement and regular checks. 

Often, businesses ask…

Continue

Added by Martin Doyle on August 25, 2015 at 3:34am — 4 Comments

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

1999

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service