Subscribe to DSC Newsletter

You need to  analyze data to make more informed decisions. There are many tools to help you analyze the data visually or statistically, but they only work if the data is already clean and consistent.

Here is the list of 5 data cleansing Tools.

Drake

Drake is a simple-to-use, extensible, text-based data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs and Drake automatically resolves their dependencies and calculates:

  • which commands to execute (based on file timestamps)

  • in what order to execute the commands (based on dependencies)

Drake is similar to GNU Make, but designed especially for data workflow management. It has HDFS support, allows multiple inputs and outputs, and includes a host of features designed to help you bring sanity to your otherwise chaotic data processing workflows.

OpenRefine

OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

DataWrangler

Wrangler is an interactive tool for data cleaning and transformation. Spend less time formatting and more time analyzing your data. Wrangler allows interactive transformation of messy, real-world data into the data tables analysis tools expect. Export data for use in Excel, R, Tableau, Protovis, ...

DataCleaner

The heart of DataCleaner is a strong data profiling engine for discovering and analyzing the quality of your data. Find the patterns, missing values, character sets and other characteristics of your data values. Profiling is an essential activity of any Data Quality, Master Data Management or Data Governance program. If you don't know what you're up against, you have poor chances of fixing it.

Winpure Data Cleaning Tool

Data quality is an important contributor in the overall success of a project or campaign. Inaccurate data leads to wrong assumptions and analysis. Consequently it leads to failure of the project or campaign. Duplicate data can thus cause all sorts of hassles such as slow load ups, accidental deletion etc. A good data cleaning tool tackles these problems and cleans your database of duplicate data, bad entries and incorrect information.

Patnab. collected data on data cleansing tools.

Views: 36755

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Venu Madhav on May 23, 2017 at 9:26pm

You can try TIBCO Clarity for free. It provides fuzzy matching service for de-duplication.

TIBCO  Clarity is the data cleaning  and standardization component of the TIBCO  Software System. It serves  as a single solution for business users to  handle massive messy data  across various sources,  applications and  systems, such as database,  cloud storage, TIBCO Jaspersoft, Spotfire,  ActiveSpaces,  MDM, Marketo  and Salesforce. TIBCO® Clarity makes it easy  for business users to  profile, validate, dedup, address cleansing,  standardize, transform,  and visualize data so that trends can be  identified and smart decisions  can be made quickly. It provides both on cloud version and  enterprise edition version.

TIBCO Clarity Overview/features:
https://clarity.cloud.tibco.com/landing/feature-summary.html

TIBCO Clarity over 40 demo videos(Watch me!):
https://clarity.cloud.tibco.com/landing/tutorial.html

Comment by Linda Boudreau on December 9, 2016 at 8:52am

DataMatch by Data Ladder is also an excellent fuzzy matching and address standardization/address parsing tool used in business and among many Fortune 500 companies. There is a complimentary trial for new users.

 

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service