Home » Technical Topics » Data Science

Why Address Standardization and Validation Matters & What You Can Do About It

  • FarahKim 

Poor address data is a complex data quality challenge that affects customers, businesses, and mailing service. Each year, millions of dollars get wasted in resolving the consequences of poor address data. Mailers spend over $20 billion on UAA mail, while direct costs to the USPS is over $1.5 billion/year. All this unnecessary cost is the result of poor, mismanaged, invalidated address data.

Over the years, working with Fortune 500 clients, we have seen the consequences of poor address data – disgruntled customers, ballooning costs, inefficient operations, marketing blunders, embarrassing mistakes…. the list goes on.

Take for a moment and imagine this.

You have over a million customer records and nearly 23% of that record is either incomplete or inaccurate – this is not taking into account records that are duplicated and are unstructured.

That’s nearly 230,000 records that may be rendered useless or may cost you thousands of dollars in managing return mails. This is a situation most companies face today, regardless of the controls they’ve put in place. When the data input is done by humans, it will always be significantly flawed.

What Does Bad Address Data Really Look Like? 

Here’s an image of how a typical unstructured, raw address data looks like. Poor address data is a challenge that causes a severe strain on businesses and their employees. Imagine having to fix these very basic issues for every mailing campaign, promotional activity, and every customer report that you have to run. It’s not only mind-bogglingly frustrating but also counter-productive as you try to match and verify each address to ensure it’s accurate and complete. Data scientists and analysts or business users in need of this data must spend days and months fixing these issues.

5616770464

Sure, it’s human nature to make mistakes. Most of the time, consumers are lax when it comes to providing their address information on physical or web forms. They may misspell a state name, write abbreviations, miss out a street number or forget their ZIP Code. It’s inevitable that some mistakes will be made and incorrect data will be entered.

Does it mean though that companies are helpless? Should poor address data be resolved via manual means – like calling up customers or using other records like bank statements and bills to verify? You could do that, but it’s going to cost you time and effort – not to mention, you’re not addressing the core problem; that is the non-standardization and validation of address data according to the USPS or any authority standard of your country.

Let me elaborate on this further.

Address Standardization and Validation Limitations 

If your CRM data look anything like in the image above, you have a significant address standardization problem. According to the USPS guidelines (given below), address data is supposed to be in a format as this:

Why Address Standardization and Validation Matters & What You Can Do About It

Unless you place strict data entry controls on your web form or physical form, there is very little chance your data will be in this perfect state. So the first limitation here is address standardization and there is no way you can manually do this for hundreds and thousands of rows of data.

Here are the USPS guidelines:

  • Always put the address and the postage on the same side of your mailpiece.
  • On a letter, the address should be parallel to the longest side.
  • All capital letters.
  • No punctuation.
  • At least 10-point type.
  • One space between city and state.
  • Two spaces between state and ZIP Code.
  • Simple type fonts.
  • Left justified.
  • Black ink on white or light paper.
  • No reverse type (white printing on a black background).
  • If your address appears inside a window, make sure there is at least 1/8-inch clearance around the address. Sometimes parts of the address slip out of view behind the window and mail processing machines can’t read the address.
  • If you are using address labels, make sure you don’t cut off any important information. Also make sure your labels are on straight. Mail processing machines have trouble reading crooked or slanted information.

Next, let’s talk about validation.

The USPS is the official database of addresses in the United States. If you want to check the validity of your address data, you’re going to have to match it to the USPS database. To do that, you will need access to a CASS Certified Vendor who will validate your address by matching it against the USPS database. These vendors have updated CASS files which means any new address or changes in locations that are recorded by the USPS will be available for the vendor.

Here’s the tricky part.

To validate this data, you have to standardize it.

To standardize it, you have to clean and dedupe this data. 

Note though that address standardization tools can only validate records based on certain geographical parameters. It cannot, for instance, validate addresses that are:

  • Valid, but no longer exists
  • Structurally right but does not belong to the customer
  • Not registered in the USPS database

Then again, once you’ve cleansed, standardized, and validated the data, the number of invalid or non-existent records goes down significantly. You can filter those records, verify the legitimately of the entity and if necessary, call them up to ask for accurate information.

So How Can You Manage this Dilemma Smartly? 

In our experience, companies do understand the problem already. They are just not sure of the perfect solution, instead, they hire data specialists or analysts, who are then tasked with the responsibility of cleaning up this data.

Let me be clear – a data scientist’s job is not to clean dirty data – it’s to study data, improve data acquisition, and make efficient use of this data. To make a data scientist spend 80% of their time cleaning data is to waste their talent.

A better strategy would be to equip the data scientist or even a business user with the right address standardization software to help them manage this better. Most validation tools today are pretty much DIY and do not require a user to learn a new language or be technically proficient. It does require a learning curve, as is the case with most software, but it’s not something that is out of reach.

It’s important though to choose a tool that lets you tackle all three aspects of this problem:

  1. Cleaning: The ability to clean up data by identifying common data quality errors (typos, format, non-printable characters, negative spacing etc)
  2. Standardization: Turn this data into an acceptable USPS format.
  3. Validation or Verification: Verifying this data by matching it with the USPS database.

Most address verification software does not have strong data matching capabilities, which is at the heart of this function. Your choice of software should be able to match your address data and give a 100% accuracy rate. If it misses matches because the content is not exact in nature, it is not the right solution for you.

At the end of the day though, tools and gadgets can only do as much. You will need to implement certain business strategies that can help you take care of this problem. These could be:

Training:

The first step towards quality is training – make sure people who are handling, interacting, using, and entering data know the impact they have in the process and on downstream applications. They need to understand the consequences of bad data on the entire organization and not just on one member or customer. Employees practicing data quality rules should be rewarded and appreciated.

Involve Business Users in the Quality Process:

Data is not just an IT problem. Business users are equally responsible for managing data. In fact, they are the sole owners of customer data that is often used in marketing and sales purposes. This is why they need to be involved in the process and also need to be trained for using data management tools.

Data Governance:

Set up a data governance team to create a data management plan and ensure that the organization follows the plan where each employee understands the plan, their rule within the plan, and the expectations that come along with the role.

Lock Down Data & User Roles:

If anyone in your team can open up the CRM or the data source, muddle around with data and leave no footprints, you are in for serious trouble. It’s necessary to create master data holders who have the right to access, enter, or process critical data. This should come in the data management plan.

Remember though, you don’t have to do a blanket address quality upgrade. Start small. Identify departments or activities that require address data to carry out tasks as mail or package delivery, newsletter, or billing and start optimizing the data for each process.

You’re not a victim of bad data. With plenty of tools and solutions now available, you can sort your data and prevent negative outcomes.

What are you doing in your organization to manage bad address data? 

This post is a condensed version of an original guide published here.