The Trusted Guide to Marketing Thought Leadership

The Art of Relationship Matching – Identifying Key Data Elements


mThink Knowledge's picture

mThink Knowledge - Posted on 29 October 2002

Printer-friendly versionSend to friend
Authored by: 

Bob Orf;

PDF File: 
Datamentors

A well-matched customer/prospect database is critical for optimal relationship matching. Without it, good customer relationships are jeopardized.

Let's assume a clean, well-matched customer/prospect database is the basis for all successful CRM activity. Let's also assume we understand that truly knowing your customer is the first step in solving the challenges that confront any business. Surprisingly, we have found that even those who agree that optimal relationship matching is important still lack a clear understanding of just how important it is, or of the substantial costs of a substandard database. When they do take the time to examine the missed opportunities, they are amazed. Not only are opportunities lost from mistargeted marketing, but existing good customer relationships are jeopardized by continued solicitation of products and services already owned. In addition, unnecessary costs are incurred from wasted and duplicate mailings. As you strive to achieve perfection, major improvements have been made in today's data cleansing technology to realize a higher level of accuracy.

Now let's take a look at some of the challenges that lie between you and the perfect database, and provide a basis for you to evaluate your current database. Our examples reflect the primary marketing unit as a household, composed of individuals owning one or more accounts. Using this database, the goal is to define the issues so you can better understand them and use them to see where and how your own database should be improved. We’ve identified key questions to ask when evaluating data quality and relationship linking solutions.

The first chore is to find the data. Once the data is found, the job of cleaning the data will need to be tackled.

Finding the Data

While finding the data may seem obvious and appear to be deceptively easy to an outsider, it is not. Often, especially on older legacy systems, there are six or seven lines of name and address information. The first challenge is finding the name or names, the street address line or lines, and the city, state, zip, or post code. Is the name an individual or company? Are we even dealing with a U.S. address?

Who is the Customer?

Once the name and address lines have been properly identified, the name lines must be analyzed. Syntaxes are used for this identification process. A syntax is a definition or code that reflects each word on a line and how it relates to other words on the same line. Many simple syntaxes will be obvious. The syntaxes that follow present several different types of challenges.

Is Robert a "Minor?"

Robert Smith A Minor

Robert A Minor

The phrase "A Minor" is a common phrase used in financial systems. Can your software accurately distinguish between these two examples? Removing phrases from the end of the name line can be dangerous if done independently of the syntax.

What's the Surname?

Dela Smith

Dela Hoya Maria

A prefix word like Dela could also be someone's first name. Simply combining prefix type words with a word that follows can lead to incorrect standardization. In the second name line, we have last name first but the syntax is especially tricky because the line also begins with a first name.

John O Connor

John L O Connor

Mary Jane O Connor

Here the problem is how to determine the last name with certainty. While guessing will often work, guessing incorrectly will almost certainly result in a poor match, since Connor would be sorted far from O Connor.

Law Firm or Not?

John Smith and Dawn

Barnett Jones and Long

These two examples contain the same syntax. If the comma is missing as it is here, will your system be intelligent enough to distinguish between the two?

Estate Account or Realtor?

Kathy Dickins Real Estate

Kathy Dickins Estate

The ability to test phrases as single entities is extremely important under many circumstances. In the first example, two common errors would be to: identify Estate as Kathy's surname; or categorize it as an estate account, assigning Real as the surname. Either error would cause these two records to fail to match.

Individual or Business?

Katherine Organ

Sussex Organ

Tampa Branch

James Branch

Once again, the same simple syntax exists for two very different situations. A complete and accurate first name table is the only way to distinguish one from the other.

Too Many Titles!

Mr. Major Bishop Junior

All four words could be titles. Will this confuse your current software?

Snow Removal

Owen Marketing Corp Trustee IRA DTD 9/01/98

John Owen

Since only four characters in the second line are contained in the first line, applying any matching algorithm to these two examples would surely fail. To successfully match John to his company, the "snow" must first be removed, leaving the clean company name Owen Marketing Corp. Even then, Owen merely comprises 4/17th or 23.5 percent of the line. Only after determining an appropriate weighting factor for each word can these lines be accurately matched so that Owen, the only important word in the first example, can be cross-referenced to John's last name.

How Many Names?

Mr. Morris V Bates Jr. (Ben) Donna M as Trustee

Parsing this syntax is no trivial matter. Donna's last name, Bates, never appears with her first name. It's impossible to know your customers if you don't even know their names.

Complex Titles

Robert Elliot Director of Mktg

Can your system correctly identify Robert's last name? Can it place the title, Director of Mktg, in a separate fixed fielded area?

Company vs. Contact

James Garrison Charles Schwab

Charles Schwab James Garrison

Since all words match exactly but are transposed, a medium-quality system may match them successfully. However, it is doubtful that the same system will correctly separate the company name from the contact name. Advanced systems will in fact recode "Charles Schwab" to "Charles Schwab & Company." This advanced recoding should be available for all business names.

Where Is the Customer?

It should be obvious by now that naming syntaxes can present considerable challenges, yet overcoming them only solves part of the overall problem. Once you have successfully standardized your names, you must tackle these same issues with your addresses.

Dual Standardization

If your current software relies solely on a CASS certified product for address standardization, then your matching results aren't what they should be. CASS certification, by definition, applies a strict set of rules that must be adhered to. Addresses not found within the address coding guide are often not fixed fielded and therefore would not be matched to other records. In the following example,

John Smith            123 Hawthorne Ave.                 Tampa, FL 33618

John Smith             123 Hawt                Tampa, FL 33618

HAWT street does not exist in the guide and was not properly identified as a street by CASS software. Therefore these records could not be matched. The answer is to use a dual standardization method that standardizes outside the strict CASS certified rules and then utilizes certain CASS returned fields such as corrected zip codes. Using this process, the best result is chosen, field-by-field, from each system. Let's look at one more example:

24 A Federal Dr.

24A Federal Dr.

24 Federal Dr. Ste. A

Most, if not all, CASS software won't correctly identify the second syntax. So, you should internally standardize this type of data.

The examples could go on endlessly. The key point to remember is that although matching is far from an exact science, major improvements have been made in data cleansing technology. Evaluating your system's data quality and relationship linking capabilities is highly recommended. In fact, the cost to not attempt to improve your data quality issues could be significant. Proper identification and standardization of names and addresses is often overlooked, but should be the first step. 

About the Author
Datamentors

Bob Orf, president and chief executive officer of DataMentors, Inc., began his technical career in 1979 as a consultant with Time Sharing Resources, a New York-based time-sharing company. In 1987, Mr. Orf co-founded OKRA Marketing Corporation, a database marketing company. Nine years later, OKRA was acquired by John H. Harland Company. In January 1999, Mr. Orf co-founded DataMentors, working with former staff of expert programmers and analysts.

Sponsors