The Art of Relationship Matching – Identifying Key Data Elements
Let's assume a clean, well-matched customer/prospect database is the basis for all successful CRM activity. Let's also assume we understand that truly knowing your customer is the first step in solving the challenges that confront any business. Surprisingly, we have found that even those who agree that optimal relationship matching is important still lack a clear understanding of just how important it is, or of the substantial costs of a substandard database. When they do take the time to examine the missed opportunities, they are amazed. Not only are opportunities lost from mistargeted marketing, but existing good customer relationships are jeopardized by continued solicitation of products and services already owned. In addition, unnecessary costs are incurred from wasted and duplicate mailings. As you strive to achieve perfection, major improvements have been made in today's data cleansing technology to realize a higher level of accuracy.
Now let's take a look at some of the challenges that lie between you and the perfect database, and provide a basis for you to evaluate your current database. Our examples reflect the primary marketing unit as a household, composed of individuals owning one or more accounts. Using this database, the goal is to define the issues so you can better understand them and use them to see where and how your own database should be improved. Weve identified key questions to ask when evaluating data quality and relationship linking solutions.
The first chore is to find the data. Once the data is found, the job of cleaning the data will need to be tackled.
Finding the Data
While finding the data may seem obvious and appear to be deceptively easy to an outsider, it is not. Often, especially on older legacy systems, there are six or seven lines of name and address information. The first challenge is finding the name or names, the street address line or lines, and the city, state, zip, or post code. Is the name an individual or company? Are we even dealing with a U.S. address?
Who is the Customer?
Once the name and address lines have been properly identified, the name lines must be analyzed. Syntaxes are used for this identification process. A syntax is a definition or code that reflects each word on a line and how it relates to other words on the same line. Many simple syntaxes will be obvious. The syntaxes that follow present several different types of challenges.
Is Robert a "Minor?"
Robert Smith A Minor
Robert A Minor
The phrase "A Minor" is a common phrase used in financial systems. Can your software accurately distinguish between these two examples? Removing phrases from the end of the name line can be dangerous if done independently of the syntax.
What's the Surname?
Dela Smith
Dela Hoya Maria
A prefix word like Dela could also be someone's first name. Simply combining prefix type words with a word that follows can lead to incorrect standardization. In the second name line, we have last name first but the syntax is especially tricky because the line also begins with a first name.
John O Connor
John L O Connor
Mary Jane O Connor
Here the problem is how to determine the last name with certainty. While guessing will often work, guessing incorrectly will almost certainly result in a poor match, since Connor would be sorted far from O Connor.
Law Firm or Not?
John Smith and Dawn
Barnett Jones and Long
These two examples contain the same syntax. If the comma is missing as it is here, will your system be intelligent enough to distinguish between the two?
Estate Account or Realtor?
Kathy Dickins Real Estate
Kathy Dickins Estate
The ability to test phrases as single entities is extremely important under many circumstances. In the first example, two common errors would be to: identify Estate as Kathy's surname; or categorize it as an estate account, assigning Real as the surname. Either error would cause these two records to fail to match.
Individual or Business?
Katherine Organ
Sussex Organ
Tampa Branch
James Branch
Once again, the same simple syntax exists for two very different situations. A complete and accurate first name table is the only way to distinguish one from the other.
Too Many Titles!
Mr. Major Bishop Junior
All four words could be titles. Will this confuse your current software?
Snow Removal
Owen Marketing Corp Trustee IRA DTD 9/01/98
John Owen
Since only four characters in the second line are contained in the first line, applying any matching algorithm to these two examples would surely fail. To successfully match John to his company, the "snow" must first be removed, leaving the clean company name Owen Marketing Corp. Even then, Owen merely comprises 4/17th or 23.5 percent of the line. Only after determining an appropriate weighting factor for each word can these lines be accurately matched so that Owen, the only important word in the first example, can be cross-referenced to John's last name.
How Many Names?
Mr. Morris V Bates Jr. (Ben) Donna M as Trustee
Parsing this syntax is no trivial matter. Donna's last name, Bates, never appears with her first name. It's impossible to know your customers if you don't even know their names.
Complex Titles
Robert Elliot Director of Mktg
Can your system correctly identify Robert's last name? Can it place the title, Director of Mktg, in a separate fixed fielded area?
Company vs. Contact
James Garrison Charles Schwab
Charles Schwab James Garrison
Since all words match exactly but are transposed, a medium-quality system may match them successfully. However, it is doubtful that the same system will correctly separate the company name from the contact name. Advanced systems will in fact recode "Charles Schwab" to "Charles Schwab & Company." This advanced recoding should be available for all business names.
Where Is the Customer?
It should be obvious by now that naming syntaxes can present considerable challenges, yet overcoming them only solves part of the overall problem. Once you have successfully standardized your names, you must tackle these same issues with your addresses.
Dual Standardization
If your current software relies solely on a CASS certified product for address standardization, then your matching results aren't what they should be. CASS certification, by definition, applies a strict set of rules that must be adhered to. Addresses not found within the address coding guide are often not fixed fielded and therefore would not be matched to other records. In the following example,
John Smith 123 Hawthorne Ave. Tampa, FL 33618
John Smith 123 Hawt Tampa, FL 33618
HAWT street does not exist in the guide and was not properly identified as a street by CASS software. Therefore these records could not be matched. The answer is to use a dual standardization method that standardizes outside the strict CASS certified rules and then utilizes certain CASS returned fields such as corrected zip codes. Using this process, the best result is chosen, field-by-field, from each system. Let's look at one more example:
24 A Federal Dr.
24A Federal Dr.
24 Federal Dr. Ste. A
Most, if not all, CASS software won't correctly identify the second syntax. So, you should internally standardize this type of data.
The examples could go on endlessly. The key point to remember is that although matching is far from an exact science, major improvements have been made in data cleansing technology. Evaluating your system's data quality and relationship linking capabilities is highly recommended. In fact, the cost to not attempt to improve your data quality issues could be significant. Proper identification and standardization of names and addresses is often overlooked, but should be the first step.

