The Trusted Guide to Marketing Thought Leadership

Customer Data Quality: Beyond the Bucket and Broom


mThink Knowledge's picture

mThink Knowledge - Posted on 29 October 2002

Printer-friendly versionSend to friend
Authored by: 
Doug Laney;
PDF File: 
META Group
The competitive need to extend available customer data requires data quality solutions that transcend mere batch-oriented data cleansing.

During 2001 and 2002, leading enterprises began managing and leveraging information as tangible assets. Over the next two years, evolving accounting and auditing principles will engender accepted information valuations and tags for purposes of origination, privacy, quality, usage, and so forth. By 2005, these changes will drive online intellectual capital marketplaces and information service "banks."

CRM implementations continue to be limited by data integration and quality issues, which are more often stumbled upon rather than anticipated or planned for, during the course of development. So pervasive are data-related issues that data quality ranks in the top 35 percent of project management concerns and data integration is the top architectural challenge noted by users in META Group's latest data warehouse and analytics industry study. Still, the bulk of data quality solutions that are implemented fail to leverage the variety of solutions available to achieve competitive advantage.

Indeed, attention to data quality is considered one of IT's many "necessary evils." And being diligent about data quality can seem like janitorial work far removed from generating business value. But the truth is contrary to this notion. Incremental improvements, in particular with customer data, lead to significant business performance gains in each phase of the customer lifecycle — engage, transact, fulfill, service — by improving prospecting, transaction and fulfillment accuracy, personalization, and customer satisfaction. Just as data warehousing and/or analytic solutions have become a common denominator for enhanced business performance, data quality practices, techniques, or technologies will become embedded into 95 percent of all CRM and e-business initiatives in the near future. By 2003, leading packaged operational, analytical, and information management solutions, such as middleware, information logistics, and metadata, will evolve to embody a variety of data quality capabilities through company acquisitions and partnership. Following in the wake of these mergers (2004-2005) will be a noticeable (and likely off-shore) black market for private household and business information.

While data quality needs are typically believed to only deal with data accuracy issues, myriad types of other data quality issues persist that demand their own distinct solutions. Enterprises typically require data quality solutions that fall into more than one of these four patterns — validation, standardization, correction, enrichment — and similarly, vendors offer a mix of capabilities within each pattern (Figure 1).

Figure 1 — Data Quality Solution Providers

Data Quality Patterns

Data Validation

At any point in an organization's information supply chain, data is subject to injected correctness, completeness, and integrity errors. Particularly during the course of a manually-entered transaction, data should be parsed, matched, and confirmed against an authoritative source — either an internal master database or an information content provider. Parsing identifies tokens like surname or postal code, while matching performs a lookup against an existing source, and confirmation completes the validation process by applying business rules or templates that indicate its degree of fitness to continue flowing through the information supply chain.

Data Standardization

The ever-increasing variety of data sources flowing into organizations drives the need for robust functions that transform validated data into enterprise-accepted and application-digestible formats. However, even two XML documents adhering to the same data type definition structure may differ in scale, precision, or even vary in format. During the standardization process, tokens are rearranged, reformatted, and/or integrated into defined templates. For example, converting all address data into a four-line format.

Data Correction

A second offshoot of the validation process involves repairing data that is determined to be wrong, such as misspelled, transposed, out-of-date, or otherwise inaccurate information. To fix this corporate "flat tire," enterprises must often select between two distinct methods of data correction: heuristic, which applies an intelligent repair process; and lookup, which replaces values that are believed to be more correct based on established "survivorship" rules.

Data Enrichment

For both users and vendors, advancing into the seemingly infinite data quality frontier requires the extension and expansion of existing data. More and more, enterprises are looking to their business partners, industry organizations, and information content providers to enrich their stores of customer data. Data enrichment through extension most often takes the form of list generation (e.g., "households with characteristics akin to our most profitable customers," or general demographic/spatial/census data), while enrichment via expansion includes everything from completing missing data, to tacking on syndicated geographic, household, or postal fields (for example, using barcodes for discounted mailing).

Users planning data quality solutions must not only consider which data quality pattern(s) they need to apply and where, but also determine the overall characteristics of these solutions.

Prominent enterprise-class customer data quality solutions for one or more patterns include those from Firstlogic, Group 1, Trillium Software, and Vality; while those targeting mid-tier, departmental, or vertical solutions include Arkidata, Data Mentors, DataFlux/SAS and Sagent. Through 2002 and 2003, users should expect vastly improved partnering (including M&A) by data quality providers with other information supply chain component vendors (e.g., ETLM, EAI, data profiling, business intelligence/analytic applications, data mining, DBMS) along with an array of data quality-related ASP offerings, and ICPs serving-up a limitless palate of certified, privatized, aggregated, industry-specific, benchmark, and unstructured information.

Data Quality Solution Characteristics and Concerns

Latency

E-business applications, for example, generally require real-time execution in all four areas, whereas sales/marketing applications may find batch execution more cost-effective.

Customer class

Business data quality solutions in B2B applications are often entirely distinct from consumer/household (B2C) solutions.

Globalization

Enterprises doing business outside the United States or North America must consider data quality solutions that can expressly handle other countries’ name/address idiosyncrasies and provide access to international postal files.

Auditing

Each of the four data quality patterns involves a process that may require auditing of its logic used to make a validity determinations and record modifications.

Platform Support

Some premier data quality solutions still require flat file input (i.e., no DBMS support), which may be time/cost prohibitive, or require staging data to a supported computing environment.

Tool Integration

Some Extract-Transform-Load (ETL) vendors offer hooks into one or more data quality products.

Application Integration

Many data quality products are offered in the form of one or more type of API or object (e.g., COM+, EJB, CORBA, JANA/JNI) to interoperate with production systems rather than be executed in a standalone manner.

Synchronization and Redundancy

How to manage the flow of standardized, corrected, and enriched data to each place it may be replicated, and/or how to eliminate unnecessary duplicates.

Privacy

With the power of some data enrichment solutions, enterprises are increasingly finding that they can triangulate to derive customer information that has not been explicitly offered by the customer.

Business Impact and the Bottom Line

Formal data quality practices are no longer optional for maintaining sufficient levels of business performance and managing operational costs.

Commonplace data quality solutions have moved beyond simple cleansing functions into the overwhelming need to enrich enterprise information assets. IT organizations are ill-advised in attempting to hand-code data quality solutions or use tools not explicitly suited for the purpose (e.g., ETLM, EAI, DBMS triggers), and should plan accordingly for selecting and integrating specific technologies to handle each type of data quality need.

About the Author
META Group
Doug Laney is a vice president with META Group’s Application Delivery Strategies service. Mr. Laney is an experienced data warehouse practitioner and author on business intelligence information architecture, decision support system project methodology, consulting practice management, information valuation, and data warehouse development tools.

Sponsors