Using Data With Integrity

A crucial element of building and maintaining a database is data integrity – making sure your information is accurate and in the correct format, unchanged by technical glitches. You can personally verify information if you are entering it manually, but when users are filling out forms on your Web site, and that information goes directly into your database, you need another approach.

There are two places to check data validity: on the client and on the server. Checking data on the client involves adding scripts to your Web pages that examine what the users enter before they submit forms to your server. A script checks that the data in the form meets the criteria you have established. If there is a violation, the script can pop up an alert message and request a correction.

To check data on the server, the user must submit the form and let the code on your server verify the data. If there is a violation, your server can return the form again with a notation of the error that needs correction.

In general, server-side checks can be more thorough, as you can check other resources on your server to validate the information. For example, you might have a database of valid discount codes, or you might check that an email address is not already in your system. However, client-side checks provide more immediate feedback to the user, and can save iterations of submitting a form and then correcting any errors. A balanced combination may prove the best choice.

A simple check on the client side could be a matter of confirming whether the user entered any value at all.

More Sophistication

Your checks can be more sophisticated. For example, you can make sure an area code was entered as three digits. And you can see if an email address contains invalid characters like a space, or if it’s missing the @ character.

In some cases you may need to examine more than one form element at the same time, such as only validating an area code if the user has selected the US or Canada. You can add a name property to the form tag. Your JavaScript function would be passed the name of the form, let’s call it “myform,” and could reference a specific form element value like “myform.test.value.”

Once your form has passed all of its client-side checks, it gets sent to your server where more sophisticated checks can be performed. If you have a database of area codes, you can validate that the user entered a legitimate one. Or, you can ensure that a user has not already signed up for your program or for your service.

If the data you receive from a form is placed directly into a database, then it is even more important that you check the information on the server before storing it. Depending on how your server performs its database access, it is possible that users may include data-base commands in their submission that could directly affect your database.

Another example is a search form, where your server performs a search through your database. If the user can search for an email address based on a name in your maillist, and the server executes “select email from maillist where name=”<name>'”, where <name> comes directly from a form submission, then the user could get your entire maillist.

If they submit the name text “‘; select * from maillist where name matches “*”, then both selects would get executed. The first would produce no results, but the second would match every entry in your maillist. If your code looped through and displayed the results of the select, assuming that only one or two matches would occur, then this example would end up displaying your entire maillist.

In both cases the user would have to guess the name of your table, but it seems there are people out there with nothing better to do.

EDWARD ARENBERG, vice president and CTO of EPage, created one of the first fully dynamic Web sites. He manages and develops for EP.com, Epage.com and AdConnect.com.

Setting the Data Table

The last issue of Revenue gave an overview of databases and how they can be used. Let’s delve a little deeper into how you, as manager of an affiliate program, can use a database to improve your service and provide customized information for all of your affiliates.

When creating a database, the first step is to understand what information you want to record, and the important relationships among the data. Similar information is grouped into a table in the database. An affiliate will have a variety of contact information such as an email address, a postal address and perhaps even a separate payment address. All of this information could be placed into a single table. Let’s call this table affiliate_contact.

You may want to record certain accounting information about an affiliate, such as the date a sale was made, what item was sold, how much the affiliate earned and the total dollar amount generate by the sale. This information could be placed into the affiliate_contact table we already created, but we will place it instead into a new table called affiliate_sales. I’ll explain why later.

In database design, you want to create tables that group similar information and then link these tables together based on their relationships. This is where the term “relational” comes from when describing a database. Relational databases, such as Oracle, DB2, SQL Server and MySQL, provide very rich tools for extracting information based on these relationships.

Planning and mapping the information you have into tables is just the first, but perhaps most important, step in developing your database. You could change a database’s design once it is running, but if you have a lot of data, or a lot of code using the database, changes can be difficult and time consuming. So, it is worthwhile to take some time and care in planning your tables. In 10 years, my company, Epage, has gone through a few database redesigns, but there are some tables that have not changed structure since the first design.

Creating basic relationships between tables can be quite easy. Usually, it’s accomplished by having a common item such as a table column in related tables. If your affiliates all have a unique identifier, such as their contact email address, this can be used to link tables together. The affiliate_contact table and the affiliate_sales tables would both have an “email” column with the affiliate’s email address. If you want to retrieve information from both tables, like the affiliate’s first name and last sales date, you could query both tables using the same lookup key (the affiliate’s email address).

There are other ways to generate relationships among tables. We like to generate a unique number or string of characters to identify one of our users. This unique identifier is only used internally to form table relationships, and may never be seen by the user. This way, if a user needs to change their email address, it would only need to be changed in one table. In our example above, both tables, and perhaps many more, would need to be updated.

There are many reasons to break your information into multiple tables. Tables with many columns (email, address, phone, etc.) can be very difficult to manage. Database servers are designed to efficiently deliver results to your queries. But, they can get bogged down when you have a lot of columns that you might want to select from. For example, when you insert a new row, such as adding a new affiliate to the affiliate_contact table, the database must re-optimize the way it retrieves data from that table. The more columns that are in a table, the more work the database must perform.

Efficiency is another reason for multiple tables. Some tables may have only one row (entry) for each affiliate, such as the affiliate_contact table. Other tables, like the affiliate_sales table, may have many rows, one for each sale. If these two tables were combined, there would be a lot of wasted space for repeating the contact information for each sale.

Consider what unique information you want to record for each affiliate when planning your tables. You may want to know certain business information. For example, you may want to know whether the affiliate prefers to be paid by check or electronically. Or you may want to review the payment terms for certain affiliates, such as the percent of the purchase price they earn. A database can record these unique terms for each affiliate, allowing you to personalize how your program works. You might want to offer better terms to a desired affiliate or during a promotional period. When a sale is made, the percentage earned by the affiliate would be read from the database, and the result would be stored into the affiliate_sales table.

If you send multiple mailings to your affiliates, some might not want to receive all of the messages. You could store which type of messages they don’t want in the affiliate_contact table. Or, you might want to contact your top-performing affiliates. Each month you could query the affiliate_sales table to find those top performers.

Once you have the information recorded, how you use it is limited only by your imagination. You could send a special message on the anniversary of an affiliate’s signup. You could determine which affiliates had a big drop-off in month-to-month sales – perhaps they are having a problem you can solve. You could determine characteristics of your best affiliates – perhaps it’s their location – and target more like them.

Another good piece of information to record is how new affiliates found out about your program. If you use a tracking code in your advertising, you can record the code in the affiliate_contact table. Then you could determine not only how many affiliates were generated with a specific code, but how much revenue that advertising generated. One last idea to consider: If your users can refer new affiliates to you, then you could record who referred each affiliate. Offer an incentive to these users, such as a percentage of sales generated by the affiliate, and you have the potential for a huge force generating new affiliates for you, with almost no work on your part.

EDWARD ARENBERG, vice president and CTO of Epage, created one of the first fully dynamic Web sites. He manages and develops for EP.com, Epage.com and AdConnect.com.