Understanding Selection Bias

Selection bias can be defined as “a nonrandom participation in a program offer
leading to damaging financial results.” It can affect the profitability of any
product line where a group of consumers is offered products or services with
varying profitability among the customers. Selection bias may occur when consumers
have insight into the relative desirability of their individual offer based
on knowledge that the seller does not have or cannot control. For example, if
a randomly selected, unsegmented group of consumers is offered a single-price
furnace warranty, those consumers who own older, inefficient furnaces are more
likely to enroll than those whose furnaces are relatively new. If the seller
bases its price on the average age of furnaces, selection bias will damage the
warranty program’s financial performance.

Although selection bias may occur in virtually any consumer program, this paper
focuses on the fixed utility bill product because it is gaining significant
acceptance in the marketplace and it can be significantly affected by selection
bias. A fixed bill is a predetermined, guaranteed utility bill that provides
the consumer a known monthly payment amount with no adjustments or true-ups.
These offers are generally customized to reflect the usage patterns of the individual

Fixed bill providers use mathematic techniques with varying degrees of sophistication
to generate consumer offers. Selection bias is introduced by imperfections in
the data and techniques used to generate the consumer quotes. For the purposes
of this paper, a “perfect quote” is defined as the amount that an average consumer
would expect to be charged for this product, whether or not they are willing
to accept the offer. In almost any offer, there are consumers who will reject
an offer and there are those who will accept an offer. Therefore, “perfect”
is defined with respect to average customer expectations and, in this study,
unlike in the real world, quotes are the same for all customers.

Case 1: No Selection Bias

Fifty consumers are offered perfect fixed bill quotes of $1,000, each with
a $200 margin. Assuming 100 percent participation, the program revenue is $50,000
with $10,000 of expected margin.

A second set of 50 consumers is offered fixed bill quotes generated with inaccurate
data and/or poor modeling techniques. The quotes are randomly priced but maintain
an average $200 margin per customer. Assuming everyone accepts their offer from
these quotes, the expected margin is still $10,000 and the financial results
do not change.

From this example, we can see that selection bias does not occur either with
perfect quotes or with 100 percent acceptance.

Case 2: Selection Bias

For the two scenarios in Table 1, assume penetration is less than 100 percent.
In the first scenario, penetration is flat – the same level of acceptance occurs
at all quote amounts. In the second scenario, the acceptance rate decreases
as the quote amounts go up. It is reasonable to assume that quotes that are
lower than they should be are more likely to be accepted than quotes that are
higher than they should be. The flat scenario illustrates results with no selection

The selection bias in the decreasing acceptance rate scenario yields a shortfall
in margin of $1,000 over the flat penetration scenario. This shortfall means
that margins will be $1,000 lower than anticipated in any weather condition.
(Misquoting does not have an impact on the cost to serve the customer.) Selection
bias only occurs with imperfect quoting and penetration that is related to the
price of the quote.

Accuracy in quotes yields a tighter relationship between price and margin and
a higher likelihood of evenly distributed quote acceptance. Therefore, the more
accurately a quoting model predicts, the lower the selection bias will be.

How do you determine if selection bias has occurred? It can only be detected
after the fact by looking at program results. It cannot be predicted prior to
marketing since, by definition, selection bias results when consumers have knowledge
that sellers do not have.

In a fixed bill program, selection bias can be recognized by studying the difference
between the consumers’ actual energy usage and the usage predicted by the quoting
models, given the actual weather that occurred. The difference between the calculated
quote and the actual bills is called non-weather variance. Non-weather variance
has many components including selection bias, model error, behavior change due
to fixed bill program participation and behavior change in the normal population,
such as the purchase of more efficient equipment. If the nonweather variance
is consistently small over several years, selection bias has been low. Low selection
bias minimizes the adverse financial impact, particularly in a fixed bill program.

Model Quality Scenarios

In the furnace warranty example, modeling can be used to create customized
individual prices offered to the consumers, or to set the overall pricing for
tightly segmented groups of customers equal to that group’s risk.

The critical success factor in implementing a consumer fixed bill program and
ensuring low selection bias is the use of precise predictive models. With precision
models, the total non-weather variance over a longrunning fixed bill program
can be near zero. Such precision modeling can eliminate the need for the large
price adders to cover selection bias, which are usually required by less precise
modeling techniques. Without precision modeling, consumers are forced to pay
higher costs to cover the risks associated with selection bias.

Four scenarios follow to illustrate the difference that model quality can make
in reducing selection bias. The results described in each of these scenarios
were developed using a highly simplified simulation. Each scenario was run assuming
5,000 consumers and using Monte Carlo methods to determine if each consumer
accepts or declines the offer. In each scenario, two different predictive models
are used. Model 2 uses a more accurate prediction method, as demonstrated by
the lower standard deviation associated with the individual quotes. In addition,
Model 2 is assumed to have no predictive biases. The results for each scenario
show the impact of selection bias that results from the less accurate Model

Scenario 1: Average Quotes Are Attractively Priced

In this scenario, both models produce quotes that are in a range that is reasonably
attractive to consumers.

Scenario 1 assumptions:

  • The perfect quote is $1,000 for all consumers.
  • Model 1 quotes an average $1,000 with a 5 percent standard deviation ($50).
  • Model 2 quotes an average $1,000 with a 0.5 percent standard deviation ($5).
  • Costs are $800.
  • Acceptance rate is linear.

    – If the quote is 40 percent of a perfect quote, the acceptance rate is
    10 percent.

    – If the quote is 160 percent of a perfect quote, the acceptance rate is
    0 percent.

As shown in Table 2, the cost associated with selection bias is $1,074. The
difference between the models’ margin is the selection bias difference – in
this scenario 1.9 percent. A positive difference indicates a cost from using
Model 1 versus Model 2. The percentage is the difference divided by the Model
2 margin. The difference in Scenario 1 results entirely from the higher standard
deviation of Model 1.

Scenario 2: Quotes Are Higher Than Perfect

  • Perfect quote is $1,000 for all consumers.
  • Model 1 quotes an average $1,300 with a 5 percent standard deviation ($65).
  • Model 2 quotes an average $1,300 with a 0.5 percent standard deviation ($6.50).
  • Costs are $1100.
  • Acceptance rate is as stated in Scenario 1.

Table 2 shows the cost of selection bias is $3,026. The differences in model
accuracy are the same as in Scenario 1. However, note the significant increase
in selection bias risk to 12.4 percent. The further the price quotes deviate
from what consumers view as expected or fair prices, the more selection bias
is magnified and the greater the adverse impact with a less precise predictive

Scenario 3: Model Bias Predicts Higher Than Perfect

In this scenario, there is a significant modeling bias (a structural problem
in the model) in Model 1 that results in higher quotes than with Model 2, which
has no bias. (Ordinary least square regression models inherently have such data-induced
modeling biases. There are many other causes of modeling bias; these causes
are beyond the scope of this paper.)

Scenario 3 assumptions:

  • Perfect quote is $1,000 for all consumers.
  • Model 1 quotes an average $1,300 with a 5 percent standard deviation ($65).
  • Model 2 quotes an average $1,000 with a 0.5 percent standard deviation ($5).
  • Costs are $800.
  • Acceptance rate is as stated in Scenario 1.

As shown in Table 2, there is a gain associated with selection bias of $7,599.
In this case, inaccuracy in modeling actually produces favorable results for
the seller. There is a 14 percent gain where the less accurate Model 1 actually
produced more value due to the over pricing of the offers. This year’s bias
worked in the seller’s favor due to a favorable temperature bias. However, as
the next scenario will illustrate, this type of model bias can cause significant
swings from year to year. The swings are not symmetrical, with losses being
significantly larger than gains.

Scenario 4: Model Bias Predicts Lower Than Perfect

It is the Scenario 4 underprediction case in which selection bias is most dangerous
with wide standard deviation models. In this final scenario, Model 1 has the
same bias as in the Scenario 3 but, due to differing data input such as temperature
changes, it produces quotes lower than those of the more accurate Model 2.

Scenario 4 assumptions:

  • Perfect quote is $1,000 for all consumers.
  • Model 1 quotes an average $700 with a 5 percent standard deviation ($35).
  • Model 2 quotes an average $1,000 with a 0.5 percent standard deviation ($5).
  • Costs are $800.
  • Acceptance rate is as stated in Scenario 1.

In this scenario, Table 2 shows that Model 1 predictions result in a 180 percent
loss of $95,740. When such biased models underpredict offers, the impact is
significantly greater because selection bias increases. While Scenario 3 shows
that in some years the bias may produce favorable results, the results from
negative years are much greater in magnitude.

These scenarios clearly illustrate how model quality can create selection bias
and ultimately be detrimental to a program’s financial results.

Some consultants suggest that by limiting offers to high R-square (a statistical
measure of goodness-of-fit) quotes, the effects of selection bias can be reduced.
A model R-square of .995 indicates that the model is 99.5 percent descriptive.
However, the .995 R-square does not mean that the model is predictively accurate
to within one-half percent. Correlation is not causation.

For example, you can build a relatively high R-square regression relationship
between northern hemisphere gas consumption and hours of sunlight in New Zealand.
This does not indicate that sunlight in the Southern Hemisphere causes consumption
in the Northern Hemisphere. It is just that the hours of sunlight correspond
to northern seasonality. In this case, the R-square of the model is high – the
model is descriptive but has almost no predictive power.

Predictive accuracy and the standard deviation of blind prediction is a completely
different topic and is unrelated to R-square. Predictive accuracy, rather than
the descriptive power of the model, is the important issue in consumer programs
such as fixed bill products.


To revisit the original furnace warranty case, a seller preparing such a furnace
warranty offer would be wise to carefully analyze the age and other characteristics
of furnaces in the potential consumer base to segment the customer base and
to price the offers.

To mitigate selection bias, the seller should replace poor modeling techniques
with high-quality, predictive models and eliminate data inaccuracies. The seller
may use Monte Carlo simulation to understand selection bias and risk profiles
relating to its specific product.

Selection bias results from random margin levels by customer and from the individual
consumer having better knowledge than the seller. If unchecked, selection bias
can significantly impact the financial performance of a consumer program.