Understanding Selection Bias by mThink, May 23, 2005 Selection bias can be defined as a nonrandom participation in a program offer leading to damaging financial results. It can affect the profitability of any product line where a group of consumers is offered products or services with varying profitability among the customers. Selection bias may occur when consumers have insight into the relative desirability of their individual offer based on knowledge that the seller does not have or cannot control. For example, if a randomly selected, unsegmented group of consumers is offered a single-price furnace warranty, those consumers who own older, inefficient furnaces are more likely to enroll than those whose furnaces are relatively new. If the seller bases its price on the average age of furnaces, selection bias will damage the warranty programs financial performance. Although selection bias may occur in virtually any consumer program, this paper focuses on the fixed utility bill product because it is gaining significant acceptance in the marketplace and it can be significantly affected by selection bias. A fixed bill is a predetermined, guaranteed utility bill that provides the consumer a known monthly payment amount with no adjustments or true-ups. These offers are generally customized to reflect the usage patterns of the individual consumers. Fixed bill providers use mathematic techniques with varying degrees of sophistication to generate consumer offers. Selection bias is introduced by imperfections in the data and techniques used to generate the consumer quotes. For the purposes of this paper, a perfect quote is defined as the amount that an average consumer would expect to be charged for this product, whether or not they are willing to accept the offer. In almost any offer, there are consumers who will reject an offer and there are those who will accept an offer. Therefore, perfect is defined with respect to average customer expectations and, in this study, unlike in the real world, quotes are the same for all customers. Case 1: No Selection Bias Fifty consumers are offered perfect fixed bill quotes of $1,000, each with a $200 margin. Assuming 100 percent participation, the program revenue is $50,000 with $10,000 of expected margin. A second set of 50 consumers is offered fixed bill quotes generated with inaccurate data and/or poor modeling techniques. The quotes are randomly priced but maintain an average $200 margin per customer. Assuming everyone accepts their offer from these quotes, the expected margin is still $10,000 and the financial results do not change. From this example, we can see that selection bias does not occur either with perfect quotes or with 100 percent acceptance. Case 2: Selection Bias For the two scenarios in Table 1, assume penetration is less than 100 percent. In the first scenario, penetration is flat the same level of acceptance occurs at all quote amounts. In the second scenario, the acceptance rate decreases as the quote amounts go up. It is reasonable to assume that quotes that are lower than they should be are more likely to be accepted than quotes that are higher than they should be. The flat scenario illustrates results with no selection bias. The selection bias in the decreasing acceptance rate scenario yields a shortfall in margin of $1,000 over the flat penetration scenario. This shortfall means that margins will be $1,000 lower than anticipated in any weather condition. (Misquoting does not have an impact on the cost to serve the customer.) Selection bias only occurs with imperfect quoting and penetration that is related to the price of the quote. Accuracy in quotes yields a tighter relationship between price and margin and a higher likelihood of evenly distributed quote acceptance. Therefore, the more accurately a quoting model predicts, the lower the selection bias will be. How do you determine if selection bias has occurred? It can only be detected after the fact by looking at program results. It cannot be predicted prior to marketing since, by definition, selection bias results when consumers have knowledge that sellers do not have. In a fixed bill program, selection bias can be recognized by studying the difference between the consumers actual energy usage and the usage predicted by the quoting models, given the actual weather that occurred. The difference between the calculated quote and the actual bills is called non-weather variance. Non-weather variance has many components including selection bias, model error, behavior change due to fixed bill program participation and behavior change in the normal population, such as the purchase of more efficient equipment. If the nonweather variance is consistently small over several years, selection bias has been low. Low selection bias minimizes the adverse financial impact, particularly in a fixed bill program. Model Quality Scenarios In the furnace warranty example, modeling can be used to create customized individual prices offered to the consumers, or to set the overall pricing for tightly segmented groups of customers equal to that groups risk. The critical success factor in implementing a consumer fixed bill program and ensuring low selection bias is the use of precise predictive models. With precision models, the total non-weather variance over a longrunning fixed bill program can be near zero. Such precision modeling can eliminate the need for the large price adders to cover selection bias, which are usually required by less precise modeling techniques. Without precision modeling, consumers are forced to pay higher costs to cover the risks associated with selection bias. Four scenarios follow to illustrate the difference that model quality can make in reducing selection bias. The results described in each of these scenarios were developed using a highly simplified simulation. Each scenario was run assuming 5,000 consumers and using Monte Carlo methods to determine if each consumer accepts or declines the offer. In each scenario, two different predictive models are used. Model 2 uses a more accurate prediction method, as demonstrated by the lower standard deviation associated with the individual quotes. In addition, Model 2 is assumed to have no predictive biases. The results for each scenario show the impact of selection bias that results from the less accurate Model 1. Scenario 1: Average Quotes Are Attractively Priced In this scenario, both models produce quotes that are in a range that is reasonably attractive to consumers. Scenario 1 assumptions: The perfect quote is $1,000 for all consumers. Model 1 quotes an average $1,000 with a 5 percent standard deviation ($50). Model 2 quotes an average $1,000 with a 0.5 percent standard deviation ($5). Costs are $800. Acceptance rate is linear. If the quote is 40 percent of a perfect quote, the acceptance rate is 10 percent. If the quote is 160 percent of a perfect quote, the acceptance rate is 0 percent. As shown in Table 2, the cost associated with selection bias is $1,074. The difference between the models margin is the selection bias difference in this scenario 1.9 percent. A positive difference indicates a cost from using Model 1 versus Model 2. The percentage is the difference divided by the Model 2 margin. The difference in Scenario 1 results entirely from the higher standard deviation of Model 1. Scenario 2: Quotes Are Higher Than Perfect Perfect quote is $1,000 for all consumers. Model 1 quotes an average $1,300 with a 5 percent standard deviation ($65). Model 2 quotes an average $1,300 with a 0.5 percent standard deviation ($6.50). Costs are $1100. Acceptance rate is as stated in Scenario 1. Table 2 shows the cost of selection bias is $3,026. The differences in model accuracy are the same as in Scenario 1. However, note the significant increase in selection bias risk to 12.4 percent. The further the price quotes deviate from what consumers view as expected or fair prices, the more selection bias is magnified and the greater the adverse impact with a less precise predictive model. Scenario 3: Model Bias Predicts Higher Than Perfect In this scenario, there is a significant modeling bias (a structural problem in the model) in Model 1 that results in higher quotes than with Model 2, which has no bias. (Ordinary least square regression models inherently have such data-induced modeling biases. There are many other causes of modeling bias; these causes are beyond the scope of this paper.) Scenario 3 assumptions: Perfect quote is $1,000 for all consumers. Model 1 quotes an average $1,300 with a 5 percent standard deviation ($65). Model 2 quotes an average $1,000 with a 0.5 percent standard deviation ($5). Costs are $800. Acceptance rate is as stated in Scenario 1. As shown in Table 2, there is a gain associated with selection bias of $7,599. In this case, inaccuracy in modeling actually produces favorable results for the seller. There is a 14 percent gain where the less accurate Model 1 actually produced more value due to the over pricing of the offers. This years bias worked in the sellers favor due to a favorable temperature bias. However, as the next scenario will illustrate, this type of model bias can cause significant swings from year to year. The swings are not symmetrical, with losses being significantly larger than gains. Scenario 4: Model Bias Predicts Lower Than Perfect It is the Scenario 4 underprediction case in which selection bias is most dangerous with wide standard deviation models. In this final scenario, Model 1 has the same bias as in the Scenario 3 but, due to differing data input such as temperature changes, it produces quotes lower than those of the more accurate Model 2. Scenario 4 assumptions: Perfect quote is $1,000 for all consumers. Model 1 quotes an average $700 with a 5 percent standard deviation ($35). Model 2 quotes an average $1,000 with a 0.5 percent standard deviation ($5). Costs are $800. Acceptance rate is as stated in Scenario 1. In this scenario, Table 2 shows that Model 1 predictions result in a 180 percent loss of $95,740. When such biased models underpredict offers, the impact is significantly greater because selection bias increases. While Scenario 3 shows that in some years the bias may produce favorable results, the results from negative years are much greater in magnitude. These scenarios clearly illustrate how model quality can create selection bias and ultimately be detrimental to a programs financial results. Some consultants suggest that by limiting offers to high R-square (a statistical measure of goodness-of-fit) quotes, the effects of selection bias can be reduced. A model R-square of .995 indicates that the model is 99.5 percent descriptive. However, the .995 R-square does not mean that the model is predictively accurate to within one-half percent. Correlation is not causation. For example, you can build a relatively high R-square regression relationship between northern hemisphere gas consumption and hours of sunlight in New Zealand. This does not indicate that sunlight in the Southern Hemisphere causes consumption in the Northern Hemisphere. It is just that the hours of sunlight correspond to northern seasonality. In this case, the R-square of the model is high the model is descriptive but has almost no predictive power. Predictive accuracy and the standard deviation of blind prediction is a completely different topic and is unrelated to R-square. Predictive accuracy, rather than the descriptive power of the model, is the important issue in consumer programs such as fixed bill products. Conclusion To revisit the original furnace warranty case, a seller preparing such a furnace warranty offer would be wise to carefully analyze the age and other characteristics of furnaces in the potential consumer base to segment the customer base and to price the offers. To mitigate selection bias, the seller should replace poor modeling techniques with high-quality, predictive models and eliminate data inaccuracies. The seller may use Monte Carlo simulation to understand selection bias and risk profiles relating to its specific product. Selection bias results from random margin levels by customer and from the individual consumer having better knowledge than the seller. If unchecked, selection bias can significantly impact the financial performance of a consumer program. Filed under: White Papers Tagged under: Utilities