Subscribe to DSC Newsletter

I work at an insurance company, and a process currently exists here where we remove vehicles from auto policies if we find that there’s no insurable interest in the vehicle.  To determine whether that insurable interest exists, a vehicle registration report is ordered, which identifies the individual to whom the vehicle is titled.  If that individual isn’t a driver on the policy, the case is reviewed by an underwriter and if no legitimate insurable interest is found, the vehicle is removed from the policy.  There is a clear business case for using a predictive model to help identify which cases should be reviewed by an underwriter, as the existing process (i.e., manually reviewing anything with a mismatching vehicle registration report) yields too many cases to review.  We can identify situations when vehicles have been removed for a lack of insurable interest, which will serve as the predictive model's target.


The predictive model would suggest whether it’s worth having an underwriter manually review a particular case, reducing their backlog.  However, my real concern is that by implementing such a model, the only cases where a vehicle could ever be removed will be those positively predicted by the model.  Future model refits will continue to target vehicle removals, but vehicles would only be removed in cases where the previous model gave a positive prediction.  We’ll essentially know when previous positive predictions end up true or false, but we won’t know when our negative predictions are true or false since generally they won’t be reviewed and would never result in a vehicle removal.  We can always review a sample of the negative predictions to see if our model is slipping over time, but we’ll never have the full history to re-analyze if we generally don’t pursue negative predictions.  I’m curious if this is something anyone has come across or considered.  I appreciate any insight offered!

Views: 371

Reply to This

Replies to This Discussion

What you mentioned with review of negative cases is prudent. You'll have 100% underwriter review on your positive cases, and some level of review based on random sampling for the negative cases. Just make sure you have large enough random sampling of the negative cases--you'll want to dial that in based on the specifics of the model and the risks involved.

The review process should be ongoing, but you have an extra sort of risk when the process is first implemented. Consider higher, more frequent sampling of negative results early on to make sure all was implemented as expected, dialing it back later.

If there are other audits or reviews going on as part of other processes, you may look to tie this new process in with the other audits, too (either cross-reference new process results with findings of the other audit, or add a new checkbox to the other audit to hook into your process results).

Another thing: make sure you have reporting in place, and make sure someone reviews the reporting. If reporting shows a drop in positive cases, be ready to dig into why that might be the case.

Thanks for the reply, Justin.  Our plan will certainly include auditing a sample of the negative predictions, at the very least to investigate whether the model is wavering in its ability to identify the cases that we are looking to review.  Initially, my dilemma was that in order to alleviate the strain on our underwriting department and reduce their workload, by definition we'll be intentionally setting aside a significant number of cases not to be reviewed.  The larger of a negatively-predicted sample that we audit, the less workload alleviation that we're able to provide.

That said, I wonder whether you're saying something beyond just this.  By reviewing negative cases and profiling the false negatives, we can compare those profiles to profiles of our true positives and true negatives.  This feedback would inform subsequent model refits and retrains in the future.  Am I interpreting this correctly?  Again, we'll need to make sure that the negative sample is large enough to support profiles that are credible, and balance this against underwriter workload.

I do appreciate the feedback on the various types of opportunities for audits and reviews, as well as a greater frequency immediately following deployment.  Along that line, I also wonder if a good approach would be to begin with a model that's trained where recall is prioritized just ahead of precision.  By casting a wider net at first, we ease ourselves into a reduced workload by making sure we're still finding as many actual positive cases as we can.

Jason said:


That said, I wonder whether you're saying something beyond just this.  By reviewing negative cases and profiling the false negatives, we can compare those profiles to profiles of our true positives and true negatives.  This feedback would inform subsequent model refits and retrains in the future.  Am I interpreting this correctly?  Again, we'll need to make sure that the negative sample is large enough to support profiles that are credible, and balance this against underwriter workload.


That's it exactly. So let's say during one time period we have 10,000 suspect policies, and of this input the predictive model classified 2,000 as needing underwriter review (predicting no insurance interest), the other 8,000 as not needing review (predicting insurable interest). We will have a 100% sample for the 2,000 positive predictions as all 2,000 go to an underwriter. We sample from the 8,000 negative predictions--some of those are also selected for review.  Some of sample of 8,000 will prove to be false negatives, meaning the model predicted insurable interest but we found no insurable interest. Some of the 2,000 will prove to be false positives. Now we have both types of failure from which to improve the model, and should our model go off the rails we'll see it in an increase in false negatives or false positives. This review should be ongoing.

If the model outputs some kind of score as opposed to a simple yes/no, it may indeed be best to start more conservatively, excluding only the most extreme scores from underwriter review, then dialing it up later, weighing the cost and probability of each type of mistake.

Thanks again!

Dear Jason,

I fully support all that Justin has said already, and just want to add a little remark. Since you explicitly say that you are interested in views/opinions from other fields, I suggest you have a look into the field of reject inference. This term is used in the credit industry to refer to the same type of problem you are facing. Consider a financial institution in need to decide whether to approave or reject a pending application for credit. Predictive models inform such decisions at large scale. By definition, when updating - recalibrating - the predictive modls, all labeled data available comes from applicants that have been accepted in the past, whereas no information is available on clients whose applications were rejected. I believe this problem is very similar to yours.

Some techniques how to address the problem of reject inference have been proposed; see below for a selection of papers. In general, however, it is fair to say that it is a tough problem. I am not aware of any "golden rule" how to deal with it; which brings me back to the excellent explanation from Justin.

Good luck and best regards


References on reject inferences (selection)

- I. D. Wu, and D. J. Hand, Handling selection bias when choosing actions in retail credit applications, European Journal of Operational Research 183(3) (2007) 1560-1568.

- J. Banasik, and J. Crook, Reject inference, augmentation, and sample selection, European Journal of Operational Research 183(3) (2007) 1582-1594.

- M. Bücker, M. van Kampen, and W. Krämer, Reject inference in consumer credit scoring with nonignorable missing data, Journal of Banking & Finance 37(3) (2013) 1040-1045.

- J. Banasik, and J. Crook, Reject inference in survival analysis by augmentation, Journal of the Operational Research Society 61(3) (2010) 473-485.

- Z. Li, Y. Tian, K. Li, F. Zhou, and W. Yang, Reject inference in credit scoring using Semi-supervised Support Vector Machines, Expert Systems with Applications 74 (2017) 105-114.

Thanks for these references, Stefan.  I've found it difficult googling general approaches to dealing with this issue, as most search terms send me in other directions (searching on 'bias' or 'confirmation bias' sends me into behavioral psychology topics, variations of 'true/false negative identification' just sends me to basic concepts around what confusion matrices are, etc.).  This doesn't seem to be a scenario that's so contrived or specialized that no one's ever dealt with it before, so it's got to have a name.  I'll look at these and see if 'reject inference' yields any other terms more general than for financial services.  Thank you!

I've been reading a few different sources on the topic of Reject Inference, which seems to be a pretty common moniker for the scenario I've described here.  It's almost exclusively related to credit/loan application situations, but it's close enough to be useful.

There are some different techniques and approaches for imputing results to your negative predictions, of which you otherwise have no understanding, but I'm seeing lots of disclaimers that the results aren't all that superior.  The general idea is very interesting, but ultimately it appears that the marginal improvement by incorporating these imputed results are either very slim, or a whole lot of work for nothing (my paraphrase).

A difference I see in the sources I've reviewed, relative to my own situation, is that the full picture was never known in the credit examples, where I'm able to start with the full picture but then will have that scope reduced going forward.  I think I could build an initial model based on the full picture of outcomes, then build a second model based on the incomplete picture that the model would have generated (treat negatives as unknown - ignoring whether they were true or false; treat true positives as my target).  I could compare the performance of the two models and see how much drop-off there is.  I could then play around with one of the "recommended" approaches for this problem (parcelling seems to resonate the most), and see how the results of that third model compare to the second.

Here are some of the links I found helpful.



  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service