Subscribe to DSC Newsletter

How to do inverse prediction in data science/machine learning

In Machine learning, we usually train models to predict X->Y. For example, a dataset with 20 input features X = (x0, x1, ... x19) and 3 output variables Y = (y0, y1, y2). The number of training/test data usually small, such as <1000 items or even <100 in the training set.

But in industry, the problem I have is that I want to know "how should I set X that I can get Y in a specific range or value"

Does anyone know how to solve this kind of "inverse prediction"? There are two ways in my mind:

  1. Train the model in the normal way: X-->Y, then set a dense mesh in the high dimension X space. In this example, it is 20 dimensions. Then use all the point in this mesh as input data and throw them to the trained model. Select all the input points where the predicted y1 > 100. Finally, use some methods, such as clustering to look for some patterns in the selected data points.
  2. Direct learn models from Y to X. Then, set a dense mesh in the high dimension Y space, where let y1 > 100. Then use the trained models to calculate the X data points.

The second method might be OK when the Y also have high dimensions. But usually, in my application, Y is very low-dimension and X is very high-dimension, which makes me think method 2 is not very practical.

Does anyone have any new thoughts? I think this should be somehow very common in industry and maybe some people meet similar situation before.

Thank you!

Views: 725

Reply to This

Replies to This Discussion

Perhaps use Bayesian Optimization, where you solve to get the inputs to get you closest to the targeted Y value?

RSS

Videos

  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service