Subscribe to DSC Newsletter

What method/model should I use for this parameter fitting problem?

I am running analysis on data for this type of sensor my company makes. I want to quantify the health of the sensor based on three features using the following formula:

sensor health index = feature1 * A + feature2 * B + feature3 *C

We also need to pick a threshold so that if this index exceeds the threshold, the sensor is considered as bad sensor.

We only have a legacy list which shows about 100 sensors are bad. But now we have data for more than 10,000 sensors. Anything not in that 100 sensor list is NOT necessarily "bad". So I guess the linear regression methods don't work in this scenario.

The only way I can think of is the brute force fitting. Pseudo code is as follows:

# class definition for params(coefficients)
class params{
  a
  b
  c
  th
}


# dictionary of parameter and accuracy rate
map = {}

for thold in range (1..20):
   for a in range (1..10):
      for b in range (1..10):
        for b in range (1..10):
           # bad sensor list
           bad_list = []
           params = new params[a, b, c, thold]
           for each sensor:
             health_index = sensor.feature1*a+sensor.feature2*b+sensor.feature3*c
             if health_index > thold:
               bad_list.append(sensor.id)
           accuracy = percentage of common sensors between bad_list and known_bad_sensors
           map[params] = accuracy

# rank params based on accuracy
rank(map)
# the params with most accuracy is the best model
print map.index(0)

I really don't like this method since it is using 5 for loops which is very inefficient. But the thing is that 100 bad sensor list is all I got. There is no way to get more labeled data point including the "good" ones. I wonder if there is a better way to do it. Using something from existing library such as sk-learn perhaps?

 

Views: 121

Reply to This

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service