.

# ML Algorithm selection for many Timestamp features

I’ve a data set with many Timestamp features like:

• Employee duty report time
• Employee duty release time
• Employee work start time
• Employee work end time
• Employee rest start time
• Employee rest end time, etc

We currently have a Java application that uses this data to tell if it is legal for an employee to work on a given day or not. The work is not continuous, some days employee work only 2 hours and other days they work straight 8 hours based on various shift assignment rules. If an employee worked continuously for 8 hours then they need to take at least 8 hours of rest before they can start to work again.

We feed 30 days of data to a Java Rule Engine to find out if an employee is legal to work or not. The Java Rule Engine uses many rules like if an employee worked 8 hours in one day or multiple days to tell if he/she is legal to work or not. From the Java Rule Engine I collected all these data into two sets one with legal to work and other is not legal.

Now I want to use ML algorithm to the labeled data (legal and not legal) to find if an employee is legal or not. What is the best approach to model the problem? Can this be time series, logistic regression or anomaly detection?

Views: 782

### Replies to This Discussion

Ummm... err... a fundament of Machine Learning is that 100% is impossible, and you expect to have a failure rate. In this use case, is a potential lawsuit the consequence of a failure? If so, rule engines are the right technology.

But that was not your question :) This is a sequential problem with state. You would need a probabilistic time series analyzer. If you want to try deep learning, LSTM/GRU or this technology might be the right tool:

Thanks for responding to my question.

I tried sklearn Decision Tree and got 97% accuracy. I am trying to see if I can get better result, I'll try your suggestion. Do you think Time Series will give better accuracy rate than Decision Tree for this use case?

Lance Norskog said:

Ummm... err... a fundament of Machine Learning is that 100% is impossible, and you expect to have a failure rate. In this use case, is a potential lawsuit the consequence of a failure? If so, rule engines are the right technology.

But that was not your question :) This is a sequential problem with state. You would need a probabilistic time series analyzer. If you want to try deep learning, LSTM/GRU or this technology might be the right tool: