Hi, I am new to analytics and studying at college is totally different then what really industry want form data engineer.

I am summer intern at IDRC, we have a cab company as our client we want to run analytics on their

data to increase their revenue.

We have to types of data.

1. GPS data polled from 100+ taxis (position data), and

2. Client’s call-centre data (demand data).

since, bookings with immediate requirements are easy to execute or deny with high certainty (you either have a cab available, or you don’t have it). The tricky bit is committing to bookings which ask for travel some time away like 4 hours or 6 hours or 8 hours from the time of booking.

We want to create a probability dashboard for such bookings. So let’s say we begin with a number of 60% certainty, and with‘factors’ changing, this number should change in real time (for each booking). Suppose we see the number going below a threshold, let’s say 30% (a red zone), we should be able to take an ‘intervening action’.

Can anybody tell me how proceed to make a algorithm on this type of data. I have attached excel sheet which contains data.

