Driving Behaviour as a Telematic Fingerprint

The objective of my final project at Metis from weeks 9 to 12, is to categorize drivers based on their behaviour on the roads - their driving style and the type of roads that they follow.

The challenge associated with this objective is to identify uniquely a driver (and hence his proper “driving behaviour”) based on the GPS log of a mobile phonelocated inside the car.

My idea to solve this issue is to experiment Topic Modeling techniques especially Latent Semantic Indexing/Analysis (LSI/LSA) and Latent Dirichlet Allocation(LDA) and explain the observed trips by the unobserved behaviour of drivers.

The following is an executive summary … you can also browse throught the ppt that I am presenting at Metis on the 7th of April 2015 during the Career Event, or check the Python code available on my blog: http://nasdag.org

The raw data received for each trip is a csv file of (x,y) coordinates logged every one second.

My approach consists of first preprocessing the data using statistical smoothing and compression algorithms:
- Kalman Filtering and
- Ramer–Douglas–Peucker,
then extracting Road and Driving Style features:
- per Segment: Length, Slip Angle, Convexity, Radius
- per Meter: Speed, Accelerations (tangential and normal), Jerk, Yaw, Pauses
then, binning the ouput to generate the “Driving Alphabet” (ex: d0, d1, d2… v0, v1, v2… a0, a1, a2… etc),
and finally, building the Driving Vocabulary - made of “Driving Slides” (ex: d3L4v2n3y1) for various preprocessing sensitivities or features combinations (the langages).

Then I translate trips from GPS log into documents; tokenize, filter, … the data is ready!

I will use the GENSIM library to transpose trips into an LDA or LSI space where each trip becomes a combination of “Driving Behaviours” made of “Driving Slides”.

In order to validate my model I am using it to compete in the AXA Kaggle challenge where I need to come up with a “telematic fingerprint” capable of distinguishing when a trip was driven by a given driver, knowing that among the 200 provided trips for each of the 2736 drivers, a few number of trips was not driven by this driver.

Submissions are judged on area under the ROC curve calculated in a global manner (all predictions together).

My approach is the following:
- transpose all trips into the new Driving Behaviours Space
- take one by one each trip from a selected Driver
- build a prediction model trained with all other trips in the dataset:
Trues if they belong to the selected Driver
Falses if they do not belong to this Driver
- predict with the trained model, the belonging of the selected Trip to the Driver, then Ensemble several predictions using various sensitivities to enhance the score …

For performance reasons I will proceed by batches of 10 or 20 selected trips and compare each time to a randomly selected limited number of False trips.

Other outlier detection / clustering techniques appear to be less performing

3.3 M generated documents are kept in a MongoDB and parallel processing set up on 4 DigitalOcean Droplets with 8CPU each.

An AUC of 0.9 has been measured by Kaggle without any ensembling technique which confirms the robustness of this approach …


Views: 2999

Tags: Automotive, Behavior, Data, Modeling, Predictive, Telematics, science


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Mahnaz on August 16, 2015 at 3:33am
Thank you for sharing. It sounds great.
Comment by Kassio Machado on April 16, 2015 at 2:10pm

excellent work! Its very interesting!

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service