As more and more businesses are facing credit card fraud and identity theft, the popularity of “fraud detection” is rising in Google Trends.
Companies are looking for credit card fraud detection software that will help to eliminate this problem or at least reduce the possible dangers. Before looking at the SPD Group credit card fraud detection project, let’s answer the most common questions:
It is a set of activities undertaken to prevent money or property from being obtained through false pretenses.
Models make predictions based on information about a transaction and some context (historical) information. To make the model more robust, we used only the most important features which were selected based on χ² (a chi-square is a test that measures how expectations compare to actual observed data) and recursive feature elimination techniques.
Neural networks are highly effective when the data scientist has access to a large dataset (say 100,000 or more data samples). They are able to seek patterns and smartly detect new behavior that seems too distinct from the normal flow. However, in our case, we decided to rely on other Machine Learning models such as Classification trees because the RNN performance did not show the accuracy we expected, most likely because of a dataset that was not large enough.
Development time – 3 months
Team size – 6 experts
Platform – Web
SPD Group was contacted by an E-commerce and Financial Service company that offered products and services that can be paid for using Mobile Money or a bank card (e.g., Visa and MasterCard) to make their platform a safer online transaction place for their customers. Along with the increase in the number of customers who faced issues with their money suddenly disappearing or being transferred to another unknown account, our client thought of implementing a modern fraud prevention method for his platform. Therefore, he contacted us and decided to rely on what Machine Learning can do here.
To dive into the challenges and obstacles of this project, we got a quote from a Machine Learning Engineer from our development team:
“The most complicated part of the solution was to achieve good metrics for users who have made only a few transactions. We could apply the regular model, which is good for users with a rich transaction history, but it would give worse scores if there is a lack of historical data (for example, a new user). Another obvious solution is to treat such users as empty accounts that have only identity information without any transaction history. In this case, we lose the advantage of having at least some data about the users, but the results that such a model provides are quite stable (underfitting). After making a weekly stand up on the matter, we decided to look into ‘few-shot learning’ techniques, which could help us improve our metrics. We have prepared a PoC, but it didn’t give us the drastic improvement we had expected. Nevertheless, we proceeded with experimenting and diving into our client business domain; it allowed us to develop features which have made a huge impact on our model that is based on ‘few-shot learning’ techniques. Because of the domain features, our main score improved by more than 15% and it became the production solution.”
Our R&D team worked on the project for 3 months, using Classification rather than classical Anomaly Detection methods. After an intense feature generation phase (about 700 features in total) they went to feature selection to choose only the most relevant ones. Finally, it was a blend of Classification methods such as GXBoost, Catboost, and LightGBM that got us close to the desired score.
The platform is an e-commerce and financial service app serving 12,000+ customers daily. This dataset included a sample of approximately 140,000 transactions that occurred between October 2018 and April 2019. One of the fraud detection challenges is that the data is highly imbalanced. There were around 130,000 normal transactions and only 6% of them were fraudulent. We addressed the problem of an imbalanced dataset with various techniques such as data oversampling (augmenting the existing data samples) and data sample generation.
Once the Machine Learning-driven fraud protection module was integrated into the e-commerce platform, it started tracking the transactions. Whenever a user requests a transaction, it is being processed for some time. Depending on the level of predicted fraud probability, there are 3 kinds of possible output:
The model estimates the probability of a fraudulent transaction based on the following transaction information: Date and Time, Product Category, Amount, Provider (Seller), Client Information, Agent Information, Location, and Client’s Behavioral Patterns. Contextual and aggregated data is produced by a Machine Learning engineer based on the previously mentioned data.
After this solution was implemented, the entire e-commerce platform received tangible benefits. In only 6 months after production, we can highlight the following areas:
What we achieved with the solution: More than 140,000 transactions were analyzed with 6% of fraudulent data points a year. Fewer customers claimed to have fraudulent transactions. Our client’s online card transaction platform became a safer service and gained more loyalty from their customers. We continued to support the project after release because it is very important to continuously train the Fraud Detection model whenever new data arrives, so new fraud schemas/patterns can be learned and detected as early as possible.
Originally posted here.