With the development and growth of ecommerce platforms like Shopify, the number of small- and medium- sized ecommerce businesses is growing at an impressive rate. But, with this growth comes a growth in market opportunities for the online villains and fraudsters out there who are looking to make a quick buck. It used to be that only huge corporations had the resources they needed to detect fraud and protect themselves from its damages. But, in this era of big data and data science for all, even small mom and pop ecommerce shops have access to the tools they need to protect themselves from evil fraudsters. This article introduces some common sources of fraud problems in ecommerce, and how you can use data science technologies or techniques to protect your business (or soon-to-be business) from risk.

On first glance, it’s somewhat difficult to imagine the types of fraud to which a typical ecommerce business is exposed. I mean, you really don’t hear much about ‘ecommerce fraud’, do you? Well, don’t let the silence fool you. As Elli Bishop guestblogged on the Kissmetrics blog, online fraud caused $3.5 Billion of damages to the ecommerce industry in 2012 alone – and the damages are increasing on an annual basis. Let’s take a look at the ways in which fraudsters are defrauding honest, and upstanding online merchants.

One serious problem in ecommerce is Card-Not-Present (CNP) fraud. CNP fraud is a type of credit card transaction fraud, oftentimes where fraudsters use stolen card numbers to make online purchases. For each instance that a transaction like this goes undetected or un-prevented, the selling merchant is held financially responsible for refunding the fraudulent credit card charge and for the losses from the merchandise that they’ve already shipped out to the fraudster. Since CNP fraud represents a double loss to selling merchants, most merchants want to do everything they can to prevent fraudulent CNP transactions.

Ecommerce businesses are also exposed to fraud problems associated with *account takeovers*. Account takeovers occur in cases where fraudsters have successfully stolen account credentials and then used those credentials to unlawfully login into clients’ accounts and make fraudulent purchases. As with CNP fraud, ecommerce merchants are left liable for the cost of the merchandise that was shipped out to the fraudster, plus the expense of reimbursing customers for the fraudulent charges that were wrongfully accrued in the account takeover.

Yikes, ecommerce sounds like it can be risky business, right? Well, that’s not entirely untrue. This is why it’s extremely important to make wise decisions when it comes to your ecommerce solution provider. Ecommerce is hot right now, so new vendors are popping up left and right. While some of these vendors offer very competitively priced packages, more mature vendors offer you an array of options for the support, analytics integrations, and best-in-breed ecommerce security add-ons that you need to keep your business safe, secure, and protected.

...more mature vendors offer you an array of options for the support, analytics integrations, and best-in-breed ecommerce security add-ons that you need to keep your business safe, secure, and protected.

Sift Science and Feedzai are two excellent fraud detection and prevention add-ons for ecommerce businesses. They’re available to all ecommerce businesses that run on the Shopify ecommerce platform. Since Shopify has been in the ecommerce game longer than most other solution providers, they’ve had time to put together a solid suite of support offerings. In fact, Shopify offers over 800 add-on applications that its customers can purchase to make their lives easier and more worry-free.

For ecommerce fraud detection and prevention, the Feedzai add-on application is an excellent selection. Feedzai prides itself on its delivery of powerful machine learning, data science, and big data solutions to help small- and medium- sized businesses. Feedzai’s star offering is a fraud prevention software that runs off of Feedzai’s proprietary risk and fraud detection engine. While of course Feedzai isn’t giving away its secret sauce, they’ve published a white paper that at least clarifies their basic methods. To summarize the white paper in a few brief words, Feedzai has combined behavioral modeling / profiling, machine learning clustering algorithms, and a rule engine to detect and prevent ecommerce fraud for businesses that run on the Shopify platform.

In fact, it’s not reasonable to expect that you can detect and prevent online fraud simply by deploying a few simple machine learning or statistical algorithms. Ecommerce fraud is a lot more complicated than that – you should expect to incorporate behavioral modeling, a rule engine, and some solid domain expertise to even begin moving towards finding a solution that will work. This said, machine learning and statistical algorithms are one essential ingredient.

And, what type of algorithms are useful for detecting and preventing ecommerce fraud? That’s a good question – there are many options depending on the approach you’d like to take. You can use time series anomaly detection algorithms to automatically detect suspicious or unusual events and trends as they occur. If your time series is periodic, then you’re likely to get good results by using an aggregating window function and then following that with a k-nearest neighbor algorithm. In R, you can use the following 2 families of window functions to aggregate your time series data.

*Cumulative aggregate*window functions in the dplyr package*Rolling aggregate*window functions in the RcppRoll package

As a general approach, you could first you’d divide your time series into windows, and then use a similarity function to calculate an anomaly score for each window. In this manner, it is possible for you to perform automated anomaly detection of time series.

And, just in case you have never used R before, here is a quick intro to get you going in a hurry. The easiest way to set up R on your machine is to download and install the RStudio IDE. The k-nearest neighbor algorithm is in R’s ‘class’ package. You can actually just copy and paste the sample code below to start playing around with classifying data using R’s knn() function.

# Window A casesA1=c(6,6) A2=c(5.5,7)

A3=c(6.5,5)

# Window B cases

B1=c(9,8)

B2=c(2.2,2.5)

B3=c(100,0)

# Window C cases

C1=c(0,0)

C2=c(1,1)

C3=c(2,2)

# Build a classification matrix from the points in each of the windows

train=rbind(A1,A2,A3, B1,B2,B3, C1,C2,C3)

# Window labels vector (attached to each class instance)

cl=factor(c(rep("A",3),rep("B",3), rep("C",3)))

# Specify the object to be classified, i.e., thetest case

test=c(2, 98)

# Load the class package so that you can access the knn() function

library(class)

# Call the knn() function and get its summary

summary(knn(train, test, cl, k = 1))

**Output results here indicate that the test case has been classified as part of the Window B cluster.**

**A B C**

**0 1 0**

** **

# Window A casesA1=c(6,6) A2=c(5.5,7)

A3=c(6.5,5)

# Window B cases

B1=c(9,8)

B2=c(2.2,2.5)

B3=c(100,0)

# Window C cases

C1=c(0,0)

C2=c(1,1)

C3=c(2,2)

# Build a classification matrix from the points in each of the windows

train=rbind(A1,A2,A3, B1,B2,B3, C1,C2,C3)

# Window labels vector (attached to each class instance)

cl=factor(c(rep("A",3),rep("B",3), rep("C",3)))

# The object to be classified, i.e., thetest case

test=c(6, 6.2)

# Load the class package so that you can access the knn() function

library(class)

# Call the knn() function and get its summary

summary(knn(train, test, cl, k = 1))

**Output results here indicate that the test case has been classified as part of the Window A cluster.**

**A B C**

**1 0 0**

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central