It seems like every tech company is using machine learning these days. Or, at the very least, talking about it. It’s a cutting-edge concept that’s getting a lot of buzz. But what does it really mean? In this post, I’ll first explain the basics of what the term means, and then how Riskified uses machine learning to drive accuracy when vetting orders for fraud.
How does Netflix know what I want to watch?
Why is Netflix so popular? Yes, they have a great selection of old movies, and their original content keeps getting better. But one of the features that really sets them apart from the competition is the accuracy of their viewing suggestions, tailored to your taste based on what you’ve watched previously. Sometimes it feels like they know exactly what you want to watch. How did they do it?
The answer, as you’ve likely guessed, is machine learning. But before explaining what this term means, let’s understand the alternative:
Before the advent of machine learning, writing code to produce personalized movie suggestions was a tall task. A programmer would have to write explicit if/then rules to cover a nearly infinite range of possibilities. An oversimplified example: if the last movie a user watched to completion and rated highly was a comedy, suggest the most popular comedy on the site that they haven’t yet watched. By programming thousands of rules like this into a decision tree, Netflix could generate a constant stream of title recommendations.
But is this the best way to do it? Probably not. For one thing, the task of deciding upon and programming these rules is extremely cumbersome. Moreover, the algorithm can perform only as well as the rules the programmer decides on. And regardless of how much research or intuition the programmer has at her disposal, it’s highly unlikely that she will understand all the variables that should be taken into account when deciding on a recommendation, as well as how these different data points should be weighted.
In other words, consistently recommending the best movie to a viewer is a task that is so complex, it’s beyond the capability of a person, or even a group of people. And so an algorithm that is explicitly told how to behave by people will be subject to the same limitations.
This is where machine learning comes in. Instead of telling the computer how to give recommendations, programmers can feed the algorithm historical data and let the machine learn how to best to make recommendations.
How do machines “learn”?
There are many subfields of machine learning, but behind all of them is a basic idea: rather than telling a computer how to solve a problem, show the computer relevant information and let it figure out how best to solve it.
So the first thing Netflix had to do was gather a ton of in-depth data. Things like: what titles a user had searched for; meta-data in the titles they’d watched (actors, directors, release year); external data about the user like demographic, region and language; and much much more.
In essence, what the architects of the system did was ask the machine: in the past, how good of a predictor was each of these data points when trying to find another film this user would like?
The process for developing a fraud detection algorithm (which I’ll discuss in greater detail below) is very similar. We show a computer millions of orders – all of which are tagged as either legitimate or fraudulent – and ask it to determine retroactively how order data points could have best been considered and weighted to arrive at the correct assessment.
In either case, the resulting algorithm will not be developed by rules, but rather based on historical trends over millions and millions of movie views or online shopping orders.
The inner workings of how a computer might arrive at this learned algorithm can be extremely complicated, but we can use an oversimplified example to get the idea.
Let’s say there are only two movies on Netflix: Die Hard and Snow White. And we have only one user data point to use when figuring out which of these films to recommend: how old the viewer is. We want to figure out which movie to recommend to a 32-year-old.
We gather data from fifty users about their age and whether they liked Die Hard or Snow White more. Here’s what the data looks like:
It might be clear to the naked eye that older viewers tend to prefer Die Hard, and younger ones prefer Snow White. But it’s not obvious what to do with our 32-year-old. So we tell a statistics program like SPSS to plot a line or curve, which summarizes what this data tells us. The results might look like this:
This extremely simple–but perfectly valid–example of a machine learning algorithm is known as logistic regression. It uses a logarithmic function to estimate the relationship between a continuous variable (age) and a binary one (which movie). You can read more about logistic regression here.
The interpretation of this line is this: at any given age, it gives the computer’s best guess about the probability of which movie the user would prefer (0 being an absolute preference for Snow White, 1 absolute preference for Die Hard).
So for a 32 year old, this model tells us there’s about a 40% chance that this user prefers Snow White and a 60% chance that they’ll prefer Die Hard. Die Hard it is.
Critically, any model like this will also give a statistic explaining the strength of the relationship between each independent variable and the dependent one. In other words, in addition to its movie suggestion, the computer also tells us how good of a predictor age is of which movie the user will like.
Wait. How does the machine know to draw a line like that?
For complex problems: trial and error. This is where the computing power of a machine is important. Unlike the above example, which could theoretically be solved by hand, most machine learning problems deal with hundreds or thousands of input variables. There is no ‘right’ answer to these questions, only a ‘best’ answer. In order to estimate the ideal weight of each input variable, machine learning programs will try a possible weighting scheme, backtest it against the data to see how off it was, make an adjustment and try it again, and again. This brute force iterative way of solving problems is sort of like how you know when to stop eating ice cream. When you’re a kid, you always overeat and feel sick. As you gain more experience eating ice cream, you get closer and closer to knowing the ‘optimal’ amount to consume, thanks to simple trial and error.
How can Machine Learning be used to detect fraud?
The same process that Netflix uses to generate tailored film recommendations can be applied to detecting fraud: gather reams of historical data on both legitimate and fraudulent transactions, then tell a computer to find which data points (or combinations of data points) are most important, and how they should be weighed.
Step one is easier said than done; to build an accurate machine learning model you need a LOT of data. Google invested hundreds of millions of dollars in paying people to drive around, in order to amass enough data about routes and traffic to develop the machine learning algorithm behind Google Maps. Many startups hoping to use machine learning solutions run out of funding in the process of gathering enough data to ensure their model’s accuracy.
Riskified took an innovative approach to this problem. We offered to review merchant’s risky orders, ones they planned to decline anyways. They would only pay for orders we approved, and any mistakes we made were backed up by our chargeback guarantee. There was no downside for merchants to give us a try. By manually reviewing and tagging orders for a year, we amassed enough data to develop our first machine learning models.
For every order we process, we have thousands of data points, including some we collect ourselves: our proprietary web beacon gathers information on shopper’s behavior while they’re on our clients’ eCommerce sites, and we use technology developed in-house to determine whether or not a shopper is using a proxy server.
Though our human analysts are extremely accurate at fraud review, there are advantages of having a computer do this task:
- It’s scalable – crucial for a rapidly growing company
- It’s faster. Even the best minds need a few minutes to make an order decision. Computers take milliseconds.
- Computers can be recalibrated far more easily.
It’s the idea of recalibration, in particular, that makes machine learning such a good fit for fraud detection. And that’s where deep learning comes in.
Deep learning, neural networks, and fraud detection
The cutting edge of machine learning is what’s known as ‘deep learning.’ This is a sophisticated form of problem-solving, which utilizes a learning structure inspired by the human brain.
In this formulation, a sample starts at a series of nodes which classifies it according to very basic criteria. After this classification, it travels to the second series of nodes, which classifies it according to slightly more narrow specifications. This process is iterated until the computer reaches a ‘decision’.
The key advantage of this setup is its superior ability to learn from its own mistakes. While more traditional machine learning programs tend to experience diminishing returns as they process more and more data, neural networks are likely to ‘learn’ nearly as much from the billionth sample as from the millionth.
This makes deep learning very useful to the field of fraud prevention. Unlike movie suggestions – where the factors influencing customer preferences are unlikely to change dramatically – fraudsters are constantly honing their craft. When prevention systems decline fraudster’s orders they don’t try the same thing again; they figure out why they were caught and try to adapt. For this reason, a fraud prevention tool which is sluggish to adapt to new information will struggle to vet orders efficiently.
I hope you found this introduction to machine learning and its applications to fraud detection helpful. To learn more about how Riskified’s deep learning fraud solution can help your business detect fraud more efficiently, request a demo of our product.