Improved Fraud Detection with RediSearch

By Ron August 30, 2022Basics

If you’re somewhat involved in the world of Event Driven Design (or Software Engineering for that matter), you most certainly have heard of Kafka. At the same time, being part of any software community, I presume you’re familiar with our beloved Redis.

Kafka is a distributed streaming platform with tons of cool applications, that allows us to build scalable and performant distributed systems. It generally serves as a communication channel to connect Microservices in a high-performant way, while keeping immutable data in store for other applications.

Redis is an in-memory data store which also has lots of interesting applications.

When it comes to Redis, you’re likely more familiar with its open-source data-store-only version, which we can use as a high performance cache database, but through the years Redis has been building a collection of data models called Redis Stack, most of them under their enterprise offering.

Why is this relevant at all?

In [smart] Software Engineering, we’re in the business of not reinventing the wheel, and Redis Stack adds quite a few ready-made modern data models that make our life so much simpler.

In this article I will focus on RediSearch, but feel free to have a look at their full offering in the official documentation.

A little bit more about RediSearch

I particularly like Redis in general for many of my EDA applications. I work in an environment where event-driven is the norm, and Redis data stores allow us to have efficient caches all over our Microservices in order to speed things up. The case for using Redis as a cache is pretty well made for sure.

When you’re working in event-driven setups, more often than not you’ll need a local cache to aggregate event data. For example, if you have incoming user profile change events, you may have a Microservice that reads from the stream and aggregates a consolidated user profile. This is where Redis comes in handy.

You want something that is really fast as a local cache, and because of the nature of Kafka, you can rebuild these caches by reprocessing the Kafka topics, for example when launching a new Microservice.

Simple service consuming from a topic and writing to Redis

Now, RediSearch is a bit more interesting.

According to Redis (Formerly Redis Labs), RediSearch is a powerful indexing, querying, and full-text search engine for Redis.

So, what does that mean exactly?

As you may have guessed by now, RediSearch allows you to run searches or queries over your Redis data in a performant way.

Take for example standard Redis. You can have a primary index on a user profile hash in the following format:

users:profiles:user_id

You can perform a search operation with that key, and Redis will use the primary index to find it and retrieve what’s stored in there. This is the standard HGETALL users:profiles:user_id command.

With RediSearch, you could create a secondary index on the same key, based on its contents. If the user profile contains a name and an email, you could index the name field and perform search operations based on that name as well.

Pretty cool, uh?

You can run complex aggregations, store analytics data, search using plain text or even fuzzy matching and more.

I bet you can see how this can be used in a distributed environment. You could potentially have different data streams, collecting and aggregating data from different parts of your system, then build local caches across your different services and index data in multiple ways using RediSearch.

For example, you could aggregate key metrics about your business in order to give decision makers the upper hand, or you could store real time analytics and monitor performance, etc.

How is this different to using Redis out of the box?

RediSearch is a module built on top of Redis, and it’s a ready-made solution, meaning we don’t have to model any of the data stores it contains. The interface is also different and it comes super-loaded with tons of new commands and features. It’s a product of its own that otherwise you would’ve have to build yourself.

How can we use this for fraud detection?

Let’s pick a particular use case and see how RediSearch can help us out.

Fraud detection is a field full of interesting challenges and opportunities. Problems are complex and bad actors are always at the door waiting to try new approaches.

Then it’s up to the poor souls in Software Engineering to stand our ground and do something about it 😅

I’m going to describe a very generic use case of mob fraud, which can be adapted to any type of fraud detection as the specifics to the fraud detection itself are not relevant right now.

The problem

Let’s say we have a system that is prone to be abused by groups of users with similar behaviours, which means that a user committing fraud is very likely to have similar characteristics to other users committing fraud.

You can approach this problem in many different ways. You could cluster users with similar behaviours using Machine Learning, or you could have specific rules that determine and detect what fraud means to you, or you could have analysts checking users profiles and looking for unusual flags, or all of the above.

In fact, it is likely that you’d be using multiple techniques and checks to capture abusers.

Knowing that similar abusers are likely to behave in similar ways, how can you take advantage of this behaviour and capture others once you’ve determined that one user is fraudulent?

Using standard Redis caches

In a standard modern architecture we may have a system with a few Microservices, communicating via Kafka, and storing its data in local Redis caches.

Something that looks like this:

Simple EDA setup, with multiple consumers and producers

In this scenario we have a service exclusively for fraud-detection, listening to multiple data streams and aggregating all that data into Redis.

A user profile could be stored in a Redis hash like this:

# key: users:profiles:my_user_id
{
  name: John,
  lastname: Wick,
  email: john.wick@defonotfrauddomain.com,
  registration_ip_address: 156.33.241.5,
  latest_ip_address: 156.33.241.5,
  country: usa,
}

Let’s create it using the node-redis library:

import { createClient, SchemaFieldTypes } from 'redis';
const client = createClient();
await client.connect();

await client.hSet('users:profiles:my_user_id', {
  name: 'John',
  lastname: 'Wick',
  email: 'john.wick@defonotfrauddomain.com',
  registration_ip_address: '156.33.241.5',
  latest_ip_address: '156.33.241.56',
  country: 'usa'
})

Note that in practice we would’ve been building this profile from the different incoming data streams that we have.

Now that the data is aggregated in one place, we can run checks on it.

For example, we could run checks on the incoming location against the registration IP address of the user.

Let’s say we receive a login event with the following IP address: 91.218.114.206. We can store it, check against the registration IP of the user and notice they’re wildly different, so this could be a basic location check.

await client.hSet('users:profiles:my_user_id', {
  latest_ip_address: '91.218.114.206'
})

if (isIpAddressSuspicious(registrationIp, latestIp)) {
  // Do something about this login attempt
}

This is a pretty straightforward example of a basic fraud check using Redis as our data store.

Including RediSearch

To make things interesting, we could leverage the data already in Redis about our users and create relations between them via RediSearch.

Having this IP address data inside Redis, we could potentially assume that users flagged as suspicious were connecting from suspicious locations.

Based on that assumption we could search for users that have used this IP before.

One solution is to store all incoming IP addresses and its users in a primary index; in a key like users:ips:my_ip but this is probably not a nice solution, as it would entail storing more data than we need.

An alternative is to create a secondary index on the user’s IP address using RediSearch, which will allow us to search for all users that have mention of this IP, and then run individual quality checks on the resulting users.

RediSearch will now fit into our model attached to the Redis local cache as a module:

RediSearch attaches to Redis as a module

To create the secondary index we could use the same node-redis library:

await client.ft.create('idx:user_ip_address', {
  latest_ip_address: {
    type: SchemaFieldTypes.TEXT,
    sortable: true
  }
}, {
  ON: 'HASH',
  PREFIX: 'users:profiles'
});

This will create a secondary index on latest_ip_address called idx:user_ip_address and it will operate on the hashes that are prefixed with user:profiles.

With this index in place, we can now search for all related users with this IP address and run quality checks on them:

const ipAddressToCheck = '91.218.114.206';

// Finding users that have used this IP address in the past
const usersWithSameIp = await client.ft.search(
  'idx:user_ip_address', 
  `@latest_ip_address:{${ipAddressToCheck}}`
);

// Sending users for further inspection
if (usersWithSameIp.total > 0) {
  kafkaProduce('users.suspicious-ips', usersWithSameIp.documents)
}

Notice how we sent all the matching users into Kafka, so other consumers in the system can run further quality checks on each of these users and implement more extensive fraud detection rules for them.

Just like that we have slightly improved our fraud detection by checking users with similar traits using RediSearch.

You can slice and dice this use case however you see fit, but in a nutshell this is the principle to follow:

Aggregate some data from your streams into Redis.
Create secondary indexes on your data based on the nature of it.
Run further searches to extend checks.

Improving using RedisJSON

Similarly to RediSearch, you can make your life easier with RedisJSON. Instead of having your data stored in Hashes, you could store a complex JSON object for your users (or any data model), and index the paths in the same way.

For example, you could have your user’s profile segmented in a different way, like so:

{
  profile: {
    name: John,
    lastname: Wick,
    email: john.wick@defonotfrauddomain.com,
},
  location: {
    registration_ip_address: 156.33.241.5,
    latest_ip_address: 156.33.241.5,
    country: usa,
  },
  ...
}

Storing data in this format is fairly simple using RedisJSON

await client.json.set('users:json_profiles:my_user_id', '$', {
  profile: {
    name: 'John',
    lastname: 'Wick',
    email: 'john.wick@defonotfrauddomain.com',
  },
  location: {
    registration_ip_address: '156.33.241.5',
    latest_ip_address: '156.33.241.5',
    country: 'usa',
  }
});

Again, you would probably build this object slowly and enrich it over time.

Then to create an index on the IP address, all you have to do is to reference the path:

await client.ft.create('idx:user_ip_address', {
  '$.location.registration_ip_address': {
    type: SchemaFieldTypes.TEXT,
    SORTABLE: true,
    AS: 'reg_ip_address'
  }
}, {
  ON: 'JSON',
  PREFIX: 'users:json_profiles'
});

And to use it, the same idea applies:

const usersWithSameIp = await client.ft.search(
  'idx:user_ip_address', 
  `@reg_ip_address:{${ipAddressToCheck}}`
);

The applications for this are numerous. You could be checking for user’s email addresses with the same domain, or with similar names, or any other user data you are collecting and look for relations.

Alternative solutions

If you weren’t using RediSearch to find these associations, you would still have many options available, but not many that are this simple and efficient.

Storing data in a relational database

You could have your IP data stored in a relational database, and run queries against the incoming IP addresses, but this is not nearly as fast as having it in-memory.

Storing relations in a graph database

You could model the relations between users and their IP addresses in a graph database, but this is also not as fast as RediSearch, plus the data model is way more complex and it requires more maintenance.

Use only the primary index

You could create another primary index, but as mentioned before we would be duplicating data without any real need.

Conclusions

Fraud detection is a world with interesting problems to solve, and as you can see, it’s worth exploring what options are out there to stay on top of the user’s behavioural patterns.

Redis Stack offers a big suite of nice tools to help you in your quest, and to be honest they’re quite fun to use.

Do not try to reinvent the wheel and use battle-tested solutions instead.

Additionally, Redis has a nice Enterprise offering with Redis Cloud, which includes all these tools and it’s the fastest way to get up to speed with Redis and all of its components.

You can get started for free and build up from there.

Happy coding!

This post is in collaboration with Redis.

Learn more: