Sessionization is a powerful tool for combating abuse and is often used to detect anomalies, identify bots, cluster accounts, and surface other risky behavior. In this post we will explore how sessionization works and some interesting properties associated with sessions.
A session is “a unit of measurement of a user’s actions taken within a period of time or with regard to completion of a task.” In other words, sessions allow us to cluster activity from a particular time window together.
SQRL (the Smyte Query and Rules Language) allows you to easily create sessions using custom keys across arbitrary timespans in real-time as your data streams in. Smyte’s definition of a session is slightly broader in that we do not restrict sessions to just users. In fact, we can create sessions that are keyed off of IP addresses, credit cards, URLs, or any feature for that matter without having to wait for complex, expensive batch jobs to run.
Let’s imagine a simple case where we wanted to create a one hour session for a user. We can accomplish this with a single line of code:
1 | LET UserHourSession := sessionize(BY User EVERY 1 HOUR); |
From now on, whenever the user performs an action (logs in, sends a message, makes a payment, etc) we will check to see if a session exists. If no session exists, we will create a session that lasts for one hour.
If the specified time elapses with no activity, the current session is destroyed. Any subsequent event will create a new session.
If the session does exist, then we will extend the current session’s window by an additional hour.
It is important to note that each session gets its own unique ID. This identifier can be used in simple counters, unique counters, or rate-limits to identify suspect behavior and can even be used to determine the age of the session since we embed timestamps in all Smyte IDs.
In our first example, we created a simple session with no conditions; however, we can easily add qualifiers with a WHERE clause.
Let’s imagine we want to create a session for any time a user sends a message.
We could accomplish this with the following line of code:
1 | LET UserMessageHourSession := sessionize( BY User EVERY 1 HOUR WHERE ActionName = "send_message" ); |
In this case, we only consider message events as valid actions to be grouped in a session.
As mentioned above, Smyte tracks session age through unique IDs. To show how this is useful in detecting bots, let’s consider the example above. If we call the dateDiff function, we can see how many hours have elapsed since the session was created.
1 | LET UserMessageHourSession := sessionize(BY User EVERY 1 HOUR WHERE ActionName="send_message"); |
Remember that as soon as an hour has elapsed with no activity the session is destroyed and a new one will be created. This means that if a user has a message session that is 24 hours old they have been actively sending messages for the past 24 hours straight! This type of behavior often indicates that the user is in fact a bot.
With this insight, you can easily create a rule to flag and review this suspicious behavior.
1 | CREATE RULE DayLongMessageSession WHERE MessageHourSessionAgeInHours >= 24 |