How NimbleEdge enables on-device event stream capture to power session-aware AI
Introduction: What is session-aware personalization?
Mobile apps across verticals today offer a staggering variety of choices - thousands of titles on OTT apps, hundreds of restaurants on food delivery apps, and dozens of hotels on travel booking apps. While this abundance should be a source of great convenience for users, the width of assortments itself is driving significant choice paralysis for users, contributing to low conversion and high customer churn for apps.
Many apps have turned to user experience personalization to tackle this challenge, delivering personalized listings in app homepage, search results and recommendations, based on the customer’s historical preferences. This approach has been partially successful, but most enterprises only rely on outdated, historical customer data for personalization, missing out on rich insights from real-time user interactions that more accurately indicate a user’s immediate intent.
That is where session-aware personalization comes in. Session-aware personalization leverages users’ real-time, in-session interactions (e.g. clicks, search queries, cart additions) to understand their current intent, and accordingly tailor their app experience. Leading apps like Netflix, Instacart, and Alibaba’s Taobao have deployed such systems, driving strong conversion and engagement benefits. At NimbleEdge, we take this further by enabling session-aware personalized experiences directly on users’ mobile devices using our on-device AI platform, making real-time personalization efficient and endlessly scalable.
Building session-aware personalization models
Session-aware personalization is performed using AI models that take the user’s real-time clickstream data as input, and return personalized rankings as output, that form the basis for real-time tailored user experiences (e.g. top search results for a query, top items to display on a user’s homepage feed). Naturally, creating such AI models is the first step towards enabling session-aware personalization, with these models then deployed and executed on users’ mobile devices using NimbleEdge.
Building such a model involves several challenges, which make session-aware modeling a highly time and resource intensive effort:
- Need for massive volume of data:
- Session-aware modeling exercises require a huge amount of granular user clickstream data, such as clicks, search queries, cart additions, purchase completions and more - for training
- This data captures the sequential nature of user behavior, enabling the model to understand user intent and how it correlates with user interactions in and across sessions
- At scale, this can mean capturing billions of interactions for apps with high traffic, such as e-commerce apps processing millions of user sessions daily
- However, a large training dataset is essential to ensure the model can generalize across diverse behavior effectively and deliver personalized, relevant responses that enhance user engagement and user experience
- Data accuracy and pre-processing:
- After data collection, it must undergo cleaning to remove incomplete or erroneous data (e.g. accidental clicks)
- The massive raw user event dataset then needs to be transformed to a format which is suitable for further analysis - for example, filtering user event streams to limit to focus data points, or enriching data with contextual information, such as product category names or prices
- Bandwidth and cost-intensiveness:
- Performing these accuracy checks and transformations is highly time-consuming for AI teams, whose time and expertise come at a premium. Combined with the infrastructure costs of storing and transferring large datasets, these processes can quickly overwhelm budget and timelines
Data collection for session-aware models and the challenge with CDPs
We’ve already established that creating large, accurate, clickstream datasets is essential for building session-aware personalization models. We’ll now shift our attention to a popular current approach for data collection (i.e. using CDPs), and associated challenges.
Customer Data Platforms (CDPs), such as Segment, Amperity, and Clevertap, are software solutions that consolidate and organize data from various sources to create comprehensive, unified user profiles. They integrate with customers’ app, website, CRM software and marketing automation tools, and keep a record of users’ demographic data, marketing campaign data, as well as behavioral data, such as product-user interactions. This data is then primarily used to inform customer segmentation, personalization, and analytics for marketing campaigns.
Given the nature of focus use-cases, the users and buyers of these platforms are usually marketing teams. While CDPs collect clickstream data that is valuable for AI teams, they are not purpose-built to support session-aware modeling, leading to several major challenges when leveraging CDPs for this task:
- Data transfer costs: Transferring data from CDPs to your own cloud storage is often time-taking as well as prohibitively expensive for the large datasets required to train session-aware models
- Format inflexibility: The data formats for event streams in CDPs are not optimized for analysis by AI teams, requiring significant bandwidth for transformation or ETL tasks to make them usable for training
- Low customizability: Additionally, CDPs offer limited customizability to AI teams in terms of data formats are their primary users are largely marketing and front-end teams. Since data collection exercises for AI use-cases are often run only for short periods and needs vary from use-case to use-case, CDPs are espeically reluctant to offer customizations to cater to ML teams
To illustrate these challenges, we share a quote below by the VP of AI and Data Science at a leading food delivery app, highlighting why they have struggled to use clickstream data from their CDP for session-aware modeling use-cases.
The solution: On-device clickstream data collection with NimbleEdge
Given our focus on session-aware personalization, NimbleEdge offers an on-device data warehouse that can help circumvent the challenges associated with clickstream data collection from CDPs. This on-device data warehouse captures and securely stores real-time user interactions directly on their own devices. These interactions, such as clicks, search queries, and cart additions, can then be seamlessly transferred to cloud storage, and easily leveraged for training session-aware models.
Purpose built for AI teams, this solution enables them to quickly collect requisite clickstream data with high accuracy to train session-aware models. Unlike CDPs, this solution also eliminates data transfer costs by leveraging over-the-air updates, making it cost-effective to create the large clickstream datasets required. By offering high-accuracy, readily usable data, NimbleEdge reduces the time ML teams spend on preprocessing, enabling faster iterations on session-aware models. This accelerates deployment timelines and improves model performance through quicker feedback loops.
In the diagram below, we share a high-level overview of how this system operates:
Impact
NimbleEdge’s on-device data warehouse unlocks AI teams’ ability to collect user event stream data at scale, unlocking the following key benefits:
- Lower data transfer costs: With NimbleEdge's on-device data warehouse enabling direct transfer of event stream data from end-user devices to cloud storage, AI teams can eliminate the data transfer costs typically incurred when using CDPs
- Control on data format: AI teams can toggle the user event stream data points they want to ingest using the NimbleEdge SaaS platform, providing flexibility in terms of data collection, as well as minimizing the time required for pre-processing collected data to bring it into a format suitable for further analysis
- Faster time to market: With simplified user event stream data collection in the requisite formats, AI teams can train session-aware models faster, as well as iterate quicker, cutting down the time to deployment
This enhancement reinforces NimbleEdge's commitment to delivering efficient, cost-effective, and scalable solutions for session-aware personalization, even in the most demanding data environments. In our next blog in this series, we share more about how NimbleEdge enables on-device processing of user event streams using Python scripts to further refine the data ingested in cloud storage to limit storage costs as well as minimize time needed for pre-processing. This is especially relevant in scenarios where the event payload is very large in size, such as when it is the response from a backend API. Stay tuned!
To learn more about how NimbleEdge drives real-time AI-driven personalized experiences at scale, visit nimbleedge.com or reach out to contact@nimbleedge.com