Session Replays: Technical Documentation and Data Collection
To record website visitor sessions/visits we use rrweb (an open source JavaScript library for recording and replaying user interactions in web applications). Session Replay - while available to everyone with a Marketing Analytics subscription - is not enabled by default, but requires an account admin or owner to list the sites and domains they want to enable it for. This is to ensure no visitor data is collected without the website owners knowingly and deliberately having approved it.
How it works:
When the Session Replay feature is enabled for a domain/site through the Siteimprove platform, the analytics script is updated with additional code that records a number of sessions/visits to the selected websites, which can then be played back with the Siteimprove platform.
- Initial baseline:The first thing collected is a full snapshot of the page content, because the replay system needs a complete description of the page (DOM tree, styles, scroll positions, input states, etc.) to accurately reconstruct the starting point.
- Snapshot contains:
- The full DOM tree (elements, attributes, and text nodes).
- Applied styles.
- The current state of inputs (e.g., text fields, checkboxes, radio buttons).
- Viewport size and scroll positions.
- NOTE: masking and blocking of content will be handled prior to the snapshot being taken and sent to Siteimprove.
- Snapshot contains:
- Compared to incremental snapshots:After the baseline is established, we primarily record incremental snapshots, which are small diffs (e.g., “user clicked here,” “this text changed,” “scroll moved”) instead of capturing the entire DOM again.;
- Incremental snapshots are more efficient because they only record changes.
What type of changes do incremental snapshots collect?
The following list of changes will be captured and sent to Siteimprove endpoints:
- DOM changes
- Node creation, deletion
- Node attribute changes
- Text changes
- Mouse movement
- Mouse interaction
- mouse up, mouse down
- click, double click, context menu
- focus, blur
- touch start, touch move, touch end
- Page or element scrolling
- Window size changes
- Input
Data masking and blocking
To ensure that no undesired data is collected, we allow for either masking of inputs and elements before the data is transferred to Siteimprove or blocking the input/element completely from being tracked. This can be done either based on a number of predefined content and input types or by setting up custom rules based on CSS or element attribute.
You can access the configuration of data masking and blocking under Analytics -> Analytics Settings -> Tracking -> Session replay settings
Masking and blocking are two distinct things:
Masking: You can decide to mask the content and input of some elements. Input fields and elements can often contain sensitive information; consider which of the inputs you wish to mask. Masking will convert the input to an anonymized value, i.e., the text "hello" would be turned into "*****"
Blocking: Elements and inputs that are blocked will be disregarded completely. This means that, e.g., inputting the text “hello” would not pass any information about the interaction to Siteimprove.
NOTE: masking and blocking of content will be handled prior to the initial snapshot being taken. This means that if, e.g., tel is blocked, the initial snapshot will not include information about elements with the type tel
The predefined content and input types available for masking are listed below:
Input Type | Description |
---|---|
Color | The color type is used for allowing user selection of a color. |
Date | The date type is used when the user should input a date. We have decided to treat all input related date and time as a single entity, which means this setting will also be used for masking inputs of type: datetime-local , month , time , week |
The email type is used when the user should input an e-mail. | |
Hidden | The hidden type is used when the value is not supposed to be shown but still be part of a form payload. Even though the value is visually hidden, it can still be viewed by inspecting the DOM. Hidden input fields can contain sensitive data. This type of input will always be masked. |
Number | The number type is used when the user should input a number. |
Password | The password type is used when confidential input such as passwords are to be input. This type of input will always be masked. |
Range | The range type is used when the user should input a number within a specific range. |
Search | The search type is used when the user should input something to search for. |
Select | Select elements are used when the user should select one or more values from a predefined list of values. |
Tel | The tel type is used when the user should input a telephone number. |
Text | The text type is used when the user should input some generic text. |
Textarea | Textarea elements are used when the user should be able to input one or more lines of text. |
URL | The url type is used when the user should input a URL. |
Additional information about sessionStorage
As part of collecting data for Session Replay we make use of a sessionStorage property to identify the same active session across different windows and tabs within the same browser
The sessionStorage object will be named _si_ctxm . The data in sessionStorage is only kept for the duration of the page session.
Window: sessionStorage property - Web APIs | MDN
IP anonymization
To ensure no IP address can be directly connected to a session replay you can enable IP anonymization for accounts and sites where session replay data is being collected. You can read about IP anonymization here: IP Anonymization in Siteimprove Analytics
Data retention and sampling for session replays
Data retention
Data collected for the Session Replays feature has a different retention period from the other data collected by the Siteimprove analytics script. We have chosen to limit the retention period for various reasons:
- Specific replays lose relevance over time - it’s rarely relevant to view how a specific visitor engaged with your website two years ago
- To ensure high standards within data privacy - not storing data longer than needed
- Session replays require a substantial amount of data storage
The default data retention period for session replay data is 30 days. It is possible for customers to purchase an extended retention period of up to 365 days.
Sampling rate
When enabling Session replay, the default sampling rate is 0.25% of visits to your website. This is included in your Marketing Analytics subscription. If you need to collect and watch additional replays, reach out to your Siteimprove contact to inquire about a higher sampling rate.
Additional resources
For more information about the rrweb technology, please see the technical documentation provided on GitHub.
https://github.com/rrweb-io/rrweb/blob/master/docs/observer.md
Did you find it helpful? Yes No
Send feedback