evitaDB - Fast e-commerce database
logo
page-background

Change data capture

Change data capture (CDC) is a design pattern used to track and capture changes made to schema and data in a database. evitaDB supports CDC through all its APIs, allowing developers to monitor and respond to data changes very easily in near real-time in their preferred programming language. This document explains how to implement CDC using our API.

The database maintains a so-called Write-Ahead Log (WAL) that records all changes made to the database. This log is used to ensure data integrity and durability, but it can (and actually is) also be leveraged to implement change data capture (CDC) functionality. Once the catalogue is switched to the ACTIVE (transactional) stage, clients can start consuming information about changes made to both the schema and the data in the catalogue.
There is also a special CDC available for the entire database engine that allows clients to monitor high-level operations such as catalogue creation, deletion, and other global events (for more details, consult the Control Engine chapter).
Change data capture is not available for catalogues in the WARMING_UP stage since the WAL is not being recorded during that phase. This phase is considered "introductory" and clients should not work (query) with the data in that phase anyway. Clients should wait until the catalogue reaches the ACTIVE stage and perceive all the data at that moment as a consistent snapshot of the first version of the catalogue.

Engine and catalogue-level CDCs cannot be combined into a single stream since they operate on different levels (engine vs. catalogue). Catalogue-level CDC is always tied to a particular catalogue (name). If you need to capture all changes across all catalogues, you need to subscribe to engine-level CDC and then for each catalogue separately to catalogue-level CDC. The engine-level CDC notifies about catalogue creation/deletion events, so clients can dynamically subscribe/unsubscribe to catalogue-level CDCs as catalogues are created/deleted.

The basic principle in all APIs is the same:

  1. clients define a predicate/condition that specifies which changes they are interested in,
  2. define a starting point in the form of a catalogue version from which they want to start receiving changes,
  3. and subscribe to the change stream.

From that point onwards, clients will receive notifications about all changes that match their criteria. The changes are delivered in the order they were made, ensuring that clients can process them sequentially. The second step is optional — if no starting version is specified, the change stream will start from the next version of the catalogue.

Hierarchy of mutations

Not all mutations operate on the same level and some mutations may encapsulate others. For example, when an entity is upserted, it may contain multiple mutations within it (multiple attribute, associated data, price operations etc.). The hierarchy of mutations is as follows:

When you don't specify any filtering criteria, you will receive all mutations in flattened form, i.e. you will receive all mutations regardless of their hierarchy. So, for example, an entity attribute upsert will be delivered once as part of the entity upsert mutation and once as a standalone attribute upsert mutation. In practice, a client usually wants either high-level information about entity changes (so only entity mutations) or very specific low-level changes (e.g. only changes to attributes of a particular name). The approach with a simple flattened stream that is filtered by a single predicate covers all these use cases very well, and it is very easy to understand and implement.

Engine change capture

The engine-level capture stream accepts for creating the Java Flow Publisher. One or more clients may then subscribe to this publisher to receive instances representing the changes made to the engine.

Request allows you to specify the following parameters:

long sinceVersion (optional)

The catalogue version (inclusive) from which you want to start receiving changes. If not specified, the change stream will start from the next version of the catalogue (i.e. the changes made to the catalogue in the future).

int sinceIndex (optional)

The index of the mutation within the same transaction from which you want to start receiving changes. If not specified, the change stream will start from the first mutation of the specified version. The index allows you to precisely specify the starting point in case you have already processed some mutations of the specified version.

content

Enumeration that specifies whether the client wants detailed information about each mutation or only high-level information that a particular type of mutation occurred. The enumeration has the following values:

  • HEADER - only the header of the event is sent
  • BODY - the entire body of the mutation triggering the event is sent
Engine capture events are represented by instances that contain the following information:
long version

The version of the evitaDB where the mutation occurs.

int index
The index of the mutation within the same transaction. Index 0 is always infrastructure mutations of type .
operation

Classification of the mutation defined by enumeration:

  • UPSERT - Create or update operation. If there was data with such identity before, it was updated. If not, it was created.
  • REMOVE - Remove operation - i.e. there was data with such identity before, and it was removed.
  • TRANSACTION - Delimiting operation signaling the beginning of a transaction.
body (optional)
Optional body of the operation when it is requested by the requested .

Catalogue change capture

The catalogue-level capture stream accepts for creating the Java Flow Publisher. One or more clients may then subscribe to this publisher to receive instances representing the changes made to the catalogue.

Request allows you to specify the following parameters:

long sinceVersion (optional)

The catalogue version (inclusive) from which you want to start receiving changes. If not specified, the change stream will start from the next version of the catalogue (i.e. the changes made to the catalogue in the future).

int sinceIndex (optional)

The index of the mutation within the same transaction from which you want to start receiving changes. If not specified, the change stream will start from the first mutation of the specified version. The index allows you to precisely specify the starting point in case you have already processed some mutations of the specified version.

[] criteria (optional)

Array of criteria that specify which changes you are interested in. If not specified, all changes are captured. If multiple criteria are specified, matching any of them is sufficient (OR logic). Each criterion consists of:

  • area - the capture area ()
  • site - the capture site () providing fine-grained filtering
content

Enumeration that specifies whether the client wants detailed information about each mutation or only high-level information that a particular type of mutation occurred. The enumeration has the following values:

  • HEADER - only the header of the event is sent
  • BODY - the entire body of the mutation triggering the event is sent
Catalogue capture events are represented by instances that contain the following information:
long version

The version of the catalogue where the mutation occurs.

int index
The index of the mutation within the same transaction. Index 0 is always infrastructure mutations of type .
area

The area of the operation that was performed:

  • SCHEMA - changes in the schema are captured
  • DATA - changes in the data are captured
  • INFRASTRUCTURE - infrastructural mutations that are neither schema nor data
String entityType (optional)

The name of the entity type that was affected by the operation. This field is null when the operation is executed on the catalog schema itself.

Integer entityPrimaryKey (optional)

The primary key of the entity that was affected by the operation. Only present for data area operations.

operation

Classification of the mutation defined by enumeration:

  • UPSERT - Create or update operation. If there was data with such identity before, it was updated. If not, it was created.
  • REMOVE - Remove operation - i.e. there was data with such identity before, and it was removed.
  • TRANSACTION - Delimiting operation signaling the beginning of a transaction.
body (optional)
Optional body of the operation when it is requested by the requested .

Capture areas and sites

Catalogue CDC distinguishes between three different capture areas that correspond to different types of operations:

Schema capture area

The schema capture area tracks changes to the catalogue schema and entity schemas. This includes operations like:

  • Creating, updating, or removing entity schemas
  • Modifying entity attributes, references, and associated data definitions
  • Changing catalogue-level schema settings
The schema area uses for filtering, which allows you to specify:
String entityType (optional)

Filter by specific entity type name. If not specified, changes to all entity types are captured.

[] operation (optional)

Filter by operation type. If not specified, all operations are captured. Possible values:

  • UPSERT - Create or update operation
  • REMOVE - Remove operation
[] containerType (optional)

Filter by container type. If not specified, changes to all container types are captured. Possible values:

  • CATALOG - Catalogue-level schema changes
  • ENTITY - Entity schema changes
  • ATTRIBUTE - Attribute schema changes
  • ASSOCIATED_DATA - Associated data schema changes
  • PRICE - Price schema changes
  • REFERENCE - Reference schema changes

Data capture area

The data capture area tracks changes to entity data within the catalogue. This includes operations like:

  • Creating, updating, or removing entities
  • Modifying entity attributes, references, and associated data values
  • Updating prices and hierarchical placement
The data area uses for filtering, which allows you to specify:
String entityType (optional)

Filter by specific entity type name. If not specified, changes to all entity types are captured.

Integer entityPrimaryKey (optional)

Filter by specific entity primary key. If not specified, changes to all entities are captured.

[] operation (optional)

Filter by operation type. If not specified, all operations are captured. Possible values:

  • UPSERT - Create or update operation
  • REMOVE - Remove operation
[] containerType (optional)

Filter by container type. If not specified, changes to all container types are captured. Possible values:

  • ENTITY - Entity-level changes
  • ATTRIBUTE - Attribute value changes
  • ASSOCIATED_DATA - Associated data value changes
  • PRICE - Price changes
  • REFERENCE - Reference changes
String[] containerName (optional)
Filter by specific container name (e.g., specific attribute name like name, code). If not specified, changes to all containers are captured.

Infrastructure capture area

The infrastructure capture area tracks transaction-related and other infrastructural mutations that don't fit into schema or data categories. This includes:

  • Transaction delimiting operations
  • System-level operations
The infrastructure area does not use any capture site for filtering — currently, it captures all infrastructure mutations represented by .
No filtering parameters
Infrastructure area captures all transaction and system-level mutations without any filtering options. To capture infrastructure mutations, specify CaptureArea.INFRASTRUCTURE in your criteria without a capture site.

This area exists separately because transaction boundaries and system operations are orthogonal to both schema and data changes, and clients may need to track transaction boundaries independently for proper event grouping and consistency guarantees.

How to set up a new catalogue change capture

Setting up catalogue change capture differs from engine change capture in that it operates on the catalogue level.

Frequently asked questions regarding a change capture mechanism

No — you can let it be garbage collected. The publisher is just a factory for creating subscribers. Once the subscriber is created and subscribed, it maintains its own state and connection to the engine. A reference to the subscriber is kept in the evitaDB (client) instance, which prevents it from being garbage collected as long as the instance is alive.

You only need to keep the reference to the publisher if you plan to subscribe multiple subscribers to it.

No, you only need a session to create the publisher. Once the publisher is created, subscribers can subscribe to it without an active session. The publisher opens up a dedicated session for each subscriber internally if the subscription is not created within an active session.

The publisher freezes the CDC request parameters (including the starting version) at the moment of its creation. If the request contains a starting catalogue version, each subscriber will receive changes starting from the version specified in the CDC request used to create the publisher, regardless of when the subscriber subscribes to the publisher. If the request does not contain a starting version, each subscriber will receive changes starting from the next version of the catalogue at the moment of its subscription.

If your subscriber class implements the AutoCloseable interface, you can rely on the evitaDB (client) instance to automatically close it when the client instance is closed. Close will be automatically called when the subscription is cancelled or when the client instance is closed.

Author: Ing. Jan Novotný

Date updated: 21.10.2025

Documentation Source