
Change data capture
Change data capture (CDC) is a design pattern used to track and capture changes made to schema and data in a database. evitaDB supports CDC through all its APIs, allowing developers to monitor and respond to data changes very easily in near real-time in their preferred programming language. This document explains how to implement CDC using our API.
Engine and catalogue-level CDCs cannot be combined into a single stream since they operate on different levels (engine vs. catalogue). Catalogue-level CDC is always tied to a particular catalogue (name). If you need to capture all changes across all catalogues, you need to subscribe to engine-level CDC and then for each catalogue separately to catalogue-level CDC. The engine-level CDC notifies about catalogue creation/deletion events, so clients can dynamically subscribe/unsubscribe to catalogue-level CDCs as catalogues are created/deleted.
The basic principle in all APIs is the same:
- clients define a predicate/condition that specifies which changes they are interested in,
- define a starting point in the form of a catalogue version from which they want to start receiving changes,
- and subscribe to the change stream.
From that point onwards, clients will receive notifications about all changes that match their criteria. The changes are delivered in the order they were made, ensuring that clients can process them sequentially. The second step is optional — if no starting version is specified, the change stream will start from the next version of the catalogue.
Hierarchy of mutations
Not all mutations operate on the same level and some mutations may encapsulate others. For example, when an entity is upserted, it may contain multiple mutations within it (multiple attribute, associated data, price operations etc.). The hierarchy of mutations is as follows:
-  (complete listing, available in engine change capture)
- (complete listing, available in catalog schema change capture)
- (complete listing, available in catalog data change capture)
 
- (available in all change capture streams)
When you don't specify any filtering criteria, you will receive all mutations in flattened form, i.e. you will receive all mutations regardless of their hierarchy. So, for example, an entity attribute upsert will be delivered once as part of the entity upsert mutation and once as a standalone attribute upsert mutation. In practice, a client usually wants either high-level information about entity changes (so only entity mutations) or very specific low-level changes (e.g. only changes to attributes of a particular name). The approach with a simple flattened stream that is filtered by a single predicate covers all these use cases very well, and it is very easy to understand and implement.
Engine change capture
Request allows you to specify the following parameters:
- long sinceVersion (optional)
- The catalogue version (inclusive) from which you want to start receiving changes. If not specified, the change stream will start from the next version of the catalogue (i.e. the changes made to the catalogue in the future). 
- int sinceIndex (optional)
- The index of the mutation within the same transaction from which you want to start receiving changes. If not specified, the change stream will start from the first mutation of the specified version. The index allows you to precisely specify the starting point in case you have already processed some mutations of the specified version. 
- content
- Enumeration that specifies whether the client wants detailed information about each mutation or only high-level information that a particular type of mutation occurred. The enumeration has the following values: - HEADER - only the header of the event is sent
- BODY - the entire body of the mutation triggering the event is sent
 
- long version
- The version of the evitaDB where the mutation occurs. 
- int index
- operation
- Classification of the mutation defined by enumeration: - UPSERT - Create or update operation. If there was data with such identity before, it was updated. If not, it was created.
- REMOVE - Remove operation - i.e. there was data with such identity before, and it was removed.
- TRANSACTION - Delimiting operation signaling the beginning of a transaction.
 
- body (optional)
Catalogue change capture
Request allows you to specify the following parameters:
- long sinceVersion (optional)
- The catalogue version (inclusive) from which you want to start receiving changes. If not specified, the change stream will start from the next version of the catalogue (i.e. the changes made to the catalogue in the future). 
- int sinceIndex (optional)
- The index of the mutation within the same transaction from which you want to start receiving changes. If not specified, the change stream will start from the first mutation of the specified version. The index allows you to precisely specify the starting point in case you have already processed some mutations of the specified version. 
- [] criteria (optional)
- Array of criteria that specify which changes you are interested in. If not specified, all changes are captured. If multiple criteria are specified, matching any of them is sufficient (OR logic). Each criterion consists of: 
- content
- Enumeration that specifies whether the client wants detailed information about each mutation or only high-level information that a particular type of mutation occurred. The enumeration has the following values: - HEADER - only the header of the event is sent
- BODY - the entire body of the mutation triggering the event is sent
 
- long version
- The version of the catalogue where the mutation occurs. 
- int index
- area
- The area of the operation that was performed: - SCHEMA - changes in the schema are captured
- DATA - changes in the data are captured
- INFRASTRUCTURE - infrastructural mutations that are neither schema nor data
 
- String entityType (optional)
- The name of the entity type that was affected by the operation. This field is null when the operation is executed on the catalog schema itself. 
- Integer entityPrimaryKey (optional)
- The primary key of the entity that was affected by the operation. Only present for data area operations. 
- operation
- Classification of the mutation defined by enumeration: - UPSERT - Create or update operation. If there was data with such identity before, it was updated. If not, it was created.
- REMOVE - Remove operation - i.e. there was data with such identity before, and it was removed.
- TRANSACTION - Delimiting operation signaling the beginning of a transaction.
 
- body (optional)
Capture areas and sites
Schema capture area
The schema capture area tracks changes to the catalogue schema and entity schemas. This includes operations like:
- Creating, updating, or removing entity schemas
- Modifying entity attributes, references, and associated data definitions
- Changing catalogue-level schema settings
- String entityType (optional)
- Filter by specific entity type name. If not specified, changes to all entity types are captured. 
- [] operation (optional)
- Filter by operation type. If not specified, all operations are captured. Possible values: - UPSERT - Create or update operation
- REMOVE - Remove operation
 
- [] containerType (optional)
- Filter by container type. If not specified, changes to all container types are captured. Possible values: - CATALOG - Catalogue-level schema changes
- ENTITY - Entity schema changes
- ATTRIBUTE - Attribute schema changes
- ASSOCIATED_DATA - Associated data schema changes
- PRICE - Price schema changes
- REFERENCE - Reference schema changes
 
Data capture area
The data capture area tracks changes to entity data within the catalogue. This includes operations like:
- Creating, updating, or removing entities
- Modifying entity attributes, references, and associated data values
- Updating prices and hierarchical placement
- String entityType (optional)
- Filter by specific entity type name. If not specified, changes to all entity types are captured. 
- Integer entityPrimaryKey (optional)
- Filter by specific entity primary key. If not specified, changes to all entities are captured. 
- [] operation (optional)
- Filter by operation type. If not specified, all operations are captured. Possible values: - UPSERT - Create or update operation
- REMOVE - Remove operation
 
- [] containerType (optional)
- Filter by container type. If not specified, changes to all container types are captured. Possible values: - ENTITY - Entity-level changes
- ATTRIBUTE - Attribute value changes
- ASSOCIATED_DATA - Associated data value changes
- PRICE - Price changes
- REFERENCE - Reference changes
 
- String[] containerName (optional)
- Filter by specific container name (e.g., specific attribute name like name, code). If not specified, changes to all containers are captured.
Infrastructure capture area
The infrastructure capture area tracks transaction-related and other infrastructural mutations that don't fit into schema or data categories. This includes:
- Transaction delimiting operations
- System-level operations
- No filtering parameters
- Infrastructure area captures all transaction and system-level mutations without any filtering options. To capture infrastructure mutations, specify CaptureArea.INFRASTRUCTURE in your criteria without a capture site.
This area exists separately because transaction boundaries and system operations are orthogonal to both schema and data changes, and clients may need to track transaction boundaries independently for proper event grouping and consistency guarantees.
How to set up a new catalogue change capture
Setting up catalogue change capture differs from engine change capture in that it operates on the catalogue level.
Frequently asked questions regarding a change capture mechanism
No — you can let it be garbage collected. The publisher is just a factory for creating subscribers. Once the subscriber is created and subscribed, it maintains its own state and connection to the engine. A reference to the subscriber is kept in the evitaDB (client) instance, which prevents it from being garbage collected as long as the instance is alive.
You only need to keep the reference to the publisher if you plan to subscribe multiple subscribers to it.
No, you only need a session to create the publisher. Once the publisher is created, subscribers can subscribe to it without an active session. The publisher opens up a dedicated session for each subscriber internally if the subscription is not created within an active session.
The publisher freezes the CDC request parameters (including the starting version) at the moment of its creation. If the request contains a starting catalogue version, each subscriber will receive changes starting from the version specified in the CDC request used to create the publisher, regardless of when the subscriber subscribes to the publisher. If the request does not contain a starting version, each subscriber will receive changes starting from the next version of the catalogue at the moment of its subscription.
