Change data capture

Change data capture (CDC) is a design pattern used to track and capture changes made to schema and data in a database. evitaDB supports CDC through all its APIs, allowing developers to monitor and respond to data changes very easily in near real-time in their preferred programming language. This document explains how to implement CDC using our API.

The database maintains a so-called Write-Ahead Log (WAL) that records all changes made to the database. This log is used to ensure data integrity and durability, but it can (and actually is) also be leveraged to implement change data capture (CDC) functionality. Once the catalogue is switched to the ACTIVE (transactional) stage, clients can start consuming information about changes made to both the schema and the data in the catalogue.

There is also a special CDC available for the entire database engine that allows clients to monitor high-level operations such as catalogue creation, deletion, and other global events (for more details, consult the Control Engine chapter).

Change data capture is not available for catalogues in the WARMING_UP stage since the WAL is not being recorded during that phase. This phase is considered "introductory" and clients should not work (query) with the data in that phase anyway. Clients should wait until the catalogue reaches the ACTIVE stage and perceive all the data at that moment as a consistent snapshot of the first version of the catalogue.

Engine and catalogue-level CDCs cannot be combined into a single stream since they operate on different levels (engine vs. catalogue). Catalogue-level CDC is always tied to a particular catalogue (name). If you need to capture all changes across all catalogues, you need to subscribe to engine-level CDC and then for each catalogue separately to catalogue-level CDC. The engine-level CDC notifies about catalogue creation/deletion events, so clients can dynamically subscribe/unsubscribe to catalogue-level CDCs as catalogues are created/deleted.

The basic principle in all APIs is the same:

clients define a predicate/condition that specifies which changes they are interested in,
define a starting point in the form of a catalogue version from which they want to start receiving changes,
and subscribe to the change stream.

From that point onwards, clients will receive notifications about all changes that match their criteria. The changes are delivered in the order they were made, ensuring that clients can process them sequentially. The second step is optional — if no starting version is specified, the change stream will start from the next version of the catalogue.

Hierarchy of mutations

Not all mutations operate on the same level and some mutations may encapsulate others. For example, when an entity is upserted, it may contain multiple mutations within it (multiple attribute, associated data, price operations etc.). The hierarchy of mutations is as follows:

(complete listing, available in engine change capture)
- (complete listing, available in catalog schema change capture)
- (complete listing, available in catalog data change capture)
(available in all change capture streams)

When you don't specify any filtering criteria, you will receive all mutations in flattened form, i.e. you will receive all mutations regardless of their hierarchy. So, for example, an entity attribute upsert will be delivered once as part of the entity upsert mutation and once as a standalone attribute upsert mutation. In practice, a client usually wants either high-level information about entity changes (so only entity mutations) or very specific low-level changes (e.g. only changes to attributes of a particular name). The approach with a simple flattened stream that is filtered by a single predicate covers all these use cases very well, and it is very easy to understand and implement.

Author: Ing. Jan Novotný

Date updated: 21.10.2025

Documentation Source