Skip to main content

Choosing the Right Modes

Datalus has different built-in modes for different stages of the data generation pipeline. Datalus supports two modes for where the generation processes are ran and two modes for where the synthetic data is stored.

Choosing the Right Storage Mode

Datalus supports two different storage modes: db and aws mode. The storage mode configuration determines where the generated data is stored after generation.

Database mode - Database mode stores generated data directly into your PostgreSQL database as a gzipped Base64 string. This is ideal when generating small sets of data and do not want to rely on the cloud for data. This mode currently limits the amount of generated data to be around ~50mb (TODO) per table per shard after compression.

AWS S3 mode (recommended) - When generating large amounts of data frequently, it is recommended to store the output directly to S3. Once generated, the data is stored in an S3 bucket as a gzipped Base64 string. This mode currently supports up to 5GB of generated data per table per shard after compression.

Choosing the Right Generation Mode

Datalus supports two different generation modes: local and aws mode. The generation mode configuration determines where the generation process container is actually ran.

Local mode - (TODO)

AWS ECS mode (recommended) - (TODO)