Choosing the Right Modes
Datalus has different built-in modes for different stages of the data generation pipeline. Datalus supports two modes for where the generation processes are ran and two modes for where the synthetic data is stored.
Choosing the Right Storage Mode
Datalus supports two different storage modes: db
and aws
mode. The storage
mode configuration determines where the generated data is stored after generation.
Database mode - Database mode stores generated data directly into your PostgreSQL database as a gzipped Base64 string. This is ideal when generating small sets of data and do not want to rely on the cloud for data. This mode currently limits the amount of generated data to be around ~50mb (TODO) per table per shard after compression.
AWS S3 mode (recommended) - When generating large amounts of data frequently, it is recommended to store the output directly to S3. Once generated, the data is stored in an S3 bucket as a gzipped Base64 string. This mode currently supports up to 5GB of generated data per table per shard after compression.
Choosing the Right Generation Mode
Datalus supports two different generation modes: local
and aws
mode.
The generation mode configuration determines where the generation process
container is actually ran.
Local mode - (TODO)
AWS ECS mode (recommended) - (TODO)