Quick Start
Get up and running with Datalus.
What you need
For selected generation mode
For selected storage mode
See Requirements for more details
Downloading Datalus
Download the latest version of Datalus from the GitHub page and move the folder to where you'd like your service to exist.
Setting up the data generators
Setting up ECS Fargate
AWS ECS Fargate is a serverless compute engine that can run Docker containers. Datalus recommends ECS to generate data as it offers increased security when running untrusted code.
This will walk you through the steps of setting up the Datalus Generator service on AWS ECS Fargate.
First, you need an AWS account with proper permissions.
Go to Amazon Elastic Container Service on your AWS account, and click on "Task definitions".
Create a new task definition. Fill out the following options. For any options not mentioned, assume the default option.
Task definition family
- This can be whatever you'd like. We will call it datalus-generator
Launch type
- Make sure AWS Fargate is enabled
CPU
and Memory
this will dictate the overall performance of your data generation. We'll leave it as the default 1 vCPU
and 4 GB
, but this can always be changed later
Container - 1
- This will be the container that runs the entire generator
name
- You can call this whatever you'd like. We'll call itgenerator
- Image URI -
docker pull public.ecr.aws/j2n6g9f5/datalus-generator
. - Port Mapping
- Container Port
1864
(TODO can this be anything) - Protocol
TCP
- Port name
generator-port
. This can be anyhting you'd like - App protocol
HTTP
- Container Port
Feel free to adjust the other settings such as Resource limits
or Storage
based on your workloads
Now that we have a Task definition, we can use this definition to generate a cluster that will manage the workloads. Go to "Clusters" in ECS, and create a new cluster. For our purposes, you just need to give it a name, such as datalus-generator-cluster
and make sure AWS Fargate (serverless)
is enabled. Feel free to configure the other settings, and click create.
AWS Subnets & Security Groups
You need to setup the networking settings to properly run and connect to your Fargate instances. Datalus requires you to pass in a list of subnets and security groups to your configuration file to successfully launch a generator instance on ECS Fargate.
These generally look something like this
AWS_ECS_SUBNETS=subnet-7a4bd2b4091595b0,subnet-750a54e04ab3827c,subnet-aa8fb7f07baf7bb4,subnet-150e617293292cdf,subnet-0068d5025383ea37f,subnet-41761b6d8ccef3a2
AWS_ECS_SECURITY_GROUPS=sg-ddfa6927717d772b
The high level steps to do this are
- Create a Virtual Private Cloud (VPC) - You should already have a default one in your AWS account
- Create subnets in the VPC - The default one should already have a list of subnets to use
- Create a security group in the VPC that accepts incoming/outgoing
HTTP
traffic from the Datalus API server outgoing traffic to the Docker Image repository. See here for more details on Fargate Networking.
Be careful to keep the security group not publically accessible. Only the Datalus API server should have access to the generator instances. Along with that, generator instances should be siloed from eachother, as there is no need for them to access eachother. All orchestration is done through the Datalus API.
A more comprehensive guide can be found here.
Setting up generated data storage
AWS S3 storage mode stores all generated data in an S3 bucket as Gzipped files. This allows for generating large amounts of data without slowing down your database. This is the recommended mode for most purposes.
To use S3 Storage Mode, simply create a bucket and name it whatever you'd like. Make sure that your AWS Access Key has permissions to access this bucket.
PostgreSQL Database
As long as your Database is running and accessible, no further action is required before installing. The Datalus CLI will automatically generate the tables in a schema of your choosing.
Installing with CLI
The Datalus CLI helps configure everything for you to quickly get started. Open up your terminal in the root directory of your Datalus download.
Run npm run setup
. This will download the required packages and will start the the Datalus setup CLI
Datalus Settings
You'll be prompted to fill out the configuration options for Datalus. Make sure you choose the correct data storage and data generation modes.
You will then need to enter an admin password. This will be the password you use to login to the admin
account. This account has full control over the users in the Datalus instance. This can be changed later
If you are running this instance of a domain (not localhost), make sure to enter the Cookie Domain where you will host it. This is important for user sessions.
Database Configuration
You will be prompted to enter your database credentials and schema name. Verify the schema you select is not already being used on your database. Datalus expects an empty schema.
Enabling SSL - Reject unauthorized connections will verify that the connected database matches the Certificate Authority you provide next. Disabling this will skip this check and will always connect.
The CLI will now ask if you'd like to skip the schema generation. If you have not generated the schema, do NOT skip this step. The database will test the connection and then generate the entire Datalus schema in your database.
AWS Configuration
If you are using either AWS storage or data generation modes, you will need to enter your AWS API keys, region, S3 options, and ECS options.
LLM Configuration
You will be prompted to input your OpenAI API key.
Getting started
The CLI will process a few more things such as building the front end, pulling the docker container (for local generation mode), and generating authentication keys. Give this a few moments, and everything should complete.
Running Datalus
Now, run npm run start
from the root of your directory and Datalus should start running