ECS Fargate (Recommended)
AWS ECS Fargate is a serverless compute engine that can run Docker containers. Datalus recommends ECS to generate data as it offers increased security when running untrusted code.
This will walk you through the steps of setting up the Datalus Generator service on AWS ECS Fargate.
First, you need an AWS account with proper permissions.
Go to Amazon Elastic Container Service on your AWS account, and click on "Task definitions".
Create a new task definition. Fill out the following options. For any options not mentioned, assume the default option.
Task definition family
- This can be whatever you'd like. We will call it datalus-generator
Launch type
- Make sure AWS Fargate is enabled
CPU
and Memory
this will dictate the overall performance of your data generation. We'll leave it as the default 1 vCPU
and 4 GB
, but this can always be changed later
Container - 1
- This will be the container that runs the entire generator
name
- You can call this whatever you'd like. We'll call itgenerator
- Image URI -
docker pull public.ecr.aws/j2n6g9f5/datalus-generator
. - Port Mapping
- Container Port
1864
(TODO can this be anything) - Protocol
TCP
- Port name
generator-port
. This can be anyhting you'd like - App protocol
HTTP
- Container Port
Feel free to adjust the other settings such as Resource limits
or Storage
based on your workloads
Now that we have a Task definition, we can use this definition to generate a cluster that will manage the workloads. Go to "Clusters" in ECS, and create a new cluster. For our purposes, you just need to give it a name, such as datalus-generator-cluster
and make sure AWS Fargate (serverless)
is enabled. Feel free to configure the other settings, and click create.
AWS Subnets & Security Groups
You need to setup the networking settings to properly run and connect to your Fargate instances. Datalus requires you to pass in a list of subnets and security groups to your configuration file to successfully launch a generator instance on ECS Fargate.
These generally look something like this
AWS_ECS_SUBNETS=subnet-7a4bd2b4091595b0,subnet-750a54e04ab3827c,subnet-aa8fb7f07baf7bb4,subnet-150e617293292cdf,subnet-0068d5025383ea37f,subnet-41761b6d8ccef3a2
AWS_ECS_SECURITY_GROUPS=sg-ddfa6927717d772b
The high level steps to do this are
- Create a Virtual Private Cloud (VPC) - You should already have a default one in your AWS account
- Create subnets in the VPC - The default one should already have a list of subnets to use
- Create a security group in the VPC that accepts incoming/outgoing
HTTP
traffic from the Datalus API server outgoing traffic to the Docker Image repository. See here for more details on Fargate Networking.
Be careful to keep the security group not publically accessible. Only the Datalus API server should have access to the generator instances. Along with that, generator instances should be siloed from eachother, as there is no need for them to access eachother. All orchestration is done through the Datalus API.
A more comprehensive guide can be found here.
Configuration
Now that your AWS resources are setup, make sure your datalus.conf
file has
the AWS ECS options properly configured
VM_MODE=aws
AWS_ACCESS_KEY={{AWS ACCESS KEY GOES HERE}}
AWS_SECRET_ACCESS_KEY={{AWS SECRET ACCESS KEY GOES HERE}}
AWS_REGION=us-east-1
AWS_ECS_SUBNETS={{subnet 1}},{{subnet 2}},...,{{subnet N}}
AWS_ECS_SECURITY_GROUPS={{security group 1}},{{security group 2}},...,{{security group N}}
AWS_ECS_CLUSTER={{CLUSTER NAME GOES HERE (i.e. datalus-generator-cluster)}}
AWS_ECS_TASK_DEFINITION={{TASK DEFINITION NAME GOES HERE (i.e. datalus-generator)}}
Verify that the AWS_REGION
field matches the region the cluster is in.
Restart your API server and jobs should not be generated through Fargate.