Skip to main content

ECS Fargate (Recommended)

AWS ECS Fargate is a serverless compute engine that can run Docker containers. Datalus recommends ECS to generate data as it offers increased security when running untrusted code.

This will walk you through the steps of setting up the Datalus Generator service on AWS ECS Fargate.

First, you need an AWS account with proper permissions.

Go to Amazon Elastic Container Service on your AWS account, and click on "Task definitions".

Create a new task definition. Fill out the following options. For any options not mentioned, assume the default option.

Task definition family - This can be whatever you'd like. We will call it datalus-generator

Launch type - Make sure AWS Fargate is enabled

CPU and Memory this will dictate the overall performance of your data generation. We'll leave it as the default 1 vCPU and 4 GB, but this can always be changed later

Container - 1 - This will be the container that runs the entire generator

  • name - You can call this whatever you'd like. We'll call it generator
  • Image URI - docker pull public.ecr.aws/j2n6g9f5/datalus-generator.
  • Port Mapping
    • Container Port 1864 (TODO can this be anything)
    • Protocol TCP
    • Port name generator-port. This can be anyhting you'd like
    • App protocol HTTP

Feel free to adjust the other settings such as Resource limits or Storage based on your workloads

Now that we have a Task definition, we can use this definition to generate a cluster that will manage the workloads. Go to "Clusters" in ECS, and create a new cluster. For our purposes, you just need to give it a name, such as datalus-generator-cluster and make sure AWS Fargate (serverless) is enabled. Feel free to configure the other settings, and click create.

AWS Subnets & Security Groups

You need to setup the networking settings to properly run and connect to your Fargate instances. Datalus requires you to pass in a list of subnets and security groups to your configuration file to successfully launch a generator instance on ECS Fargate.

These generally look something like this

AWS_ECS_SUBNETS=subnet-7a4bd2b4091595b0,subnet-750a54e04ab3827c,subnet-aa8fb7f07baf7bb4,subnet-150e617293292cdf,subnet-0068d5025383ea37f,subnet-41761b6d8ccef3a2
AWS_ECS_SECURITY_GROUPS=sg-ddfa6927717d772b

The high level steps to do this are

  1. Create a Virtual Private Cloud (VPC) - You should already have a default one in your AWS account
  2. Create subnets in the VPC - The default one should already have a list of subnets to use
  3. Create a security group in the VPC that accepts incoming/outgoing HTTP traffic from the Datalus API server outgoing traffic to the Docker Image repository. See here for more details on Fargate Networking.

Be careful to keep the security group not publically accessible. Only the Datalus API server should have access to the generator instances. Along with that, generator instances should be siloed from eachother, as there is no need for them to access eachother. All orchestration is done through the Datalus API.

A more comprehensive guide can be found here.

Configuration

Now that your AWS resources are setup, make sure your datalus.conf file has the AWS ECS options properly configured

VM_MODE=aws
AWS_ACCESS_KEY={{AWS ACCESS KEY GOES HERE}}
AWS_SECRET_ACCESS_KEY={{AWS SECRET ACCESS KEY GOES HERE}}
AWS_REGION=us-east-1

AWS_ECS_SUBNETS={{subnet 1}},{{subnet 2}},...,{{subnet N}}
AWS_ECS_SECURITY_GROUPS={{security group 1}},{{security group 2}},...,{{security group N}}
AWS_ECS_CLUSTER={{CLUSTER NAME GOES HERE (i.e. datalus-generator-cluster)}}
AWS_ECS_TASK_DEFINITION={{TASK DEFINITION NAME GOES HERE (i.e. datalus-generator)}}

Verify that the AWS_REGION field matches the region the cluster is in.

Restart your API server and jobs should not be generated through Fargate.