In this chapter we will enable Prometheus metrics collection from an ECS cluster. In this scenario, we will use the Prometheus Receiver to scrape from application and the AWS ECS Container Metrics Receiver to scrape infrastructure metrics.
We will deploy sample app which has ADOT Collector and a Prometheus metric emitter.
Our ADOT Collector configuration will contain two pipelines:
In the Cloud9 workspace, run the following commands:
cd ~/environment/ecsdemo-amp/cdk
export AMP_WORKSPACE_ID=$(aws amp list-workspaces --alias ecs-workshop --query 'workspaces[*].workspaceId' --output text | awk '{print $1;}')
export AMP_Prometheus_Endpoint=$(aws amp describe-workspace --workspace-id $AMP_WORKSPACE_ID --query 'workspace.prometheusEndpoint' --output text)
export AMP_Prometheus_Remote_Write_Endpoint='"'${AMP_Prometheus_Endpoint}api/v1/remote_write'"'
sed -i -e "s~{{endpoint}}~$AMP_Prometheus_Remote_Write_Endpoint~" ecs-fargate-adot-config.yaml
sed -i -e "s~{{region}}~$AWS_REGION~" ecs-fargate-adot-config.yaml
cdk synth
cdk diff
cdk deploy --require-approval never
For the Prometheus sample application, we simply want to run containers from a docker images, but still need to figure out how to deploy it and get it behind a scheduler. To do this on our own, we would need to build a VPC, ECS cluster, Task definition and ECS service. To build these components on our own would equate to hundreds of lines of CloudFormation, whereas with the higher level constructs that the cdk provides, we are able to build everything with 80 lines of code.
class AmpService(cdk.Stack):
def __init__(self, scope: cdk.Stack, id: str, **kwargs):
super().__init__(scope, id, **kwargs)
self.vpc = ec2.Vpc(self, "VPC")
self.ecs_cluster = ecs.Cluster(self, "DemoCluster", vpc=self.vpc)
with open("ecs-fargate-adot-config.yaml", 'r') as f:
adot_config = f.read()
self.fargate_task_def = ecs.TaskDefinition(
self, "aws-otel-FargateTask",
compatibility=ecs.Compatibility.EC2_AND_FARGATE,
cpu='256',
memory_mib='1024'
)
self.adot_log_grp = logs.LogGroup(
self, "AdotLogGroup",
removal_policy=cdk.RemovalPolicy.DESTROY
)
self.app_log_grp = logs.LogGroup(
self, "AppLogGroup",
removal_policy=cdk.RemovalPolicy.DESTROY
)
self.otel_container = self.fargate_task_def.add_container(
"aws-otel-collector",
image=ecs.ContainerImage.from_registry("public.ecr.aws/aws-observability/aws-otel-collector:latest"),
memory_reservation_mib=512,
logging=ecs.LogDriver.aws_logs(
stream_prefix='/ecs/ecs-aws-otel-sidecar-collector-cdk',
log_group=self.adot_log_grp
),
environment={
"REGION": getenv('AWS_REGION'),
"AOT_CONFIG_CONTENT": adot_config
},
)
self.prom_container = self.fargate_task_def.add_container(
"prometheus-sample-app",
image=ecs.ContainerImage.from_docker_image_asset(
asset=ecr_a.DockerImageAsset(
self, "PromAppImage",
directory='../prometheus'
)
),
memory_reservation_mib=256,
logging=ecs.LogDriver.aws_logs(
stream_prefix='/ecs/prometheus-sample-app-cdk',
log_group=self.app_log_grp
),
environment={
"REGION": getenv('AWS_REGION')
},
)
self.fargate_service = ecs.FargateService(
self, "AmpFargateService",
service_name='aws-otel-FargateService',
task_definition=self.fargate_task_def,
cluster=self.ecs_cluster,
desired_count=1,
)
self.fargate_task_def.add_to_task_role_policy(
iam.PolicyStatement(
actions=[
"logs:PutLogEvents",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:DescribeLogStreams",
"logs:DescribeLogGroups",
"ssm:GetParameters",
"aps:RemoteWrite"
],
resources=['*']
)
)