Schedule edge Services with native service discovery
Edge computing lets organizations run workloads closer to their users. This proximity unlocks several benefits:
- Decreased latency. Data does not need to travel to distant data centers for processing. This decreases network latency which provides a better user experience. This benefit is crucial for CDN providers and online game servers.
- Privacy and Compliance. Edge computing increases privacy by storing and processing user data close to the user, ensuring data doesn't leave the geographic region. This benefit is especially important for regulated industries like healthcare and financial services and with regulations like GDPR.
- Smart device fleet management. Edge computing lets you collect data, monitor, and control internet of things (IoT) devices and sensors. This benefit is useful for any industries that need to manage fleets of remote devices, like agriculture, manufacturers, and more.
However, when organizations adopt edge computing, they run into challenges like managing heterogeneous devices (different processors, operating systems, etc), resource constrained devices, and intermittent connectivity.
Nomad addresses these challenges, making it an attractive edge orchestrator. The Nomad client agent is a single binary with a small footprint, limited resource consumption, and the ability to run on different types of devices. In addition, Nomad supports geographically distant clients, which means a Nomad server cluster does not need to run near the client.
Since Nomad 1.3, native service discovery simplifies connecting Nomad tasks where you cannot use a single service mesh and removes the need to manage a separate Consul cluster. Nomad's native service discovery also removes the need to install a Consul agent on each edge device. This reduces Nomad's resource footprint even further, so you can run and support more workloads on the edge. Additionally, disconnected client allocations reconnect gracefully, handling situations when edge devices experience network latency or temporary connectivity loss.
In this tutorial, you will deploy a single Nomad server cluster with distant clients edge architecture in two AWS regions. One region, representing an on-premise data center, will host the Nomad server cluster and one client. The other region, representing the edge data center, will host two Nomad clients. Then, you will schedule HashiCups, a demo application, on both on-prem and edge data centers, connecting its services with Nomad's native service discovery. Finally, you will simulate unstable network connectivity between the Nomad clients and the server to test how Nomad handles client disconnection and reconnection. In the process, you will learn how these features make Nomad an ideal edge scheduler.
HashiCups overview
HashiCups is a demo application that lets you view and order customized
HashiCorp branded coffee. The HashiCups application consists of a frontend React
application and multiple backend services. The HashiCups backend consists of a
GraphQL backend (public-api
), products API (product-api
), a Postgres
database, and a payments API (payment-api
).The product-api
connects to both
the public-api
and database to store and return information about HashiCups
coffees, users, and orders.
You will deploy the HashiCups application to two Nomad data centers. The primary data center will host the HashiCups database and product API. The edge data center will host the remaining HashiCups backend (public API, payments API) and the frontend (frontend and NGINX reverse proxy). This architecture decreases latency for users by placing the frontend services closer to them. In addition, sensitive payment information remains on the edge — HashiCups does not need to send this data to the primary data center, reducing potential attack surfaces.
Prerequisites
The tutorial assumes that you are familiar with Nomad. If you are new to Nomad itself, refer first to the Get Started tutorials.
For this tutorial, you will need:
- Packer 1.8 or later installed locally.
- Terraform 1.1.7 or later installed locally.
- Nomad 1.3 or later installed locally.
- An AWS account with credentials set as local environment variables
Note
This tutorial creates AWS resources that may not qualify as part of the AWS free tier. Be sure to follow the Cleanup process at the end so you don't incur any additional unnecessary charges.
Clone the example repository
In your terminal, clone the example repository. This repository contains all the Terraform, Packer, and Nomad configuration files you will need to complete this tutorial.
$ git clone https://github.com/hashicorp-education/learn-nomad-edge
Navigate to the cloned repository.
$ cd learn-nomad-edge
Now, checkout the tagged version verified for this tutorial.
$ git checkout tags/v1.0.0
Create SSH key
Later in this tutorial, you will need to connect to your Nomad agent to bootstrap ACLs.
Create a local SSH key to pair with the terraform
user so you can securely
connect to your Nomad agents.
Generate a new SSH key called learn-nomad-edge
. The argument provided with the
-f
flag creates the key in the current directory and creates two files called
learn-nomad-edge
and learn-nomad-edge.pub
. Change the placeholder email
address to your email address.
$ ssh-keygen -t rsa -C "your_email@example.com" -f ./learn-nomad-edge
When prompted, press enter to leave the passphrase blank on this key.
Review and build Nomad images
Navigate to the packer
directory.
$ cd packer
This directory contains all the files used to build AMIs in the us-east-2
and
us-west-1
AWS regions that contain the Nomad 1.5.3 binary and your previously
created SSH public key.
$ tree
.
├── config
│ ├── nomad.hcl
│ ├── nomad-acl-user.hcl
│ ├── nomad-client.hcl
│ └── nomad.service
└── scripts
│ ├── client.sh
│ ├── server.sh
│ └── setup.sh
└── nomad.pkr.hcl
The
config
directory contains configuration files for the Nomad agents.The
nomad.hcl
file configures the Nomad servers. Since the primary and edge data centers are on different networks, the server must advertise its public IP address so the Nomad clients can successfully connect to the server cluster.packer/config/nomad.hcl
advertise { http = "IP_ADDRESS:4646" rpc = "IP_ADDRESS:4647" serf = "IP_ADDRESS:4648" }
The
scripts/server.sh
script will replace the placeholders (IP_ADDRESS
,SERVER_COUNT
, andRETRY_JOIN
) when the server starts. The Nomad servers also have ACL enabled.The
nomad-acl-user.hcl
file defines the ACL policies.The
nomad-client.hcl
file configures the Nomad clients. Since the primary and edge data centers are on different networks, the client must advertise its public IP address so the Nomad clients can successfully connect to the other Nomad clients.packer/config/nomad.hcl
advertise { http = "IP_ADDRESS:4646" rpc = "IP_ADDRESS:4647" serf = "IP_ADDRESS:4648" }
The
scripts/client.sh
script will replace the placeholders (DATACENTER
,SERVER_NAME
, andRETRY_JOIN
) when the client starts. The Nomad clients also have ACL enabled.The
nomad.service
defines a systemd process. This makes it easier to start, stop, and restart Nomad on the agents.
The
scripts
directory contains helper scripts. Thesetup.sh
script creates theterraform
user, adds the public SSH key, and installs Nomad 1.5.3 and Docker. Theclient.sh
andserver.sh
scripts configure their respective Nomad agents.The
nomad.pkr.hcl
Packer template file defines the AMIs. It uses thescripts/setup.sh
to set up Nomad agents on an Ubuntu 20.04 image.
Build Nomad images
Initialize Packer to retrieve the required plugins.
$ packer init nomad.pkr.hcl
Build the image.
$ packer build nomad.pkr.hcl
## ...
Build 'amazon-ebs.nomad-secondary' finished after 3 minutes 42 seconds.
Build 'amazon-ebs.nomad-primary' finished after 8 minutes 52 seconds.
==> Wait completed after 8 minutes 52 seconds
==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs.nomad-secondary: AMIs were created:
us-west-1: ami-0183bb1e3ab40da53
--> amazon-ebs.nomad-primary: AMIs were created:
us-east-2: ami-08a59e91d881df603
Packer will display the two AMIs. You will use these AMIs in the next section to deploy the Nomad server cluster and clients.
Review and deploy Nomad cluster and clients
Navigate to the cloned repository's root directory. This directory contains Terraform configuration to deploy all the resources you will use in this tutorial.
$ cd ..
Open main.tf
. This file contains the Terraform configuration to deploy the
underlying shared resources and Nomad agents to the two AWS regions through the
single server cluster and distant client (SCDC) edge architecture. As opposed to
deploying a Nomad server cluster at every edge location, this edge architecture
is simpler, scalable, has a smaller resource consumption footprint, and avoids
server federation. However, it requires more client to server connection
configuration, especially around heartbeats and unstable connectivity.
The
primary_shared_resources
andedge_shared_resources
modules use theshared-resources
module to deploy a VPC, security groups, and IAM roles into their respective regions.The
primary_nomad_servers
module uses thenomad-server
module to deploy a three node Nomad server cluster in the primary data center (us-east-2
). Notice that it usesvar.primary_ami
for its AMI.main.tf
module "primary_nomad_servers" { source = "./nomad-server" region = "us-east-2" ## ... ami = var.primary_ami server_instance_type = "t2.micro" server_count = 3 }
The
primary_nomad_clients
module uses thenomad-client
module to deploy two Nomad clients in the primary data center (us-east-2
). Notice that it uses the same AMI (var.primary_ami
) as the server agent — the user script (nomad-client/data-scripts/user-data-client.sh
) configures the Nomad agent as a client — and definesnomad_dc
asdc1
.main.tf
module "primary_nomad_clients" { source = "./nomad-client" region = "us-east-2" ## ... ami = var.primary_ami client_instance_type = "t2.small" client_count = 1 nomad_dc = "dc1" }
The
edge_nomad_clients
module uses thenomad-client
module to deploy one Nomad client in the edge data center (us-west-1
). Notice that it usesvar.edge_ami
for its AMI and definesnomad_dc
asdc2
.main.tf
module "edge_nomad_clients" { source = "./nomad-client" region = "us-west-1" ## ... ami = var.edge_ami client_instance_type = "t2.small" client_count = 2 nomad_dc = "dc2" }
Define AMI IDs
Update terraform.tfvars
to reflect the AMI IDs you built with Packer. The
primary_ami
should reference the AMI created in us-east-2
; the edge_ami
should reference the AMI created in us-west-1
.
terraform.tfvars
primary_ami = "REPLACE_WITH_BUILD_AMI_ID"
edge_ami = "REPLACE_WITH_BUILD_EDGE_AMI_ID"
Deploy Nomad cluster and clients
Initialize your Terraform configuration.
$ terraform init
Initializing modules...
- edge_nomad_clients in nomad-client
- edge_shared_resources in shared-resources
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.13.0 for edge_shared_resources.vpc...
- edge_shared_resources.vpc in .terraform/modules/edge_shared_resources.vpc
- primary_nomad_clients in nomad-client
- primary_nomad_servers in nomad-server
- primary_shared_resources in shared-resources
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.13.0 for primary_shared_resources.vpc...
- primary_shared_resources.vpc in .terraform/modules/primary_shared_resources.vpc
Initializing the backend...
Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Installing hashicorp/aws v4.6.0...
- Installed hashicorp/aws v4.6.0 (signed by HashiCorp)
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed by HashiCorp)
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Then, apply your configuration to create the resources. Respond yes
to the
prompt to confirm the apply.
$ terraform apply
## ...
Plan: 41 to add, 0 to change, 0 to destroy.
## ...
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
## ...
Apply complete! Resources: 41 added, 0 changed, 0 destroyed.
Outputs:
edge_dc_nomad_client = "184.169.204.238"
nomad_lb_address = "http://learn-nomad-edge-server-lb-1934725976.us-east-2.elb.amazonaws.com:4646"
nomad_primary_dc_clients = [
"3.15.5.228",
]
nomad_server = "18.191.0.46"
nomad_server_1 = "3.145.196.167"
nomad_server_2 = "3.144.15.124"
nomad_servers = [
"18.191.0.46",
"3.145.196.167",
"3.144.15.124",
]
primary_dc_nomad_client = "3.15.5.228"
Once Terraform finishes provisioning the resources, display the
nomad_lb_address
Terraform output.
$ terraform output -raw nomad_lb_address
http://learn-nomad-edge-server-lb-1934725976.us-east-2.elb.amazonaws.com:4646
Open the link in your web browser to go to the Nomad UI. It should show an unauthorized page, since you have not provided the ACL bootstrap token.
Bootstrap Nomad ACL
Connect to one of your Nomad servers via SSH.
$ ssh terraform@$(terraform output -raw nomad_server) -i ./learn-nomad-edge
Run the following command to bootstrap the initial ACL token, parse the bootstrap token, and export it as an environment variable.
Inside Nomad server
$ export NOMAD_BOOTSTRAP_TOKEN=$(nomad acl bootstrap | grep -i secret | awk -F '=' '{print $2}')
Then, apply the ACL policy. This is the ACL policy defined in
packer/config/nomad-acl-user.hcl
.
Inside Nomad server
$ nomad acl policy apply -token $NOMAD_BOOTSTRAP_TOKEN -description "Policy to allow reading of agents and nodes and listing and submitting jobs in all namespaces." node-read-job-submit /ops/config/nomad-acl-user.hcl
Successfully wrote "node-read-job-submit" ACL policy!
Finally, create an ACL token for that policy. Keep this token in a safe place, you will use it in the next section to authenticate the Nomad UI to view the Nomad agents and jobs.
Inside Nomad server
$ nomad acl token create -token $NOMAD_BOOTSTRAP_TOKEN -name "read-token" -policy node-read-job-submit | grep -i secret | awk -F "=" '{print $2}'
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Create a management token. Unlike the previous ACL token, this management token can perform all operations. You will use this in future sections to authenticate the Nomad CLI to deploy jobs.
Inside Nomad server
$ nomad acl token create -token $NOMAD_BOOTSTRAP_TOKEN -type="management" -global=true -name="Replication Token" | grep -i secret | awk -F "=" '{print $2}'
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Close the SSH connection.
Inside Nomad server
$ exit
Verify Nomad cluster and clients
Go to the Nomad UI and click on ACL Tokens in the top right corner. Enter the management ACL token in the Secret ID field and click on Set Token. You now have read permissions in the Nomad UI.
Click on Servers to confirm there are three nodes in your Nomad server cluster.
Click on Clients to confirm there are three clients — one in the primary
data center (dc1
) and two in the edge data center (dc2
).
Connect to Nomad servers
You need to set the NOMAD_ADDR
and NOMAD_TOKEN
environment variables so your
local Nomad binary can connect to the Nomad cluster.
First, set the NOMAD_ADDR
environment variable to one of your Nomad servers.
$ export NOMAD_ADDR="http://$(terraform output -raw nomad_server):4646"
Then, set the NOMAD_TOKEN
environment variable to the management token you
created in the previous step.
$ export NOMAD_TOKEN=
List the Nomad server members to verify you successfully configured your Nomad binary.
$ nomad server members
Name Address Port Status Leader Raft Version Build Datacenter Region
ip-10-0-101-18.global 18.191.0.46 4648 alive false 3 1.5.3 dc1 global
ip-10-0-101-57.global 3.145.196.167 4648 alive true 3 1.5.3 dc1 global
ip-10-0-101-69.global 3.144.15.124 4648 alive false 3 1.5.3 dc1 global
Review HashiCups jobs
The jobs
directory contains the HashiCups jobs you will schedule in the
primary and edge data centers.
Review the HashiCups job
Open jobs/hashicups.nomad.hcl
. This Nomad job file will deploy the HashiCups
database and product-api
to the primary data center.
The hashicups
job contains a hashicups
group which defines the HashiCups
database and product-api
tasks. Nomad will only deploy this job in the primary
datacenter (var.datacenters
).
jobs/hashicups.nomad.hcl
## ...
# Begin Job Spec
job "hashicups" {
type = "service"
region = var.region
datacenters = var.datacenters
## ...
}
In the db
task, find the service
stanza.
jobs/hashicups.nomad.hcl
## ...
job "hashicups" {
## ...
group "hashicups" {
## ...
task "db" {
driver = "docker"
meta {
service = "database"
}
service {
port = "db"
tags = ["hashicups", "backend"]
provider = "nomad"
address = attr.unique.platform.aws.public-ipv4
}
## ...
}
}
Since this job file defines the service provider as nomad
, Nomad will register
the service in its built-in service discovery. This will enable other Nomad
tasks to query and connect to the service. Nomad's native service discovery lets
you register and query services. Unlike Consul, it does not provide a service
mesh and route traffic. This is preferable for edge computing where unstable
connectivity could impact service mesh. In addition, it reduces resource
consumption since you do not need to run a Consul agent on each edge device.
Notice that the service stanza defines the address
to the attribute associated
with the EC2 instance's public IP address. Since the EC2 instance's kernel is
unaware of its public IP address, Nomad cannot advertise the public IP address
by default. For edge workloads that want to communicate with each other over the
public Internet (like the HashiCups demo application), you must set the
address
to the attribute associated with the EC2 instance's public IP address
for Nomad's native service discovery to list the correct address to connect to.
jobs/hashicups.nomad.hcl
service {
port = "db"
tags = ["hashicups", "backend"]
provider = "nomad"
address = attr.unique.platform.aws.public-ipv4
}
The product-api
task has a similar service stanza. This advertises the
product-api
's address and port number, letting the public-api
query Nomad's
service discovery to connect to the product-api
service.
In the product-api
task, find the template
stanza.
jobs/hashicups.nomad.hcl
## ...
job "hashicups" {
## ...
group "hashicups" {
## ...
task "product-api" {
driver = "docker"
meta {
service = "product-api"
}
template {
data = <<EOH
{{ range nomadService "hashicups-hashicups-db" }}
DB_CONNECTION="host={{ .Address }} port={{ .Port }} user=${var.postgres_user} password=${var.postgres_password} dbname=${var.postgres_db} sslmode=disable"
{{ end }}
EOH
destination = "local/env.txt"
env = true
}
## ...
}
}
This template queries Nomad's native service
discovery
for the hashicups-hashicups-db
service's address and port. It uses these
values to populate the DB_CONNECTION
environment variable which lets the
product-api
connect to the database.
Review the HashiCups edge job
Open jobs/hashicups-edge.nomad.hcl
. This Nomad job file will deploy the remaining
HashiCups backend and the frontend to the edge data center.
The hashicups-edge
job contains a hashicups-edge
group, which defines the
remaining HashiCups tasks. Nomad will only deploy this job in the edge
datacenter (
var.datacenters`).
jobs/hashicups-edge.nomad.hcl
## ...
# Begin Job Spec
job "hashicups-edge" {
type = "service"
region = var.region
datacenters = var.datacenters
## ...
}
Find the max_client_disconnect
attribute inside the group
stanza.
jobs/hashicups-edge.nomad.hcl
## ...
job "hashicups" {
## ...
group "hashicups-edge" {
## ...
max_client_disconnect = "1h"
## ...
}
}
If you do not set this attribute, Nomad runs its default behavior: when a Nomad client fails its heartbeat, Nomad will mark the client as down and the allocation as lost. Nomad will automatically schedule a new allocation on another client. However, if the down client reconnects to the server, it will shut down its existing allocations. This is suboptimal since Nomad will stop running allocations on a reconnected client just to place identical ones.
For many edge workloads, especially ones with high latency or unstable network
connectivity, this is disruptive since a disconnected client does not
necessarily mean the client is down. The allocations may continue to run on the
temporarily disconnected client. For these cases, you want to set the
max_client_disconnect
attribute to gracefully handle disconnected client
allocation.
If max_client_disconnect
is set, when the client disconnects, Nomad will still
schedule the allocation on another client. However, when the client reconnects:
- Nomad will mark the reconnected client as ready.
- If there are multiple job versions, Nomad will select the latest job version and stop all other allocations.
- If Nomad rescheduled the lost allocation to a new client and the new client has a higher node rank, Nomad will continue the allocations in the new client and stop all others.
- If the new client has a worse node rank or there is a tie, Nomad will resume the allocations on the reconnected client and stop all others.
This is the preferred behavior for edge workloads with high latency or unstable network connectivity, and especially true when the disconnected allocation is stateful.
In the public-api
task, find the template
stanza.
jobs/hashicups.nomad.hcl
## ...
job "hashicups-edge" {
## ...
group "hashicups-edge" {
## ...
task "public-api" {
driver = "docker"
meta {
service = "public-api"
}
template {
data = <<EOH
{{ range nomadService "hashicups-hashicups-product-api" }}
PRODUCT_API_URI="http://{{.Address}}:{{.Port}}"
{{ end }}
EOH
change_mode = "noop"
destination = "local/env.txt"
env = true
}
## ...
}
}
This template queries Nomad's native service discovery for the
hashicups-hashicups-product-api
service's address and port. In addition, this
template stanza sets change_mode
to noop
. By default, change_mode
is set
to restart
, which will cause your task to fail if your client is unable to
connect to the Nomad server. Since Nomad is scheduling this job on the edge
datacenter, if the edge client disconnects from the Nomad server (and therefore
service discovery), the service will use the previously configured address and
ports.
Schedule HashiCups jobs
Submit the hashicups
job to deploy the tasks to the primary data center.
$ nomad job run jobs/hashicups.nomad.hcl
==> 2022-04-24T14:47:17-07:00: Monitoring evaluation "2e20a4db"
2022-04-24T14:47:17-07:00: Evaluation triggered by job "hashicups"
==> 2022-04-24T14:47:18-07:00: Monitoring evaluation "2e20a4db"
2022-04-24T14:47:18-07:00: Evaluation within deployment: "9b7f3d50"
2022-04-24T14:47:18-07:00: Allocation "3442ce01" created: node "0643838d", group "hashicups"
2022-04-24T14:47:18-07:00: Evaluation status changed: "pending" -> "complete"
==> 2022-04-24T14:47:18-07:00: Evaluation "2e20a4db" finished with status "complete"
==> 2022-04-24T14:47:18-07:00: Monitoring deployment "9b7f3d50"
✓ Deployment "9b7f3d50" successful
2022-04-24T14:47:30-07:00
ID = 9b7f3d50
Job ID = hashicups
Job Version = 0
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
hashicups 1 1 1 0 2022-04-24T21:57:28Z
Submit the hashicups-edge
job to deploy the tasks to the edge data center.
$ nomad job run jobs/hashicups-edge.nomad.hcl
==> 2022-04-24T14:47:47-07:00: Monitoring evaluation "8756c237"
2022-04-24T14:47:47-07:00: Evaluation triggered by job "hashicups-edge"
2022-04-24T14:47:47-07:00: Evaluation within deployment: "66d4779e"
2022-04-24T14:47:47-07:00: Allocation "48af7a5e" created: node "6ba84888", group "hashicups-edge"
2022-04-24T14:47:47-07:00: Evaluation status changed: "pending" -> "complete"
==> 2022-04-24T14:47:47-07:00: Evaluation "8756c237" finished with status "complete"
==> 2022-04-24T14:47:47-07:00: Monitoring deployment "66d4779e"
✓ Deployment "66d4779e" successful
2022-04-24T14:48:29-07:00
ID = 66d4779e
Job ID = hashicups-edge
Job Version = 0
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
hashicups-edge 1 1 1 0 2022-04-24T21:58:28Z
Verify HashiCups jobs
List the Nomad services. Notice the service name contains the job name, group
name, and task name, separated by a dash (-
).
$ nomad service list
Service Name Tags
hashicups-edge-hashicups-edge-frontend [frontend,hashicups]
hashicups-edge-hashicups-edge-nginx [frontend,hashicups]
hashicups-edge-hashicups-edge-payments-api [backend,hashicups]
hashicups-edge-hashicups-edge-public-api [backend,hashicups]
hashicups-hashicups-db [backend,hashicups]
hashicups-hashicups-product-api [backend,hashicups]
Retrieve detailed information about the nginx
service. Since there are two
Nomad clients on the edge datacenter, this command is useful to locate which
client the service is running on. Notice that the nginx
service's address
reflects the address defined by the advertise
stanza — the client's public IP
address.
$ nomad service info hashicups-edge-hashicups-edge-nginx
Job ID Address Tags Node ID Alloc ID
hashicups-edge 184.169.204.238:80 [hashicups,frontend] 6ba84888 e3b69fc2
Open the nginx
's address in your web browser to go to HashiCups.
Simulate client disconnect
When running and managing edge services, the network connection between your
Nomad servers and edge services may be unstable. In this step, you will simulate
the client running the hashicups-edge
job disconnecting from the Nomad servers
to learn how Nomad reacts to disconnected clients.
Retrieve the nginx
service's client IP address. For the example below, the
client IP address is 184.169.204.238
.
$ nomad service info hashicups-edge-hashicups-edge-nginx
Job ID Address Tags Node ID Alloc ID
hashicups-edge 184.169.204.238:80 [hashicups,frontend] 6ba84888 e3b69fc2
Export the client IP address as an environment variable named CLIENT_IP
. Do
not include the port. For example, the client IP address for this example would
be 184.169.204.238
.
$ export CLIENT_IP=
Run the following command to drop all packets from the Nomad servers to the
Nomad client that is currently hosting the hashicups-edge
job.
$ ssh terraform@$CLIENT_IP -i ./learn-nomad-edge \
'sudo iptables -I INPUT -s '$(terraform output -raw nomad_server)' -j DROP && \
sudo iptables -I INPUT -s '$(terraform output -raw nomad_server_1)' -j DROP && \
sudo iptables -I INPUT -s '$(terraform output -raw nomad_server_2)' -j DROP'
Verify disconnected client
Retrieve the hashicups-edge
job's status. Notice that one of the allocations's
status is now unknown
and Nomad rescheduled the allocation onto a different
client.
Tip
If the allocation status does not change, wait a couple of seconds before retrieving the job's status. If it does not change, verify that you dropped packets on the correct client.
$ nomad status hashicups-edge
## ...
Allocations
ID Node ID Task Group Version Desired Status Created Modified
40f52550 da109b44 hashicups-edge 0 run pending 9s ago 8s ago
48af7a5e 6ba84888 hashicups-edge 0 run unknown 2m39s ago 9s ago
This is the preferred behavior as the client instance is still up but could not connect to the Nomad status, like an edge network's unstable network connection.
List the nginx
service. Notice that Nomad lists both services. This is
because even though the original client cannot connect to the Nomad servers, it
does not necessarily mean that the client is unavailable. As a result, Nomad
continues to list the original client as available.
$ nomad service info hashicups-edge-hashicups-edge-nginx
Job ID Address Tags Node ID Alloc ID
hashicups-edge 13.57.34.53:80 [hashicups,frontend] da109b44 40f52550
hashicups-edge 184.169.204.238:80 [hashicups,frontend] 6ba84888 48af7a5e
Visit both addresses to find the HashiCups dashboard.
Re-enable client connection
Run the following command to re-accept packets from the Nomad servers.
$ ssh terraform@$CLIENT_IP -i ./learn-nomad-edge \
'sudo iptables -D INPUT -s '$(terraform output -raw nomad_server)' -j DROP && \
sudo iptables -D INPUT -s '$(terraform output -raw nomad_server_1)' -j DROP && \
sudo iptables -D INPUT -s '$(terraform output -raw nomad_server_2)' -j DROP'
Retrieve the hashicups-edge
job's status. Notice that the original client
status is now running
and rescheduled allocation on the new client is now
complete
.
Tip
If the allocation status does not change, wait a couple of seconds before retrieving the job's status. If it does not change, verify that you re-accepted packets on the correct client.
$ nomad status hashicups-edge
## ...
Allocations
ID Node ID Task Group Version Desired Status Created Modified
40f52550 da109b44 hashicups-edge 0 stop complete 3m42s ago 3s ago
48af7a5e 6ba84888 hashicups-edge 0 run running 6m12s ago 4s ago
Since the original client reconnected and the node rank on the rescheduled allocation is equal to or worse than the original client, Nomad resumed the original allocation and stopped the new one.
Retrieve the re-connected allocation's status to find the reconnect event,
replacing ALLOC_ID
with your re-connected allocation ID. In this example, it
is 48af7a5e
.
$ nomad alloc status ALLOC_ID
## ...
Recent Events:
Time Type Description
2022-04-24T14:53:55-07:00 Reconnected Client reconnected
2022-04-24T14:48:01-07:00 Started Task started by client
2022-04-24T14:47:48-07:00 Driver Downloading image
2022-04-24T14:47:47-07:00 Task Setup Building Task Directory
2022-04-24T14:47:47-07:00 Received Task received by client
List the nginx
service. Notice that Nomad removed the completed job – it only
lists the original service.
$ nomad service info hashicups-edge-hashicups-edge-nginx
Job ID Address Tags Node ID Alloc ID
hashicups-edge 184.169.204.238:80 [hashicups,frontend] 6ba84888 48af7a5e
Clean up resources
Run terraform destroy
to clean up your provisioned infrastructure. Respond
yes
to the prompt to confirm the operation.
$ terraform destroy
## ...
Plan: 0 to add, 0 to change, 20 to destroy.
## ...
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
## ...
Destroy complete! Resources: 20 destroyed.
Your AWS account still has the AMI and its S3-stored snapshots, which you may be charged for depending on your other usage. Delete the AMI and snapshots stored in your S3 buckets.
Note
Remember to delete the AMI images and snapshots in both regions
where you created them. If you didn't update the region
variable in the
terraform.tfvars
file, they will be in the us-east-2
and us-west-1
regions.
In your us-east-2
AWS account, deregister the
AMI
by selecting it, clicking on the Actions button, then the Deregister AMI
option, and finally confirm by clicking the Deregister AMI button in the
confirmation dialog.
Delete the snapshots by selecting the snapshots, clicking on the Actions button, then the Delete snapshot option, and finally confirm by clicking the Delete button in the confirmation dialog.
Then, delete the AMI images and snapshots in the us-west-1
region.
In your us-west-1
AWS account, deregister the
AMI
by selecting it, clicking on the Actions button, then the Deregister AMI
option, and finally confirm by clicking the Deregister AMI button in the
confirmation dialog.
Delete the snapshots by selecting the snapshots, clicking on the Actions button, then the Delete snapshot option, and finally confirm by clicking the Delete button in the confirmation dialog.
Next steps
In this tutorial, you deployed a single server cluster and distant client edge architecture. Then, you scheduled HashiCups on both on-prem and edge data centers, connecting its services with Nomad's native service discovery. Finally, you tested the disconnected client allocation by simulating unstable network connectivity between the Nomad clients and the server.
For more information, check out the following resources.
- Learn more about Nomad's native service discovery by visiting the Nomad documentation
- Read more about disconnected client allocation handling by visiting the Nomad documentation
- Complete the tutorials in the Nomad ACL System Fundamentals collection to configure a Nomad cluster for ACLs, bootstrap the ACL system, author your first policy, and grant a token based on the policy.