Logstash Cluster
This folder contains a Terraform module to deploy an Logstash cluster in AWS on top of an Auto Scaling Group. The idea is to create an Amazon Machine Image (AMI) that has Logstash installed using the install-logstash modules.
What's included in this module?
This module creates the following:
Auto Scaling Group
This module runs Logstash on top of an Auto Scaling Group (ASG). Typically,
you should run the ASG with multiple Instances spread across multiple Availability
Zones. Each of the EC2
Instances should be running an AMI that has Logstash installed via the install-logstash script.
You pass in the ID of the AMI to run using the ami_id
input parameter.
Load Balancer
We use a Network Load Balancer (1) so that we can perform ongoing health checks on each Logstash node, and (2) so that Filebeat can access the Logstash cluster via a single endpoint which will forward to a live Kibana endpoint at random.
Security Group
Each EC2 Instance in the ASG has a Security Group that allows minimal connectivity:
- All outbound requests
- Inbound SSH access from the CIDR blocks and security groups you specify
The ID of the security group is exported as an output variable, which you can use with the logstash-security-group-rules module to open up all the ports necessary for Logstash.
Check out the Security section for more details.
How do you roll out updates?
If you want to deploy a new version of Logstash across the cluster, the best way to do that is to:
Rolling deploy:
Build a new AMI.
Set the
ami_id
parameter to the ID of the new AMI.Run
terraform apply
.Because the logstash-cluster module uses the Gruntwork server-group modules under the hood, running
terraform apply
will automatically perform a zero-downtime rolling deployment. Specifically, one EC2 Instance at a time will be terminated, a new EC2 Instance will spawn in its place, and only once the new EC2 Instance passes the Load Balancer Health Checks will the next EC2 Instance be rolled out.Note that there will be a brief period of time during which EC2 Instances based on both the old
ami_id
and newami_id
will be running.
New cluster:
- Build a new AMI.
- Create a totally new ASG using the
logstash-cluster
module with theami_id
set to the new AMI, but all other parameters the same as the old cluster. - Wait for all the nodes in the new ASG to join the cluster and catch up on replication.
- Remove each of the nodes from the old cluster.
- Remove the old ASG by removing that
logstash-cluster
module from your code.
Security
Here are some of the main security considerations to keep in mind when using this module:
Security groups
This module attaches a security group to each EC2 Instance that allows inbound requests as follows:
SSH: For the SSH port (default: 22), you can use the
allowed_ssh_cidr_blocks
parameter to control the list of\ CIDR blocks that will be allowed access. You can use theallowed_inbound_ssh_security_group_ids
parameter to control the list of source Security Groups that will be allowed access.The ID of the security group is exported as an output variable, which you can use with the logstash-security-group-rules modules to open up all the ports necessary for Logstash and the respective.
SSH access
You can associate an EC2 Key Pair with each
of the EC2 Instances in this cluster by specifying the Key Pair's name in the ssh_key_name
variable. If you don't
want to associate a Key Pair with these servers, set ssh_key_name
to an empty string.
Reference
- Inputs
- Outputs
Required
ami_id
stringThe ID of the AMI to run in this cluster.
aws_region
stringThe AWS region the cluster will be deployed in.
cluster_name
stringThe name of the Logstash cluster (e.g. logstash-stage). This variable is used to namespace all resources created by this module.
instance_type
stringThe type of EC2 Instances to run for each node in the cluster (e.g. t2.micro).
lb_target_group_arns
list(string)The ALB taget groups with which to associate instances in this server group
size
numberThe number of nodes to have in the Logstash cluster.
subnet_ids
list(string)The subnet IDs into which the EC2 Instances should be deployed.
user_data
stringA User Data script to execute while the server is booting.
vpc_id
stringThe ID of the VPC in which to deploy the Logstash cluster
Optional
allowed_ssh_cidr_blocks
list(string)A list of CIDR-formatted IP address ranges from which the EC2 Instances will allow SSH connections
[]
allowed_ssh_security_group_ids
list(string)A list of security group IDs from which the EC2 Instances will allow SSH connections
[]
If set to true, associate a public IP address with each EC2 Instance in the cluster.
false
beats_port_cidr_blocks
list(string)A list of IP address ranges in CIDR format from which access to the Filebeat port will be allowed
[]
beats_port_security_groups
list(string)The list of Security Group IDs from which to allow connections to the beats_port. If you update this variable, make sure to update num_beats_port_security_groups
too!
[]
collectd_port
numberThe port on which CollectD will communicate with the Logstash cluster
8080
collectd_port_cidr_blocks
list(string)A list of IP address ranges in CIDR format from which access to the Collectd port will be allowed
[]
collectd_port_security_groups
list(string)The list of Security Group IDs from which to allow connections to the collectd_port. If you update this variable, make sure to update num_collectd_port_security_groups
too!
[]
ebs_optimized
boolIf true, the launched EC2 instance will be EBS-optimized.
false
ebs_volumes
list(object(…))A list that defines the EBS Volumes to create for each server. Each item in the list should be a map that contains the keys 'type' (one of standard, gp2, or io1), 'size' (in GB), and 'encrypted' (true or false). Each EBS Volume and server pair will get matching tags with a name of the format ebs-volume-xxx, where xxx is the index of the EBS Volume (e.g., ebs-volume-0, ebs-volume-1, etc). These tags can be used by each server to find and mount its EBS Volume(s).
list(object({
type = string
size = number
encrypted = bool
}))
[]
Example
default = [
{
type = "standard"
size = 100
encrypted = false
},
{
type = "gp2"
size = 300
encrypted = true
}
]
filebeat_port
numberThe port on which Filebeat will communicate with the Logstash cluster
5044
Time, in seconds, after instance comes into service before checking health.
600
health_check_type
stringThe type of health check to use. Must be one of: EC2 or ELB. If you associate any load balancers with this server group via elb_names
or alb_target_group_arns
, you should typically set this parameter to ELB.
"EC2"
The number of security group IDs in beats_port_security_groups
. We should be able to compute this automatically, but due to a Terraform limitation, if there are any dynamic resources in beats_port_security_groups
, then we won't be able to: https://github.com/hashicorp/terraform/pull/11482
0
The number of security group IDs in collectd_port_security_groups
. We should be able to compute this automatically, but due to a Terraform limitation, if there are any dynamic resources in collectd_port_security_groups
, then we won't be able to: https://github.com/hashicorp/terraform/pull/11482
0
Whether the volume should be destroyed on instance termination.
true
root_volume_size
numberThe size, in GB, of the root EBS volume.
50
root_volume_type
stringThe type of volume. Must be one of: standard, gp2, or io1.
"gp2"
If set to true, skip the rolling deployment, and destroy all the servers immediately. You should typically NOT enable this in prod, as it will cause downtime! The main use case for this flag is to make testing and cleanup easier. It can also be handy in case the rolling deployment code has a bug.
false
ssh_key_name
stringThe name of an EC2 Key Pair that can be used to SSH to the EC2 Instances in this cluster. Set to an empty string to not associate a Key Pair.
null
ssh_port
numberThe port used for SSH connections
22
tags
map(string)A map of key value pairs that represent custom tags to propagate to the resources that correspond to this logstash cluster.
{}
Example
default = {
foo = "bar"
}
A maximum duration that Terraform should wait for ASG instances to be healthy before timing out. Setting this to '0' causes Terraform to skip all Capacity Waiting behavior.
"10m"