This guide describes the process to configure an AWS pipeline to load data into a customer-managed Elasticsearch or OpenSearch cluster.
ℹ️ Note: We currently support loading into Elasticsearch OSS version 6.x only. Version 7.x introduces breaking changes that we have not yet addressed.
Prerequisites
Before starting, please confirm the following:
- The domain endpoint is reachable from Snowplow’s IP addresses (publicly accessible over the Internet and restricted by an IP allowlist).
- The cluster is configured to use basic authentication for security.
Information Needed
Once the prerequisites are confirmed, securely share the following details using the Freeform message feature in the BDP Console:
Required Information
-
Elasticsearch/OpenSearch Cluster:
- Cluster endpoint (URL)
- Domain name associated with the cluster
- Port number
-
Access Details:
- Username with admin privileges
- Password associated with the username
-
Additional Information for OpenSearch Clusters:
- AWS region where the cluster is hosted
- Whether requests to the cluster need to be signed
Snowplow’s IP Address Allowlist
ℹ️Note: Support will provide the two different IPv4 NAT Gateways.
Below is the list of IP addresses that must be allowed for Snowplow to access your cluster:
Region | Purpose | IP Address |
---|---|---|
Global | Orchestration | 54.152.94.171/32 |
EU (eu-central-1) | SP VPN | 18.194.133.57/32 |
EU (eu-central-1) | SP Ops 1 | 3.124.46.197/32 |
EU (eu-central-1) | SP Ops 2 | 35.158.184.151/32 |
EU (eu-central-1) | SP Ops 3 | 3.123.243.200/32 |
US (us-east-1) | SP VPN | 54.209.175.161/32 |
US (us-east-1) | SP Ops 1 | 52.204.48.193/32 |
US (us-east-1) | SP Ops 2 | 3.229.60.33/32 |
US (us-east-1) | SP Ops 3 | 3.216.241.66/32 |
AP (ap-southeast-2) | SP VPN | 54.66.204.91/32 |
AP (ap-southeast-2) | SP Ops 1 | 13.211.15.150/32 |
AP (ap-southeast-2) | SP Ops 2 | 3.105.98.15/32 |
AP (ap-southeast-2) | SP Ops 3 | 13.228.26.124/32 |
Customer-specific | VPC NAT Gateway 1 | <VPC NAT Gateway 1> |
Customer-specific | VPC NAT Gateway 2 | <VPC NAT Gateway 2> |
Cluster Configuration
-
Instance Types: We recommend memory-optimized instances (
Rx
) over general-purpose instances (Mx
) for performance and scalability. -
Example Configurations:
- Typical traffic:
2 x r5.large.elasticsearch
- Higher traffic:
8 x r5.large.elasticsearch
or2 x r5.xlarge.elasticsearch
- Minimal traffic:
2 x t3.medium.elasticsearch
- Typical traffic:
ℹ️ AWS docs: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/supported-instance-types.html
Notes
-
Supported Versions:
- We currently support Elasticsearch OSS version 6.x and 7.x. We also support upgrading from version 6.x to version 7.x.
- OpenSearch versions are also unsupported at this time but we are in the process of implementing the necessary code changes to support this.
-
Authentication: The only authentication method we support is HTTP Basic Authentication.
-
Incident Handling: Elasticsearch is considered an ephemeral destination. In the event of an incident, we may pause or disable loading into the cluster to ensure the stability of the overall pipeline. We will notify you if such action is required.
FAQ
Why are ES Loaders lagging?
- Cluster Issue: If the issue is with the Elasticsearch cluster, we may need to contact you for resolution.
- Traffic Spikes: Unexpected traffic spikes can overwhelm the cluster, requiring adjustments to instance size or count.
- Bad Data: An upstream issue causing excessive bad data may trigger lag in the bad stream.