How to enable OpenSearch/ElasticSearch Loader: Customer Instructions

Dimitris Zoutsos
Snowplow Team
Edited

This guide describes the process to configure an AWS pipeline to load data into a customer-managed Elasticsearch or OpenSearch cluster.

ℹ️ Note: We currently support loading into Elasticsearch OSS version 6.x only. Version 7.x introduces breaking changes that we have not yet addressed.


Prerequisites

Before starting, please confirm the following:

  1. The domain endpoint is reachable from Snowplow’s IP addresses (publicly accessible over the Internet and restricted by an IP allowlist).
  2. The cluster is configured to use basic authentication for security.

Information Needed

Once the prerequisites are confirmed, securely share the following details using the Freeform message feature in the BDP Console:

Required Information

  • Elasticsearch/OpenSearch Cluster:

    • Cluster endpoint (URL)
    • Domain name associated with the cluster
    • Port number
  • Access Details:

    • Username with admin privileges
    • Password associated with the username
  • Additional Information for OpenSearch Clusters:

    • AWS region where the cluster is hosted
    • Whether requests to the cluster need to be signed

Snowplow’s IP Address Allowlist

ℹ️Note: Support will provide the two different IPv4 NAT Gateways.

Below is the list of IP addresses that must be allowed for Snowplow to access your cluster:

Region Purpose IP Address
Global Orchestration 54.152.94.171/32
EU (eu-central-1) SP VPN 18.194.133.57/32
EU (eu-central-1) SP Ops 1 3.124.46.197/32
EU (eu-central-1) SP Ops 2 35.158.184.151/32
EU (eu-central-1) SP Ops 3 3.123.243.200/32
US (us-east-1) SP VPN 54.209.175.161/32
US (us-east-1) SP Ops 1 52.204.48.193/32
US (us-east-1) SP Ops 2 3.229.60.33/32
US (us-east-1) SP Ops 3 3.216.241.66/32
AP (ap-southeast-2) SP VPN 54.66.204.91/32
AP (ap-southeast-2) SP Ops 1 13.211.15.150/32
AP (ap-southeast-2) SP Ops 2 3.105.98.15/32
AP (ap-southeast-2) SP Ops 3 13.228.26.124/32
Customer-specific VPC NAT Gateway 1 <VPC NAT Gateway 1>
Customer-specific VPC NAT Gateway 2 <VPC NAT Gateway 2>

 


Cluster Configuration

  • Instance Types: We recommend memory-optimized instances (Rx) over general-purpose instances (Mx) for performance and scalability.

  • Example Configurations:

    • Typical traffic: 2 x r5.large.elasticsearch
    • Higher traffic: 8 x r5.large.elasticsearch or 2 x r5.xlarge.elasticsearch
    • Minimal traffic: 2 x t3.medium.elasticsearch

ℹ️ AWS docs: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/supported-instance-types.html


Notes

  1. Supported Versions:

    • We currently support Elasticsearch OSS version 6.x and 7.x. We also support upgrading from version 6.x to version 7.x.
    • OpenSearch versions are also unsupported at this time but we are in the process of implementing the necessary code changes to support this.
  2. Authentication: The only authentication method we support is HTTP Basic Authentication.

  3. Incident Handling: Elasticsearch is considered an ephemeral destination. In the event of an incident, we may pause or disable loading into the cluster to ensure the stability of the overall pipeline. We will notify you if such action is required.


FAQ

Why are ES Loaders lagging?

  • Cluster Issue: If the issue is with the Elasticsearch cluster, we may need to contact you for resolution.
  • Traffic Spikes: Unexpected traffic spikes can overwhelm the cluster, requiring adjustments to instance size or count.
  • Bad Data: An upstream issue causing excessive bad data may trigger lag in the bad stream.

Was this article helpful?