How to set up (Privatelink) AWS VPC Endpoint Service and Endpoint

Kyla Oyamot  
Edited

Overview

To establish a secure connection between your services (e.g., Kafka MSK, Redshift, or Databricks) and the Snowplow pipeline, you need to configure VPC endpoints across both your AWS account (Account B) and the Snowplow pipeline account (Account A).

Note:
Account A refers to the AWS account under your organization where the Snowplow pipeline operates within a Snowplow-hosted sub-account. While the Snowplow pipeline runs in this sub-account, any required infrastructure, such as VPC endpoints and security configurations, must be created in other accounts or sub-accounts outside of the Snowplow-hosted environment.

Instructions

Step 1: Create an AWS VPC Endpoint Service in Account B

  1. Log in to your AWS Console in Account B.
  2. Navigate to VPC > Endpoint Services and click Create Endpoint Service.
  3. Select the service you want to expose (e.g., Kafka MSK, Redshift, Databricks).
  4. Attach the appropriate security group to the service, ensuring it:
    • Allows inbound communication on the required ports.
    • Permits traffic from specific CIDR ranges (provided later in Step 4).
  5. Enable PrivateLink for secure communication.

Step 2: Create an AWS VPC Endpoint in Account A (Non-Snowplow-Hosted Account)

  1. Log in to Account A, the non-Snowplow-hosted account associated with your Snowplow pipeline.
  2. Navigate to VPC > Endpoints and click Create Endpoint.
  3. Configure the VPC endpoint:
    • Service Name: Enter the endpoint service name created in Account B.
    • Type: Ensure the endpoint type is set to Interface.
    • VPC: Select the VPC in Account A.
    • Subnets: Choose subnets corresponding to the availability zones you plan to use.

      ⚠️ If your endpoint service requires three availability zones, ensure an additional subnet is created in Account A to match the required zone.

  4. Review and create the endpoint.

Step 3: Update the Security Group in Account A

  1. Identify the security group attached to the VPC endpoint in Account A.
  2. Modify the security group to allow inbound traffic on the required ports and CIDRs:
    • For Kafka MSK (PrivateLink): Open port 9095/tcp.
    • Apply the CIDR ranges:
      • For EKS-based Snowplow components:
        • CGNAT CIDRs:
          • cg_nat_cidr_private_subnet_1_cidr_block (CGNAT CIDR 1)
          • cg_nat_cidr_private_subnet_2_cidr_block (CGNAT CIDR 2)
      • For EC2/ECS-based Snowplow components:
        • Private Subnet CIDRs:
          • private_subnet_1_cidr_block
          • private_subnet_2_cidr_block
          • private_subnet_3_cidr_block

Note: If you do not already have your CIDR's please reach out to Snowplow Support at support@snowplow.io


Step 4: Confirm CIDR Requirements

  • CIDRs for your security group:
    • Ensure these CIDRs are included in the security group rules for both the endpoint service in Account B and the VPC endpoint in Account A.

Notes and Tips

  • Default Snowplow Configuration: The Snowplow pipeline VPC typically supports two availability zones. If your setup requires three zones, create an additional subnet in Account A.
  • Ensure all configurations align with the specific requirements of the service you are connecting to (e.g., Kafka MSK, Redshift, Databricks).