Current state of IPv6 on AWS

I had to re-do an application setup on AWS EC2, so while doing that I though I could save some money by using IPv6 instead of v4 since allocated v4 addresses do now cost money. It turned out, the overall IPv6 experience on AWS is not so great.

First, I created a new VPC and subnets with IPv6-only address allocation. IPv6 only means you can only use that, no IPv4, which is not feasible since most of the internet and AWS itself is still IPv4-only. You can enable DNS64 + NAT64, a clever way of rewriting IPv4 addresses at DNS lookup time, so all addresses traverse a IPv6-to-IPV4 gateway that can rewrite v6 to v4. Each lookup that resolves to an IPv4-only record set would get rewritten to an address in prefix 64:ff9b::/96, ie.:

$ host ipv4.google.com
ipv4.google.com is an alias for ipv4.l.google.com.
ipv4.l.google.com has address 64:ff9b::8efb:1ace

(Note: 64:ff9b::8efb:1ace can be written as 64:ff9b::142.251.36.206 - and 142.251.36.206 is just the original IP address at the time and location of writing).

Second, after getting this to work, I added Docker where one has to enable IPv6 with experimental flags:

{
  "experimental": true,
  "ip6tables": true
}

Though flagged as experimental, it does work, but … does not work when a DNS record has both - IPv4 and IPv6 - types of records. DNS64, only rewrites IPv4 addresses to prefixed IPv6 when there are no v6 records at all - when both are available, the record gets handed out unchanged. Within Docker containers, the preferred network stack is v4, so applications within containers have the choice and try to speak IPv4, which does not work when the host does not have IPv4 configured. There seems to be a way to configure the preferred stack on kernel level, but I couldn’t get that to work.

Third, to compensate I added private IPv4 support for the subnets, and a NAT gateway to these networks, in addition to the egress-only IPv6 gateway. Now I have to pay again for the NAT-Gateway’s IP (v4) address, but at least that is only a single one - not one for each VM as previously.

Now we can drop the whole DNS64 thing again, since we have plain IPv4 in the stack - nice, since most of the AWS API endpoints are IPv4 only and I fear this clever rewriting might cause issues in the long run. Or, to put it differently, I prefer less clever solutions.

Forth, these EC2 machines are part of an autoscaling setup and I have added scheduled actions that change the desired instances amount (around office hours and weekdays). But I need graceful shutdown for the services on the machines, since they’re CI machines and I don’t want to get false build failures just because a scheduler has decided to shut down a host.

Turns out, it is actually not easy at all to implement this in AWS, but my current solution is to have an SNS topic (publish / subscribe) with AWS Lambda and some code that manages the service shutdown with a grace period.

So I then deployed Lambda and got an error EAFNOSUPPORT - address family not supported. A brief research revealed that AWS Lambda does not have IPv6 stack enabled by default - though since October 2023 you can deploy your functions within a VPC with IPv6 for outbound connections. Given that I usually deploy lambdas without a VPC I disliked it, but tried it anyways. It was not working together with Serverless framework, because the feature is too new to be supported there (basically there’s a boolean flag that one needs to pass but cannot).

Anyways, since I had lambda within VPC now, I could just fall back to using private IPv4 private addresses and reach the VMs via that route.

It feels like its all quickly baked in, with just the most crucial parts adapted and the rest of AWS left as it was. It also seems noone is really using this, since there are so many fragile parts. And that is after how many years of IPv6 availability?