Not to diminish this work, but I think it's worth noting that it's increasingly ...

easton · on Sept 2, 2022

I’m kind of curious if AWS is ever going to launch a firecracker as a service thing independent from lambda. It would be wonderful for CI or other tasks where you want to rapidly spin up a box and you don’t know how long it needs to be up. EC2 and Fargate take enormous amounts of time to provision compared to firecracker.

Dunedan · on Sept 2, 2022

AWS Fargate uses Firecracker as well.

capableweb · on Sept 2, 2022

Strange, Fargate is anything but fast.

sexy_panda · on Sept 2, 2022

From my experience the allocation of resources and other tasks preparing the run of a container are consuming quite a lot of time.

Pulling the image and building the container is actually just a matter of a few seconds.

I have no data about it though.

nijave · on Sept 2, 2022

From testing a couple years ago (things are likely different now), image pull/setup made a pretty noticeable difference. A 1GB container was about 20 seconds slower than a 500MB one--I assume I/O since Fargate instance size didn't make a difference

On the other hand, ECS still seems slow compared to k8s where things are nearly instance unless you're measuring so ECS control plane speed might be part of the issue, too

easton · on Sept 2, 2022

This is still a thing, Fargate pull times are super slow: https://github.com/aws/containers-roadmap/issues/696. We run all of our workloads on fargate, and it's really annoying when you're trying to iterate on something and you have to sit there waiting on "Provisioning..." for 1-2 minutes every time you launch a task. I don't think the control plane is that slow, as EC2 based ECS launches tasks really fast if the images are already cached on the machine.

acdha · on Sept 2, 2022

People have mentioned image loading but one other shockingly slow thing is allocating ENIs (this also affects Lambda, VPC endpoints, etc.). I've had a few times where I've looked at the logs and it's basically been like 5 minutes to launch something where 4 of those were waiting for the ENI.

rfoo · on Sept 2, 2022

I'd also like to see a Firecracker powered EC2 (with some constraints, of course), but ~6s provision time of current EC2 is already pretty awesome and TBH I don't care about 6s for CI things much.

easton · on Sept 2, 2022

We use Azure DevOps at work for our CI/CD, and although they provide an ephemeral runner setup (where you can run the agent with a --once flag, and it will exit after a single job runs so you know to destroy the container/VM), jobs will fail if there are no runners in the pool when the build starts. If we could get VM starts down to milliseconds or a second at most in AWS, we could scale our CI runners down to zero and use a webhook (for PR/commit) from ADO to trigger a VM launch on AWS, and by time the pipeline actually started, there would be an agent ready to take the job.

A very specific use case, I know, but if I could have the CI runners run as needed, we could get instances that are way bigger so our builds run faster, and pay around the same amount since they don't have to sit around when they aren't being used.

1337shadow · on Sept 2, 2022

Well that's going to be a very exensive CI, when virt-lightning spawns a VM in less than 10 seconds with virtio, and you can have plenty on a dedicated server, which you probably have for CI because CI runs faster on dedicated hardware.

JoshTriplett · on Sept 2, 2022

I would love to see this as well. I currently can launch a Linux VM in milliseconds, but EC2 takes ~6s before the first user-provided instruction gets to run.

staticassertion · on Sept 2, 2022

How fast do you want? My bet is that you can get EC2 to boot up very quickly, ie: ~1 minute or less with a bit of effort.

eyberg · on Sept 2, 2022

Worth noting that loading a small hello world c unikernel can load in a ridiculous small amount of time but some multiple-gigabyte JVM unikernel might take 100s of ms.

If you need super fast boot times firecracker is definitely worth looking at but should be taken with caveats of what precisely you are going to run there.

vlovich123 · on Sept 2, 2022

I think you may be ignoring the aspect of cloning the codebase and handling writes transparently and then being able to quickly clone/snapshot that VM.

cperciva · on Sept 2, 2022

Cloning the codebase is what I'm getting at with preparing a disk image.

CompuIves · on Sept 2, 2022

I'm very eager to see more developments in the fresh start times!

The main reason why snapshotting became interesting for us, is because we're running development servers defined by our users. A development server could take a long time to start, sometimes minutes.

So even if we can start the VM fast, the most important speedup for us is on the user code that we cannot control.

visarga · on Sept 2, 2022

Say the user code initiates a download, what happens if we clone during the run of the operation? Will the clone be able to finish the download?

The opposite case - say the user code binds to an IP:port to run a service. Will the clone try to step over the parent, binding to a port that is already taken?

CompuIves · on Sept 2, 2022

The TCP connection gets "paused", it doesn't get broken but packets don't arrive. The packets that don't arrive are seen as packet loss, and so they get resent. If the connection stays frozen too long it will lead to disconnection (at least of the websocket connection to the VM).

For IP uniqueness, we give every VM the same IP, but we put every VM in its own network namespace. Then we have iptable rules to rewrite the src/dest IP on every packet that enters the network namespace.

iam-TJ · on Sept 2, 2022

Have you considered, or tested, using ECMP (Equal Cost Multiple Path routing) and anycast for that?

I did some extensive IPv4 and IPv6 ECMP anycast testing a couple years ago where we'd randomly bring up and kill hosts and containers.

The network layer provided the fault tolerance and could be tweaked to react very quickly to missing hosts.

CompuIves · on Sept 2, 2022

That is very interesting, would it also be able to handle paused VMs where it buffers the packets up to certain threshold?

iam-TJ · on Sept 4, 2022

You know I'm not sure... TCP is stream oriented and supposed to handle lost packets so I'd think the TCP layer itself would handle the pause. If the sender doesn't get an ACK for a packet then it'll resend that packet later (TCP has sequence numbers so the stream can be reconstructed from out-of-order delivery and resends).

I revisited my proof-of-concept test scripts when I wrote the previous comment. I'll try in the next week to add some additional tests in there to determine stream reliability and packet delay/loss.

UDP of course doesn't have the same benefits.

I'm using ECMP + Anycast in a project I've been developing for the last couple of years (K18S or Keep It Simples Stupids) to effectively replace Kubernetes functionality with standard protocols and tooling that is in almost all distros.

We started out with the challenge of replacing the major parts of CNIs and that is where the ECMP + Anycast work arose from.

Native IPv6 with only VLANs and direct routing (no messing about with IPv4, NAT or overlay networks), ECMP + Anycast gives load-balanced routing to pods with automatic detection of lost hosts. Pods exposed to public get public IPv6 address in addition to a ULA (Unique Local Address, formerly called site-local). ULAs used for private routing.

Systemd-networkd is configured automatically by systemd-nspawn so there doesn't need to be a massive, foreign, orchestration control system.

Systemd-nspawn/systemd-machined to manage container lifecycles with OCI compliant images, or leverage nspawn's support for overlayfs to build machine images from several different file-system images. (rather like Docker's layers but always separate, not combined) but can be used in a pick-and-mix fashion to assemble a container that has several related but separately packaged components.

Configs for /etc/ of each container mapped in from external storage using the same overlayfs method. In most cases everything is read-only but some hosts/pods can be allowed to write into the /etc/ overlay and those changes can be optionally committed to the external storage.

Adopting IPV6 and dropping IPv4 was the best thing we ever did in terms of keeping things simple and straightforward and relying on the existing network protocols and layers, instead of re-inventing it all (badly).

At the time we started Kubernetes didn't even have IPv6 support and even once it did many CNIs couldn't handle it properly.