To my disappointment, this was obviously not the case.
Now, 15 years of technological development later, would such a service be possible?
What is the closest service to a truly "elastic" VM instance to date?
Edit: e2 not n2 - https://www.google.com/amp/s/cloudblog.withgoogle.com/produc...
But when you say "the price would vary according to how much RAM or CPU resources you use" you get into the real complexity: resource sharing. If your VM temporarily gives up some RAM, can another VM use that RAM? This is very hard to do, because the provider doesn't know when/if you'll want that RAM back. They don't want a physical server to get into a situation where the RAM demand is higher than the installed RAM because there is no good solution to that scenario. If you're running hundreds of "micro" VMs/containers on one server you can rely on statistical multiplexing and luck, but it doesn't really work for large workloads.
A provider called NearlyFreeSpeech has been charging based on "the use of one gigabyte of RAM for one minute, or the equivalent amount of CPU power" since even before EC2 existed AFAIK, but I suspect this complexity is more scary than attractive for most people. https://www.nearlyfreespeech.net/services/hosting
Consider memory usage -- but operating systems (and some applications) are designed to grab all the memory they can, and use it for caching etc. So it's hard for the VM host to known when it can grab memory and stop billing the user for it.
But there is this idea called memory ballooning -- you have a little process running on the VM guest OS that grabs lots of memory, but is actually in cahoots with the host, and just tells the host -- "hey I got all this memory, you can take it back and use it somewhere else".
Okay, so doesn't ballooning solve the problem? There are a few problems with it -- you can't balloon when you need the memory, because it's not fast enough. So you have to balloon pro-actively. And you don't know how much to balloon, so you have to guess: do it wrong and the guest OS will start swapping, or it might activate its OOM killer and start killing processes.
So making memory usage follow the application is kinda sorta possible but comes with hairy problems. What about CPU? CPUs usage already follows the application so you could just measure and bill accordingly -- except that nothing is gained if memory doesn't also follow usage.
All in all it's way simpler to get towards this goal with clearly-defined higher level services like Lambda.
There are videos of it happening, but no blog post yet.
So running a t3 would allow you to pay for a baseline and then only pay for the CPU you end up needing beyond that baseline.
Doing the same in a VM might be possible (the technologies exist) but it's often the task or workload that needs to be modified to support it and when you are already doing that a step to containers, functions or horizontal scaling is just as easy (or hard). Horizontal scaling based on load is pretty common (classic ASGs but also overcapacity cost bidding based scaling).
It raises some isolation concerns, however. To make this work, the host needs to know how much memory is allocated to a given tenant, and that's difficult without having access to the OS running inside the machine, which is easy with containers, but not so much with VMs. The tenant can turn off virtual CPU cores in the VM and the hypervisor can pick that signal up (at least in Linux) but I'm not sure it's possible right now to do the same to virtual memory modules. I'd love if AWS allowed me to do that because the boot process of some of my workloads is more CPU-bound than the rest of the machine's lifetime and having a bunch of extra cores at boot would do wonders. If, under memory pressure, I could "plug in" more virtual memory modules, it'd also be quite nice.
There could be a market opportunity, but whoever does it would need to beat the current incumbents in price or this would not fly. Also there would be some difficulty scaling up - all these extra CPUs and memory would need to come from somewhere and would necessarily fail (or need to trigger a live VM migration, which, IIRC, nobody does in cloud environments right now) if the host box's resources are fully allocated.
However we kind of moved on to disposable servers which can be scaled on demand. In which case you just add additional servers or adjust server type depending on requirements. Same with containers.
The need to add RAM or CPU to a running instance was from the time you used to have a singe long living instance serving an application.
Simply allocate each vm one core for each hyperthread and record the CPU time used in total.
Nobody actually does this because it makes billing complicated. Both practically and for sales.
My guess is this would result in overall less revenue as well, AWS is for sure making lots of money selling the same cpu time to a dozen people not actually using it.
It's easier with CPUs where you can just yield back if there's no work.