Dynamic memory/CPU provisioning for VMs?

Question

When AWS EC2 (Elastic Compute Cloud) was launched, the young inexperienced me initially understood this was a service to hire virtual servers by the hour, but the price would vary (in an "Elastic" way) according to how much RAM or CPU resources you use.To my disappointment, this was obviously not the case.Now, 15 years of technological development later, would such a service be possible?What is the closest service to a truly "elastic" VM instance to date?

dilyevsky · Accepted Answer

GCP e2 instances (that are like 30% cheaper) are closest match to what you are asking. These VMs run on overcommitted capacity and migrated to a different physical host seamlessly when the resources are reclaimedEdit: e2 not n2 - https://www.google.com/amp/s/cloudblog.withgoogle.com/produc...

wmf · Answer

For CPU you have hotplug and quota scheduling; for memory you have hotplug and ballooning.
But when you say "the price would vary according to how much RAM or CPU resources you use" you get into the real complexity: resource sharing. If your VM temporarily gives up some RAM, can another VM use that RAM? This is very hard to do, because the provider doesn't know when/if you'll want that RAM back. They don't want a physical server to get into a situation where the RAM demand is higher than the installed RAM because there is no good solution to that scenario. If you're running hundreds of "micro" VMs/containers on one server you can rely on statistical multiplexing and luck, but it doesn't really work for large workloads.
A provider called NearlyFreeSpeech has been charging based on "the use of one gigabyte of RAM for one minute, or the equivalent amount of CPU power" since even before EC2 existed AFAIK, but I suspect this complexity is more scary than attractive for most people. https://www.nearlyfreespeech.net/services/hosting

soamv · Answer

Turns out to be a somewhat annoying problem at the VM level. Not impossible but complex enough that maybe higher-level solutions like functions are better.
Consider memory usage -- but operating systems (and some applications) are designed to grab all the memory they can, and use it for caching etc. So it's hard for the VM host to known when it can grab memory and stop billing the user for it.
But there is this idea called memory ballooning -- you have a little process running on the VM guest OS that grabs lots of memory, but is actually in cahoots with the host, and just tells the host -- "hey I got all this memory, you can take it back and use it somewhere else".
Okay, so doesn't ballooning solve the problem? There are a few problems with it -- you can't balloon when you need the memory, because it's not fast enough. So you have to balloon pro-actively. And you don't know how much to balloon, so you have to guess: do it wrong and the guest OS will start swapping, or it might activate its OOM killer and start killing processes.
So making memory usage follow the application is kinda sorta possible but comes with hairy problems. What about CPU? CPUs usage already follows the application so you could just measure and bill accordingly -- except that nothing is gained if memory doesn't also follow usage.
All in all it's way simpler to get towards this goal with clearly-defined higher level services like Lambda.

purpleidea · Answer

https://github.com/purpleidea/mgmt/ can dynamically add and remove vcpus to a running vm. Each change has sub-second precision, and a second https://github.com/purpleidea/mgmt/ running in the vm can detect this and charge workloads accordingly if so desired.There are videos of it happening, but no blog post yet.

phamilton · Answer

t3 instances on AWS have burst capacity charges if you choose unlimited. It's $0.05 per vCPU-hour, and is only charged if you exceed the accumulated burst capacity.So running a t3 would allow you to pay for a baseline and then only pay for the CPU you end up needing beyond that baseline.

trebligdivad · Answer

Removing RAM from VMs turns out to be actually quite tricky - hot-unplug very rarely works because the OS tends to have allocated stuff all over so you don't have a nice large DIMM like quantity to unplug. Ballooning kind of works but it's more advisory, there's nothing to stop a guest gobbling that ram up again (and it has other issues). David Hildenbrand's Virtio-mem might help solve this; see: https://www.youtube.com/watch?v=H65FDUDPu9s

oneplane · Answer

Not for VMs but definitely for containers and functions (like AWS Lambda). You can configure them with soft and hard limits but also invocation counts and runtime.Doing the same in a VM might be possible (the technologies exist) but it's often the task or workload that needs to be modified to support it and when you are already doing that a step to containers, functions or horizontal scaling is just as easy (or hard). Horizontal scaling based on load is pretty common (classic ASGs but also overcapacity cost bidding based scaling).

rbanffy · Answer

It would be possible. Paying for CPU and memory was the business model of hundreds of data processing companies that ran mainframe batch jobs and time-sharing services on partitioned machines. Mainframe companies billed users by machine usage (with the machine on-site). This is obviously doable.
It raises some isolation concerns, however. To make this work, the host needs to know how much memory is allocated to a given tenant, and that's difficult without having access to the OS running inside the machine, which is easy with containers, but not so much with VMs. The tenant can turn off virtual CPU cores in the VM and the hypervisor can pick that signal up (at least in Linux) but I'm not sure it's possible right now to do the same to virtual memory modules. I'd love if AWS allowed me to do that because the boot process of some of my workloads is more CPU-bound than the rest of the machine's lifetime and having a bunch of extra cores at boot would do wonders. If, under memory pressure, I could "plug in" more virtual memory modules, it'd also be quite nice.
There could be a market opportunity, but whoever does it would need to beat the current incumbents in price or this would not fly. Also there would be some difficulty scaling up - all these extra CPUs and memory would need to come from somewhere and would necessarily fail (or need to trigger a live VM migration, which, IIRC, nobody does in cloud environments right now) if the host box's resources are fully allocated.

PaywallBuster · Answer

Possible in vmware to add additional CPUs https://blogs.vmware.com/performance/2019/12/cpu-hot-add-per...
However we kind of moved on to disposable servers which can be scaled on demand. In which case you just add additional servers or adjust server type depending on requirements. Same with containers.
The need to add RAM or CPU to a running instance was from the time you used to have a singe long living instance serving an application.

pstrateman · Answer

Yeah of course you can.Simply allocate each vm one core for each hyperthread and record the CPU time used in total.Nobody actually does this because it makes billing complicated. Both practically and for sales.My guess is this would result in overall less revenue as well, AWS is for sure making lots of money selling the same cpu time to a dozen people not actually using it.

the8472 · Answer

As far as memory goes your OS would normally gobble up as much ram as it can get for caching. You would have to go out of your way to make it memory-frugal so it could yield ram back to the hypervisor, and that could impact performance.It's easier with CPUs where you can just yield back if there's no work.

spullara · Answer

AWS&rsquo;s Aurora Serverless is charged for in this manner albeit with a minimum level.

hacker_newz · Answer

Your title implies dynamic hardware provisioning while the post is about pricing for use? Which is it?

gautamkmr89 · Answer

Use AWS Fargate or other lambda or other serverless technology.

Dynamic memory/CPU provisioning for VMs?

GCP e2 instances (that are like 30% cheaper) are closest match to what you are asking. These VMs run on overcommitted capacity and migrated to a different physical host seamlessly when the resources are reclaimed
Edit: e2 not n2 - https://www.google.com/amp/s/cloudblog.withgoogle.com/produc...

t3 instances on AWS have burst capacity charges if you choose unlimited. It's $0.05 per vCPU-hour, and is only charged if you exceed the accumulated burst capacity.
So running a t3 would allow you to pay for a baseline and then only pay for the CPU you end up needing beyond that baseline.

AWS’s Aurora Serverless is charged for in this manner albeit with a minimum level.

Your title implies dynamic hardware provisioning while the post is about pricing for use? Which is it?

Use AWS Fargate or other lambda or other serverless technology.