HACKER Q&A
📣 masterofall2612

Cheaper way to do model inference?


Does anyone know of any solutions for saving GPU compute during server downtime? Is there a managed solution to turn off a pod and turn it back on when I need it? I'm currently doing model inference and most of the time I'm just paying for compute without serving any user requests.


  👤 brianjking Accepted Answer ✓
Huggingface Inference Endpoints can autoscale to 0 and cost nothing when not being used.

👤 PaulHoule
You are running inference on something like an EC2 instance?