A random and non-exhausting list of things that bother me from time to time:
— Must build an image before deploying and it takes time, so deploys are slow (of course we use CI to do it, it's not manual).
— If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.
— Must remember that launched containers do not close when ssh breaks connection and they can easily linger for a couple of weeks.
I generally find it harder to change how things work together. It's probably possible to spend lots of effort to fix these things, but I don't remember having to do all this cruft with old school infrastructure.
Hence the question - has anyone migrated off containerized infrastructure? Are you satisfied? Or I'm misremembering things and horrible things wait for me in the old-school ways?
What you are complaining about isn't really containers (you could still pretty easily run stuff in a container and set it up/treat it like a "pet" rather than "cattle"), it's the CI/CD and immutable infrastructure best practices you are really sad about. Your complaints are totally valid: but there is another side to it.
Before adopting containers it wasn't unusual to SSH in and change a line of code on a broken server and restart. In fact that works fine while the company/team is really small. Unfortunately it becomes a disaster and huge liability when the team grows.
Additionally in regulated environments (think a bank or healthcare) one person with the ability to do that would be a huge threat. Protecting production data is paramount and if you can modify the code processing that data without a person/thing in your way then you are a massive threat to the data. I know you would never do something nefarious - neither would I. We just want to build things. But I promise you it's a matter of time until you hire somebody that does. And as a customer, I'd rather not trust my identity to be protected because "we trust Billy. He would never do that."
I pine for the old days - I really do. Things are insanely complex now and I don't like it. Unfortunately there are good reasons for the complexity.
If you don’t use CI, it’s easy to get fast deploys with containers. Just build the images on your dev box, tag with the branch and commit hash, and push directly to a docker registry (docker push is smart enough to only push the layers the registry doesn’t already have). Code is running 30 seconds after compiling finishes.
(Don’t want to pay for a registry? It’s trivial to run one yourself)
These aren’t foolproof, fully reproducible builds, but practically they’re pretty close if your tools require your repo to be clean and pushed before building the image, and if your build system is sane. Besides, if you’re used to editing code as it’s running on servers, you don’t care about reproducible builds.
Also, if you’re starting containers manually on the command line, you’re doing it wrong. At least use compose so your setup is declarative and lifetime-managed.
(Edit: s/swarm/compose/)
Being able to just ssh into a machine is one of the problems that we did solve with containers. We didn't want to allow anyone to SSH into a machine and change state. Everything must come from a well define state checked into version control. That's where containers did help us a lot.
Not sure what you mean with lingering containers. Do you launch yours through SSH manually? That's terrible. We had automation that would launch containers for us. Also we had monitoring that did notify us of any discrepancies between intended and actuated state.
Maybe containers aren't the right tool for your usecase - but I wouldn't want to work with your setup.
Btw. most of this is possible with VMs, too. So if you prefer normal GCE / EC2 VMs over containers that's fine, too. But then please build the images from checked in configuration using e.g. Packer and don't SSH into them at all.
Also, running services is hard already. Adding another layer that caused existing assumptions to break (networking, storage, etc) made it even harder.
Bare VMs are crazy fast and running apt install is crazy easy. Not looking back at the moment!
Image is already built.
C/I already certified it.
The RIGHT NOW fix is just a rollback and deploy. Which takes less time than verifying new code in any situation. I know you don't want to hear it but really, if you need a RIGHT NOW fix that isn't a rollback you need to look at how you got there in the first place. These systems are literally designed around never needing a RIGHT NOW fix again. Blue/Green, canary, auto deploys, rollbacks. Properly designed container infrastructure takes the guesswork and stress out of deploying. Period. Fact. If yours doesn't, it's not set up correctly.
I have worked with a few companies moving from containers to serverless and a few moving from containers to VMs.
I think that serverless often gives people what they were looking out of containers in terms of reliable deployments, less infra management, and cloud providers worrying about the hairy pieces of managing distributed compute infrastructure.
Moving to more traditional infrastructure has also often simplified workflows for my customers. Sometimes the containerization layer is just bloat. And an image that can scale up or down is all they really need.
In any of these cases, devops is non-negotiable and ssh to prod should be avoided. Canary deployments should be utilized to minimize the impact of bad production builds and to automate rollback. If prod is truly down, pushing an old image version to production or reverting to an old serverless function version should be baked into process.
The real crux of your issue seems to be bad devops more than bad containers and that's where I'd try to guide your focus.
If you want to deploy fast you need to skip steps and reduce I/O - stateful things are _good_ -- deploying with rsync/ssh and HUPing services is very fast, but people seem to have lost this as an option in a world with Docker.
I consult in this space - first 30m on the phone is free - hmu if interested.
You can still have a perfectly good, quickly executing, end-to-end CI/CD pipeline for which the deployment step is (gasp) put some files on a server and start a process.
The inflexion point for this varies by organisation, but I've never seen an environment with less than three services where managing container scaffolding is a net positive.
You write code, you push it, 5 Minutes later it is rolling out, tested, with health checks and health metrics.
Your infrastructure itself is keeping itself up to date (nightly image builds, e2e tests etc.)
It just works and runs. It doesn't make the same mistake twice, it doesn't need an expert to be used.
I'm not saying its for everyone! Put three/four VMs on AWS, add a managed database and you are good to go with your ansible. Use a Jira plugin to create reaccuring Tickets for doing your maintenance and that should work fine.
Nonethless, based on your 'random list of things' it does sound like you are not doing it right.
There is something very wrong if you really think its critical for you to be able to 'hot fix' aka playing hero by jumping on your vms and hacking around. IF you only one single server for your production env. there is no risk of forgetting a server to patch but there is still the issue of forgetting to backport it (which is probably the wrong term if you don't hotfix your release branch)
Most mistakes i do, are mistakes i do because i was able to do them.
And that might sound unfortunate but there is a feeling of trust for your setup. At least i get that feeling and i get that through automatisation. Knowing and seeing the deployprocess just working day in day out. Knowing that my monitoring and alerting is setup properly, knowing that the systems keep themselfs up to date, knowing there are proper tests in place.
1. "Hot fix now" -> I do just log in and enter the container, change a couple of lines and restart, not sure what's your problem here
2. "containers do not close when ssh breaks" -> I guess that's also going to save you if you run 3. "Harder to change how things work" -> actually, it makes it much easier to add services to a stack: just add them to the compose file and use container names for hostnames in configurations ! 4. "Must remember launched containers" -> why not use NetData for monitoring ? it really does all the monitoring/alerting you need out of the box ! And will show containers, tell you before /var/lib/docker runs out of space (use BtrFS to save a lot of time and space) I'll add that containers make it easy to go from DevOps to eXtreme DevOps which will let you maintain a clean master branch, and that is priceless ! Details -> https://blog.yourlabs.org/posts/2020-02-08-bigsudo-extreme-d... Where I'm migrating to : replacing Ansible/Docker/Compose with a new micro-framework that lets you do all of these with a single script per repo, instead of a bunch of files, but that's just because I can :)
It's the same jump, from non-containerisation to containerisation, as it is from non-SCM to SCM. People who upload their files via FTP have a hard time picking up Git (or well, they did, ten years ago or so.) You'd have people complaining that they have to run a whole bunch of commands: git add, git commit, git push, then on the other side git pull, when they used to just drag the files into FileZilla and be done with it.
The thing is though, if you change the way you work, if you change the process and the mindset, you can be in a much better position by utilising the technology. And that requires that you buy in.
But, as for your questions: no, I haven't. I have always taken legacy or new projects and gone containerisation with continuous integration, delivery, and deployment.
Every now and then I have long discussions with friends who swear on containerizations.
All the benfits they mention are theoretical. I have never run into one of the problems that containerization would solve.
- Deploy times are certainly slower, up to 50x slower than non-containerized. However, we're talking 30s deploys versus 20 minute deploy times, all-inclusive. The sidenote here is that you can drastically reduce containerized deploy by putting in some effort: make sure the (docker) containers inherit from other containers (preferably self-built) with executable version that you need. For instance, you might inherit version X of program A and version Y of program B before building only a container with version Z of program C, as A and B barely change (and if they do, it's just a version bump in the final container). Even better, just build a code container during deploy (so a container with essentially only code and dependencies), and keep all the executable as separate images/containers that are built during development time;
- Containers do allow high-speed fixes, in the form of extremely simplified rollbacks. It is built into the entire fabric of containers to allow this, as you just change a version number in a config (usually) and can then rollback to a non-broken situation. Worst case, the deploy of fixed code in my case does take only 20 minutes (after the time it takes to fix/mitigate the issue, which is usually much longer);
- Local environment is _so much easier_ with containers. It takes 10 minutes to setup a new machine with a working local environment with containers, versus the literal hours it can take on bare metal, disregarding even supporting multiple OS'es. On top of that, any time production wants a version bump, you can start that game all over again without containers. Most of my devs don't ever worry about the versions of PHP or Node they are running in their containerized envs, whereas the non-container system takes a day to install for a new dev.
Containers can be heavy and cumbersome, but in many cases, a good responsibility split can make them fast and easily usable. In the specific case of docker, I find treating the containers like just the executable they are (which is a fairly default way of dealing with it) works wonders for the easy-and-quick approach.
* Resource and process isolation -> capability-based security
* Dependency management conflict resolution -> nix / guix style package management
* Cluster orchestration and monitoring -> ansible, salt, chef, puppet, etc.
If you need all of those things at the same time, maybe containers are the right choice. But I hate the fact that the first thing we do when we run into a pip package conflict is to jump to the overhead of containerization.
We have an internal tool that listens to a message queue, dumps a database (from a list of permitted databases), encrypts it and sends it to S3 to be investigated by developers.
When running on a container, the process takes 2-3 minutes with small databases, about an hour or more with larger ones. When running on a regular EC2 image, the process takes about 5 minutes in the worst case scenario and is borderline instant with smaller databases.
Mapping internal volumes, external volumes, editing LLVM settings, contacting AWS support etc yielded nothing. Only migrating it to a regular EC2 instance had any results and they were dramatic.
We run Docker containers for local development too but when restoring very large databases we need to use MySQL in a real VM instead of in our Docker setup because Docker crashes when faced with a disk-heavy workload.
So to conclude, the only reason I wouldn't want to containerise a workload is when the workload is very disk IO heavy, whereas most of our apps are very network IO heavy instead.
I think containerization is another one of those things that you're told is great for everyone, but really you need to have many teams with many services that all need to act in concert in order for containerization to be worth the effort / overhead.
That being said, I conceptually prefer how with tools like K8s you can have fully declarative infra as code, rather than the hybrid imperative/declarative mix of a tool like Ansible.
Everyone in HN starts criticizing vague container statements. This really turned into Apple vs PC debate.
You can run lxc/nspawn containers as lightweight VMs and save a lot of (runtime, management) overhead without having to worry about any of Docker's or k8s's quirks.
We're quite happy with that approach, Docker isn't production grade IMO and k8s doesn't make sense at our scale.
We use Go binaries, so dependencies are compiled in, hosted on VMs on a cloud provider, setup with cloud config, and using systemd to manage services sitting behind a load balancer, one vm per service. Automated test and deploy so it's simple to make updates.
Never really felt the need for containers or the extra layer of abstraction and pain it brings.
Re hot fixes, automate your deploy process so that it's fast, and you won't mind checking in a fix and then deploying because it should take seconds to do this. You don't need containers to do this.
If you can't deploy changes quickly and easily in a reproducible way something is wrong. Having a record of every change and reproducible infrastructure is incredibly important, so we check everything into git then deploy (service config, code changes, data migrations). You don't need containers to get there though, and I'm not really sure they make it easier - perhaps in very big shops they help to standardise setup and infrastructure.
But after working with it, it's pretty visible that the abstraction layer is really huge and you need to learn the tools well. When you deploy to linux VPS, you probably have already worked on unix system and know plenty of the commands.
Another thing, I think having a designated person to the infrastructure makes it much less trying for a team. On the other hand, you have 'code' sitting in the repos and everyone feels like they can into devops. I don't think it's exactly true, because e.g. k8s is a pretty complex solution.
If you have the need for that kind of thing, I don't know why you would use containers.
Containers is for organizations who have processes.
Unfortunately nowadays we teach every developer to have containers, ci/cd, terraform, test coverage, ... as a requirement
Pick the tools that suits your flow.
Nothing wrong with bare metal or virtual servers.
EDIT: to add some good years ago was managing PHP shop where all production was baremetal and development/staging was replicated in containers, everybody happy, hope it helps
It has trade offs (eg worse docs), but you might like them better than eg Docker’s
One big change in the last 2 years is documentation on "how to use this image" has become more common. Figuring out how to use an image used to take hours - inspecting it's internals, learning the config files for that specific tool, modifying just the lines you needed, or write/mounting a custom file/folder, etc. Now, many images have docker-compose examples, and many images have loader scripts that can read env variables and configure themselves properly. Having a good entrypoint.sh is a huge benefit to a docker image's usability, and having a docker-compose example is good documentation
Why did I switch back? The isolation finally became significantly more useful for me. Perhaps the range of my 'hobbies' increased - I started running many more technologies. Multiple tools had conflicting dependencies, or only supported super-old options (looking at you, Unifi Controller still depending on the EOL mongo3.4)
When I joined, the team was in the process of migrating to cloud, yet with no understanding of what that means. The basic plan was to split the app into smaller services and get them running in containers, with no provision to get the team to learn to support the app, debug problems, deal with spurious errors, etc.
We are currently migrating off to get be able to focus on improving the app instead of spending entire learning budget (I mean developers' limited focus ability, not money) on learning cloud tech. Improving here means refactoring, properly separating modules, building robust error handling, etc. There might be time when when we decide to get back on cloud but currently I see this is only distracting from really important issues.
System now is usually deploying to a workbench, automated installing the services I need there, automate making a disk image I can use to provision n-number of machines on the fly through a loadbalancer daemon (that monitors cpu load, network in, and other custom tasks on the slaves to determine whether to scale up or down), while still having the flexibility to automating scp'ing (fabric) code to the slaves as things update (also through the daemon) without re-provisioning everything on the slaves via boot from an image.
An aws consultant tried to move our monolith to a full on aws monstrosity with docker + elb + cloudfront + bunch of other stuff, went from about ~$15/day to ~$180/day in infrastructure costs, and a bunch of stuff was (more) broken. Decided to roll our own, and were around ~$20/day now, and can still bring it down below what we were paying before.
not strictly true.
this is how we do with kubernetes and docker:
for testing: have test images that have the basic supporting systems built in the image, but the application gets built from the entry point before starting with a configuration point provide the git credentials and branch to build.
startup is 5 minutes instead of instant, but that's handled by minReadySeconds or initialDelaySeconds, and there's no image build involved, just a change into the deployment metadata to trigger the pod cycling.
for production: you split the basic supporting image and add your built application as a layer on top, depending form it, so instead of building everything from a single docker file you only push the binaries and the docker push is basically instant.
if performance of that step concerns you that much because the binaries come with lots of assets, you can run CI directly on the docker registry host, so push bandwidth becomes a non issue, or you can bundle assets in an intermediate image with it's own lifecycle.
Maybe some books? Much appreciated!
But the app/services are not installed manually, instead I use debian packages. Every package will copy itself into a 'backup' directory during installation, so in case of of rollback I reinstall a copy from there. I have it working for 2 years this way without issues. Configuration is preloaded inside the packages.
-Must build an image before deploying and it takes time ... still the same ... -If I need quick hot fix RIGHT NOW, I can't just log in ... perhaps this is more related to having fixed instances vs auto scaling. -Must remember that launched containers do not close when ssh breaks ... + memory issues , disk issues... this is fixed and it is the biggest benefit for me.
You actually can ssh into a container if you’re debugging a deployment problem with something like: sleep 9999d your container, then do kubectl exec -it podname — bash
You can debug what the hell went wrong like was it a config? ENV? Fix all that and see if your deployment is working then fix it all at one with your changes in code then deploy. But agreed with the sentiment that ssh sucks. It sounds like your iteration cycle for checking deployments are long because of either tooling or knowing these small tricks that isn’t written in manuals.
As for dealing with runtime issues/hot fixes, put something like supervisor in your containers that might need temporary interventions in production and use that to restart the service without having to restart the container.
The reason I moved away from containers was because of a linux kernel bug which slowed down network requests through docker. I was working on a latency sensitive application at the time, so I just moved nginx and my containers to real machines.
Setting up things manually wasn't great, especially when deploying to multiple machines, so I just wrote a few different nix configurations and created some Digital Ocean machines with nixos-infect as a init script. There was definitely a learning curve as the language is peculiar (2-3 days to get nginx + postgres + redis + python app), but after doing it once I can pretty much deploy anything I want in a fast and immutable way. Replicating a similar system with a node.js app took less than a hour.
Changing something on the fly is still possible and you have access to everything. I run everything as a systemd services that I can limit with cgroups.
You may run into problems if you're relying on an old package that it's not on nixpkgs, but adding local derivations is quite straightforward (and you can always take inspiration from nixpkgs).
Overall, everything was easier though that might be because of not using SF rather than moving away from containers. An end to end deployment (Iac, App and integration tests) of ALL microservices would take between 18 - 45 minutes.
The lower figure was if it skipped IaC (limitations on ARM templates and Vnets forced serial deployment) and the upper figure was slowness on our agents. You'd have to add anywhere between 2 and 30 minutes for the build itself (30 minutes was the remnants of the monolith that was being slowly and methodically dismantled)
Could save another 8-16 minutes by not running any tests and work as ongoing to allow a more pick and choose approach as the full deployment method was starting to become too slow as the number of microservices increased (I think we had 12 when I left)
There was nothing that, say k8s, offered and we needed that wasn't available on our setup, plus it was miles cheaper than SF or AKS.
If it works for you then fine, if it doesn't the try something else
It helped us solve a few pain points with deploying to VMs:
- dependencies are packaged and shipped with you app (no apt get install) so dev and prod environments are the same - local development happens against a docker container that contain all our prod services running inside of it (redis, postgres, Kafka, etc.) - built in CD and rollbacks via deployment channels
Tooling such as skaffold.dev would alleviate some of your complaints around deploy lifecycle. It will watch for code changes and automatically build, tag and deploy the image.
Paketo Buildbacks is great as well, no more Dockerfiles and a consistent build process across different languages and frameworks
As somebody working on the ops side, this saved our asses so many times from making the already bad situation worse, for example: when the CI system caught a typo in the db migration script meant to fix the bug which only showed up on production, would brought down the whole site and we must have done a full DB restore.
Most of the senior engineers I worked with always knew, that there is a high chance that they can tell what their change would effect, but if they forget something it could cause massive problems and understood well, that our manager/team lead/etc would have been asked the really uncomfortable question from the business side, why did we went down and lost money. And I don't want to put them to a position, where the answer is, we made a change on production without testing, which brought us down, because we were in a hurry and as a result the 10 minutes outage became a 12 hours long outage.
I built https://m3o.com with this primarily in mind. I spent 5 years building an open source framework so that M3O would not be a silo and multi-environment is a thing. We made it so simple that `micro run [git url]` just works. And what's even better. We just run your code, we don't build a container, there's no CI/CD cruft, we just literally pull your code and run it. That simple.
Containers are a great way to package an app and its dependencies but we can prebaked then and inject the source code. What's more devs don't need to even think about it.
Reforming your architecture so that it is reproducible without the overhead of containers, orchestration, etc is definitely possible.
What containers enforce upon you is the rigor of reproducibility. If you are disciplined enough to wipe your infrastructure and redeploy using shell scripts instead, you will get the same benefits without the infra overhead (training, bulkiness, etc. — looking at you, Ansible.)
Be prepared to implement a fair few things you used to get “for free” though. Also, you must strongly resist the urge to “just quickly implement [feature we used to have with Docker]”.
It’s quicker to think your way out of needing the feature than it is to realize how hard it was for Docker to reliably implement it.
I recommend at least basing yourself on LXC or LXD, and starting from there. It’s much easier to scriptably reinstall an LXC host than it is to reimage bare metal.
So I rather deal with containers. However never do kubernetes, it has nice features but it's not worth it for small and medium companies.
The overhead in some of the csp kubernetes platforms is quite annoying for managing the infrastructure & container runtime ops. We've "hacked" many setup approaches to get what we need.
Other than that, no way.
But consider this. You can mount the entire host fs, say under `/var/host` or the likes, and you're tied back to Code on the machine. You can use the host network stack with `--net=host`. And you can even skip user and process space segmentation. And so what would that get you?
Containers are just threads with good default isolation. By default, the system capabilities are greatly reduced in a container (PTRACE for example, though sometimes that one hurts a little too). Systemd does the exact same things with it's units, careful segmenting out it's units into process groups with constrained capabilities.
The point being that containers are just processes with good default isolation. That's a win for versioning and security, at the cost of complexity.
Specific points..
> Must build an image before deploying and it takes time, so deploys are slow (of course we use CI to do it, it's not manual).
How large are your images? Build times can be very variable, but if you have a good base image it can be very fast. I think on average my build times have been about 5-20 seconds. There's definitely an art to it, but dockerfiles are not in any way complicated and setting up a multi-stage build to cache the expensive parts (compiling C libs, etc.) is fairly straightforward.
> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.
Oh my god, just don't do this. This is such an antipattern. If you think you need this you're really doing things wrong. If you need a hotfix RIGHT NOW, you should hit that fancy roll-back button (which containers and their orchestrators make easy..) and then figure out what went wrong, instead of trying to use vim to fix some line of code in production.
> Must remember that launched containers do not close when ssh breaks connection and they can easily linger for a couple of weeks.
Huh? TBH I don't understand why you would be expecting a closed SSH connection to shut it down -- these things are almost always meant for running a service -- but this is a really minor thing.
It sounds like you just don't want to change any of your current habits, not that the habits that containers encourage are somehow worse.
These days I just keep an Provision file has all the commands the target OS needs to run the program. Service files are copied to their respective places under $HOME/.config/systemd and are ready to use after systemctl --user daemon-reload and systemctl --user enable I install programs as services using systemd and monitor a bunch of them with monit. I also can't help but notice that they do work better on bare metal.
On my own projects I even deploy with cca 200 lines of Bash. It's super super fast, I understand every bit of it and can fix things quickly.
You should move to containers/CD/immutable infra when it makes sense to you and your project. But as someone already mentioned you can make containers fast as well.
Anyway if anybody is interested in actually understanding the basics of deployment I am writing a book on this topic called Deployment from Scratch[0]. I am also looking for beta readers.
Question number one: what is the business cost for a system being offline? If the cost is moderate, I find it difficult to argue against running two copies on a single server or VPN in different availability zones behind a robust (managed as a service) load balancer. Scale the servers or VPNs for anticipated work load.
Managed container services or platforms like AppEngine or Heroku have always made sense to me to to reduce labor costs.
Containerized infrastructure makes sense when the benefits out weigh the labor costs.
Have seen what you call old-school maybe, the VMware era, Xen, colocated infra ,owned infra, ... The Cloud(s)... and finally docker (and later kubernetes), in those years.
Now I can say I happily switched job 3 years ago, to a place that never entered the virtualization, neither the containers vogue.
On a team of 3 and half persons (enough to cover one week of on-call each month), we manage our near 500 physical servers (all of them with 2xXeon, between 32 and 256G of ram, 10G networks (in 40G switches/uplinks) and all kind of storage layouts with SATA, SSD and NVME ) in different colocations. Sometimes, with the help of remote hands (mainly for hard disk replacements).
During those 3 years (and before) I have seen lot of drama between my friends, by all kind of issues related to container technologies.
Complexity issues, high availability issues, bugs, maintenance issues, team issues, cross-team issues, management issues, security issues, networking issues, chaos issues, nobody-knows issues, cost issues, operational issues, etc
Yes, you can have all of them with bare metal servers too, indeed, but I look my daily work, and then I talk with friends in pro-container companies, and I feel good.
Nothing stops you to use ansible in bare metal servers, indeed you need to automate everything if you want to be happy: IPMI setup, firmware updates, OS install, service management, operations, monitoring... the most of that you fully automate, the better will be your daily routine.
Also, really important, going "old-school" doesn't free you to have a full stage environment, and a well thought and fault-resistant architecture and good backups.
Regarding your random list:
> Must build an image before deploying and it takes time, so deploys are slow (of course we use CI to do it, it's not manual).
Maybe going "quick and dirty" will release you of this, but doing things well, no mater if it's container or bare metal, won't.
I need to build packages (or ansible roles), test the change effects, pass tests and CI, the same (it's not mandatory, but it's convenient)
> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.
True, but we're in the same point... "old-school" is not the same than "quick and dirty". You can change containers in a quick and dirty way too if you want (and your company allows that).
> Must remember that launched containers do not close when ssh breaks connection and they can easily linger for a couple of weeks
Well each platform has their own issues to deal with. In bare metal you could have to deal with compatibility and degradation (or even breakage) issues for example.
I think, a good balance between all this could be: develop "quick and dirty" and freely, release via CI, deploy properly. No mater the platform.
If developers don't have agile environments, and they need to commit, wait for build, wait for CI, review the results etc for every and each line of code they want to try... I get what you mean, it's a pain.
http://www.smashcompany.com/technology/my-final-post-regardi...
If we need to deploy a hot fix, it is pushed to master directly with admin access or if there is something truly critically (which should never happen), we just block connections on our load balancers and/or clusters.
Containers are imho one of the best things out there for deployment, as you can deploy changes to a cluster of servers without needing to reconfigure them and without the codes/containers interfering each other.
Stateless is a pain unless it forces you to decouple from the state (in the database, presumably?) so you can roll back (or forth) easily.
I know everyone in the industry probably thinks: haven't you read all these manuals and guides? Even as a tourist, that is what we have been telling you.
As a tourist: no, I didn't see what the guides thought they were showing me. I've seen this a lot where I read through an intro just to learn (I don't program sites professionally), and then at the end I have an "aha" moment that was never (in my opinion) explicitly stated.
yes you can. you need to have your containers running opensshd. Despite what people tell you, even facebook does this on Tupperware ( Slide 18 of https://www.slideshare.net/Docker/aravindnarayanan-facebook1... ).
Our containers are built using supervisord running the app itself and sshd.
Containers allow you to do anything. Its people that are religious.
Our new CI happens to use containers, but:
- We aren't managing the containers ourselves, because we're an arts platform not a cloud infrastructure provider.
- We don't spawn new instances to run a (expletive) linting tool.
> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.
That is actually a feature, not a bug. You bypass QA at your peril, especially when you need a fix RIGHT NOW because you are likely going to make a bad situation much worse.
Keep image size as small as possible (reasonably) and optimize your images to that the things that change the most (code and config usually) as the last layer to take advantage of caching things that don't change often.
I have felt it worth my time to:
- separate the installation from the configuration
- have uniform environments (where a test environment is the same as prod in terms of tech stack and configuration, not necessarily volume)
- have consistent automated deployment approaches
- have consistent automated configuration approaches
With such an investment of time and effort, it has helped me:
- construct a prod-like environment (tech stack + configuration) to test in with the assurance that the app will experience the same constraints in prod.
- provide a tested deployment and configuration approach that environment owners can use to change production.
- push/place my hot fix in the test environment, validate it via automated tests, and then push the same hot fix to prod.
This has helped me ensure that the time between code-commit to go-live on prod is under an hour (including gatekeeping approvals from the environment owners) in even regulated environments. (I'm working on a book on this and will share more in a few weeks)
Depending upon the organisation and the specific project's automated test suite maturity, sometimes testing in prod may be the only option. If you must use containers but wish to retain the flexibility to change the binaries, then consider mounting the binaries into the container from a file system and use the containers for containment.
However, you should strongly consider moving to a package-once push-anywhere process.
If you face a situation where an intermediate environment is broken or blocked and you must push the artefact to prod, then by all means do so manually - after all, you ought to be certifying artefacts and not environments. An automated process that doesn't permit any authorised manual over-ride is only getting in the way. Such a manual override should provide for the right audibility and traceability, though.
Having said all this, the ideal would be uniform environments, consistent deployments and configuration, package-once deploy anywhere, audibility and traceability.
It's important to go with the choice that works best for you and the devs you have.
A lot of great stuff runs with and without containers.
If you have CI/CD, and you must have CI/CD, this should never be done. As soon as it is allowed, you will eventually have changes applied to production that are not in your VCS.
This is a feature, not a bug. Really this speaks to the entire post; as these aren't "quirks". It has a side effect of blocking the "Hero Programmer".
Enable rollbacks?
Docker images are extremely fast to build if you use a dockerignore or staging directory properly. Our multi-GB image builds in seconds.
The ability to do this is usually abused by developers, it is best not to have it. Cattle, not pets.
Not for any of the reasons that you pointed out, but primarily because running container orchestration platforms is a pain in the rear end.
Running between 8 and 14 Linode and Digital ocean VMS, I wanted to research if K8s or any docker setup would be good for me.
I have been doing linux server maintanance for over 16 years now, set-up and grew three webhosting companies. My current VMs are for my own tooling (selfhosted mail, nextcloud, matomo) my startup (mostly servers crunching OSM databases) and some leftover client hosting (mostly Rails and Sinatra).
The issues I ran into with Docker and k8s were all small and could probably be overcome given extra time and effort. But combined, I decided that containerization does not solve enough problems for me, that it warrants all the new problems it introduces.
In no particular order:
* Firewalling: On a normal linux machine or in a cluster a long-solved problem (resp: iptables/ufw or dedicated firewall servers/hardware). With containerized: no idea. Seems a largely unsolved issue, probably because "containerization should not do any firewalling". And partly because of how network mapping works, the problem is mitigated (but not solved!) there.
* Monitoring: at least three popular docker images that I used forgot to add logrotation, crashing after long term running. That in itself is bad (and shows lack of detailed finish in many images) but it shows you need to monitor. I used munin and am now migrating towards prometheus/grafana, but I really have no idea how to properly monitor a flock of containers. Another problem that has been solved for ages, but requires time&effort in a containerized env.
* Timed jobs (cronjobs). There is tooling to spin-up-and-run containers on timed schedules, but no-one as easy and stable as having Ansible write a file to /etc/cron.d/osm_poi_extract (or use systemd, fine with me).
* fail2ban, tripwire, etc: small tooling that is hard to do or requires extra attention in a containerized setup.
* unity: if you rely on 3rd party containers you'll quickly have a rainbow-spectrum of machines: Ubuntu LTS, Ubuntu edge, Ubuntu ancient, Alpine, CentOS, a rare freeBSD. The interface is consistent, the underlying tech is not: troubleshooting is a terror if you first have to spend 20 minutes of googling "how do I know what version Alpine I have and how do I get curl on this damned thing to see if elasticsearch is maybe giving results on localhost".
I realize that my 16+ years of prior linux-sysadmin experience hinder me here: I'm probably trying to make a containerized setup do the same that I'm used to, but which is not really needed (tripwire, fail2ban, firewalls?).
But for me, containers -in production- solve no problem. Yet they introduce a whole range of problems that in a more classic setup have been solved for, sometimes literally, decades. "Infra and state in a documented revision control" is solved mostly with ansible, saltstack, chef or puppet; this can and should certainly evolve and improve. Networking is solved with, well, networking. Hell, one of my previous hosting setups had a cron job that would get "the latest /etc/hosts" hourly: that was our entire DNS setup. it worked. reliably", /etc/hosts and docker, don't get me started (the reply would probably be: but you don't need /etc/hosts in k8s).
Beautiful summary of why trends are cyclical. Problems with current trend lead to invention of new trend. Everyone jumps to new trend. Eventually people find problems with new trend but they've now forgotten the problems with old trend so begin to move back.
No, nor would I ever.
> Are you satisfied?
Yes, I am. Not claiming that it's perfect - everything has it's benefits and downsides, and it's always a question of using the right tool for the job. But in my case, the way I work, the benefits of containers very clearly outweigh the downsides by a large margin. So much so, that I'd still be using containers for local development, even if the production environment was non-containerized. Switching between projects with different setups has never been so easy.
> Must build an image before deploying and it takes time, so deploys are slow
Deployments sure can be slow. A full deployment to AWS can take 10 to 15 minutes for me. But it comes with zero downtime - which is top priority for my systems. The load balancer only spins down the old instances, after the new ones are up and running (yes, I have multiple containers, and multiple instances of each container running at the same time). I can build the new containers, fully test them locally, and only deploy them AFTER I know everything is fine. And I can 100% rely on the fact that the deployed containers are exactly identical to what I tested locally.
> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.
I completely stopped doing that over 15 years ago. Back then I started modifying files locally, pushing them to versioning and then pulling the new version on production. That helped to avoid so much pain, that I'm never going back ever. No way.
While it would be possible to setup a container to pull the newest source on every startup (thus allowing hot fixes through versioning as described above) - I actually prefer to build containers, test them locally, and only deploy them, once I know everything works. This way I rarely ever need fast hot fixes in the first place.
It's just my way of doing things - and for me containers are the right tool for the job. That does of course not mean, that they are the right choice for everyone. But I'm currently doing a lot of things that I think are pretty great, which would be outright impossible without containers.
> Must remember that launched containers do not close when ssh breaks connection and they can easily linger for a couple of weeks.
What? How? I don't even...
- Before, when I just SSH:ed to servers and rscynced files, I had many situations where I forgot what change I did on a server, how it was set up, and so on. I found Docker when I was looking for tools to put those various commands and bash scripts into one place for central storage and easy re-creation of the environment. Dockerfiles and containers makes everything reproducible and reduced to the smallest amount of steps needed to get something correctly setup.
- I would find that something worked locally but not on remote due to different versions of some dependency. Docker images ensured I could test in almost identical environments. It's also easy to try new apps without worrying about polluting the current environment, so I'm not faster in trying out solutions and rolling back/forward dependencies.
- I would test things on the server, because I was not able to run the exact setup on my local computer. This takes time and risk breaking real stuff. Docker images fixes this.
- I would struggle with knowing what services ran or not. Part of this came from me not knowing all the ins and outs of Linux, so I felt it was hard to get an overview of what's running. docker ps makes it easy to see what's running.
- Updating a traditional app often required me to change more things than just update a source tree. It could be starting/stopping services, adding files in other places. So updates tended to become manual and error-prone (I didn't do them often enough to remember by heart what's needed). Docker and primarily docker-compose encapsulates all the stuff into simple commands.
- Before, my apps would use mixed sources of configuration - environment, config files in different places, command line arguments. More importantly, they would often be stateful, saving things to files that needed to be managed. With Docker, i was somewhat forced to align all config in one place and make everything else stateless and that makes things much cleaner.
- As a hobbyist, I rarely had the time before to go over the security of my servers. I find that Docker provides a better secure default in terms of making it clear what services are open to the world and by reducing the attack surface.
Of course, containers have brought some problems too:
- Lots of apps were not ready to be containerised, or required a lot of hacks to do so. So I've done a lot of debugging and sleuthing to find the right way to run and configure various apps in Docker. These days, the official Docker images are much better and exist for almost all apps, but there is still a lot of "messy magic" built into their Dockerfiles.
- More often than not you want to get into the container to run something, debug, ping, check a file, etc. This gets more convoluted than before, and you need to learn some new tricks to e.g. pipe in and out data. It's made harder by the fact that many images are so minimalistic you don't have access to the full range of tools inside the containers.
- Logging in Linux is IMO a mess since before, but still with Docker it's not great, just mashing up stdout for all containers and unclear rotation procedures. There are many ways to deal with it, but they often require lots more tooling, and it still gives me some headache.
- Yes, waiting for build and transfer of images adds a bit of time to deploy. And it's somewhat annoying to deal with two version control systems, e.g. git and Docker Hub. I haven't gone all in on CI yet but that would automate more and just let me use git.
I can assure you with nix and cmake it's 1000x more complicated than it needs to be.