Why did containers happen?

51 points by todsacerdoti 13 hours ago

https://www.youtube.com/watch?v=eMU2mZgo99c

alphazard 4 hours ago

Containers (meaning Docker) happened because CGroups and namespaces were arcane and required lots of specialized knowledge to create what most of us can intuitively understand as a "sandbox".

Cgroups and namespaces were added to Linux in an attempt to add security to a design (UNIX) which has a fundamentally poor approach to security (shared global namespace, users, etc.).

It's really not going all that well, and I hope something like SEL4 can replace Linux for cloud server workloads eventually. Most applications use almost none of the Linux kernel's features. We could have very secure, high performance web servers, which get capabilities to the network stack as initial arguments, and don't have access to anything more.

Drivers for virtual devices are simple, we don't need Linux's vast driver support for cloud VMs. We essentially need a virtual ethernet device driver for SEL4, a network stack that runs on SEL4, and a simple init process that loads the network stack with capabilities for the network device, and loads the application with a capability to the network stack. Make building an image for that as easy as compiling a binary, and you could eliminate maybe 10s of millions of lines of complexity from the deployment of most server applications. No Linux, no docker.

Because SEL4 is actually well designed, you can run a sub kernel as a process on SEL4 relatively easily. Tada, now you can get rid of K8s too.

tliltocatl 4 hours ago

Containers and namespaces are not about security. They are about not having singleton objects at the OS level. Would have called it virtualization if the word wasn't so overloaded already. There is a big difference that somehow everyone misses. A bypassable security mechanism is worse than useless. A bypassable virtualization mechanism is useful. It is useful to be able to have a separate root filesystem just for this program - even if a malicious program is still able to detect it's not the true root.
As about SEL4 - it is so elegant because it leaves all the difficult problems to the upper layer (coincidentally making them much more difficult).
- alphazard 4 hours ago
  
  > As about SEL4 - it is so elegant because it leaves all the difficult problems to the upper layer (coincidentally making them much more difficult).
  I completely buy this as an explanation for why SEL4 for user environments hasn't (and probably will never) take off. But there's just not that much to do to connect a server application to the network, where it can access all of its resources. I think a better explanation for the lack of server side adoption is poor marketing, lack of good documentation, and no company selling support for it as a best practice.
  - frumplestlatz 4 hours ago
    
    The lack of adoption is because it’s not a complete operating system.
    Using sel4 on a server requires complex software development to produce an operating environment in which you can actually do anything.
    I’m not speaking ill of sel4; I’m a huge fan, and things like it’s take-grant capability model are extremely interesting and valuable contributions.
    It’s just not a usable standalone operating system. It’s a tool kit for purpose-built appliances, or something that you could, with an enormous amount of effort, build a complete operating system on top of.
    
    josephg 4 hours ago
    
    Yes. I really hope someone builds a nice, usable OS with SeL4 as a base. If SeL4 is like the linux kernel, we need a userland (GNU). And a distribution that's simple to install and make use of.
    I'd love to work on this. It'd be a fun problem!
bombcar 4 hours ago

Is that why containers started? I seem to recall them taking off because of dependency hell, back in the weird time when easy virtualization wasn't insanely available to everyone.
Trying to get the versions of software you needed to use all running on the same server was an exercise in fiddling.
- mbreese 4 hours ago
  
  I think there were multiple reasons why containers started to gain traction. If you ask 3 people why they started using containers, you're likely to get 4 answers.
  For me, it was avoiding dependencies and making it easier to deploy programs (not services) to different servers w/o needing to install dependencies.
  I seem to remember a meetup in SF around 2013 where Docker (was it still dotCloud back then?) was describing a primary use-case was easier deployment of services.
  I'm sure for someone else, it was deployment/coordination of related services.
  - dabockster 3 hours ago
    
    The big selling points for me were what you said about simplifying deployments, but also the fact that a container uses significantly less resource overhead than a full blown virtual machine. Containers really only work if your code works in user space and doesn't need anything super low level (eg TCP network stack), but as long as you stay in user space it's amazing.
- alphazard 4 hours ago
  
  Yes, totally agree that's a contributor too. I should expand that by namespaces I mean user, network, and mount table namespaces. The initial contents of those is something you would have to provide when creating the sandbox. Most of it is small enough to be shipped around in a JSON file, but the initial contents of a mount table require filesystem images to be useful.
- ctkhn 4 hours ago
  
  On a personal level, that's why I started using them for self hosting. At work, I think the simplicity of scaling from a pool of resources is a huge improvement over having to provision a new device. Currently at an on-prem team and even moving to kubernetes without going to cloud would solve some of the more painful operational problems that send us pages or we have to meet with our prod support team about.
- chasd00 4 hours ago
  
  iirc full virtualization was expensive ( vmware ) and paravirtualization was pretty heavyweight and slow ( Xen ). I think Docker was like a user friendlier cgroups and everyone loved it. I can't remember the name but there was a "web hosting company in a box" software that relied heavily on LXC and probably was some inspiration for containerization too.
  edit: came back in to add reference to LXC, it's been probably 2 decades since i've thought about that.
tptacek 4 hours ago

This makes sense if you look at containers as simply a means to an end of setting up a sandbox, but not really much sense at all if you think of containers as a way to make it easy to get an arbitrary application up and running on an arbitrary server without altering host system dependencies.
- ianburrell 4 hours ago
  
  I suspect that containers would have taken off even without isolation. I think the important innovation of Docker was the image. It let people deploy consistent version of their software or download outside software.
  All of the hassle of installing things was in the Dockerfile, and it was run in containers so more reliable.
  - tptacek 3 hours ago
    
    I agree: I think the container image is what matters. As it turns out, getting more (or less) isolation given that image format is not a very hard problem.
- zellyn 4 hours ago
  
  Agreed. There was a point where I thought AMIs would become the unit of open source deployment packaging, and I think docker filled that niche in a cloud-agnostic way
  - zellyn 4 hours ago
    
    ps I still miss the alternate universe where Kenton won the open source deployment battle :-)
orbifold 3 hours ago

It would be great if we got "kernel independent" Nvidia drivers. I have some experience with bare-metal development and it really seems like most of what an operating system provides could be provided in a much better way as a set of libraries that make specific pieces of hardware work, plus a very good "build" system.
stinkbeetle an hour ago

cgroups first came from resource management frameworks that IIRC came out of IBM and got into some distro kernels for a time but not upstream.
Namespaces were not an attempt to add security, but just grew out of work to make interfaces more flexible, like bind mounts. And Unix security is fundamentally good, not having namespaces isn't much of a point against it in the first place, but now it does have them.
And it's going pretty well indeed. All applications use many kernel features, and we do have very secure high performance web and other servers.
L4 systems have been around for as long as Linux, and SEL4 in particular for 2 decades. They haven't moved the needle much so I'd say it's not really going all that well for them so far. SEL4 is a great project that has done some important things don't get me wrong, but it doesn't seem to be a unix replacement poised for a coup.
ants_everywhere 4 hours ago

> Because SEL4 is actually well designed, you can run a sub kernel as a process on SEL4 relatively easily. Tada, now you can get rid of K8s too.
k8s is about managing clusters of machines as if they were a single resource. Hence the name "borg" of its predecessor.
AFAIK, this isn't a use case handled by SEL4?
- alphazard 4 hours ago
  
  The K8s master is just a scheduling application. It can run anywhere, and doesn't depend on much (just etcd). The kublet (which runs on each node) is what manages the local resources. It has a plugin architecture, and when you include one of each necessary plugin, it gets very complicated. There are plugins for networking, containerization, storage.
  If you are already running SEL4 and you want to spawn an application that is totally isolated, or even an entire sub-kernel it's not different than spawning a process on UNIX. There is no need for the containerization plugins on SEL4. Additionally the isolation for the storage and networking plugins would be much better on SEL4, and wouldn't even really require additional specialized code. A reasonable init system would be all you need to wire up isolated components that provide storage and networking.
  Kubernetes is seen as this complicated and impressive piece of software, but it's only impressive given the complexity of the APIs it is built on. Providing K8s functionality on top of SEL4 would be trivial in comparison.
  - ants_everywhere 3 hours ago
    
    I understand what you're saying, and I'm a fan of SEL4. But isolation isn't one of the primary points of k8s.
    Containerization is after all, as you mentioned, a plugin. As is network behavior. These are things that k8s doesn't have a strong opinion on beyond compliance with the required interface. You can switch container plugin and barely notice the difference. The job of k8s is to have control loops that manage fleets of resources.
    That's why containers are called "containers". They're for shipping services around like containers on boats. Isolation, especially security isolation, isn't (or at least wasn't originally) the main idea.
    You manage a fleet of machines and a fleet of apps. k8s is what orchestrates that. SEL4 is a microkernel -- it runs on a single machine. From the point of view of k8s, a single machine is disposable. From the point of view of SEL4, the machine is its whole world.
    So while I see your point that SEL4 could be used on k8s nodes, it performs a very different function than k8s.
Eikon 4 hours ago
```
    make tinyconfig 
```
can get you pretty lean already.
https://archive.kernel.org/oldwiki/tiny.wiki.kernel.org/

chatmasta 4 hours ago

My headcanon is that Docker exists because Python packaging and dependency management was so bad that dotCloud had no choice but to invent some porcelain on top of Linux containers, just to provide a pleasant experience for deploying Python apps.

bane an hour ago

That's basically correct. But the more general problem is that engineers simply lost the ability to succinctly package applications and their dependencies into simple to distribute and run packages. Somehow around the same time Java made .jar files mainstream (just zip all the crap with a manifest), the rest of the world completely forgot how to do the equivalent of statically linking in libraries and that we're all running highly scheduled multithreaded operating systems now.
The "solution" for a long time was to spin up single application Virtual Machines, which was a heavy way to solve it and reduced the overall system resources available to the application making them stupidly inefficient solutions. The modern cloud was invented during this phase, which is why one of the base primitives of all current cloud systems is the VM.
Containers both "solved" the dependency distribution problem as well as the resource allocation problem sort of at once.
ecnahc515 3 hours ago

Sure they definitely were using Docker for their own applications, but also dotCloud was itself a PaaS, so they were trying to compete with Heroku and similar offerings, which had buildpacks.
The problem is/was that buildpacks aren't as flexible and only work if the buildpack exists for your language/runtime/stack.
frumplestlatz 4 hours ago

Pretty much this; systems with coherent isolated dependency management, like Java, never required OS-level container solutions.
They did have what you could call userspace container management via application servers, though.
- drowsspa 4 hours ago
  
  NodeJS, Ruby, etc also have this problem, as does Go with CGO. So the problem is the binary dependencies with C/C++ code and make, configure, autotools, etc... The whole C/C++ compilation story is such a mess that almost 5 decades ago inventing containers was pretty much the only sane way of tackling it.
  Java at least uses binary dependencies very rarely, and they usually have the decency of bundling the compiled dependencies... But it seems Java and Go just saw the writing on the wall and mostly just reimplement everything. I did have problems with the Snappy compression in the Kafka libraries, though, for instance .
  - skydhash 3 hours ago
    
    The issue is with cross platform package management without proper hooks for the platform themselves. That may be ok if the library is pure, but as soon as you have bindings to another ecosystem (C/C++ in most cases), then it should be user/configurable instead of the provider doing the configuration with post installs scripts and other hacky stuff.
    If you look at most projects in the C world, they only provide the list of dependencies and some build config Makefile/Meson/Cmake/... But the latter is more of a sample and if your platform is not common or differs from the developer, you have the option to modify it (which is what most distros and port systems do).
    But good luck doing that with the sprawling tree of modern packages managers. Where there's multiple copies of the same libraries inside the same project just because.
IshKebab 3 hours ago

Exactly this, but not just Python. The traditional way most Linux apps work is that they are splayed over your filesystem with hard coded references to absolute paths and they expect you to provide all of their dependencies for them.
Basically the Linux world was actively designed to apps difficult to distribute.
- dabockster 3 hours ago
  
  > Basically the Linux world was actively designed to apps difficult to distribute.
  It has "too many experts", meaning that everyone has too much decision making power to force their own tiny variations into existing tools. So you end up needing 5+ different Python versions spread all over the file system just to run basic programs.

tacker2000 3 hours ago

The author suggests that Docker doesnt help development and that devs just spin up databases, but I have to disagree with that and Im pretty sure i am not the only one.

All my projects (primarily web apps) are using docker compose which configures multiple containers (php/python/node runtime, nginx server, database, scheduler, etc) and run as a dev environment on my machine. The source code is mounted as a volume. This same compose file is then also used for the deployment to the production server (with minor changes that remove debug settings for example).

This approach has worked well for me as a solo dev creating web apps for my clients.

It has also enabled extreme flexibility in the stacks that I use, I can switch dev environments easily and quickly.

figassis 3 hours ago

> I was always surprised someone didn't invent a tool for ftping to your container and updating the PHP

We thought of it, and were thankful that it was not obvious to our bosses, because lord forbid they would make it standard process and we would be right back where we started, with long lived images and filesystem changes, and hacks, and managing containers like pets.

wucke13 an hour ago

My take: containers forced devepopers to declare various aspects of the application in a standardized, opinioated way:

- Persistant state? Must declare a volume. - IO with external services? Must declare the ports (and maybe addresses). - Configurable parameters? Must declare some env variables. - Trasitive dependecies? Must declare them, but using a mechanism of your choosing (e.g. via the package manager of your base image distro).

Separation of state (as in persistency) and application (as in binaries, assets) makes updates easy. Backups also.

Having most all IO visible and explicit simplifies operation and integration.

And a single, (too?!?) simple config mechanism increases reusability, by enabling e.g. lightweight tailoring of generic application service containers (such as mariadb).

Together this bunch of forced, yet leaky abstractions is just good enough to foster immense reuse & composability on to a plethora of applications, all while allowing to treat them almost entirely like blackboxes. IMHO that is why OCI containers became this big, compared to other virtualization and (application-) cuntainer technologies.

aPoCoMiLogin 4 hours ago

it happened because the story of dependencies (system & application) was terrible. the ability to run the app on different distribution/kernel/compiler/etc was hard. there were different solutions like vagrant, but they were heavy and the DX wasn't there

bane 2 hours ago

Containers happened because nobody can be bothered to build an entire application into a single distributable executable anymore - heck even the tooling barely exists anymore. But instead of solving problems like dependency management and linking, today's engineers simply build piles of abstraction into the problem space until the thing you want to do more than anything (i.e. execute an application) becomes a single call.

Of course you now need to build and maintain those abstract towers, so more jobs for everybody!

spullara 4 hours ago

Because dependencies on Unix are terrible for some languages that assume things are installed globally.

BandButcher 4 hours ago

"The compute we are wasting is at least 10x cheaper, but we have automation to waste it at scale now."

So much this. keep it simple, stupid (muah)

hedgehog 5 hours ago

Containers happened because running an ad network and search engine means serving a lot of traffic for as little cost as possible, and part of keeping the cost down is bin packing workloads onto homogeneous hardware as efficiently as possible.

https://en.wikipedia.org/wiki/Cgroups

(arguably FreeBSD jails and various mainframe operating systems preceded Linux containers but not by that name)

cbdumas 4 hours ago

What does the 'ad network and search engine' have to do with it? Wouldn't any organization who serves lots of traffic have the same cost cutting goals you mentioned?
- wmf 4 hours ago
  
  It's an oblique way to say that Linux cgroups and namespaces were developed by Google.
  - hedgehog 4 hours ago
    
    Yes, to expand: Both search and ads mean serving immense amounts of traffic and users while earning tiny amounts of revenue per unit of each. The dominant mid-90s model of buying racks of Sun and NetApp gear, writing big checks to Oracle, etc, would have been too expensive for Google. Instead they made a big investment in Linux running on large quantities of commodity x86 PC hardware, and building software on top of that to get the most out of it. That means things like combining workloads with different profiles onto the same servers, and cgroups kind of falls out of that.
    Other companies like Yahoo, Whatsapp, Netflix also followed interesting patterns of using strong understanding of how to be efficient on cheap hardware. Notably those three all were FreeBSD users at least in their early days.

guigar 4 hours ago

I love this sentence about DevOps "Somehow it seems easier for people to relate to technology than culture, and the technology started working against the culture."

draw_down an hour ago

[dead]

jcelerier 4 hours ago

For me the main reason to use containers is "one-line install any linux distro userspace". So much simpler than installing a dozen VirtualBox boxes to test $APP on various versions of ubuntu, debian, nixos, arch, fedora, suse, centos etc.

kccqzy 4 hours ago

Yeah nowadays we have the distrobox(1) command. Super useful. But certainly that's not why containers happened.

blu3h4t 5 hours ago

You can laugh or not but its because they never finished gnu/hurd :D

rshnotsecure 5 hours ago

Fascinating documentary on Kubernetes for those who have 50 minutes. Gives more background to the "Container Wars". The filmmakers also have documentaries on the history of Python, Argo, etc.

Some highlights:

- How far behind Kubernetes was at the time of launch. Docker Swarm was significantly more simple to use, and Apache Mesos scheduler could already handle 10,000 nodes (and was being used by Netflix).

- RedHat's early contributions were key, despite having the semi-competing project of OpenShift.

- The decision to Open Source K8S came down to one meeting brief meeting at Google. Many of the senior engineers attended remotely from Seattle, not bothering to fly out because they thought their request to go OS was going to get shutdown.

- Brief part at the end where Kelsey Hightower talks about what he thinks might come after Kubernetes. He mentions, and I thought this was very interesting ... Serverless making a return. It really seemed like Serverless would be "the thing" in 2016-2017 but containers were too powerful. Maybe now with KNative or some future fusing of Container Orchestration + K8S?

[1] - https://youtu.be/BE77h7dmoQU

btreecat 3 hours ago

I feel that's going to be more interesting than this video. The speaker is very unpracticed.

jmclnx 5 hours ago

FreeBSD jails years ago based upon a user request.

>hosting provider's ... desire to establish a clean, clear-cut separation between their own services and those of their customers

https://en.wikipedia.org/wiki/FreeBSD_jail

My guess Linux started getting requests rom various orgs for a while, so in true Linux fashion, we got a a few different container type methods years later.

I still think Jails are the best of the bunch, but they can be a bit hard to setup. Once setup, Jails works great.

So here we are :)

all2 5 hours ago

Likely because Plan9's 'everything-is-a-filesystem' failed.

walkabout 5 hours ago

The standard answer is, "because inventing and implementing them was easier than fixing Python packaging."
- LexiMax 4 hours ago
  
  I think "fixing distro packaging" is more apropos.
  In a past life, I remember having to juggle third-party repositories in order to get very specific versions of various services, which resulted in more than few instances of hair-pull-inducing untangling of dependency weirdness.
  This might be controversial, but I personally think that distro repos being the assumed first resort of software distribution on Linux has done untold amounts of damage to the software ecosystem on Linux. Containers, alongside Flatpak and Steam, are thankfully undoing the damage.
  - walkabout 4 hours ago
    
    > This might be controversial, but I personally think that distro repos being the assumed first resort of software distribution on Linux has done untold amounts of damage to the software ecosystem on Linux.
    Hard agree. After getting used to "system updates are... system updates; user software that's not part of the base system is managed by a separate package manager from system updates, doesn't need root, and approximately never breaks the base system (to include the graphical environment); development/project dependencies are not and should not be managed by either of those but through project-specific means" on macOS, the standard Linux "one package manager does everything" approach feels simply wrong.
    
    bsder 2 hours ago
    
    > development/project dependencies are not and should not be managed by either of those but through project-specific means" on macOS, the standard Linux "one package manager does everything" approach feels simply wrong.
    This predates macOS. The mainframe folks did this separation eons ago (see IBM VM/CMS).
    On Unix, it's mostly the result of getting rid of your sysadmins who actually had a clue. Even in Unix-land in the Bad Old Days(tm), we used to have "/usr/local" for a reason. You didn't want the system updating your Perl version and bringing everything to a screeching halt; you used the version of Perl in /usr/local that was under your control.
  - bombcar 4 hours ago
    
    I wonder if it can be traced back to something RedHat did somewhere, because it may have all begun once you COULDN'T be absolutely certain that anything even remotely "enterprise" was running on a RedHat.
    
    LexiMax 4 hours ago
    
    I think it's a natural outgrowth of what Linux is.
    Linux is just a kernel - you need to ship your own userland with it. Therefore, early distros had to assemble an entire OS around this newfangled kernel from bits and pieces, and those bits and pieces needed a way to be installed and removed at will. Eventually this installation mechanism gets scope creep and and suddenly things like FreeCiv and XBill are distributed using the same underlying system that bash and cron use.
    This system of distro packaging might be good as a selling point for a distro - so people can brag about their distro comes with 10,000 packages or whatever. That said, I can think of no other operating system out there where the happiest path of releasing software is to simply release a tarball of the source, hope a distro maintainer packages it for you, hope they do it properly, and hope that nobody runs into a bug due to a newer or older version of a dependency you didn't test against.
    
    skydhash 3 hours ago
    
    Yours is a philosophy I encounter more and more. Where there should be that unified platform, ideally fast moving, where software is only tested against $latest. Stability is a thing of the past. The important thing is more feature.
    Instead of designing a solution and perfecting it overtime, it's endless tweaking where there's a new redesign every years. And you're supposed to use the exact computer as the Dev to get their code to work.
- aaroninsf 4 hours ago
  
  Ngl this is why I started using them
jauntywundrkind 5 hours ago

Never grew popular, perhaps. But I'm not sure how it failed, and not sure how many of the Venm Diagrams of concerns plan9 really has with containers.
Yes there was an idea of creating bespoke filesystems for apps, custom mount structures that plan9 had. That containers also did something semi-parallel to. But container images as read only overlays (with a final rw top overlay) feel like a very narrow craft. Plan9 had a lot more to it (everything as a file), and containers have a lot more to them (process, user, net namespaces, container images to pre-assembled layers).
I can see some shared territory but these concerns feel mostly orthogonal. I could easily imagine a plan9 like entity arising amid the containerized world: these aren't really in tension with each other. There's also a decade and a half+ gap between Plan9's hayday and the rise of containers.

Hizonner 4 hours ago

Original sin.

IshKebab 3 hours ago

Because Linux devs generally suck at making portable packages that are easy to install.

lisbbb 3 hours ago

I loved the assertion that AI ate up all the budget and that K8s is now "boring" technology. That's fine because it was getting pretty annoying with all the clone competitors for practically everything that were popping up every month!

Do you use K8s? No! That's old! I use Thrumba! It's just a clone of K8s by some startup because people figured out that the easiest way to make gobs of money is/was to build platform products and then get people to use them.

IlikeKitties 5 hours ago

I mean, containers do lend themselves to cargo culting by their very nature.