In my homelab I use LXD/LXC quite a bit. Linux containers provide pretty decent alternative to full blown virtualization with smaller overhead and some extra perks like easier local filesystem access. All things considered, I’ve been happy with LXD for many years.
However there are places where the illusion of completely separated virtual machines shows some cracks when you look close enough. This is a story of one such crack that was haunting me for months.
After years of using LXD, one of the containers started having something that on the surface looked like memory issues. The container itself had 8GB memory limit, workload running on it needed around 5GB of RAM. It should run within these constraints just fine - and in fact it did for many years. But then one day..
After couple weeks since last reboot, the service starts to misbehave in
weird way. The web interface is super slow to load, and eventually just
won’t load at all. This is non-critical service in my homelab and
by the time I get to investigate, I can’t even ssh in.
Fortunately this is where convenience of containers helps, so I can
still lxc exec
in and have a look. Still, I can hardly run anything.
System behaves as if it’s absolutely out of memory and sure, free -m
reports 0MB free, yet over 4GB of RAM as available
, most
of it in buff/cache
according to the utility.
Now if you try to search online why there’s low free
memory on
Linux, the most likely answers you’re going to find is that "this
is normal" and "it’s expected behavior of Linux" and "free RAM
is wasted RAM". I know that, we all know that. It is so common question
from Linux newbies, that there is an actual
webpage dedicated to this problem. So
when I try to search why free
actually is zero my chances of
finding anything relevant are just drowned in sea of "this is how
linux memory management works" answers.
Understandably so, that would be my first reaction as well.
Eventually system is so starved of memory, I fail to do anything reasonable within the container shell and I decide to just restart the container. The problem disappears with reboot and all is well with the world up until couple weeks later.
In the next few months this repeats about once every 5-ish weeks. I’m super busy at the time, so I just reboot container to quickly restore the service and promise to have a look later. Which I avoid for a while.
Finally there’s free evening and I still feel fresh enough to do
some investigation, so I log in into the container and have a look. This
time the service is still running just fine, system is not running out
of memory at all, so I wonder if there will be anything to find. free
memory is about where I’d expect it (couple hundred MB), so I peek
at available
sitting somewhere in 3GB area. Again, nothing unusual.
But now I’m in the discovery mode. free -m
probably won’t
tell me much more, so I turn to contents of /proc/meminfo
.
There’s a bunch of metrics there, so I open
documentation
to confirm, what each one means. Some are pretty obvious, some can be a
bit tricky.
One particular sentence caught my eye:
Shmem: Total memory used by shared memory (shmem) and tmpfs
🤔 Hmmm..
It sure looks like that’s the case! df
tells me there’s
tmpfs mounted on /run
with almost 2GB (at the time of investigation)
of space used. This memory is then reported in buff/cache
column and
also as available
. Which sounds wrong, but it actually makes sense -
this memory can be dropped to swap if needed. So even if it’s not
really "buffer", it can sort of be freed like one. The thing is, you
need to have swap, which I don’t. (for reasons)
So now I know, it’s tmpfs eating my memory unable to free it up when needed.
That explains the reason why memory management behaved the way it did,
but why is it problem now and not last couple years? I can very
confidently rule out application knowing it does not use /run
at all.
Something else must have changed.
Quick inspection with du
points to journal
using up all this tmpfs
space with its logs. I can already hear the systemd hate pitchforks
getting ready as I’m typing this, but surely journald folks
can’t be this reckless and just use all the RAM? And of course
they aren’t. In fact the internal
logic is
pretty sane and defaults to using up to 10% of space for logs while
making sure 15% is free and with hard limit of 4GB.
Just as I was looking at this, the journald was already using almost 2GB of RAM - way above the 10% I’d expect. (Which with 8GB RAM should be around 800MB) And that is even before the service is impacted, looking at graphs journald was peaking at 4GB space used.
Huh?
Well, the explanation for that ends up being quite simple, maybe a bit
unexpected. When mounting tmpfs an optional size
flag can be provided.
This will set the desired size for the mount. The
default is
"half of your physical RAM without swap". The somewhat surprising
(although very logical) fact is that this default value does not in any
way reflect amount of memory available in specific container. So on my
server with 96GB RAM, the /run
is created with 48GB of capacity:
$ df -h /run/
Filesystem Size Used Avail Use% Mounted on
tmpfs 48G 1.8G 46G 3% /run
💡 Aha!
So when journald starts, it looks at /run
, it can see 48GB available.
10% of that is just under 5GB, so maximum of 4GB is used instead.
Looking back at graphs when the service was impacted, journald really
was near the 4GB limit - almost half of the memory occupied by logs.
Leaving the rest to the system and apps, which is just not enough.
At no stage is the container using more memory than allocated. Even if
we tried to fill up the whole /run
with files, Linux wouldn’t
let container use more than the 8GB allocated and tmpfs would simply
fail writes with file system full
.
So now that we came all the way up from failing web interface down to
the default size
in tmpfs, there’s just one question unanswered.
Why did the LXD container run happily for years and only started failing
couple months back.
Well, you see. I’ve managed to buy some cheap memory (it’s older DDR3 system) couple months back. I’ve bumped server RAM from 48GB to 96GB. This in turn changed the tmpfs default from 24GB to 48GB, which bumped up journald limit from about 2.4GB to 4GB. And the 1.6GB difference was juuuust enough to not fit together with the application anymore.
So here’s the punchline: My container ran out of memory, because I added more RAM to the server.
This article is part of Linux category. Last 16 articles in the category:
You can also see all articles in Linux category or subscribe to the RSS feed for this category.