Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 27 Sep 2019 18:17:30 +0300
From: croco@...nwall.com
To: owl-users@...ts.openwall.com
Subject: x86 containers under x86_64 hardware node: threads don't work

Hi All,

I've got a kind of funny story, heh; but may be there's a solution to work
my problem around somehow, that would save me a week or so of my life.

I'm used to run an Owl server with several OpenVZ containers (with Owl
inside them as well).  Until certain recent moment (about a year ago) both
HN and containers were 32-bit (even though the computer was 64-bit capable,
I didn't replace the system installed earlier on a 32-bit one), but when
the HN died recently for another time due to hardware failure, I replaced
it with a 64-bit netbook and, since I failed to let 32-bit Owl run on it, I
had to move to Owl x86_64.

However, as for the containers, I simply copied them from the old HN's HDD
and started, so they remained 32-bit.  I almost immediately noticed that
_some_ programs fail to run inside these containers; namely, I had to move
my MySQL to the HN itself and let the websites running in the containers
use it, as MySQL didn't want to start.  Another notable thing was my ftp
site: xinetd didn't want to run either, so I moved the ftp to the HN, too.
All the other soft I used to use, including Apache, run with no problem,
and the whole construction served me for a year or so.

Today I tried to install git inside one of the containers, and it refused
to work complaining like this:

   error: cannot create thread: Resource temporarily unavailable

After some unsuccessful attempts to 'solve' this by changing container's
config parameters, I started suspecting the problem is with threads as
such, and to check for it, tried to run a small (really small!)
pthread-based demo program.  It failed to create any threads at all, that
is, pthread_create always returns that damn EAGAIN.

Taking a look at strace's output, I noticed the problem definitely has
nothing to do with "temporary failures" nor with "insufficiency of
resources".  Once the main process clone()'s the thread, the clone syscall
passes successfully, and then the thread acts like this:

1003  old_mmap(0xbf600000, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x261000
1003  munmap(0x261000, 2097152)         = 0
1003  old_mmap(0xbf400000, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x261000
1003  munmap(0x261000, 2097152)         = 0
1003  old_mmap(0xbf200000, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x261000
1003  munmap(0x261000, 2097152)         = 0
1003  old_mmap(0xbf000000, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x261000
1003  munmap(0x261000, 2097152)         = 0
1003  old_mmap(0xbee00000, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x261000
1003  munmap(0x261000, 2097152)         = 0
1003  old_mmap(0xbec00000, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x261000
1003  munmap(0x261000, 2097152)         = 0
1003  old_mmap(0xbea00000, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x261000
1003  munmap(0x261000, 2097152)         = 0
1003  old_mmap(0xbe800000, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x261000
1003  munmap(0x261000, 2097152)         = 0
[....]

Well, it even walks thru all the address space for several times (I didn't
try co count how many times exactly), then it seems getting tired and sends
a signal to the parent, after which pthread_create actually returns the
failure.  As far as I can tell, the thread simply attempts to allocate
stack space for itself, but the kernel refuses to allocate it at the
address the thread wants.

So far, this is the only thing that doesn't work in x86 (32-bit) containers
under 64-bit kernel.

May be anyone experienced something like this and knows the solution?
Well, the most obvious solution is simply to move to x86_64 containers
(yes, threads work there on the same HN, I checked for it, so the problem
is specific to x86 32-bit containers), but this will take several days and
I've got better idea on how to spend these days.  So, if this problem can be
worked around, I'd prefer to keep the containers I now have.


Thanks in advance!

--
Croco

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.