Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 26 Jun 2018 10:14:34 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com, John Mudd <johnbmudd@...il.com>
Subject: Re: ERROR: epoll_create1 failed: Function not implemented ?

On Tue, Jun 26, 2018 at 01:46:15AM +0200, Szabolcs Nagy wrote:
> * John Mudd <johnbmudd@...il.com> [2018-06-25 16:49:36 -0400]:
> > I build a dynamically linked version of Postgres using musl. It's been
> > working well for years. I just built a new version and I'm getting the
> > following Postgres error on some machines. Any suggestions?
> > 
> >     ERROR:  epoll_create1 failed: Function not implemented
> > 
> 
> try to run it with strace to see how epoll_create1 is called
> 
> > I build on 32-bit Linux Mint 18.3 Sylvia with 4.13.0-39-generic kernel.
> > 
> > It runs on some machines such as 64-bit Ubuntu with 4.4.0-121-generic
> > kernel. But fails on CentOS release 5.4 (Final) with 2.6.18-416.el5 #1 SMP
> > kernel.
> > 
> > My previous musl builds of Postgres run on all of my machines.

Linux 2.6.18 did not have the SYS_epoll_create1 syscall; it was added
in 2.6.27 (according to man 2 syscalls) which is around the time all
the O_CLOEXEC-family stuff was added. I suspect the new version of
Postgres you updated too is (correctly) passing the EPOLL_CLOEXEC flag
to make opening the epoll fd safe against fd leak races, and there is
fundamentally (well, without horrible hacks) no way to emulate this on
old kernels that lack the functionality.

For some other interfaces we emulate the functionality non-atomically
with fcntl after the open, but this isn't really a good solution.

Really you should update the kernel to something capable of dealing
safely with fd-leak races. For correct behavior of many interfaces,
musl needs a minimum kernel version of around 2.6.28; behavior with
earlier versions will be best-effort.

If you really can't upgrade the kernel, consider patching Postgres to
remove the EPOLL_CLOEXEC flag (pass 0 for the flag) and possibly
adding a fcntl call to set the O_CLOEXEC flag after epoll_create[1]
succeeds. Or you can see if there's an option to build without epoll
at all, using the standard poll instead which does not use a fd and is
not affected by this issue.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.