Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 18 Oct 2022 16:51:56 +0000
From: Pierre Boulay <Pierre.Boulay@...rosoft.com>
To: "musl@...ts.openwall.com" <musl@...ts.openwall.com>
CC: Ben Hillis <Ben.Hillis@...rosoft.com>, Brian Perkins
	<Brian.Perkins@...rosoft.com>, Peter Martincic <pmartincic@...rosoft.com>
Subject: Potential deadlock when fork() is called while __aio_get_queue() is
 allocating memory

Hello,

I'm working on the Windows Subsystem Linux, which uses MUSL and I'm currently working on solving WSL complete freeze * Issue #8824 * microsoft/WSL (github.com)<https://github.com/microsoft/WSL/issues/8824>.

I believe that this freeze is caused by a bug causing a deadlock in MUSL.

If we look at this sample program:

#include <unistd.h>
#include <pthread.h>
#include <aio.h>
#include <fcntl.h>
#include <time.h>
#include <stdio.h>
#include <errno.h>

void* thread2(void* unused) // Thread 2
{
    unsigned char buf[256] = {0};
    struct aiocb aiocb = {};
    aiocb.aio_fildes = open("/proc/self/cmdline", O_RDONLY, S_IRUSR | S_IWUSR);
    aiocb.aio_buf = buf;
    aiocb.aio_nbytes = sizeof(buf) - 1;
    aiocb.aio_lio_opcode = LIO_READ;

    if (aio_read(&aiocb) < 0)
    {
        printf("aio_read failed, %i\n", errno);
        return NULL;
    }

    int error = -1;
    while(error = aio_error(&aiocb) == EINPROGRESS)
    {
        printf("In progress...\n");
        sleep(1);
    }

    if (error != 0)
    {
        printf("aio_error: %i\n", error);
    }

    printf("aio result: %s\n", buf);
    return NULL;
}

int main()
{
    pthread_t thread;
    pthread_create(&thread, NULL, &thread2, NULL);

    if (fork() > 0) // Thread 1
    {
        pthread_join(thread, NULL);
        printf("aio complete");
    }

}

Here we have two threads. The main thread is scheduling an asynchronous io and the other is calling fork().
If we look at what the main thread is doing:


  *   After calling pthread_create, the fork() call goes here<https://git.musl-libc.org/cgit/musl/tree/src/process/fork.c#n47>, in fork.c.
  *   Before calling _Fork(), this thread will acquire the __malloc_lock (through __malloc_atfork()),
  *   Then _Fork()<https://git.musl-libc.org/cgit/musl/tree/src/process/_Fork.c#n12> is called, which calls __aio_atfork(), which waits for the maplock

In the meantime, the second thread will:


  *   Call aio_read(), which calls submit(),<https://git.musl-libc.org/cgit/musl/tree/src/aio/aio.c#n276> in aio.c
  *   submit() then calls __aio_get_queue(), which acquires the maplock
  *   At this point the map structure needs to be allocated so submit() calls calloc()
  *   calloc() calls malloc<https://git.musl-libc.org/cgit/musl/tree/src/malloc/mallocng/malloc.c#n299>(), in malloc.c
  *   If the allocation overflows, malloc() then calls wrlock(), which waits for the __malloc_lock

So to summarize: Thread1 holds __malloc_lock and waits for maplock, and thread2 holds maplock and waits for ___malloc_lock. We have a deadlock.
This is my understanding of the issue, but I have a (very) limited knowledge of MUSL's codebase, so let me know if it's correct.

If it is, I think this could be solved by  moving __aio_atfork() from _Fork() to fork(), before __malloc_atfork() so the locks are acquired in the correct order to prevent the deadlock.

I do see that _Fork() calls __block_all_sigs before calling __aio_atfork() so I wonder if the order of these calls is important though (let me know if that's the case).
If this solution makes sense to you, I'm happy to submit a contribution with the fix.

Thank you,

--
Pierre Boulay














Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.