Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 5 Feb 2020 19:17:05 +0900
From: Leesoo Ahn <yisooan@...olink.co.kr>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: Memory leak issue in multi-threaded program

Dear Rich,

My coworker and I had been trying to solve this leak issue in embedded 
system which is based on OpenWRT, ARM64 arch and currently musl-1.1.16 
for our product. However, musl-1.1.24 patch you referred below, we 
figured out that backporting of the patch into 1.1.16 is quite difficult 
by such problems, for examples, translation faults raised, or in another 
way of without the patch, double-locking issue in atomically calling 
malloc/free with this changes[1].

But not only in 1.1.16, but also 1.1.24 that we tested with, has the 
same problems as well. So, we are currently like in the middle of Sea 
without any foods. It has a big risk and so much dangerous for our product.

We are considering to keep 1.1.16 as our base in product, because 
although in 1.1.24, a lot of bugs fixed, nobody can guarantee for our 
product when we put 1.1.24 on it.

Could you give us any ideas for fixing the issue in v1.1.16, please? Ah, 
we are in so much pain...

Or what do you think this case that all the time, all processes ask to 
kernel via mmap syscall? Does this solve the issue...even though it has 
bad performance...?

I wish I can solve this problem sooner.

Best regards,
Leesoo

----
[1]
diff --git a/src/malloc/malloc.c b/src/malloc/malloc.c
index 9698259..f914cff 100644
--- a/src/malloc/malloc.c
+++ b/src/malloc/malloc.c
@@ -14,6 +14,10 @@
  #define inline inline __attribute__((always_inline))
  #endif

+#include <pthread.h>
+pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
+
  static struct {
  	volatile uint64_t binmap;
  	struct bin bins[64];
@@ -281,8 +285,25 @@ static void trim(struct chunk *self, size_t n)
  	__bin_chunk(split);
  }

+#if 1
+static void *__malloc(size_t n);
+
  void *malloc(size_t n)
  {
+	void *new_heap;
+
+	pthread_mutex_lock(&lock);
+	new_heap = __malloc(n);
+	pthread_mutex_unlock(&lock);
+
+	return new_heap;
+}
+
+static void *__malloc(size_t n)
+#else
+void *malloc(size_t n)
+#endif
+{
  	struct chunk *c;
  	int i, j;

@@ -516,8 +537,21 @@ static void unmap_chunk(struct chunk *self)
  	__munmap(base, len);
  }

+#if 1
+static void __free(void *p);
+
  void free(void *p)
  {
+	pthread_mutex_lock(&lock);
+	__free(p);
+	pthread_mutex_unlock(&lock);
+}
+
+static void __free(void *p)
+#else
+void free(void *p)
+#endif
+{
  	if (!p) return;

  	struct chunk *self = MEM_TO_CHUNK(p);


20. 1. 28. ¿ÀÈÄ 10:29¿¡ Rich Felker ÀÌ(°¡) ¾´ ±Û:
> On Tue, Jan 28, 2020 at 02:44:07PM +0900, Leesoo Ahn wrote:
>> Dear musl developers,
>>
>> Hello!, it seems that musl currently has a memory leak issue in
>> multi-threaded program. It occurs in the below situation of latest
>> (v1.1.24) source. Also, not only in 32-bits[1], but also 64-bits[2]
>> as well.
>>
>> When a program create and run, at least, two threads or more with
>> pthread APIs, VSZ of the program by ps command keeps increasing. But
>> here is a weird thing that it is fine 'IF ONLY ONE' pthread is
>> created and run.
>>
>> To confirm the issue in your host machine, please follow the instructions,
>>
>> 0. Clone the musl git and get inside.
>> 1. Build with these options for static build, ./configure
>> --prefix=$(pwd)/_build_dir --disable-shared
>> 2. Download the test code[3], then build with the command,
>> ../_build_dir/bin/musl-gcc ./test.c
>> 3. Run this script, ./a.out &; while [ 1 ]; do { ps aux | grep
>> [a].out | grep -v grep; sleep 1; } done
>>
>> You may figure out that VSZ keeps increasing.
>>
>> BUT, when I make it to try to allocate memory all the time by kernel
>> mmap with this diff[4] as workaround, although it creates more
>> pthreads than 2, the issue never happens.
>>
>> It would be really thankful if you guys could confirm it and find
>> out the way to fix the bug.
> 
> This is a known issue described in:
> 
> https://www.openwall.com/lists/musl/2018/10/30/2
> 
> and likely several times before that, though it was not realized that
> people were hitting it in practice (vs it just being theoretical)
> until around that time. I posted an experimental mitigation patch last
> spring:
> 
> https://www.openwall.com/lists/musl/2019/04/12/4
> 
> but it's not heavily tested and its impact on performance is
> significant. I think it should be ok if you need an immediate fix, but
> you should do some testing to make sure. If you go this route, reports
> of any problems (or success) would be nice to hear about.
> 
> Further work in that direction was not done because it was already
> planned that musl's malloc implementation will be replaced, and that
> the replacement will solve this and other problems in much better
> ways. This is work in progress and is intended for merge in the next
> release cycle:
> 
> https://www.openwall.com/lists/musl/2019/10/22/3
> https://github.com/richfelker/mallocng-draft
> 
> Hope this information helps.
> 
> Rich
> 
> 
> 
> 
> 


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.