john-dev - Re: 1.7.9's --external + OpenMP fails on Cygwin

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20111216231340.GA23495@openwall.com>
Date: Sat, 17 Dec 2011 03:13:40 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: 1.7.9's --external + OpenMP fails on Cygwin

On Sat, Dec 17, 2011 at 01:46:48AM +0400, Solar Designer wrote:
> src/winsup/cygwin/thread.cc:
> 
> int
> pthread_mutex::init (pthread_mutex_t *mutex,
>                      const pthread_mutexattr_t *attr,
>                      const pthread_mutex_t initializer)
> {
>   if (attr && !pthread_mutexattr::is_good_object (attr))
>     return EINVAL;
> 
>   mutex_initialization_lock.lock ();
>   if (initializer == NULL || pthread_mutex::is_initializer (mutex))
> 
> Notice how the not yet initialized mutex is checked with
> "pthread_mutex::is_initializer (mutex)".  And yes, it catches faults:
...

This was close, but not quite it.  The same approach is used in other
parts of the Cygwin threads code, including in:

int
semaphore::init (sem_t *sem, int pshared, unsigned int value)
{
  /*
     We can't tell the difference between reinitialising an
     existing semaphore and initialising a semaphore who's
     contents happen to be a valid pointer
   */
  if (is_good_object (sem))
    {
      paranoid_printf ("potential attempt to reinitialise a semaphore");
    }

where:

inline bool
semaphore::is_good_object (sem_t const * sem)
{
  if (verifyable_object_isvalid (sem, SEM_MAGIC) != VALID_OBJECT)
    return false;
  return true;
}

While paranoid_printf() is probably not triggered, a fault is often
triggered (on invalid pointer inside the not-yet-initialized semaphore).
And apparently there's something wrong with the fault handling.

Since this stuff is not needed, I binary-patched it out of my copy of
cygwin1.dll.  As seen with "objdump -d" and "diff -u":

 610ecff6:      e8 75 d4 06 00          call   6115a470 <__Z11__set_errnoPKcii>
 610ecffb:      b8 ff ff ff ff          mov    $0xffffffff,%eax
 610ed000:      eb 42                   jmp    610ed044 <__ZN9semaphore4initEPPS_ij+0x194>
-610ed002:      8b 06                   mov    (%esi),%eax
-610ed004:      81 78 04 4c f0 0d df    cmpl   $0xdf0df04c,0x4(%eax)
+610ed002:      33 c0                   xor    %eax,%eax
+610ed004:      40                      inc    %eax
+610ed005:      90                      nop
+610ed006:      90                      nop
+610ed007:      90                      nop
+610ed008:      90                      nop
+610ed009:      90                      nop
+610ed00a:      90                      nop
 610ed00b:      0f 85 0f ff ff ff       jne    610ecf20 <__ZN9semaphore4initEPPS_ij+0x70>
 610ed011:      8b 95 14 ff ff ff       mov    -0xec(%ebp),%edx
 610ed017:      64 a1 04 00 00 00       mov    %fs:0x4,%eax

After this change, the problem went away.

Another workaround that worked was to add:

free(calloc(1, 0x100));
free(calloc(1, 0x1000));
free(calloc(1, 0x10000));
free(calloc(1, 0x100000));

right before one of the parallel regions where the problem was otherwise
triggered, but this obviously has performance impact.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.