Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 17 Dec 2011 03:13:40 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: 1.7.9's --external + OpenMP fails on Cygwin

On Sat, Dec 17, 2011 at 01:46:48AM +0400, Solar Designer wrote:
> src/winsup/cygwin/thread.cc:
> 
> int
> pthread_mutex::init (pthread_mutex_t *mutex,
>                      const pthread_mutexattr_t *attr,
>                      const pthread_mutex_t initializer)
> {
>   if (attr && !pthread_mutexattr::is_good_object (attr))
>     return EINVAL;
> 
>   mutex_initialization_lock.lock ();
>   if (initializer == NULL || pthread_mutex::is_initializer (mutex))
> 
> Notice how the not yet initialized mutex is checked with
> "pthread_mutex::is_initializer (mutex)".  And yes, it catches faults:
...

This was close, but not quite it.  The same approach is used in other
parts of the Cygwin threads code, including in:

int
semaphore::init (sem_t *sem, int pshared, unsigned int value)
{
  /*
     We can't tell the difference between reinitialising an
     existing semaphore and initialising a semaphore who's
     contents happen to be a valid pointer
   */
  if (is_good_object (sem))
    {
      paranoid_printf ("potential attempt to reinitialise a semaphore");
    }

where:

inline bool
semaphore::is_good_object (sem_t const * sem)
{
  if (verifyable_object_isvalid (sem, SEM_MAGIC) != VALID_OBJECT)
    return false;
  return true;
}

While paranoid_printf() is probably not triggered, a fault is often
triggered (on invalid pointer inside the not-yet-initialized semaphore).
And apparently there's something wrong with the fault handling.

Since this stuff is not needed, I binary-patched it out of my copy of
cygwin1.dll.  As seen with "objdump -d" and "diff -u":

 610ecff6:      e8 75 d4 06 00          call   6115a470 <__Z11__set_errnoPKcii>
 610ecffb:      b8 ff ff ff ff          mov    $0xffffffff,%eax
 610ed000:      eb 42                   jmp    610ed044 <__ZN9semaphore4initEPPS_ij+0x194>
-610ed002:      8b 06                   mov    (%esi),%eax
-610ed004:      81 78 04 4c f0 0d df    cmpl   $0xdf0df04c,0x4(%eax)
+610ed002:      33 c0                   xor    %eax,%eax
+610ed004:      40                      inc    %eax
+610ed005:      90                      nop
+610ed006:      90                      nop
+610ed007:      90                      nop
+610ed008:      90                      nop
+610ed009:      90                      nop
+610ed00a:      90                      nop
 610ed00b:      0f 85 0f ff ff ff       jne    610ecf20 <__ZN9semaphore4initEPPS_ij+0x70>
 610ed011:      8b 95 14 ff ff ff       mov    -0xec(%ebp),%edx
 610ed017:      64 a1 04 00 00 00       mov    %fs:0x4,%eax

After this change, the problem went away.

Another workaround that worked was to add:

free(calloc(1, 0x100));
free(calloc(1, 0x1000));
free(calloc(1, 0x10000));
free(calloc(1, 0x100000));

right before one of the parallel regions where the problem was otherwise
triggered, but this obviously has performance impact.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.