Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 5 Aug 2020 21:45:26 -0800
From: Royce Williams <royce@...ho.org>
To: john-users@...ts.openwall.com
Subject: Re: sha512crypt-opencl / Self test failed (cmp_all(1))

On Wed, Aug 5, 2020 at 9:29 PM Albert Veli <albert.veli@...il.com> wrote:

> On Thu, Aug 6, 2020 at 1:39 AM Royce Williams <royce@...ho.org> wrote:
>
> > On Wed, Aug 5, 2020 at 11:07 AM Solar Designer <solar@...nwall.com>
> wrote:
> >
> > > On Wed, Aug 05, 2020 at 10:47:23AM -0800, Royce Williams wrote:
> > > > FWIW, I have recently been consistently getting the same self-test
> > errors
> > > > on sha512crypt-ztex.
> > >
> > > You can try troubleshooting your setup.  You can also try lowering the
> > clock rate.
> > >
> > > Out of the 5 bitstreams, the sha512crypt+Drupal7 one consumes the most
> > > power (and produces the most heat) when in active use.  It's about 44W
> > > per board, vs. e.g. 27W for bcrypt.  (As measured at the 12V input.)
> So
> > > if e.g. some of your cooling fans failed, that may show up on
> > > sha512crypt and Drupal7 first.
> > I had no idea whether this issue was fork-related or OpenCL-related. For
> > all I knew, the issue might only emerge when forking reaches a certain
> > device count, which would explain why I'm seeing it on ZTEX when others
> > might not.
> >
>
> I did not see the self test error on ZTEX. But I saw some other errors
> on my setup, Aleksey saw them too on his setup. Something like this:
>
> SN 04A36E226F FPGA #2 error: pkt_comm_status=0x01, debug=0x0000
> SN 04A36E226F error -1 doing r/w of FPGAs (LIBUSB_ERROR_IO)
> SN 04A36E226F: Timeout.
>
> It happens after a while. Not every time but sometimes. It is usually
> enough to power off the boards and power them on again (I have connected
> the PSU to a Silver Shield power manager to do so remotely, a modbus I/O
> could also be used for this).
>

When this happened to me, I dropped the speed on the specific boards by
10MHz or so until it stopped, using the "Frequency_[serial] = 999" syntax
for that particular algorithm's section.

If enough boards are lower than the default, it's easier to just change the
default and create exceptions for the remainder.

If that doesn't work, you have other issues (flaky USB connector, flaky USB
cable, unstable power, etc.)

Royce

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.