Date: Wed, 20 Jun 2012 20:59:56 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bf-opencl On Wed, Jun 20, 2012 at 05:44:50PM +0530, SAYANTAN DATTA wrote: > But as far as I know IL codes are independent of ASICs. So it shouldn't > matter whether its GCN or not.However ISA is ASIC dependent. So, did you > mean ISA instead of IL? I meant programming in IL or at least reading OpenCL-generated IL, but having the specific ISA and device in mind. And ideally we need to be skimming over and be able to understand the generated GCN code. For example, for bcrypt, we need to know what execution units are actually in use (e.g., are the scalar units that are normally used for control in use for actual computation here?), whether scatter/gather addressing from the SIMD units is in use or not, what memory types and regions are actually in use. With pure OpenCL, we're kind of blind. IL is mostly but not fully independent from the underlying ISA; some IL instructions are documented to be specific to some GPU model ranges. It's akin to use of intrinsics and OpenMP in C sources: the exact instructions that are generated may vary (e.g., the same intrinsic may produce SSE2 or AVX depending on compiler settings), we don't do register allocation, and some intrinsics are specific to some CPU model ranges. Yet we happen to have enough control to achieve decent speed when we review and benchmark the generated code and adjust our source. Continuing this analogy, OpenCL is akin to C sources without intrinsics and OpenMP, but with enabled auto-vectorization and auto-parallelization. I had poor luck achieving decent performance in this way. Of course, OpenCL is more suitable for this than C, so better results are achieved, yet specifying things more explicitly at the lower level may help - especially when implementing things that don't fit the device perfectly (e.g., OpenCL is fine for perfect match things like MD5, but we may need something more explicit for poor matches like bcrypt on GPU). > I think two or more john builds collided on the same card resulting in > crash after 15 min. No, I did not run a second instance of John, and no one else logged in. > Otherwise the implementation is perfectly stable at > stocks(I tested using pw-fake-unix hashes for 28 minutes). And yes the > card is unstable after 1225Mhz core. OK. So we did not trigger the same problem again yet. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.