bcrypt on FPGA Blowfish S-boxes fit Xilinx Block RAMs perfectly Can't reasonably use pipelining, but can improve resource usage by implementing multiple instances of bcrypt per core (not completed in the GSoC project) Low clock rate, thus high latency (compared to CPU) Reasonable throughput may be achieved due to large number of cores Estimate: optimal implementation on Pico's M-501 (one Virtex-6 LX240T) could be ~5x faster than optimal code on quad-core CPU (without AVX2)