1997: bitslice DES "A Fast New DES Implementation in Software", Eli Biham, 1997 "This implementation is about five times faster than the fastest known DES implementation on a (64-bit) Alpha computer, and about three times faster than than our new optimized DES implementation on 64-bit computers. [...] view the processor as a SIMD computer, i.e., as 64 parallel one-bit processors computing the same instruction." ~100 gates per S-box "Reducing the Gate Count of Bitslice DES", Matthew Kwan, 1998+ S-box expressions released in 1998, technique presented in 1999, paper posted online many years later 51 to 56 gates per S-box on average depending on available gates The gate count was further reduced in later years by Marc Bevand (45.5 using Cell's "bit select" instruction), Dango-Chu (39.875, ditto), Roman Rusakov (32.875 with "bit selects", 44.125 without)