Why was DES slow in software? Each S-box uses only 6 bits and produces 4 bits Typical CPUs have much wider word size (16- to 64-bit, then even wider SIMD) Possible optimizations (late 1980s) Spread the 6 and 4 data bits throughout up to 64-bit words to save on other overhead (E and P lookups, shifts to produce array indices) Do two S-box lookups at once (12-to-8) It is usually not practical to go further (combined tables become too large for fast access) Wasteful even with the above optimizations This diagram illustrating one round of DES has been released into the public domain by its author, Matt Crypto