CPUs run so fast these days, they're essentially instant computation surrounded by bottlenecked bottlenecks. The secret to optimization today has
almost nothing to do with assembly code, for two reasons: one being sheer speed, the other being that good compilers are only slightly slower than
perfect assembly. The real trick is using datasets on the order of cache size, so the (hardware) memory manager can copy more from system RAM to
cache, and cache to cache, while the processor is chugging away, waiting a minimal amount of time before more data becomes available.
Tim |