Articles on Optimization
Last updated: 2023/02/01
Top deep-dives on Optimization
Israel Lot tells the story of five different developers (looks like no genius entry yet, maybe it'll be one of you?), at different levels of expertise, attempting to optimize a checksum method, with varying levels of success. Optimizing stuff like this is usually overkill in 99% of cases, but it's a fun exercise nonetheless.
- Pro unrolled loops
- Senior used unsafe
- Hacker converted a long to a short vector
- Expert reverses the Endianess
There are few things more frustrating than navigating to a webpage and having to wait over three seconds to be able to do anything. "Alex, you're just a spoiled brat too used to fast internet" you say. No dear reader, I'm just a professional who understands that there are ways for solving this if you're not a lazy dev. Paweł Urbanek has written an in depth article on the topic, not only providing a variety of different methods for testing, but also offering a bunch of specific tips for cutting down load times.
Nima Badizadegan explains how CPU caches affect the performance of lookup tables of different sizes, then demonstrates it practically with examples.
People have been obsessed with measuring time since its beginning; probably because we have so little of it! Philosophy aside, Brendan Gregg's article describes how he managed to fix a 30% increase in write latency for a Cassandra database cluster when switching from CentOS to Ubuntu, just by changing how the time is measured.
Aaron Batilo shares how he increased the speed of docker builds by using a persistent cache.
- "builds went from about 3.5 minutes down to about 50 seconds in the cold case and about 15 seconds when a given container in my monorepo hadn’t changed"
- Uses docker buildkit with kubernetes
- Setup for using with gitHub action workflow
Simon Hørup Eskildsen explains the optimization process for a Go program when you've exhausted easy problems found with profiling.
Graeme Connell discusses how Signal updated its enclaves to use ORAM for better performance and obscurity.
Irfan Sharif uses control theory, studies CPU scheduler latencies, builds forms of cooperative scheduling, and patches the Go runtime to reduce the impact of CPU utilization on tail latencies.
- Tail latency is the small percentage of response times from a system, out of all of the responses to the input/output (I/O) requests it serves, that takes the longest in comparison to the majority of its response times
- With CPU sometimes high utilization is moderate over-utilization
- The patch to the Go language was to track CPU use at the level of individual goroutines
Simon Boehm discusses how to optimize a CUDA matrix multiplication kernel for performance.
- Simon begins with a naive kernel and then applies optimizations to improve performance
- The goal is to get within 80% of the performance of cuBLAS, NVIDIA's official matrix library
- Dives into coalescing global memory accesses, shared memory caching, occupancy optimizations, and more