Articles on Optimization
Last updated: 2023/02/01
Top deep-dives on Optimization
Refactoring a checksum until it runs 100 times faster
Israel Lot tells the story of five different developers (looks like no genius entry yet, maybe it'll be one of you?), at different levels of expertise, attempting to optimize a checksum method, with varying levels of success. Optimizing stuff like this is usually overkill in 99% of cases, but it's a fun exercise nonetheless.
- Pro unrolled loops
- Senior used unsafe
- Hacker converted a long to a short vector
- Expert reverses the Endianess
Low Hanging Fruits in Frontend Performance Optimization
There are few things more frustrating than navigating to a webpage and having to wait over three seconds to be able to do anything. "Alex, you're just a spoiled brat too used to fast internet" you say. No dear reader, I'm just a professional who understands that there are ways for solving this if you're not a lazy dev. Paweł Urbanek has written an in depth article on the topic, not only providing a variety of different methods for testing, but also offering a bunch of specific tips for cutting down load times.
You (Probably) Shouldn't use a Lookup Table
Nima Badizadegan explains how CPU caches affect the performance of lookup tables of different sizes, then demonstrates it practically with examples.
The Speed of Time
People have been obsessed with measuring time since its beginning; probably because we have so little of it! Philosophy aside, Brendan Gregg's article describes how he managed to fix a 30% increase in write latency for a Cassandra database cluster when switching from CentOS to Ubuntu, just by changing how the time is measured.
A comprehensive guide for the fastest possible Docker builds in human existence
Aaron Batilo shares how he increased the speed of docker builds by using a persistent cache.
- "builds went from about 3.5 minutes down to about 50 seconds in the cold case and about 15 seconds when a given container in my monorepo hadn’t changed"
- Uses docker buildkit with kubernetes
- Setup for using with gitHub action workflow
Scaling Causal's Spreadsheet Engine from Thousands to Billions of Cells: From Maps to Arrays
Simon Hørup Eskildsen explains the optimization process for a Go program when you've exhausted easy problems found with profiling.
Technology Deep Dive: Building a Faster ORAM Layer for Enclaves
Graeme Connell discusses how Signal updated its enclaves to use ORAM for better performance and obscurity.
Rubbing control theory on the Go scheduler
Irfan Sharif uses control theory, studies CPU scheduler latencies, builds forms of cooperative scheduling, and patches the Go runtime to reduce the impact of CPU utilization on tail latencies.
- Tail latency is the small percentage of response times from a system, out of all of the responses to the input/output (I/O) requests it serves, that takes the longest in comparison to the majority of its response times
- With CPU sometimes high utilization is moderate over-utilization
- The patch to the Go language was to track CPU use at the level of individual goroutines
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
Simon Boehm discusses how to optimize a CUDA matrix multiplication kernel for performance.
- Simon begins with a naive kernel and then applies optimizations to improve performance
- The goal is to get within 80% of the performance of cuBLAS, NVIDIA's official matrix library
- Dives into coalescing global memory accesses, shared memory caching, occupancy optimizations, and more