Articles on Optimization

Last updated: 2023/02/01

Top deep-dives on Optimization

Refactoring a checksum until it runs 100 times faster

Israel Lot tells the story of five different developers (looks like no genius entry yet, maybe it'll be one of you?), at different levels of expertise, attempting to optimize a checksum method, with varying levels of success. Optimizing stuff like this is usually overkill in 99% of cases, but it's a fun exercise nonetheless.
Some highlights:

Pro unrolled loops
Senior used unsafe
Hacker converted a long to a short vector
Expert reverses the Endianess

Low Hanging Fruits in Frontend Performance Optimization

There are few things more frustrating than navigating to a webpage and having to wait over three seconds to be able to do anything. "Alex, you're just a spoiled brat too used to fast internet" you say. No dear reader, I'm just a professional who understands that there are ways for solving this if you're not a lazy dev. Paweł Urbanek has written an in depth article on the topic, not only providing a variety of different methods for testing, but also offering a bunch of specific tips for cutting down load times.

You (Probably) Shouldn't use a Lookup Table

Nima Badizadegan explains how CPU caches affect the performance of lookup tables of different sizes, then demonstrates it practically with examples.

The Speed of Time

People have been obsessed with measuring time since its beginning; probably because we have so little of it! Philosophy aside, Brendan Gregg's article describes how he managed to fix a 30% increase in write latency for a Cassandra database cluster when switching from CentOS to Ubuntu, just by changing how the time is measured.

A comprehensive guide for the fastest possible Docker builds in human existence

Aaron Batilo shares how he increased the speed of docker builds by using a persistent cache.
Some highlights:

"builds went from about 3.5 minutes down to about 50 seconds in the cold case and about 15 seconds when a given container in my monorepo hadn’t changed"
Uses docker buildkit with kubernetes
Setup for using with gitHub action workflow

Tail latency is the small percentage of response times from a system, out of all of the responses to the input/output (I/O) requests it serves, that takes the longest in comparison to the majority of its response times
With CPU sometimes high utilization is moderate over-utilization
The patch to the Go language was to track CPU use at the level of individual goroutines

How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog

Simon Boehm discusses how to optimize a CUDA matrix multiplication kernel for performance.
Some highlights:

Simon begins with a naive kernel and then applies optimizations to improve performance
The goal is to get within 80% of the performance of cuBLAS, NVIDIA's official matrix library
Dives into coalescing global memory accesses, shared memory caching, occupancy optimizations, and more

Articles on Optimization

Top deep-dives on Optimization

Refactoring a checksum until it runs 100 times faster

Low Hanging Fruits in Frontend Performance Optimization

You (Probably) Shouldn't use a Lookup Table

The Speed of Time

A comprehensive guide for the fastest possible Docker builds in human existence

Scaling Causal's Spreadsheet Engine from Thousands to Billions of Cells: From Maps to Arrays

Technology Deep Dive: Building a Faster ORAM Layer for Enclaves

Rubbing control theory on the Go scheduler

How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog

Want to see more in-depth content?

subscribe to my newsletter!

Other Articles