Articles on Sql
Last updated: 2023/02/28
Top deep-dives on Sql
Building a DB from scratch will give you a pretty good idea of how it works, and this series by João Henrique Machado Silva does exactly that. João also explains a lot of the decisions he makes, which gets a thumbs up from me.
SQL queries are a great way to get data from a database (durr). I like to abstract away more complicated ones to the database layer, because it just looks cleaner and is more understandable to me. One great tool for building more complicated queries is CTEs. Preetam Jinka presents how you can use these for your queries, using a real life example inspired by Github issues.
Subqueries are a pretty intuitive way of connecting data from different tables in SQL. Personally I use them quite a lot, but I never realized the implications. Jamie Brandon's article underlines the potential inefficiencies of subqueries, and educates on the approaches used by different databases to better optimize queries. Pretty rad stuff.
A typical problem in programming is when two processes need to use the same resource, at the same time. Fortunately it's a well documented problem, and the internet is full of information on how you can avoid it. In this short article by Ke Pi, we're brought the problem as it relates to databases. Ke Pi covers how SQL blocking happens and how you can troubleshoot it for your own database.
It's not a bad ideas to get to know how SQL databases work. A lot of projects use them. In this article of an extensive series, Pamela Mooney breaks down how the SQL sserver works and how to tune queries.
Doing a meta analysis of merges/commits to a software project can tell you a lot about the characteristics of the code. Łukasz Langa's concise article does precisely that, by analyzing the CPython repo. Łukasz presents the hot parts of the codebase, who is contributing and where, and stats for merging a PR. Not super technically heavy, but some interesting data to explore.
Databases access patterns are common throughout many applications, because most applications have similar requirements; some data should be accessed a lot, and some data should barely be accessed. How do you go about optimizing a system for this when you usually don't explicitly state where the boundary is? In this informative article, Arpit Bhayani presents the caching system used by Postgres to solve common database access patterns using multiple queues.
After writing SQL for a while, you start realizing that some queries take A LOT longer than others, especially when you're using CTEs or subqueries. Being a proponent of "do first, read later", I've just developed an intuitive feel. Well now I have an explantion, because Jamie Brandon explains the effect nested loops have on SQL queries, how other databases attempt to replace them, and a deep dive into the approaches of CockroachDB and Materialize.
Have you had to setup or refactor three different databases recently? No? Well I have, and let me tell you what, it's a lot easier to do it when you have an idea of what you're doing. Fortunately if you don't, but are planning on pulling a DB hat-trick anytime soon, Pamela Mooney's article is for you. In this series, Pamela covers all the essentials of being the big bad database admin, but in this one specifically she illuminates a plethora of backup and restore strategies that'll save your ass when everything goes to shit.
Semih Salihoğlu dives into what Worst-case Optimal Join (wcoj) algorithms are, how they can be useful in databases management systems, and how they're implemented in the Kùzu database.
- Wcoj algorithms propose that queries involving complex “cyclic joins” over many-to-many relationships should be evaluated column at a time instead of table at a time
- "Worst-case optimal" means that the worst-case runtime of these algorithms meets a known lower bound for the worst-case runtime of any join algorithm
- Kùzu generates plans that mix binary joins and wcoj-style multiway intersections
Tim Zöller explains different types of temporal databases, and how they are used to store information about time.
- A bitemporal database utilizes two axes of time simultaneously, which enables us to query data in regard to both transaction time and valid time
- This type of database is useful for organizations that need to keep track of data changes over time, while also being able to reproduce documents from the past
- There are SQL examples for the implementation
Jonathan Lewis illuminates some common database query optimizations one can make when the Oracle Optimizer is being underwhelming.
Jonathan Lewis discusses the Or Expansion Transformation, a feature of the Oracle Optimizer, and how it can be used to improve query performance.
- The optimizer is able to take a single query block and transform it into a UNION ALL of 2 or more query blocks which can then be optimized and run separately
- "A critical difference between Concatenation and Or-Expansion is that the OR’ed access predicates for the driving table must all be indexed before Concatenation can be used"
- The or expansion allows queries to be optimized a lot more, but requires a lot more time for actual optimization to take place