Articles on Database

Last updated: 2023/02/21

Top deep-dives on Database

How Does a Database Load Balancer Work?

I was recently talking to a fellow programmer and asking questions about distributed systems. He said it was a nightmare, and one of the worst things to have to maintain. But that doesn't mean they're a nonessential piece of many large systems. In this introductory article by Agus Syafaat, Agus presents a load balancer's architecture, covers a couple of different load balancing algorithms, highlights a few advantages of using one, and then describes a couple of database replication methods. It's not super indepth, but it covers the basic essentials.

The usefulness of currying

Not a fan of Facebook, but their size leads to interesting problems that don't normally occur on smaller scales. In this article, Harish Dattatraya Dixit discusses a data corruption problem occuring in Scala due to big exponents. Although the article is merely an introduction, the paper it links to goes far more indepth if your interest is piqued.

Feature Casualties of Large Databases

Big databases can get out of hand pretty quickly. Throw quickly written code and not following best practices, and it can be a complete nightmare. In this article, Brandur Leach covers some of the first things that (wrongly) get dropped first when databases start becoming a mess.

Will you pay the consistency costs?

I really wish my love for video games pushed me into coding earlier. It almost did a couple of times, but getting into Java to write Runescape bots (when so many were readily available with a simple download) never held my attention for more than a couple of days. Anyway, now I'm into coding, but not into video games. Articles like this one by Ayende Rahien however, always pique my interest again. In it, Ayende discusses the problems with consistency across distributed systems.

Practical Uses Of Blockchain Technology

William Kennedy discusses "how the different technical aspects of blockchain technology could be used to build a single, append-only, publicly available, transparent, and cryptographically auditable database that runs in a distributed and decentralized environment for managing version of source code". I've featured a number of articles that were anti-blockchain, so figured it's only fair to feature one on the other side of the spectrum.

Implementing a MIME database in XXXX

Drew DeVault walks us through implementing a database for media types in a redacted language. Mystery, intrigue, knowledge. This article is chock full of them!

Poor schemas, poor cataloguing: why music tagging sucks

The author illuminates the data schemas of songs and albums, and how them breaking typical conventions makes working with their data difficult.
Some highlights:

Songs have duplicate album information on them
No standardized tagging system
Versioning and cataloguing are nontrivial problems

Seeing is believing: a client-centric specification of database isolation

Understanding the limitations and possible issues with a database isn't always easy. It not only requires a solid grasp of how queries work, but also knowledge about the inner workings of the black boxes that databases can be. Adrian Colyer has written an article summarizing a paper on database snapshot isolation states. There's an explanation for the motivation, an example, and plenty of definitions to get you up to speed with the important, but jargon heavy topic.

Write-ahead logging and the ARIES crash recovery algorithm

Kevin Sookocheff dives deep into how database changes survive crashes using write-ahead logging and the ARIES crash recovery algorithm.
Some highlights:

Write-ahead logging is a fundamental primitive that ensures all changes to data are first written safely to stable storage before being applied
Databases execute transactions in main-memory, which is facilitated by the buffer pool
The buffer pool holds an in-memory representation of the state of the database and periodically writes that state to permanent storage on disk
Both file systems and databases use a buffer residing in main memory to cache frequently or recently used data
Buffers provide excellent performance advantages when accessing data, but they introduce reliability concerns when true crash recovery is required
A STEAL policy states whether or not the database allows an uncommitted transaction to overwrite the most recent committed value of object in non-volatile storage
A FORCE policy requires that all updates made by a transaction are reflected on non-volatile storage before the transaction is allowed to be committed
A database log is a file on disk that stores a sequential list of operations on the database
The WAL protocol ensures that all log records updating a page are written to non-volatile storage before the page itself is over-written in non-volatile storage and a transaction is not considered committed until all of its log records have been written to non-volatile storage
The database uses the log sequence numbers to facilitate recovery by tracking the state of database pages, transactions, and flushed data during normal database operation
The most widely known and emulated implementation of crash recovery is ARIES — Algorithms for Recovery and Isolation Exploiting Semantics, which has three phases; analysis, redo, and undo
Analysis: The analysis phase reads the log from the last recorded checkpoint to identify any dirty pages and active transactions at the time of the crash. Redo: The redo phase repeats all changes to the database from a point in the log forward. This includes transactions that will abort. This retrace brings the database back to the exact state it was in before the crash. Undo: Reverse the changes made by transactions that did not commit before the system crashed.

Exploring Row Level Security In PostgreSQL

The author dives into how row level security works in postgres.
Some highlights:

Unlike other RDBMSs, postgres has a role system that helps separate identity, authentication, and authorization
Covers policies, checks, inheritance, partitioning, performance, and more concepts related to the topic
It ends with a sales pitch for the company's product, but all of the information beforehand is solid

Scaling Attributed Network Embedding to Massive Graphs [pdf]

Renchi Yang, Jieming Shi, Xiaokui Xiao, Yin Yang, Juncheng Liu, and Sourav S. Bhowmick present a novel method for attribute network embedding computation on massive data sets.