Articles on Database

Last updated: 2022/11/23

Top deep-dives on Database

How Does a Database Load Balancer Work?

I was recently talking to a fellow programmer and asking questions about distributed systems. He said it was a nightmare, and one of the worst things to have to maintain. But that doesn't mean they're a nonessential piece of many large systems. In this introductory article by Agus Syafaat, Agus presents a load balancer's architecture, covers a couple of different load balancing algorithms, highlights a few advantages of using one, and then describes a couple of database replication methods. It's not super indepth, but it covers the basic essentials.

The usefulness of currying

Not a fan of Facebook, but their size leads to interesting problems that don't normally occur on smaller scales. In this article, Harish Dattatraya Dixit discusses a data corruption problem occuring in Scala due to big exponents. Although the article is merely an introduction, the paper it links to goes far more indepth if your interest is piqued.

Feature Casualties of Large Databases

Big databases can get out of hand pretty quickly. Throw quickly written code and not following best practices, and it can be a complete nightmare. In this article, Brandur Leach covers some of the first things that (wrongly) get dropped first when databases start becoming a mess.

Will you pay the consistency costs?

I really wish my love for video games pushed me into coding earlier. It almost did a couple of times, but getting into Java to write Runescape bots (when so many were readily available with a simple download) never held my attention for more than a couple of days. Anyway, now I'm into coding, but not into video games. Articles like this one by Ayende Rahien however, always pique my interest again. In it, Ayende discusses the problems with consistency across distributed systems.

Practical Uses Of Blockchain Technology

William Kennedy discusses "how the different technical aspects of blockchain technology could be used to build a single, append-only, publicly available, transparent, and cryptographically auditable database that runs in a distributed and decentralized environment for managing version of source code". I've featured a number of articles that were anti-blockchain, so figured it's only fair to feature one on the other side of the spectrum.

Helios: hyperscale indexing for the cloud & edge – part 1

Currently a lot of the computing structure in the world is based on a very centralized system. You have clients, who query some central server, that carries out any computation and data storage, then returns the results. Although this is fine when you're not processing a lot of data or requests, it become very expensive or very slow, very quickly when scaled up. In this introductory article, Adrian Colyer present Microsoft's solution for large scale computation and data storage; 'a federated differential dataflow style system that processes and materializes just what is needed at each layer'.

Implementing a MIME database in XXXX

Drew DeVault walks us through implementing a database for media types in a redacted language. Mystery, intrigue, knowledge. This article is chock full of them!

Seeing is believing: a client-centric specification of database isolation

Understanding the limitations and possible issues with a database isn't always easy. It not only requires a solid grasp of how queries work, but also knowledge about the inner workings of the black boxes that databases can be. Adrian Colyer has written an article summarizing a paper on database snapshot isolation states. There's an explanation for the motivation, an example, and plenty of definitions to get you up to speed with the important, but jargon heavy topic.

It's about time - Approaching Bitemporality (Part 1)

Tim Zöller explains different types of temporal databases, and how they are used to store information about time.
Some highlights:

  • A bitemporal database utilizes two axes of time simultaneously, which enables us to query data in regard to both transaction time and valid time
  • This type of database is useful for organizations that need to keep track of data changes over time, while also being able to reproduce documents from the past
  • There are SQL examples for the implementation

Write-ahead logging and the ARIES crash recovery algorithm

Kevin Sookocheff dives deep into how database changes survive crashes using write-ahead logging and the ARIES crash recovery algorithm.
Some highlights:

  • Write-ahead logging is a fundamental primitive that ensures all changes to data are first written safely to stable storage before being applied
  • Databases execute transactions in main-memory, which is facilitated by the buffer pool
  • The buffer pool holds an in-memory representation of the state of the database and periodically writes that state to permanent storage on disk
  • Both file systems and databases use a buffer residing in main memory to cache frequently or recently used data
  • Buffers provide excellent performance advantages when accessing data, but they introduce reliability concerns when true crash recovery is required
  • A STEAL policy states whether or not the database allows an uncommitted transaction to overwrite the most recent committed value of object in non-volatile storage
  • A FORCE policy requires that all updates made by a transaction are reflected on non-volatile storage before the transaction is allowed to be committed
  • A database log is a file on disk that stores a sequential list of operations on the database
  • The WAL protocol ensures that all log records updating a page are written to non-volatile storage before the page itself is over-written in non-volatile storage and a transaction is not considered committed until all of its log records have been written to non-volatile storage
  • The database uses the log sequence numbers to facilitate recovery by tracking the state of database pages, transactions, and flushed data during normal database operation
  • The most widely known and emulated implementation of crash recovery is ARIES — Algorithms for Recovery and Isolation Exploiting Semantics, which has three phases; analysis, redo, and undo
  • Analysis: The analysis phase reads the log from the last recorded checkpoint to identify any dirty pages and active transactions at the time of the crash. Redo: The redo phase repeats all changes to the database from a point in the log forward. This includes transactions that will abort. This retrace brings the database back to the exact state it was in before the crash. Undo: Reverse the changes made by transactions that did not commit before the system crashed.

Scaling Attributed Network Embedding to Massive Graphs [pdf]

Renchi Yang, Jieming Shi, Xiaokui Xiao, Yin Yang, Juncheng Liu, and Sourav S. Bhowmick present a novel method for attribute network embedding computation on massive data sets.

Oracle optimizer Or Expansion Transformations

Jonathan Lewis discusses the Or Expansion Transformation, a feature of the Oracle Optimizer, and how it can be used to improve query performance.
Some highlights:

  • The optimizer is able to take a single query block and transform it into a UNION ALL of 2 or more query blocks which can then be optimized and run separately
  • "A critical difference between Concatenation and Or-Expansion is that the OR’ed access predicates for the driving table must all be indexed before Concatenation can be used"
  • The or expansion allows queries to be optimized a lot more, but requires a lot more time for actual optimization to take place

Want to see more in-depth content?

subscribe to my newsletter!

Other Articles