Querying Parquet with Millisecond Latency

Published: 7 December 2022
Tags: data, data processing, database

In the article, the authors discuss how querying data stored in the Parquet format can be done quickly and efficiently.
Some highlights:

  • Apache Parquet is an increasingly popular open format for storing analytic datasets, and has become the de-facto standard for cost-effective, DBMS-agnostic data storage
  • data in a Parquet file is broken into horizontal slices called RowGroups, and each RowGroup contains a single ColumnChunk for each column in the schema
  • "Parquet achieves impressive compression ratios by using sophisticated encoding techniques such as run length compression, dictionary encoding, delta encoding, and others"


When to use gRPC vs GraphQL

Published: 28 November 2022
Tags: graphql, grpc

Loren Sands-Ramshaw compares the two protocols gRPC and GraphQL in terms of interface design, message format, and overfetching.
Some highlights:

  • gRPC was released in 2016 by Google as an efficient and developer-friendly method of server-to-server communication
  • GraphQL was released in 2015 by Meta as an efficient and developer-friendly method of client-server communication
  • While gRPC is better suited for server-to-server communication, GraphQL is better for client-server communication, with some exceptions


Delimiter-first code

Published: 29 November 2022
Tags: language design, philosophy

A new top-level syntax for programming languages is proposed by Alex Rogozhnikov to show advantages of this method.
Some highlights:

  • Write all code with the delimiter first
  • New syntax is arguably as simple, but more consistent, better preserves visual structure and solves some issues in code formatting
  • Exploring new ideas like this is helpful for the evolution of programming language design


