Tableland Notes - 2023
This post contains links to several articles I wrote in 2023 for Tableland Weeknotes, a weekly newsletter by the Tableland team focusing on the intersection of databases and blockchains. These six articles focus on the replication of Postgres databases, with an emphasis on ensuring that the replicated data is cryptographically verifiable.
The first post describes how to extract Write Ahead Logs from a Postgres instance, sign them with an Ethereum wallet, and send them to a remote server for replication onto another Postgres instance. The remote server verifies the signature to ensure that the database change is authorized and materializes the WAL message into a row within a table.
The second post accomplishes the same goals as the first but uses a different approach. Instead of using Postgres' logical replication protocol, we use DuckDB with the postgres_scanner extension. In this approach, we perform a table scan instead of streaming Write Ahead Logs and produce a materialized parquet file as the output. This file is the point-in-time copy of the original database. While DuckDB does all the heavy lifting, this approach has its limits. Scanning the table too often is resource-intensive, especially if the table is large.
In the third post, we modify the approach described in the first. We materialize the WAL locally instead of sending the signed updates to a remote server. We use a DuckDB instance to replicate the source Postgres database locally. At regular intervals, we export the state of the local DuckDB as parquet files, sign it, and send it to the remote storage of persistence.
The fourth post describes the same thing as the third, but I implemented the prototype in Rust this time. I found that the libraries implementing Postgres's replication protocol are less mature than in Go. Rust libraries are more low-level and lack documentation. At the time of the writing, the rust-postgres
project has a few open Pull Requests to enable working with logical replication, but these PRs have been open for a long time. However, Materialize's fork is stable and offers these features.
The fifth and sixth posts are about ECMH and its applications. ECMH stands for Elliptic Curve MultiSet hashes. It is a cryptographic hashing scheme with set semantics. The output of the usual hash function is a digest that looks like a string of random characters. The same applies to ECMH hashes, but they have an additional structure. ECMH hash digests represent a set. You can perform set operations on these, such as inserting or deleting elements. I describe implementing an ECMH-based hashing scheme in Go using the go-ristretto library in the last two posts.
The sixth and the last post describes an application for ECMH. I used set-based hashes to fingerprint a directory in the file system. The fingerprint provides data integrity. By keeping track of the 32-byte fingerprint, we can tell if any of its files are tampered with.