TL;DR B+ tree was the dominating data structure for database indexing. The LSM tree has become increasingly popular in the distributed database world. Let's consider building an OLTP (online transactional processing) data store from scratch.
Many of the notes here came from my reading of Designing data-intensive application.
The simplest storage: a single file and some bash commands
Write performance is hard to beat; read performance is abysmal.
Suppose that the file is super large and spans a lot of pages, it will be difficult to traverse all the relevant pages to find a particular key. It'll be great if there's an index that tells you which page contains the last write of the target key.
To speed up reads, use an in-memory hash index to point to the positions of keys on disks.
However, this assumes that all keys fit into memory. Range query performances are still bad.
Let's maintain an index on disk then? A B+ tree is exactly that.
There's usually a primary index that's built on the primary key. The primary index is usually also the clustered index that contains the actual data on the leaf nodes. But nothing stops us from building a secondary index that sorts the same data pages by a different key and thus in a different order.
Range queries on the primary index are more efficient because they are sequential. Range queries on a secondary index are typically accessing the pages in random order. But this difference may be less relevant on SSDs.
Note: some differences between a B tree and a B+ tree:
Such an index is not very efficient for answering 2D queries; it can only fetch all data that matches the first dimension and then filter on the second dimension. One can map the 2D space into a 1D space and then do a linear scan (that probably involve many fragments). But more commonly, spatial indexes such as R-tree are used to narrow down multiple dimensions simultaneously.
Coming back to the in-memory hash index, what if we still keep an in-memory map of keys and values to serve reads and writes? When there are too many keys, we must flush the in-memory mapping of keys onto disks. Before writing to disk, we'd better reformat the content so that lookups can be performed efficiently later. We sort the keys and write them to disk as SSTables (sorted string table). The sorted property allows us to optimize in different ways.
This is an LSM (log-structured) tree.
Checksums can be used to detect corrupted data (partially written records).