LSM Compaction Strategy

Overview

Compaction operations are expensive in terms of CPU, memory, and Disk I/O，而由于immutable特质，该操作在LSM架构上有必不可少。

Log Structured Merge (LSM)

data过来之后会写到memory table (MemTable)，当mem满了之后，会flush到disk形成不可变的immutable Sorted String Table (SSTable)。当SSTable太多，os所打开的文件句柄也会过多，所以此时需要将多个同质的SSTable合并成一个SSTable。

leveldb architecture

Amplification

写放大：一份数据被顺序写几次（还是那一份数据，只是需要一直往下流）。第一次写到L0，之后由于compaction可能在L1~LN各写一次
读放大：一个query会在多个sstable，而这些sstable分布在L0~LN
空间放大：一份原始数据被copy/overwritten（一份用于服务query，一份用于snapshot之后用于compaction）

Size-tiered Compaction

Triggered when the system has enough similarly sized SSTables, merged together to form one larger sstable. A disadvantage of this strategy is that very large SSTable will stay behind for a long time and utilize a lot of disk space (recommended for write-intensive workloads)
每一个tiers的单片大小逐渐变大，但是每一个tiers的sstables数量一致
如果某一个tier满了（即sstables数量达到阈值）就会进行compaction，从而将该tier的所有数据merge为一个然后丢给下一个tier作为下一个tier的一个sstable。而在这个merge的过程，会copy一份原数据snapshot用于merge，merge之后再删除

tiered (num same，size grow)

Leveled Compaction

Triggered when unleveled SSTables (newly flushed SSTable files in Level 0) exceeds 4 (recommended for read-intensive workloads)
每一个tier里面的 sstable大小都是一致的，区别是每一个tier的sstable数量是逐渐变大的（一个数量级）
tier1里面的sstables会跟tier2的sstables一起进行merge操作，最终在tier2（量大者）上形成一个有序的sstable

leveled (num grow，size same)

Summary

Size-tiered Compaction vs. Leveled Compaction

data in one SSTable which is later modified or deleted in another SSTable wastes space as both tables are present in the system
when data is split across many SSTables, read requests are processed slower as many SSTables need to be read

Scylla compaction summary

Lucene Merge Policy

这里同样有三个放大问题，

写放大（doc在segment之间的迁移）
读放大（doc不同版本在不同segment，而打开一个segment需要一个indexReader）
空间放大（segments tmp空间） lucene write-once vs. random write

Reference

PREVIOUSes Snapshot and Restore

NEXT20181216深马&1223汕马