LSM Compaction Strategy

Overview

Compaction operations are expensive in terms of CPU, memory, and Disk I/O,而由于immutable特质,该操作在LSM架构上有必不可少。

image

Log Structured Merge (LSM)

data过来之后会写到memory table (MemTable),当mem满了之后,会flush到disk形成不可变的immutable Sorted String Table (SSTable)。当SSTable太多,os所打开的文件句柄也会过多,所以此时需要将多个同质的SSTable合并成一个SSTable。

image

leveldb architecture


Amplification

  • 写放大:一份数据被顺序写几次(还是那一份数据,只是需要一直往下流)。第一次写到L0,之后由于compaction可能在L1~LN各写一次
  • 读放大:一个query会在多个sstable,而这些sstable分布在L0~LN
  • 空间放大:一份原始数据被copy/overwritten(一份用于服务query,一份用于snapshot之后用于compaction)

Size-tiered Compaction

  • Triggered when the system has enough similarly sized SSTables, merged together to form one larger sstable. A disadvantage of this strategy is that very large SSTable will stay behind for a long time and utilize a lot of disk space (recommended for write-intensive workloads)
  • 每一个tiers的单片大小逐渐变大,但是每一个tiers的sstables数量一致
  • 如果某一个tier满了(即sstables数量达到阈值)就会进行compaction,从而将该tier的所有数据merge为一个然后丢给下一个tier作为下一个tier的一个sstable。而在这个merge的过程,会copy一份原数据snapshot用于merge,merge之后再删除

image

tiered (num same,size grow)


Leveled Compaction

  • Triggered when unleveled SSTables (newly flushed SSTable files in Level 0) exceeds 4 (recommended for read-intensive workloads)
  • 每一个tier里面的 sstable大小都是一致的,区别是每一个tier的sstable数量是逐渐变大的(一个数量级)
  • tier1里面的sstables会跟tier2的sstables一起进行merge操作,最终在tier2(量大者)上形成一个有序的sstable

image

leveled (num grow,size same)


Summary

image

Size-tiered Compaction vs. Leveled Compaction

  • data in one SSTable which is later modified or deleted in another SSTable wastes space as both tables are present in the system
  • when data is split across many SSTables, read requests are processed slower as many SSTables need to be read

image

Scylla compaction summary


Lucene Merge Policy

这里同样有三个放大问题,

  • 写放大(doc在segment之间的迁移)
  • 读放大(doc不同版本在不同segment,而打开一个segment需要一个indexReader)
  • 空间放大(segments tmp空间) lucene write-once vs. random write

Reference