Zoned flash: The next big thing in enterprise SSDs

As good as SSDs are -- and for all the improvements they've made -- they're not perfect.

Three main issues:

  • Write amplification chews up the increasingly limited number -- as few as 500 lifetime writes -- that newer (cheaper!) flash supports
  • Over-provisioning of media to ensure sufficient lifetime writes and efficient garbage collection.
  • Costly DRAM buffers -- the BOM's highest cost after the flash -- to handle incoming writes and the physical address translation table.

Background

SSDs excel in random read workloads. Yet, today, sequential write and one write/many read workloads are more and more common in hyperscale and big data workloads.

The Flash Translation Layer that enables a block-based medium to emulate a 4KB disk drive sector, with read/program/erase I/O and garbage collection, is suboptimal for these newer workloads. Thus, the work on Openchannel and zoned flash drives.

Fortunately, this work builds on what has already been done for Shingled Magnetic Recording (SMR) drives, which characteristics are logically similar to NAND flash. Like SMR, Openchannel writes sequentially within a Logical Block Address (LBA) range, with the option for parallel units, where the LBAs can be divided among different workloads and spread across the underlying media for maximum performance.

Zoned namespaces (ZNS)

ZNS divides SSD capacity into zones, and each zone is written sequentially, with the interface optimized for SSDs. The zone size is aligned to flash block sizes and zone capacity aligned to physical media sizes.

This enables full flash block writes, dramatically reducing the need for partial block updates (and associated write amplification), and garbage collection. For hyperscalers, the ZNS logical to physical mapping can be integrated with filesystems, which improves performance, reduces host overhead, and eliminates the 1GB of DRAM for each terabyte of flash media requirement.

ZNS code is already in use at hyperscale sites and is available with Linux. It will be part of the next major NVMe spec due for release next year.

The take

NVMe has largely replaced SATA on systems and high-performance notebooks. With the increasingly data-centric nature of today's workloads, anything that reduces latency, cost, and overhead is a Good Thing.

ZNS isn't likely to make sense on notebooks, but it will help our cloud infrastructures serve us faster and at a lower cost.

Comments welcome.  



tinyurlis.gdv.gdv.htclck.ruulvis.netshrtco.de