Finding a needle in Haystack: Facebook's photo storage
Created on 2021-11-17T21:03:52-06:00
Background
NFS has significant overhead which degrades its performance at webscale
Haystack targeted as an object store that is write once, read many, and rarely deleted.
Structure
Space on hard drives is placed in to physical volumes.
Logical volumes bridge physical volumes across machines.
Directory keeps track of which logical volumes are associated to physical ones.
Retrieving a file involves asking the directory for it. It returns a URL pointing to a CDN with fallbacks to a specific server node and logical volume. CDN tries to serve the file and on failure falls back to the machine and volume block.
Also designed to run on an XFS filesystem (because XFS has low overhead.)
Volumes
Volumes are large append-only flat files which contain headers and data.
Data is stored by appending data (the 'needles') to the end.
After appends the access keys and offset inside the volume are kept in the directory.
Data is found by opening the volume and going directly to a set needle.
Updates require inserting a new payload and updating the needle. When conflicting needles exist the one of highest offset wins.
Deletes involve setting a deletion flag in the needle.