Finding a needle in Haystack: Facebook's photo storage

Created on 2021-11-17T21:03:52-06:00

Return to the Index

This card pertains to a resource available on the internet.

This card can also be read via Gemini.

Background

NFS has significant overhead which degrades its performance at webscale

Haystack targeted as an object store that is write once, read many, and rarely deleted.

Structure

Space on hard drives is placed in to physical volumes.

Logical volumes bridge physical volumes across machines.

Directory keeps track of which logical volumes are associated to physical ones.

Retrieving a file involves asking the directory for it. It returns a URL pointing to a CDN with fallbacks to a specific server node and logical volume. CDN tries to serve the file and on failure falls back to the machine and volume block.

Also designed to run on an XFS filesystem (because XFS has low overhead.)

Volumes

Volumes are large append-only flat files which contain headers and data.

Data is stored by appending data (the 'needles') to the end.

After appends the access keys and offset inside the volume are kept in the directory.

Data is found by opening the volume and going directly to a set needle.

Updates require inserting a new payload and updating the needle. When conflicting needles exist the one of highest offset wins.

Deletes involve setting a deletion flag in the needle.