Hyperband parameter tuning algorithm
Created on 2022-05-30T21:37:59-05:00
Attempts to optimize "random search" methods because of their conceptual simplicity.
Successive halving: run a set of candidates constrained by some budget and throw out the worst performing halves.
Works by rationing the amount of a resource used for each candidate.
Rationing can be training epochs for neural networks, amount of RAM, amount of data used for training.
Algorithm
for s in S_{max}: n = ceil((B/R) (nu^s)/(s+1)) r = R * (nu ^ -s) T = get_hyperparameter_configuration(n) for i in 0..s: n_i = floor(n * (nu ^ -i)) r_i = r * (nu^i) L = {run_then_return_val_loss(t, r_i): t in T} T = top_k(T, L, floor(n_i / nu))
- R: Maximum amount of resource to allocate to a single configuration
- Nu: input that control proportion of configurations discarded each round (default is 3)
- get_hyperparameter_configuration: generates the configuration for the network to be trained. the paper samples a uniform distribution but smarter distributions are allowed and may work better.
- run_then_return_val_loss: performs training on a given configuration with a given resource cap
- S_max is set to floor(log_nu(R)); up to s_max+1 brackets are run
top_k: returns the top k winners given algorithms and their loss scores
Envelopes
Standard deviations of a model's score during training create the envelope.
If the envelops do not overlap on termination the models can be seen as meaningfully different.