Hyperband parameter tuning algorithm

Created on 2022-05-30T21:37:59-05:00

Attempts to optimize "random search" methods because of their conceptual simplicity.

Successive halving: run a set of candidates constrained by some budget and throw out the worst performing halves.

Works by rationing the amount of a resource used for each candidate.

Rationing can be training epochs for neural networks, amount of RAM, amount of data used for training.

Algorithm

for s in S_{max}:
   n = ceil((B/R) (nu^s)/(s+1))
   r = R * (nu ^ -s)
   T = get_hyperparameter_configuration(n)
   for i in 0..s:
      n_i = floor(n * (nu ^ -i))
      r_i = r * (nu^i)
      L = {run_then_return_val_loss(t, r_i): t in T}
      T = top_k(T, L, floor(n_i / nu))

R: Maximum amount of resource to allocate to a single configuration
Nu: input that control proportion of configurations discarded each round (default is 3)
get_hyperparameter_configuration: generates the configuration for the network to be trained. the paper samples a uniform distribution but smarter distributions are allowed and may work better.
run_then_return_val_loss: performs training on a given configuration with a given resource cap
S_max is set to floor(log_nu(R)); up to s_max+1 brackets are run

top_k: returns the top k winners given algorithms and their loss scores

Envelopes

Standard deviations of a model's score during training create the envelope.

If the envelops do not overlap on termination the models can be seen as meaningfully different.