Double descent

Template:Short description Script error: No such module "For".Template:Machine learning bar

File:Double descent in a two-layer neural network (Figure 3a from Rocks et al. 2022).png

An example of the double descent phenomenon in a two-layer neural network: as the ratio of parameters to data points increases, the test error first falls, then rises, then falls again.^[1] The vertical line marks the "interpolation threshold" boundary between the underparametrized region (more data points than parameters) and the overparameterized region (more parameters than data points).

Double descent in statistics and machine learning is the phenomenon where a model with a small number of parameters and a model with an extremely large number of parameters both have a small training error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a much greater test error than one with a much larger number of parameters.^[2] This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning.^[3]

History

Early observations of what would later be called double descent in specific models date back to 1989.^[4]^[5]

The term "double descent" was coined by Belkin et. al.^[6] in 2019,^[3] when the phenomenon gained popularity as a broader concept exhibited by many models.^[7]^[8] The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff),^[9] and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models.^[6]^[10]

Theoretical models

Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.^[11]

A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically.^[12]

Empirical examples

The scaling behavior of double descent has been found to follow a broken neural scaling law^[13] functional form.

References

Template:Reflist

External links

Script error: No such module "citation/CS1".
Script error: No such module "citation/CS1".
Understanding "Deep Double Descent" at evhub.

Template:Statistics Template:Artificial intelligence navbox

Template:Asbox

↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ ^a ^b Script error: No such module "citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ ^a ^b Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Script error: No such module "citation/CS1".
↑ Script error: No such module "Citation/CS1".
↑ Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.

[1] Script error: No such module "Citation/CS1".

[2] Script error: No such module "citation/CS1".

[:1-3] Script error: No such module "citation/CS1".

[4] Script error: No such module "Citation/CS1".

[5] Script error: No such module "Citation/CS1".

[:0-6] Script error: No such module "Citation/CS1".

[7] Script error: No such module "Citation/CS1".

[8] Script error: No such module "Citation/CS1".

[geman-9] Script error: No such module "Citation/CS1".

[10] Script error: No such module "Citation/CS1".

[11] Script error: No such module "citation/CS1".

[12] Script error: No such module "Citation/CS1".

[13] Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Double descent

Contents

History

Theoretical models

Empirical examples

See also

References

Further reading

External links

Navigation menu

Double descent

History

Theoretical models

Empirical examples

See also

References

Further reading

External links

Navigation menu

Search