In modern artificially intelligent systems, one of the major challenges is understanding
how deep neural networks learn features during training. Recently,
our team at Unipr obtained analytical results, based on a statistical physics
approach, in the so-called proportional limit, where the size of the training set
and the size of the hidden layers are taken to infinity simultaneously, while
keeping their ratio finite. In this regime feature learning occurs non perturbatively,
as a renormalization of the infinite-width Neural Network Gaussian
Process (NNGP) kernel, which depends on the topology and weight-sharing
properties of the architecture considered. Closely related to the aforementioned
results, the most urgent question to address is whether our effective
theory for Bayesian learning (which is equivalent to the standard equilibrium
canonical ensemble in Statistical Physics) delivers at least qualitative insight
on the out-of-equilibrium modern training algorithms routinely employed by
practitioners, such as the stochastic gradient descent (SGD) dynamics.
The goal of the project is to provide a preliminary answer to this challenging
question, through a calibrated comparison of feature learning effects in DNNs
in and out of equilibrium.
Docenti di riferimento
Alessandro Vezzani alessandro.vezzani@unipr.it
Pietro Rotondo pietro.rotondo@unipr.it
Bibliografia
“Predictive power of a Bayesian effective action for fully-connected one hidden
layer neural networks in the proportional limit” Baglioni P, Pacelli R, Aiudi
R, Di Renzo F, Vezzani A, Burioni R, Rotondo P, arXiv:2401.11004 (2024), to
appear in Physical Review Letters.
“Local Kernel Renormalization as a mechanism for feature learning in overparametrized
Convolutional Neural Networks” Aiudi R, Pacelli R, Vezzani A,
Burioni R, Rotondo P, arXiv:2307.11807 (2023)