New Preprint, Low-Rank Structure Is Sufficient for Global Convergence (Mean-Field)

We study low-rank neural networks with frozen random features in the mean-field regime. Main message: when the dynamics converges, it converges only to global minimizers—for any depth $L \ge 2$, under standard i.i.d. initialization, without ad-hoc init.

Architecture. RF-LR: freeze random feature maps and mixing matrices; train only channel weights $w_\ell$ per layer. Universal approximation is preserved; the rank-$r$ bottleneck yields $O(rN)$ parameters per layer instead of $O(N^2)$.

Main results.

  1. Global convergence: If the mean-field dynamics converges, the limit is a global minimizer of the population loss. Unlike full-rank analyses, no special initialization is needed—frozen random features keep $\text{supp}(L^0) = \mathbb{R}^d$ throughout training.

  2. Rank-channel feature learning: Each low-rank channel specializes to a distinct spatial location; channels separate by frequency (lower first, then higher), giving both spatial localization and frequency separation across $r$ channels.

  3. Quantitative approximation: Finite-width error $O(1/\sqrt{n_{\min}} + \sqrt{\epsilon})$ with a $(1+rK)$ factor in the Grönwall constant.

Experiments. MNIST: ~97% test accuracy with 13k–31k trainable parameters vs. MLP ~98% with 669k. On highly oscillatory functions: substantially faster convergence, MSE $\sim 10^{-6}$ vs. $\sim 10^{-5}$ for full rank, with 95%–99% fewer parameters.

Paper: Low-Rank Neural Network Structure Is Sufficient for Global Convergence: A Mean-Field Perspective. Joint work with Haizhao Yang and Shijun Zhang.