Artificial neural variability for deep learning: on overfitting, noise memorization, and catastrophic forgetting

Xie, Zeke; He, Fengxiang; Fu, Shaopeng; Sato, Issei; Tao, Dacheng; Sugiyama, Masashi

doi:10.1162/neco_a_01403

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Artificial neural variability for deep learning: on overfitting, noise memorization, and catastrophic forgetting
Citation	Xie Z, He F, Fu S, Sato I, Tao D, Sugiyama M. Neural Comput. 2021; 33(8): 2163-2192.
Copyright	(Copyright © 2021, MIT Press)
DOI	10.1162/neco_a_01403
PMID	unavailable
Abstract	Deep learning is often criticized by two serious issues that rarely exist in natural nervous systems: overfitting and catastrophic forgetting. It can even memorize randomly labeled data, which has little knowledge behind the instance-label pairs. When a deep network continually learns over time by accommodating new tasks, it usually quickly overwrites the knowledge learned from previous tasks. Referred to as the neural variability, it is well known in neuroscience that human brain reactions exhibit substantial variability even in response to the same stimulus. This mechanism balances accuracy and plasticity/flexibility in the motor learning of natural nervous systems. Thus, it motivates us to design a similar mechanism, named artificial neural variability (ANV), that helps artificial neural networks learn some advantages from "natural" neural networks. We rigorously prove that ANV plays as an implicit regularizer of the mutual information between the training data and the learned model. This result theoretically guarantees ANV a strictly improved generalizability, robustness to label noise, and robustness to catastrophic forgetting. We then devise a neural variable risk minimization (NVRM) framework and neural variable optimizers to achieve ANV for conventional network architectures in practice. The empirical studies demonstrate that NVRM can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs. Language: en