Current browse context:
cond-mat.dis-nn
Change to browse by:
References & Citations
Condensed Matter > Disordered Systems and Neural Networks
Title: Optimal inference of a generalised Potts model by single-layer transformers with factored attention
(Submitted on 14 Apr 2023 (this version), latest version 4 Apr 2024 (v4))
Abstract: Transformers are the type of neural networks that has revolutionised natural language processing and protein science. Their key building block is a mechanism called self-attention which is trained to predict missing words in sentences. Despite the practical success of transformers in applications it remains unclear what self-attention learns from data, and how. Here, we give a precise analytical and numerical characterisation of transformers trained on data drawn from a generalised Potts model with interactions between sites and Potts colours. While an off-the-shelf transformer requires several layers to learn this distribution, we show analytically that a single layer of self-attention with a small modification can learn the Potts model exactly in the limit of infinite sampling. We show that this modified self-attention, that we call ``factored'', has the same functional form as the conditional probability of a Potts spin given the other spins, compute its generalisation error using the replica method from statistical physics, and derive an exact mapping to pseudo-likelihood methods for solving the inverse Ising and Potts problem.
Submission history
From: Riccardo Rende [view email][v1] Fri, 14 Apr 2023 16:32:56 GMT (563kb,D)
[v2] Thu, 14 Dec 2023 12:08:44 GMT (569kb,D)
[v3] Wed, 7 Feb 2024 09:48:07 GMT (568kb,D)
[v4] Thu, 4 Apr 2024 13:24:36 GMT (569kb,D)
Link back to: arXiv, form interface, contact.