Optimal inference of a generalised Potts model by single-layer transformers with factored attention

Rende, Riccardo; Gerace, Federica; Laio, Alessandro; Goldt, Sebastian

Full-text links:

Download:

Current browse context:

cond-mat.dis-nn

< prev | next >

new | recent | 2304

Condensed Matter > Disordered Systems and Neural Networks

Title: Optimal inference of a generalised Potts model by single-layer transformers with factored attention

Authors: Riccardo Rende, Federica Gerace, Alessandro Laio, Sebastian Goldt

(Submitted on 14 Apr 2023 (this version), latest version 4 Apr 2024 (v4))

Abstract: Transformers are the type of neural networks that has revolutionised natural language processing and protein science. Their key building block is a mechanism called self-attention which is trained to predict missing words in sentences. Despite the practical success of transformers in applications it remains unclear what self-attention learns from data, and how. Here, we give a precise analytical and numerical characterisation of transformers trained on data drawn from a generalised Potts model with interactions between sites and Potts colours. While an off-the-shelf transformer requires several layers to learn this distribution, we show analytically that a single layer of self-attention with a small modification can learn the Potts model exactly in the limit of infinite sampling. We show that this modified self-attention, that we call ``factored'', has the same functional form as the conditional probability of a Potts spin given the other spins, compute its generalisation error using the replica method from statistical physics, and derive an exact mapping to pseudo-likelihood methods for solving the inverse Ising and Potts problem.

Comments:	4 pages, 3 figures
Subjects:	Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:	arXiv:2304.07235 [cond-mat.dis-nn]
	(or arXiv:2304.07235v1 [cond-mat.dis-nn] for this version)

Submission history

From: Riccardo Rende [view email]
[v1] Fri, 14 Apr 2023 16:32:56 GMT (563kb,D)
[v2] Thu, 14 Dec 2023 12:08:44 GMT (569kb,D)
[v3] Wed, 7 Feb 2024 09:48:07 GMT (568kb,D)
[v4] Thu, 4 Apr 2024 13:24:36 GMT (569kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cond-mat > arXiv:2304.07235v1

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Condensed Matter > Disordered Systems and Neural Networks

Title: Optimal inference of a generalised Potts model by single-layer transformers with factored attention

Submission history