№ 003Parties, from text alone.
PROJECTS · № 003
NLP · Political Text · 2023

Party labels
from speech alone.

Abstract
A classifier trained on Spanish parliamentary speech to predict party affiliation from the text of a single intervention. The model generalises: when tested on minor parties whose speech it has never seen, it still places them on a coherent left–right axis.
Method
Corpus collection (Cortes transcripts) → cleaning → fine-tuned transformer → out-of-distribution evaluation on secondary parties.
Stylised courtroom debate
Fig. 1 — The corpus is messy; the signal is real.

Results

The per-class predicted-probability distributions below are diagnostic. For the two pairs the model was trained on (PSOE/PP and Podemos/Vox), the densities separate cleanly toward 0 and 1 — the model knows what each party sounds like. For ERC/JxC — two Catalan pro-independence parties not seen at training time — the densities overlap heavily, correctly reflecting that their rhetoric is less separable.

Predicted probabilities — Podemos vs Vox
Fig. 2 — Podemos vs. Vox. Strong separation — the extremes are easy.
Predicted probabilities — PSOE vs PP
Fig. 3 — PSOE vs. PP. Centre-left and centre-right still separate, albeit with a thicker overlap band.
Predicted probabilities — ERC vs JxC
Fig. 4 — ERC vs. JxC. Near-uniform densities — the model, correctly, cannot tell them apart.

Lessons

Classifiers trained on ideologically distant parties transfer partially to out-of-distribution parties, and the amount of transfer is itself a measurement: low class-separation probabilities on unseen pairs tells us something substantive about rhetorical proximity.