NLP · Political Text · 2023

Party labels
from speech alone.

Abstract: A classifier trained on Spanish parliamentary speech to predict party affiliation from the text of a single intervention. The model generalises: when tested on minor parties whose speech it has never seen, it still places them on a coherent left–right axis.
Method: Corpus collection (Cortes transcripts) → cleaning → fine-tuned transformer → out-of-distribution evaluation on secondary parties.

Stylised courtroom debate — Fig. 1 — The corpus is messy; the signal is real.

Results

The per-class predicted-probability distributions below are diagnostic. For the two pairs the model was trained on (PSOE/PP and Podemos/Vox), the densities separate cleanly toward 0 and 1 — the model knows what each party sounds like. For ERC/JxC — two Catalan pro-independence parties not seen at training time — the densities overlap heavily, correctly reflecting that their rhetoric is less separable.

Predicted probabilities — Podemos vs Vox — Fig. 2 — Podemos vs. Vox. Strong separation — the extremes are easy.

Predicted probabilities — PSOE vs PP — Fig. 3 — PSOE vs. PP. Centre-left and centre-right still separate, albeit with a thicker overlap band.

Predicted probabilities — ERC vs JxC — Fig. 4 — ERC vs. JxC. Near-uniform densities — the model, correctly, cannot tell them apart.

Lessons

Classifiers trained on ideologically distant parties transfer partially to out-of-distribution parties, and the amount of transfer is itself a measurement: low class-separation probabilities on unseen pairs tells us something substantive about rhetorical proximity.

Party labelsfrom speech alone.

Results

Lessons

Party labels
from speech alone.