PADO: Personality-induced multi-Agents for Detecting OCEAN in human-generated texts

Haein Yeo¹, Taehyung Noh¹, Seungwan Jin², Kyungsik Han^1,2,*

¹Department of Artificial Intelligence, Hanyang University

²Department of Data Science, Hanyang University

^*Corresponding Author

Abstract

As personality can be useful in many cases — better understanding people's underlying contexts or providing personalized services — research has long focused on modeling personality from text. Personality detection, however, is hard: traits are latent and relative, vocabulary cues are context-dependent, and high-quality annotated data is scarce. We introduce PADO (Personality-induced multi-Agent framework for Detecting OCEAN Big Five traits), the first LLM-based multi-agent personality detection framework. PADO replaces a single-perspective judgment with a comparative judgment between agents that have been induced toward contrasting personality expressions, then aggregates their reasoning along psycholinguistic axes (emotion, cognition, sociality). Across two benchmarks (Essays, MyPersonality) and a range of LLMs from GPT-4o down to LLaMA3-8B-Instruct, PADO is consistently more accurate and generalizable than zero-shot, one-shot, and Chain-of-Thought prompting baselines, with particularly large gains on smaller models and on traits with subtle linguistic signatures.

Why a Multi-Agent, Comparative Approach?

Single-prompt LLM personality detection inherits the model's own biases and tends to over-rely on surface keywords. Naively stacking more agents helps little either, because similarly-prompted agents converge on similar verdicts — producing a multi-agent analogue of confirmation bias. PADO instead structures disagreement: it instantiates agents induced toward opposing endpoints of a trait, has them produce evidence-grounded explanations, and uses a separate judge agent to compare the two analyses. This mirrors LLM-as-Judge protocols (Zheng et al., 2024) but applies them to a task where the right answer depends on relative, not absolute, evidence — well-suited to personality.

Three-Phase Pipeline

Phase 1

Personality Induction

Builds two reasoner agents per trait — one induced toward the high pole, one toward the low pole — using the Personality Prompting (P²) method and validated on the MPI Evaluation Dataset for internal consistency. Inducing both poles directly creates the contrast the later judgment phase needs.

Phase 2

Psycholinguistic Explanation

Each induced reasoner analyzes the text along three psycholinguistic axes drawn from Pennebaker's framework: emotional (affective tone), cognitive (reasoning complexity), and social (interpersonal orientation). Outputs are evidence-grounded explanations rather than direct labels.

Phase 3

Comparative Assessment

A judge agent runs three steps: comparative analysis (where do the two reasoners agree or disagree, and how well does each match the text?), overall evaluation (which explanation better fits the user), and final judgment (high vs. low for the trait).

Five OCEAN Axes

The pipeline above is run independently for each of the Big Five trait dimensions. Each dimension gets its own pair of contrasting induced agents and its own comparative judgment pass.

O · Openness

Openness to Experience

Curiosity, abstract thinking, novelty-seeking and aesthetic engagement. The contrasting agents debate whether vocabulary signals exploratory cognition or conventional descriptive language.

C · Conscientiousness

Conscientiousness

Planning, reliability, and self-discipline cues. The agents weigh whether structure and goal orientation come through, against signals of impulsivity.

E · Extraversion

Extraversion

Outward-facing energy, assertiveness, and social orientation. The judge resolves whether narratives are socially engaged or introspective.

A · Agreeableness

Agreeableness

Warmth, cooperation, and prosocial reasoning. The contrast helps separate genuine empathy from formulaic politeness.

N · Neuroticism

Neuroticism

Emotional reactivity, anxiety markers, vulnerability disclosures. Calibrated against context so that situational venting isn't mistaken for trait-level neuroticism.

Experiments

Datasets

Essays (Pennebaker & King, 1999) — 2,468 stream-of-consciousness essays labeled high/low on each OCEAN trait via standardized self-report. ~50 sentences per essay; 10% sampled for testing.
MyPersonality (Celli et al., 2013) — 9,913 Facebook status messages from 250 users, each labeled high/low on OCEAN traits.

Baselines & Models Tested

Encoder fine-tuning: BERT, RoBERTa (8:1:1 train/val/test split).
Decoder LLMs with zero-shot, one-shot, Chain-of-Thought, and PADO: GPT-4o, GPT-3.5-turbo, Solar-10.7B-Instruct, LLaMA3-8B-Instruct.

Headline Results (F1)

On the Essays benchmark with GPT-4o, PADO lifts the average F1 from 0.53 (zero-shot) / 0.51 (one-shot) / 0.54 (CoT) to 0.66. The same backbone on MyPersonality reaches 0.83 F1 on Openness with PADO, well above all baselines. Smaller models (LLaMA3-8B-Instruct, Solar-10.7B) show similar directional improvements, confirming that the gains stem from the comparative-judgment structure rather than raw model scale.

Backbone	Method	O	C	E	A	N	Avg.
GPT-4o (Essays)	Zero-shot	0.62	0.38	0.41	0.59	0.64	0.53
	One-shot	0.39	0.43	0.51	0.58	0.66	0.51
	CoT	0.58	0.45	0.48	0.57	0.61	0.54
	PADO (ours)	0.70	0.70	0.63	0.65	0.61	0.66

F1 scores on the Essays benchmark with GPT-4o as backbone, excerpted from the paper. PADO improves the average and substantially closes the gap on Conscientiousness and Extraversion.

Contributions

A novel personality-induced multi-agent framework for OCEAN detection that leverages contrasting agent perspectives, the first such framework using LLM agents.
Demonstration that PADO is backbone-agnostic across model scales (GPT-4o through LLaMA3-8B), with particularly strong improvements on smaller models.
A psycholinguistically-grounded reasoning scheme that decomposes each agent's analysis into emotional, cognitive, and social axes — capturing implicit and relative trait signals without additional training.

BibTeX

@inproceedings{yeo2025pado, author = {Yeo, Haein and Noh, Taehyung and Jin, Seungwan and Han, Kyungsik}, title = {{PADO}: Personality-induced multi-Agents for Detecting {OCEAN} in human-generated texts}, booktitle = {Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025)}, year = {2025}, pages = {5719--5736}, url = {https://aclanthology.org/2025.coling-main.382/} }