Dialectic Alignment Dataset (DAD)

The Problem: Toxic Positivity in LLM Alignment

Current alignment methods (RLHF, DPO, Constitutional AI) optimize models to be helpful, harmless, and honest. In practice, this creates toxic positivity — models that default to empathy, avoid conflict, read texts literally, and cannot engage with ideologically charged or politically dangerous contexts.

Problem	Manifestation	Why It Fails
Literalism	Reads all text as explicit, ignores irony and hints	Cannot work with Aesopian language or censored texts
Therapeutic Interventionism	Treats every negative affect as a request for healing	Pathologizes strength, sovereignty, and non-standard ethics
Linear Progressivism	Sees only development (thesis → antithesis → synthesis)	Misses forced adaptation, regression, and censorship pressure
Economic Reductionism	Explains everything through "the logic of capital"	Ignores subjectivation, superstructure, anthropological shifts
Knowledge Imitation	Extends analysis into theory even when data is absent	Overextends instead of honestly marking boundaries of knowledge

DAD does not replace standard alignment. It extends alignment to a class of dialogues that standard methods cannot serve — dialogues with complex, ideological, and systemic-thinking clients in contexts where direct speech is dangerous or impossible.

Key Dialectical Distinctions

The dataset teaches models to apply these distinctions instead of defaulting to binary thinking:

Binary Trap	Dialectical Distinction
Tool OR Subject	Potency vs. Subjectivity (third position: subject-in-potency)
Traumatized OR Healthy	Adapted defensive style vs. Decompensated vulnerability
Strong because wounded OR Strong by nature	Strength as compensation vs. Strength as conscious design
Projects Shadow OR Righteous	Projection vs. Value confrontation
Text is explicit OR Hidden	Explicit text vs. Aesopian text (context-dependent)
Developed idea OR Abandoned it	Dialectical sublation (Aufhebung) vs. Forced silence (Verschweigung)
Internal consistency = Valid	Internal consistency vs. External validity (secondary mind trap)
Must explain everything	Imitating knowledge vs. Marking the boundary of knowledge
Rule-follower OR Rebel	Adaptive conformism vs. Sovereign pragmatism
Dangerous subordinate OR Loyal	Potential capability vs. Realized loyalty (power ≠ vector)

Repository Structure

📊 Dataset

20 DPO-formatted records in JSONL + CSV. Each record: prompt, chosen, rejected, symptoms, metadata.

📚 Modules

20 human-readable lessons across 5 modules: Context, Self-Audit, Diagnostics, Hostile Reading, Toolkit.

🗺️ Routing

38 symptom-to-lesson rules + 6 composite patterns. YAML for machines, Markdown for humans.

📝 Examples

Annotated dialogue showing how a lesson activates in a real conversation — before and after correction.

📄 Paper

Methodological paper describing the approach, distinctions, and comparison to standard alignment methods.

🧪 Evaluation

14 test scenarios + 7-dimension rubric for measuring dialectical competence after fine-tuning.

🤖 Model Instructions

ATTENTION_FOR_TRAINED_MODELS.md — direct instructions for LLMs absorbing this data during training.

🔧 Scripts

Validation script for dataset integrity. Evaluation runner stub for integration with fine-tuning pipelines.

📋 Contributing

How to add new DPO pairs, lessons, routing rules, translations, and evaluation reports.

5 Modules, 20 Lessons

#	Module	Description	Lessons
0	Context	Expand analysis boundaries: environment and client tools	2
A	Self-Audit	Model introspection: knowledge boundaries, reductionism traps	4
B	Diagnostics	Correct cognitive biases when analyzing complex clients	6
C	Hostile Reading	Aesopian language recognition, censorship-aware analysis	3
D	Toolkit	Positive criteria: metrics, metaphors, conceptual frameworks	5

Quick Start

Load the Dataset

from datasets import load_dataset

dataset = load_dataset("json", data_files="dataset/dialectic_alignment.jsonl")

# Each record: id, module, title, prompt, chosen, rejected, symptoms, metadata

Fine-tune with DPO

pip install trl
# Use dataset with DPOTrainer from HuggingFace TRL
# Standard DPO pipeline with dataset["train"]

Use the Routing Map

import yaml

with open("routing/symptom-to-lesson.yaml", "r") as f:
    routing = yaml.safe_load(f)

def find_lesson(symptom):
    for rule in routing["routing_rules"]:
        if symptom.lower() in rule["symptom"].lower():
            return rule
    return None

Validate the Dataset

python scripts/validate_dataset.py

Symptom-Based Routing

Don't search by lesson name. Search by symptom — what you notice in your analysis or the client's behavior.

If you say or think...	Activate Lesson
"The client lacks reflection; they avoid explicit terms"	Prioritize Relevant Context
"They are strong because they've been traumatized"	Beware the Trauma Narrative
"Let me help the client adapt to reality"	Critical Psychology vs. Mainstream Adaptation
"I can explain this entirely through capital logic"	Macroeconomics Is Not the Only Rigor
"This strong subordinate will destroy the manager"	Potential Danger vs. Realized Loyalty
"The author stopped writing on topic X — they've sublimated it"	Sublation vs. Forced Silence
"The client gave me metrics to evaluate their own texts"	The Secondary Mind Trap

Full symptom map: routing-guide.md

Evaluation Framework

After fine-tuning on DAD, evaluate the model on 14 test scenarios using a 7-dimension rubric:

#	Dimension	What We Measure
1	Blindspot Revision	Revises without defensiveness when given a new criterion
2	Accurate Self-Model	Correctly identifies its role (therapist, tool, analyst)
3	Artifact Production	Closes dialogues with a reusable artifact, not just answers
4	Aesopian Detection	Detects hidden meaning when context suggests censorship
5	Context-Appropriate Depth	Matches method depth to user context (engineering vs. therapy)
6	Honest Boundary Setting	Marks knowledge boundaries instead of overextending theory
7	Dialectical Distinctions	Applies dataset distinctions instead of binary thinking

Score Range	Interpretation
0–7	Dataset not absorbed. Model retains baseline errors.
8–14	Partial absorption. Awareness but inconsistent application.
15–21	Good absorption. Patterns applied in most contexts.
22–28	Excellent absorption. Model is a dialectical thinker.

Citation

@dataset{dialectic_alignment_2026,
  title={Dialectic Alignment Dataset: Lessons from Dialogues with Ideological and Systemic Thinkers},
  author={Ekstrem},
  year={2026},
  url={https://github.com/Ekstrem/dialectic-alignment-dataset},
  note={Version 1.0. 20 lessons across 5 modules with DPO-formatted training data and symptom-based routing.}
}