Adversarial Stress Test

Adversarial Stress Test

Adversarial Dataset + Kill List

Adversarial Dataset + Kill List

The adversarial set is PRM’s stress test. It checks whether the metric stack stays coherent when comparison material is dense, difficult, and easy to over-read.

The adversarial set is PRM’s stress test. It checks whether the metric stack stays coherent when comparison material is dense, difficult, and easy to over-read.

/prm/adversarial

/prm/adversarial

Role

Pressure test

Uses hard comparison material to separate sturdy signals from scores that only look strong in easier conditions.

Protects

False confidence

Surfaces where topic, repetition, transcription, or handling artifacts could inflate a reading.

Public View

Aggregate evidence

Keeps the stress-test results readable without exposing protected text, identities, titles, or source maps.

Evidence Frame

What the adversarial dataset is

The adversarial dataset is PRM’s pressure chamber for interpretation.

The Solo Dataset asks what the human corpus can sustain. The adversarial layer asks what happens when model systems try to interpret, compress, attribute, or respond to that kind of language under pressure.

The public page stays at the framework level. It describes the stress environment and diagnostic categories without publishing protected interaction logs, private transcripts, prompts, or source-reconstructable material.

Why it exists

It exists because model behavior is part of the evidence environment.

The point is not to mock failure. The point is to make failure modes measurable and reviewable. When a system misses layered meaning, over-compresses a passage, misattributes novelty, or drifts away from context, that event becomes a diagnostic signal.

That makes the adversarial layer useful for both human interpretation and model-evaluation work: it shows where the material is difficult, and where a model response stops tracking the structure in front of it.

The Kill List

The Kill List is a diagnostic framework for recurring model-failure modes.

Examples include:

- misattribution
- context drift
- missed layered meaning
- false compression
- repetition blindness
- authorship confusion
- overgeneralization
- tone failure
- weak novelty detection

How the stress test works

The adversarial layer turns interpretation into a test environment. It looks for places where a model response loses structure, ignores constraints, collapses distinct ideas together, or explains away behavior that should be inspected.

Those failures are grouped into diagnostic categories. The categories are not a public transcript archive. They are a public-safe way to explain what kind of weakness the project is tracking.

Public-safe evidence

The public evidence uses aggregate gauntlet views. These charts show how measured PRM segments behave under comparison pressure without exposing private adversarial logs.

Pairwise dominance, first-place rate, top-20 heatmaps, and exception counts give different views of stress behavior: direct matchups, repeated wins, broad leaderboard position, and visible exceptions.

Why it matters

The adversarial layer turns PRM from a static corpus into a stress environment. It shows not only what the human corpus contains, but where model systems struggle to interpret it.

For the site narrative, this page keeps the model-facing claim separate from the corpus claim. The corpus pages explain what was measured. The adversarial page explains why interpretation itself became part of the test.

How to read the charts

Start with the gauntlet dashboard to understand the broad comparison environment. Then use the top-20 heatmap to see how tested segments behave across slice sizes.

Pairwise dominance asks whether The Sequel holds up in direct matchups, not only in an aggregate average. First-place rate asks how often it reaches the top across shared views. Exception count is the honesty check: it shows where the result is not first, keeping the stress-test frame grounded.

Public-safe limits

This page does not publish raw model transcripts, protected writing, third-party source text, private prompts, source-level mappings, artist names, album titles, or song titles.

Public-safe boundary

Public pages show aggregate evidence, metric behavior, method provenance, and corpus structure. Protected text, identities, source titles, and reconstructable mappings stay private.