Project Rocket Man

Public Research Site

TPO

PRM

The Sequel

Metrics

EURE

LDI

RACS

Pipeline

Data

Integrity

Dataset Map

Dataset Overview

The dataset map explains the public corpus families: what each set is for, how it is used, and what remains private.

Role

Family map

Keeps solo, adversarial, reference, and public-domain comparison sets distinct.

Public Layer

Aggregate outputs

Shows chartable behavior and measurement structure without publishing protected source material.

Private Layer

Reserved review

Source-level manifests remain reserved for verification outside the public site.

Evidence Frame

Dataset family

The public family is easiest to read in layers:

- Primary Corpus: the central body of Project Rocket Man, built from original first-draft writing by Seekwill.
- Adversarial Dataset: a separate corpus family built from direct lyrical exchanges between Seekwill and AI systems.
- Reference Corpora: anonymized comparison families shown publicly with neutral labels.
- Public-Domain Corpora: external comparison material that can be discussed with lower source-risk.
- Transform Outputs: RMR and TQT views that separate readable review from metric-clean measurement.
- Analysis Outputs: metric workbooks, summary tables, chart exports, and Framer-ready payloads.

The point of the map is separation. Readers can see how the evidence is organized without being handed a source-reconstructable manifest.

Primary Corpus

The Primary Corpus is the central body of Project Rocket Man: a large-scale, solo-authored lyrical dataset built from original first-draft writing by Seekwill. It represents the project’s core human baseline — the main body of work used to measure entropy behavior, lexical range, phonetic structure, rhyme architecture, density, continuity, and long-form non-repetition under sustained creative pressure.

This corpus is treated as the foundation layer of PRM. It is separate from adversarial exchanges, AI-side response material, public reference corpora, and transformed analysis outputs, so its statistical behavior can be measured cleanly and compared under consistent conditions.

Public pages describe the Primary Corpus through aggregate metrics, chronology, corpus-scale findings, transform definitions, and public-safe excerpts where appropriate. Protected source material, full raw text, and reconstruction-level dataset contents remain controlled for formal review.

Adversarial Dataset

The Adversarial Dataset is a separate corpus family built from direct lyrical exchanges between Seekwill and AI systems. It is used to compare human-authored battle writing against model-generated response writing under adversarial pressure, while keeping the solo corpus, AI-side material, reference corpora, and transform outputs methodologically distinct.

This layer helps document how human and model-generated writing behave under comparable pressure conditions, including differences in originality, density, continuity, rebuttal behavior, structural control, and collapse resistance.

Public pages describe this dataset through aggregate metrics, role labels, corpus-family structure, and public-safe findings. They do not publish protected transcripts, private source material, or prompt-level details that could reconstruct the review environment.

Reference and public-domain comparisons

Reference corpora are public-facing comparison families, not public artist rankings. The site uses labels such as Reference Corpus A and Reference Corpus B so readers can inspect relative metric behavior without turning the page into a source map.

Public-domain comparison texts provide additional external scale where they fit the question and carry lower protected-source risk.

Analysis outputs

The public site is built from outputs, not raw corpora:

- tokenized and segmented measurement passes
- metric workbooks
- index leaderboards
- slice-size trends
- profile heatmaps
- chart-ready public-safe summaries
- Framer-ready CMS payloads

These outputs let the site show behavior, not source text. They are the layer where evidence becomes visible and the protected material stays protected.

What this page proves / does not prove

This page supports a structure claim: PRM is organized as a rights-controlled dataset family with separate roles for solo material, adversarial review, reference baselines, transforms, analysis workbooks, and chart-ready public summaries.

It does not publish a downloadable corpus, expose protected text, reveal source titles, or provide a source-reconstructable manifest. Dataset structure is visible here; source-level verification belongs in controlled review.

Real-world size comparisons

Exact counts should come from the current private manifest before publication.

Public copy can use reviewed scale descriptions, but it should avoid exposing a source-reconstructable manifest. Useful comparisons can include:

- combined PRM sets compared to major novel-length works
- solo dataset compared to familiar book lengths
- adversarial dataset as a separate review layer

Until counts are reviewed, structural descriptions are stronger than invented totals.

How to read the charts

The dataset charts are not source documents. They are aggregate views of what the measurement system produced.

Start with the dataset or coverage dashboard to understand the analysis surface. Then use heatmaps and profile charts to see how behavior changes across segments, tokenizer passes, and comparison lenses.

The useful question is not “which private source is this?” The useful question is whether the same public-safe pattern remains visible across the dataset family.

Public vs private

Public materials show aggregate results, anonymized labels, chart assets, and method explanations.

Private review may include source mapping, manifests, provenance details, and controlled technical documentation.

The split is intentional. The website should make the measurement logic legible without turning protected material into a public dataset.

Technical review path

Dataset licensing or technical diligence should start from the controlled review boundary. The public page explains the dataset families and output layers; deeper review can cover provenance records, source manifests, versioning notes, and reproducibility materials under appropriate terms.

Public-safe boundary

Public pages show aggregate evidence, metric behavior, method provenance, and corpus structure. Protected text, identities, source titles, and reconstructable mappings stay private.

Review access

Technical / Licensing Review

Disclosure page

Disclosure Boundary

Metrics hub

Methods & Metrics

Pipeline workflow

Pipeline

Integrity boundary

Integrity & Provenance

PRM context

Solo Dataset