PRM Gateway

Corpus Review Site

TPO

PRM

The Sequel

Metrics

EURE

LDI

RACS

Pipeline

Data

Integrity

Terminology

PRM Glossary

Short public-safe definitions for the terms used across Project Rocket Man.

This page gives reviewers enough vocabulary to follow the project without exposing source text, private maps, protected titles, or reconstructable manifests.

Open Review Materials

Project Rocket Man / PRM

A rights-controlled, human-authored lyrical language corpus and benchmark environment documented through public-safe aggregate evidence.

Protected Corpus

The private source writing behind PRM. Public pages describe its behavior through aggregate outputs, not raw text.

Public-Safe Evidence

Charts, metric summaries, method notes, and boundaries that make the project legible without making protected material reconstructable.

Controlled Review

A non-public review path for technical, research, or licensing diligence under appropriate terms.

Aggregate Metrics

Measurements summarized across slices or corpus groups, so readers see behavior without source-level text.

Source Mapping

Private structure connecting source material to analysis inputs. Public pages do not expose it.

Reconstructable Manifest

Any listing, map, or label set that could help rebuild protected source structure. PRM keeps this out of the public site.

Entropy

A measure of distribution and uncertainty in text behavior. In PRM, entropy helps show how information is spread across a measured slice.

Inverse Entropy Behavior

A way of reading how predictability, repetition, or compression pressure changes across a text set.

Shannon Entropy

The primary entropy measure for information dispersion. Higher Shannon entropy generally means the text is less predictable and more evenly distributed.

Rényi Entropy

An entropy-family measure that can be more sensitive to concentration effects than Shannon alone. PRM uses it as a second view of distribution behavior.

Tsallis Entropy

An entropy-family measure that reads diversity and concentration through a different mathematical lens, helping test whether entropy strength is broad or metric-specific.

Lexical Diversity

A vocabulary-range signal. It asks how much variety appears in the measured text relative to the amount of language being measured.

Unique Tokens

The count of distinct measured tokens in a slice or corpus group. It helps separate vocabulary reach from repeated reuse of the same terms.

MTLD

Measure of Textual Lexical Diversity. It estimates how long a text can sustain vocabulary variety before repetition lowers the diversity signal.

MSTTR50

Mean Segmental Type-Token Ratio at 50-token windows. It averages vocabulary variety across controlled chunks so one long passage does not dominate the read.

HDD42

Hypergeometric Distribution Diversity using PRM’s comparison setting. It estimates lexical variety through probability rather than a simple unique-word count.

Herdan’s C

A vocabulary-growth metric that compares distinct vocabulary against total length. It helps show whether vocabulary keeps expanding as text volume grows.

Yule’s K

A repetition and lexical-concentration metric. Lower raw Yule’s K usually means less repetition pressure, so PRM inverts it before index scoring.

Maas

A lexical narrowness metric. Lower raw Maas generally indicates broader vocabulary behavior, so PRM inverts it before index scoring.

Tokenizer

A system that breaks text into units for analysis. Tokenizer behavior can change how language patterns appear.

Slice Size

The amount of text measured at once. PRM uses controlled slices so comparisons face similar pressure.

Slice Trend

A stability check across different measurement windows. It asks whether a signal survives changing slice sizes or depends on one favorable window.

Transform

A repeatable preparation step that converts protected source material into analysis-ready form without publishing the source.

RMR

A review-facing layer used for human-readable checking, provenance work, and source-facing quality control.

TQT

A measurement-clean layer used for statistical analysis after formatting and handling choices are controlled.

EURE

Efficient Complexity. A PRM index combining entropy strength, unique-token behavior, and repetition control into a public-safe efficient-complexity read.

LDI

Lexical Discipline Index. A PRM index for vocabulary control and sustained word-choice behavior outside the entropy formula.

RACS

Repetition-Adjusted Complexity Score. A PRM index for complexity that remains after repetition pressure and lexical narrowness are discounted.

Public-Safe Gauntlet

A set of aggregate checks that stress the claim without exposing source text, titles, or private mappings.

Reference Baseline

A comparison coordinate used to interpret metric behavior. It is not an artistic ranking.

Adversarial Writing Set

A difficult comparison set used to test whether the metric stack stays coherent under pressure.

Provenance

The record of where inputs, transforms, versions, and outputs came from, kept public-safe on the open site.

Reproducibility Package

Controlled materials that may support independent review without turning the protected corpus into a public download.

Metrics hub

Methods & Metrics

Dataset structure

Dataset Structure

Review access

Technical / Licensing Review

Disclosure page

Disclosure Boundary