Collect preference and rationale data with outcome labels for reward models.
Run structured alignment tasks in conversation and collect preferences, rationales, and outcome labels with full experimental context.
Preference labels alone can be fragile. echo links preferences to downstream outcome signals so reward models can be evaluated beyond stated choice.
Vary question formats, rubrics, and disclosures as experimental arms and measure how protocol choices shape labels.
Every run exports a full reproducibility pack including protocol files, cohort definitions, exclusion rules, metric definitions, and timestamps so analyses are defensible and repeatable.
You ask, we answer.
Pairwise comparisons, rankings, Likert/rubric scoring, and structured free-response. We can mix task types in one study.
Core labels include comprehension, preference, confidence/trust, risk perception, and intent/likelihood to act. Definitions are explicit and consistent across cohorts so results are comparable.
Yes – you can run multi-arm tests with randomized assignment and versioned prompts. We track performance by arm, segment, and time.
Start by exploring what echo already knows. Go deeper when you’re ready.