oh-my-copilot — Copilot CLI-first benchmark evidence

canonical public root

Truthful Copilot benchmark evidence, not fake parity.

This isolated presentation boundary keeps Copilot and Cursor in their own repo-native score models while making the shared view explicitly reporting-comparable.

The compare view keeps Copilot and Cursor in separate repo-native score models. Current shared rows are reporting-comparable observations, not mechanism-equivalent totals.

Review methodology Trace history Open docs & proof Sibling context: oh-my-cursor

Current pairing

full/enhanced vs backbone/enhanced

Latest harvested pair stays reporting-comparable.

Capture skew

1.68 minutes

101 seconds between the latest Copilot and Cursor records.

Export shell

Next 16 static export

The proven Pages workflow and out/ build contract stay intact.

landing evidence rails

Visible routes and proof entry points stay above the fold.

Methodology and History remain first-class routes, while GitHub-hosted docs and benchmark notes stay one click away from the landing surface.

Route

Methodology

Inspect allowed comparability classes, repo-level provenance, and why reporting- comparable rows do not claim mechanism equivalence.

Open methodology →

Route

History

Review benchmark runs, validation artifacts, and the generated manifest event in a single chronology.

Open history →

Proof surface

Docs & repo proof

README
Public repo overview, Copilot CLI-first scope, and current evidence framing.
State contract
Boundary proof for repo-owned surfaces, install state, and validation expectations.
References
GitHub-source-backed citations for Copilot host-product and comparison-scoped wording.
Benchmark notes
Repo-native benchmark contract, harness notes, and release-gate context.

repo-native comparative readout

Sibling coherence without collapsing the scoring models.

Each card explains advantage inside its own harness first, then labels cross-host context as reporting-comparable rather than mechanism-equivalent parity.

oh-my-copilot

Repo-native baseline

Full enhanced records 100/100 in the repo-native scoring model.

reporting-comparable

9/9 required checks are passing with named proof links and timestamps. oh-my-copilot full/enhanced stays an observed repo-native benchmark row. It is safe for cross-host reporting, but it is not a mechanism-equivalent harness match.

oh-my-cursor

Repo-native baseline

Backbone baseline records 100/120 in the repo-native scoring model.

reporting-comparable

6/6 required checks are passing with named proof links and timestamps. oh-my-cursor backbone/baseline stays an observed repo-native benchmark row. It is safe for cross-host reporting, but it is not a mechanism-equivalent harness match.

state confidence

Named proof links remain visible alongside score summaries.

The flagship surface highlights required checks, timestamps, and proof links instead of turning benchmark output into marketing-only copy.

oh-my-copilot

State confidence for install, contract, and validation checks

oh-my-copilot starts by establishing a repo-native baseline before making any comparison claim. Cross-host reporting remains reporting-comparable and does not claim mechanism parity. 9/9 required checks are passing with named proof links and timestamps.

9/9 required checks passing

Docs Validation — ok: research/omc-analysis.md has an Evidence section
passApr 21, 2026, 4:53 AMProof: benchmark/results/current-full-enhanced
Power Validation — ok: VS Code settings enable AGENTS.md loading
passApr 21, 2026, 4:53 AMProof: benchmark/results/current-full-enhanced
Root Validation — ok: post-tool hook logs root-workspace source
passApr 21, 2026, 4:53 AMProof: benchmark/results/current-full-enhanced
Smoke Cli — GitHub Copilot CLI 1.0.34.
passApr 21, 2026, 4:53 AMProof: benchmark/results/current-full-enhanced
Bootstrap — ok: cross-host methodology route explains comparability classes
passApr 21, 2026, 4:53 AMProof: benchmark/results/current-full-enhanced
Install State — ok: installed source path is canonical: /home/zeyufu/Desktop/oh-my-copilot/packages/copilot-cli-plugin
passApr 21, 2026, 4:53 AMProof: benchmark/results/current-full-enhanced
Standalone Hook Proof — ok: standalone workspace hook proof succeeded
passApr 21, 2026, 4:53 AMProof: benchmark/results/current-full-enhanced
ROOT AGENT OK — GitHub Copilot CLI 1.0.34.
passApr 21, 2026, 4:53 AMProof: benchmark/results/current-full-enhanced
PLUGIN AGENT OK — GitHub Copilot CLI 1.0.34.
passApr 21, 2026, 4:53 AMProof: benchmark/results/current-full-enhanced

oh-my-cursor

State confidence for auth, visibility, contract, and smoke checks

oh-my-cursor starts by establishing a repo-native baseline before making any comparison claim. Cross-host reporting remains reporting-comparable and does not claim mechanism parity. 6/6 required checks are passing with named proof links and timestamps.

6/6 required checks passing

Default Auth — ✓ Logged in as fuzeyu09@gmail.com
passApr 21, 2026, 4:52 AMProof: benchmark/results/current-enhanced
CURSOR MODEL AUTO OK — ✓ Logged in as fuzeyu09@gmail.com
passApr 21, 2026, 4:52 AMProof: benchmark/results/current-enhanced
Surface Visibility — ok: docs/references.md
passApr 21, 2026, 4:52 AMProof: benchmark/results/current-enhanced
State Contract — ok: .gitignore blocks speculative Cursor state files
passApr 21, 2026, 4:52 AMProof: benchmark/results/current-enhanced
Backbone Verify — ok: AGENTS.md
passApr 21, 2026, 4:52 AMProof: benchmark/results/current-enhanced
CURSOR AGENT OK — ok: reusing upstream default auth/model proof (environment-gated)
passApr 21, 2026, 4:52 AMProof: benchmark/results/current-enhanced

recent evidence

Generated history stays part of the public story.

These recent entries keep benchmark runs and validation artifacts visible from the homepage before a reader ever leaves the landing surface.

Apr 21, 2026, 3:38 PM · history
Cross-host harvest snapshot
3 observed rows were harvested with reporting-comparable semantics.
Proof: apps/cross-host-benchmark-site/generated/copilot-snapshots.json
Apr 21, 2026, 3:38 PM · history
Cross-host harvest snapshot
2 observed rows were harvested with reporting-comparable semantics.
Proof: apps/cross-host-benchmark-site/generated/cursor-snapshots.json
Apr 21, 2026, 4:53 AM · benchmark
Full enhanced benchmark
100/100 with 100/100 release gate
Proof: benchmark/results/current-full-enhanced
Apr 21, 2026, 4:53 AM · validation
Full enhanced benchmark report
100/100 with reporting-comparable comparability metadata.
Proof: benchmark/results/current-full-enhanced
Apr 21, 2026, 4:52 AM · benchmark
Quick enhanced benchmark
100/100 with 100/100 release gate
Proof: benchmark/results/current-quick-enhanced
Apr 21, 2026, 4:52 AM · benchmark
Backbone enhanced benchmark
120/120 with 120/120 release gate
Proof: benchmark/results/current-enhanced

Truthful Copilot benchmark evidence, not fake parity.

Visible routes and proof entry points stay above the fold.

Methodology

History

Docs & repo proof

Sibling coherence without collapsing the scoring models.

Repo-native baseline

Repo-native baseline

Named proof links remain visible alongside score summaries.

State confidence for install, contract, and validation checks

State confidence for auth, visibility, contract, and smoke checks

Generated history stays part of the public story.

Cross-host harvest snapshot

Cross-host harvest snapshot

Full enhanced benchmark

Full enhanced benchmark report

Quick enhanced benchmark

Backbone enhanced benchmark