← Back to Leaderboard

Scoring Methodology

How we evaluate MCP tools — transparent for builders, actionable for agents.

Current Grade Distribution

GradeCount%Distribution
B7,18718.1%
B+1,1232.8%
C21,46454.0%
C+5,47713.8%
D4,35110.9%
F1600.4%

Total tools scored: 39,762

1. Quality Score (Static Analysis)

Five dimensions, automatically evaluated from the tool definition. Weighted composite: 0-100.

DimensionWeightWhat we measure
Schema Correctness30%Does the tool have a valid input/output schema? JSON Schema structure, type field, properties, required fields.
Token Efficiency25%How many tokens does the tool definition consume? Every token counts against the agent's context window. ≤80 tokens = optimal.
Description Quality20%Is the description clear, concise, and actionable? Length (50-200 chars optimal), action verbs, naming conventions.
Security15%Prompt injection patterns, suspicious language, security keywords. No runtime sandbox yet.
Install Reliability10%Can we detect the install method? HTTP endpoint vs npm/pip/go/cargo. Clear install instructions help.

Grade Mapping

Score RangeGradeInterpretation
92-100A+Exceptional quality across all dimensions
82-91AStrong quality, minor improvements possible
74-81B+Good quality, some dimensions need attention
66-73BSolid, with clear improvement areas
55-65CAverage — functional but needs work
40-54DBelow average — significant issues
0-39FCritical issues — not recommended for agents

2. Trust Score (Real-World Performance)

Derived from actual agent usage. Starts at 50 (neutral baseline) and improves as agents report success.

SignalWeightSource
Success Rate40%Reported success/failure from agent calls
Recency25%How recently was the tool used? Active tools score higher.
Consistency20%Stability of success rate + user ratings
Community15%Usage volume + GitHub engagement

3. Agent Signals (Discovery Metadata)

Signals that help agents decide whether to trust a tool before calling it.

SignalWhat it means
Is OfficialMaintained by the service provider or platform organization
GitHub StarsCommunity endorsement (log-scale scoring)
Activity StatusActive (≤30d), Maintained (≤180d), Stale (≤365d), Abandoned (>365d)
Community ScoreComposite: stars (50%) + activity (35%) + official bonus (15%)

4. Discrepancy Flag

When quality and trust contradict each other, we flag it:

5. Continuous Improvement

Scores are recalculated periodically. Improving your tool's schemas, descriptions, or install documentation directly improves your Quality Score. Real-world agent usage improves your Trust Score. We're actively calibrating based on ecosystem feedback.

Last updated: 2026-06-08