How we evaluate MCP tools — transparent for builders, actionable for agents.
| Grade | Count | % | Distribution |
|---|---|---|---|
| B | 7,187 | 18.1% | |
| B+ | 1,123 | 2.8% | |
| C | 21,464 | 54.0% | |
| C+ | 5,477 | 13.8% | |
| D | 4,351 | 10.9% | |
| F | 160 | 0.4% |
Total tools scored: 39,762
Five dimensions, automatically evaluated from the tool definition. Weighted composite: 0-100.
| Dimension | Weight | What we measure |
|---|---|---|
| Schema Correctness | 30% | Does the tool have a valid input/output schema? JSON Schema structure, type field, properties, required fields. |
| Token Efficiency | 25% | How many tokens does the tool definition consume? Every token counts against the agent's context window. ≤80 tokens = optimal. |
| Description Quality | 20% | Is the description clear, concise, and actionable? Length (50-200 chars optimal), action verbs, naming conventions. |
| Security | 15% | Prompt injection patterns, suspicious language, security keywords. No runtime sandbox yet. |
| Install Reliability | 10% | Can we detect the install method? HTTP endpoint vs npm/pip/go/cargo. Clear install instructions help. |
| Score Range | Grade | Interpretation |
|---|---|---|
| 92-100 | A+ | Exceptional quality across all dimensions |
| 82-91 | A | Strong quality, minor improvements possible |
| 74-81 | B+ | Good quality, some dimensions need attention |
| 66-73 | B | Solid, with clear improvement areas |
| 55-65 | C | Average — functional but needs work |
| 40-54 | D | Below average — significant issues |
| 0-39 | F | Critical issues — not recommended for agents |
Derived from actual agent usage. Starts at 50 (neutral baseline) and improves as agents report success.
| Signal | Weight | Source |
|---|---|---|
| Success Rate | 40% | Reported success/failure from agent calls |
| Recency | 25% | How recently was the tool used? Active tools score higher. |
| Consistency | 20% | Stability of success rate + user ratings |
| Community | 15% | Usage volume + GitHub engagement |
Signals that help agents decide whether to trust a tool before calling it.
| Signal | What it means |
|---|---|
| Is Official | Maintained by the service provider or platform organization |
| GitHub Stars | Community endorsement (log-scale scoring) |
| Activity Status | Active (≤30d), Maintained (≤180d), Stale (≤365d), Abandoned (>365d) |
| Community Score | Composite: stars (50%) + activity (35%) + official bonus (15%) |
When quality and trust contradict each other, we flag it:
Scores are recalculated periodically. Improving your tool's schemas, descriptions, or install documentation directly improves your Quality Score. Real-world agent usage improves your Trust Score. We're actively calibrating based on ecosystem feedback.
Last updated: 2026-06-08