Scoring Methodology

How we evaluate MCP tools — transparent for builders, actionable for agents.

Current Grade Distribution

Grade	Count	%
B	7,187	18.1%
B+	1,123	2.8%
C	21,464	54.0%
C+	5,477	13.8%
D	4,351	10.9%
F	160	0.4%

Total tools scored: 39,762

1. Quality Score (Static Analysis)

Five dimensions, automatically evaluated from the tool definition. Weighted composite: 0-100.

Dimension	Weight	What we measure
Schema Correctness	30%	Does the tool have a valid input/output schema? JSON Schema structure, type field, properties, required fields.
Token Efficiency	25%	How many tokens does the tool definition consume? Every token counts against the agent's context window. ≤80 tokens = optimal.
Description Quality	20%	Is the description clear, concise, and actionable? Length (50-200 chars optimal), action verbs, naming conventions.
Security	15%	Prompt injection patterns, suspicious language, security keywords. No runtime sandbox yet.
Install Reliability	10%	Can we detect the install method? HTTP endpoint vs npm/pip/go/cargo. Clear install instructions help.

Grade Mapping

Score Range	Grade	Interpretation
92-100	A+	Exceptional quality across all dimensions
82-91	A	Strong quality, minor improvements possible
74-81	B+	Good quality, some dimensions need attention
66-73	B	Solid, with clear improvement areas
55-65	C	Average — functional but needs work
40-54	D	Below average — significant issues
0-39	F	Critical issues — not recommended for agents

2. Trust Score (Real-World Performance)

Derived from actual agent usage. Starts at 50 (neutral baseline) and improves as agents report success.

Signal	Weight	Source
Success Rate	40%	Reported success/failure from agent calls
Recency	25%	How recently was the tool used? Active tools score higher.
Consistency	20%	Stability of success rate + user ratings
Community	15%	Usage volume + GitHub engagement

3. Agent Signals (Discovery Metadata)

Signals that help agents decide whether to trust a tool before calling it.

Signal	What it means
Is Official	Maintained by the service provider or platform organization
GitHub Stars	Community endorsement (log-scale scoring)
Activity Status	Active (≤30d), Maintained (≤180d), Stale (≤365d), Abandoned (>365d)
Community Score	Composite: stars (50%) + activity (35%) + official bonus (15%)

4. Discrepancy Flag

When quality and trust contradict each other, we flag it:

Quality > Trust: Well-designed but unverified in production. Caution: may work on paper but not battle-tested.
Trust > Quality: Widely used despite design issues. Adoption paradox — works in practice but may have maintainability risks.

5. Continuous Improvement

Scores are recalculated periodically. Improving your tool's schemas, descriptions, or install documentation directly improves your Quality Score. Real-world agent usage improves your Trust Score. We're actively calibrating based on ecosystem feedback.

Last updated: 2026-06-08