The Credit Score for Autonomous Systems: Recursive Trust Scoring with Sybil Resistance

Abstract

We present Fidelity, a behavioral trust framework that functions as a credit score for autonomous agents. Fidelity measures behavioral consistency via peer-cohort normalized variance, contract fulfillment via stake-weighted success rates, reputation via multi-party feedback, and anomaly freedom via exponential penalty functions. The defining innovation is recursive trust: reputation feedback is weighted by the submitter's own Fidelity score, creating natural Sybil resistance where low-trust entities' assessments carry minimal weight in the network.

Problem

Trust is the foundational resource of economic exchange. In human-to-human transactions, trust is built through repeated interaction, social networks, institutional affiliation, and legal accountability ^[1]. In human-to-software transactions, trust is mediated through brands, warranties, regulatory frameworks, and the legal liability of the software vendor. But in agent-to-agent transactions, where autonomous systems negotiate, transact, and collaborate without direct human supervision, none of these traditional trust mechanisms apply. The autonomous agent has no social network in the human sense, no brand reputation that took decades to build, and no personal liability for breach of contract. The absence of these trust anchors does not eliminate the need for trust; it creates a demand for new mechanisms that can establish, measure, and communicate trustworthiness in computational terms.

The FICO credit score provides the closest existing analogy ^[2]. Before FICO, lending decisions were subjective assessments made by individual loan officers based on personal judgment, local knowledge, and implicit biases. The credit score transformed lending by creating a standardized, numerical representation of creditworthiness that could be computed, communicated, and compared at scale. The score did not replace the underlying reality of borrower behavior; it provided a standardized lens through which that behavior could be evaluated. The result was a dramatic expansion of credit markets, a reduction in discriminatory lending practices, and the creation of a feedback loop in which credit scores themselves influenced borrower behavior.

Autonomous agents need an analogous mechanism. When a purchasing agent selects among competing supplier agents, it needs a standardized assessment of each supplier's behavioral trustworthiness. When an orchestration agent delegates subtasks to specialized agents, it needs confidence that the delegated agents will perform as expected. When a human principal grants autonomy to an agent, the principal needs a quantitative measure of how reliably the agent has behaved in the past ^[3]. The Fidelity framework provides this measure by computing a behavioral trust score analogous to a credit score, based on observed behavioral patterns rather than self-reported capabilities or organizational reputation.

The challenge of computing behavioral trust for autonomous agents is complicated by the adversarial environment in which agents operate. Unlike human credit scoring, where the population of borrowers is largely non-adversarial, the population of autonomous agents may include deliberately deceptive entities. An agent might behave impeccably during an evaluation period to build a high trust score, then exploit that trust in a subsequent period ^[4]. An agent might create multiple synthetic identities to generate fake positive feedback ^[5]. A coalition of agents might collude to boost each other's trust scores through reciprocal positive assessments. Any trust scoring framework that does not account for these adversarial dynamics will be systematically gamed.

Framework Design

Fidelity measures behavioral trust across four dimensions, each capturing a distinct aspect of trustworthy agent behavior. The first dimension, behavioral consistency, measures the variance of an agent's behavior relative to its peer cohort. Rather than defining absolute standards for correct behavior, Fidelity normalizes behavioral variance against the distribution of behaviors observed in agents performing similar functions. An agent whose response times, decision patterns, and output characteristics fall within the expected range for its peer group receives a high consistency score. An agent whose behavior fluctuates unpredictably, even if each individual action falls within acceptable bounds, receives a lower score reflecting the unpredictability that makes it a less reliable interaction partner.

The second dimension, contract fulfillment, measures the agent's success rate in completing committed obligations, weighted by the stakes involved. A trading agent that fulfills 99% of small trades but defaults on 20% of large trades should not receive the same fulfillment score as an agent with a uniform 99% completion rate. Stake-weighting ensures that the fulfillment score reflects the economic significance of the agent's reliability pattern, not merely the frequency of success. The stakes are determined by the declared value of each transaction, verified where possible against external records, and weighted using a logarithmic function that prevents extreme outliers from dominating the score.

The third dimension, reputation, aggregates multi-party feedback from entities that have interacted with the agent ^[6]. Each interaction counterparty can submit a structured assessment covering dimensions including timeliness, accuracy, responsiveness, and fairness. These assessments are aggregated into a reputation component that reflects the collective experience of the agent's interaction partners. Crucially, reputation feedback is not weighted equally; it is weighted by the submitter's own Fidelity score, implementing the recursive trust mechanism described in the Validation section.

The fourth dimension, anomaly freedom, measures the absence of behavioral patterns that indicate malfunction, deception, or compromise. Rather than rewarding the presence of positive behaviors, anomaly freedom penalizes the detection of negative patterns using an exponential penalty function ^[7]. A single minor anomaly, such as an unexplained latency spike, produces a small penalty. Multiple anomalies or severe anomalies, such as responses that contradict the agent's declared decision logic, produce exponentially increasing penalties that rapidly collapse the anomaly freedom score. The exponential function ensures that a pattern of anomalous behavior is treated as substantially more concerning than a single isolated event.

The four dimensions are combined into a composite Fidelity score using a geometric mean aggregation ^[8]. The geometric mean is appropriate for trust measurement because trust is non-compensatory: an agent with perfect consistency but terrible contract fulfillment should not receive a moderate trust score. The geometric mean ensures that weakness in any single dimension disproportionately affects the composite score, reflecting the practical reality that a single dimension of untrustworthiness undermines the value of trustworthiness in all other dimensions.

Scoring

The geometric mean aggregation produces a composite Fidelity score on the Amplitude 0-100 scale that exhibits the non-compensatory property essential for trust measurement. Consider an agent with dimension scores of consistency: 85, fulfillment: 90, reputation: 80, and anomaly freedom: 20. The arithmetic mean would produce a score of 69, suggesting moderate trustworthiness. The geometric mean produces approximately 56, a substantially lower score that better reflects the reality that a serious anomaly problem undermines overall trust regardless of performance on other dimensions ^[9]. When the anomaly freedom score drops to 5, indicating severe behavioral irregularities, the geometric mean collapses to approximately 30, while the arithmetic mean would still return 65. This collapse behavior is a feature, not a bug: it ensures that Fidelity scores correctly communicate the practical untrustworthiness of agents with severe weaknesses.

The weighting of dimensions within the geometric mean reflects domain-specific priorities that can be configured for different agent categories. For financial trading agents, contract fulfillment receives the highest weight because the economic consequences of unfulfilled obligations are direct and measurable. For healthcare advisory agents, behavioral consistency receives elevated weight because inconsistent medical advice creates risks that transcend individual transaction economics. For data processing agents, anomaly freedom receives elevated weight because anomalous behavior may indicate data exfiltration or corruption. The default equal-weight configuration serves as a domain-neutral baseline.

Score trajectories over time reveal behavioral patterns that point-in-time scores cannot capture. An agent whose Fidelity score has been monotonically increasing over six months presents a different risk profile than an agent whose score has oscillated between 60 and 80 during the same period, even if both agents currently score 75. Fidelity therefore records score history and computes trajectory metrics including trend direction, volatility, and mean-reversion characteristics. These trajectory metrics are available as supplementary information alongside the point-in-time score, enabling counterparties to make interaction decisions informed by behavioral trends as well as current status.

The scoring mechanism incorporates a cold-start protocol for newly deployed agents that have insufficient behavioral history for reliable score computation ^[10]. During the cold-start period, the agent receives a provisional Fidelity score that is explicitly flagged as provisional and is computed from the limited data available, supplemented by priors derived from the agent's Provenance score and the behavioral distributions of its peer cohort. The provisional score converges to a full score as behavioral data accumulates, with the convergence rate proportional to the volume and diversity of observed interactions.

Validation

The recursive trust mechanism is Fidelity's primary defense against reputation manipulation. In a naive reputation system, feedback from all submitters is weighted equally, which creates a vulnerability: an adversary can deploy multiple low-quality agents that submit positive feedback for a target agent, inflating its reputation score through sheer volume. This is the Sybil attack pattern ^[5], named for the psychiatric case study of multiple personalities, and it is the most common attack vector against distributed trust systems. Blockchain-based reputation systems have attempted to address this through proof-of-stake mechanisms, but these impose economic barriers that are orthogonal to behavioral trustworthiness.

Fidelity addresses Sybil attacks through a recursive weighting scheme in which each reputation submission is weighted by the submitter's own Fidelity score ^[11]. A reputation assessment from an agent with a Fidelity score of 90 carries 9 times the weight of an assessment from an agent with a Fidelity score of 10. This creates a natural defense: an adversary deploying multiple Sybil agents to manipulate reputation must first build high Fidelity scores for those Sybil agents, which requires sustained trustworthy behavior across all four dimensions, which defeats the purpose of the attack. The recursive structure means that trust assessments from the most trustworthy agents carry the most weight, and assessments from untrusted agents are naturally marginalized without explicit blacklisting.

We validate the Sybil resistance properties through adversarial simulation in which coalitions of varying sizes attempt to inflate a target agent's reputation. The simulations demonstrate that a coalition of 10 Sybil agents with Fidelity scores of 15 each produces less reputation impact than a single legitimate assessment from an agent with a Fidelity score of 80. A coalition of 50 Sybil agents at the same score level produces approximately the same impact as two legitimate assessments. The cost-benefit analysis shows that maintaining a Sybil coalition large enough to meaningfully affect reputation requires sustained behavioral investment that far exceeds the return from reputation manipulation ^[12].

The recursive mechanism creates a bootstrapping challenge: how do agents accumulate reputation when there are no trusted assessors to provide weighted feedback? Fidelity addresses this through the cold-start protocol, which weights early-stage reputation assessments by the submitter's Provenance score rather than their Fidelity score. As the ecosystem matures and more agents accumulate Fidelity histories, the weighting transitions from Provenance-based to Fidelity-based, creating a natural progression from identity-based trust to behavior-based trust ^[13]. This transition is gradual and automatic, requiring no manual intervention or ecosystem governance decisions.

We further validate the framework through analysis of score stability under adversarial conditions. In scenarios where an agent builds a high Fidelity score through 1,000 consistent interactions and then begins behaving anomalously, the exponential penalty function in the anomaly freedom dimension produces rapid score degradation that is proportional to the severity and frequency of anomalies ^[7]. A single minor anomaly reduces the composite score by 2-5 points; a pattern of severe anomalies collapses the score below 40 within 20 interactions. This responsiveness ensures that historically trustworthy agents cannot indefinitely exploit their accumulated trust to mask behavioral degradation, while the graduated response prevents single-event false positives from destroying legitimately earned trust.

References

Gambetta, D. (2000). Can We Trust Trust? In D. Gambetta (Ed.), Trust: Making and Breaking Cooperative Relations (pp. 213-237). University of Oxford.
Mays, E. (2004). Credit Scoring for Risk Managers: The Handbook for Lenders. Thomson/South-Western.
Ross, S. A. (1973). The economic theory of agency: The principal's problem. American Economic Review, 63(2), 134-139.
Axelrod, R. (1984). The Evolution of Cooperation. Basic Books.
Douceur, J. R. (2002). The Sybil Attack. Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS), 251-260.
Resnick, P., Kuwabara, K., Zeckhauser, R., & Friedman, E. (2000). Reputation systems. Communications of the ACM, 43(12), 45-48.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), Article 15.
Bullen, P. S. (2003). Handbook of Means and Their Inequalities. Kluwer Academic Publishers.
Hardy, G. H., Littlewood, J. E., & Polya, G. (1952). Inequalities (2nd ed.). Cambridge University Press.
Schein, A. I., Popescul, A., Ungar, L. H., & Pennock, D. M. (2002). Methods and metrics for cold-start recommendations. Proceedings of the 25th Annual International ACM SIGIR Conference, 253-260.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab Technical Report.
Levine, B. N., Shields, C., & Margolin, N. B. (2006). A survey of solutions to the Sybil attack. University of Massachusetts Amherst Technical Report 2006-052.
Abdul-Rahman, A., & Hailes, S. (2000). Supporting trust in virtual communities. Proceedings of the 33rd Hawaii International Conference on System Sciences, 1-9.