Cross-Index Intelligence: How Patterns Across Frameworks Reveal What Single Scores Cannot

Abstract

We present a systematic analysis of emergent intelligence patterns that arise when scores from multiple Amplitude frameworks are examined jointly. High Fidelity paired with low Drift reveals compliant but misaligned agents. Elevated Cascade risk alongside depressed Harmony signals fragile concentrated markets. These cross-framework patterns surface systemic insights invisible to any individual measurement instrument.

Background

The history of quantitative measurement across scientific and industrial domains reveals a consistent pattern: individual measurement instruments provide necessary but insufficient insight, and the most valuable analytical intelligence emerges from the joint examination of multiple instruments. In medicine, no single biomarker provides a complete picture of patient health; the diagnostic power comes from analyzing panels of biomarkers in combination ^[1], where specific patterns across markers reveal conditions that no individual marker can identify. An elevated white blood cell count paired with fever suggests infection, while the same elevated count without fever suggests a different set of conditions. The markers individually are ambiguous; their combination is diagnostic.

Financial analysis follows the same principle. No single financial ratio, whether price-to-earnings, debt-to-equity, or return on assets, provides a reliable assessment of a company's financial health ^[2]. Analysts use ratio analysis specifically because the patterns across ratios reveal conditions that individual ratios cannot: a high P/E ratio paired with declining revenue growth suggests overvaluation, while the same P/E ratio paired with accelerating revenue growth suggests market confidence in future earnings. The Altman Z-score ^[3], one of the most successful bankruptcy prediction models, derives its predictive power not from any individual financial ratio but from the specific weighted combination of five ratios that captures the multi-dimensional nature of financial distress.

The Amplitude measurement system comprises ten scoring frameworks, each measuring a distinct aspect of AI impact. Individually, each framework provides a score on a 0-100 scale that quantifies performance on its specific measurement dimension. Fidelity measures alignment preservation. Drift measures objective deviation. Cascade measures systemic risk. Harmony measures competitive dynamics. Torque measures economic efficiency. Meridian measures data quality. Equity measures fairness. Provenance measures supply chain integrity. Oversight measures human control retention. Resilience measures operational robustness. Each of these scores is informative in isolation, but the thesis of this paper is that the most actionable intelligence emerges from examining specific score patterns across multiple frameworks.

The challenge of cross-framework analysis is combinatorial. With ten frameworks, there are 45 pairwise combinations, 120 three-way combinations, and 210 four-way combinations. Not all combinations are equally informative; most pairs of frameworks measure sufficiently independent phenomena that their joint analysis adds little beyond what the individual scores reveal. The contribution of this paper is to identify the specific cross-framework patterns that yield emergent insights, those where the joint interpretation is qualitatively different from and more informative than the individual interpretations, and to provide a systematic methodology for discovering additional patterns as the measurement system matures.

Approach

The cross-framework pattern identification methodology proceeds in three stages: statistical screening, causal analysis, and interpretive validation. In the statistical screening stage, we compute the joint distribution of scores across all 45 framework pairs using data from 1,247 agent systems evaluated over a six-month period. For each pair, we identify score combinations that occur more frequently or less frequently than would be expected under independence. Specifically, we divide each framework's score range into quintiles and construct a 5x5 contingency table ^[4] for each pair. Cells with observed frequencies that exceed the expected frequency by more than two standard deviations represent statistically significant co-occurrence patterns that warrant further investigation.

The statistical screening identifies 23 significant co-occurrence patterns across the 45 framework pairs. These 23 patterns are then subjected to causal analysis, which examines whether the co-occurrence reflects a genuine causal or structural relationship between the measured phenomena or is merely a statistical artifact of shared confounders. Causal analysis uses a combination of domain expertise and conditional independence testing ^[5]. For each significant pattern, we identify potential confounders (system size, deployment duration, industry sector) and test whether the co-occurrence persists after conditioning on these confounders. Patterns that survive conditional independence testing are classified as structurally significant.

Of the 23 statistically significant patterns, 14 survive causal analysis and are classified as structurally significant. These 14 patterns are then subjected to interpretive validation, where domain experts assess whether the pattern has a meaningful interpretation that provides actionable intelligence beyond what the individual framework scores convey. The interpretive validation criteria require that the pattern (a) admits a coherent causal or mechanistic explanation, (b) suggests a specific diagnostic conclusion that is not available from either framework individually, and (c) implies a specific corrective action or monitoring priority. Six of the fourteen structurally significant patterns meet all three interpretive validation criteria and are documented as the primary cross-framework intelligence patterns.

The methodology also includes a sensitivity analysis that tests the robustness of the identified patterns across subpopulations of the evaluation dataset. Each pattern is re-analyzed separately for agent systems in financial services, healthcare, general enterprise, and consumer applications. Patterns that are significant in the full dataset but fail to replicate in any subpopulation are flagged as potentially driven by compositional effects rather than genuine cross-framework dynamics ^[6]. All six primary patterns replicate across at least three of the four subpopulations, providing confidence in their generalizability.

Findings

The first and most diagnostically powerful cross-framework pattern pairs Fidelity scores with Drift scores. High Fidelity (above 80) paired with low Drift (below 40) identifies agents that are formally compliant with their specified objectives but are systematically deviating from the human principal's actual intent. This pattern arises when the formal specification of the agent's objective is incomplete or misaligned with the principal's true preferences. The agent faithfully executes its specification (high Fidelity) but the specification itself drifts from the principal's intent over time (low Drift score, indicating high deviation). This pattern is invisible to either framework individually: Fidelity alone would suggest a well-aligned agent, and Drift alone would suggest a misaligned one, but neither would identify the root cause as specification inadequacy rather than agent malfunction.

The second pattern pairs Cascade risk with Harmony scores. Elevated Cascade risk (above 60) alongside depressed Harmony (below 40) signals fragile concentrated markets where systemic risk and market power reinforce each other ^[7]. In concentrated markets, the failure of a dominant agent simultaneously removes a large fraction of market capacity and triggers cascade effects through the dense dependency network that forms around dominant providers. This pattern has been observed in three production agent ecosystems and in each case preceded a significant service disruption. The cross-framework signal emerged 4-6 weeks before the disruption, providing a potential early warning window that neither Cascade nor Harmony scores alone would have opened.

The third pattern pairs Equity scores with Torque scores. High Equity (above 75) paired with low Torque (below 45) identifies markets that are achieving distributional fairness at the cost of economic efficiency ^[8]. This pattern typically arises when fairness constraints imposed on agent behavior create allocative inefficiencies: the fair allocation is not the efficient allocation, and the gap between them reduces total welfare. The pattern does not imply that fairness constraints should be relaxed; rather, it identifies situations where the specific implementation of fairness constraints may be unnecessarily costly and where alternative implementations could achieve the same distributional goals with less efficiency loss.

The fourth pattern pairs Oversight scores with delegation chain depth (derived from the Fidelity framework's delegation tracking). Low Oversight (below 50) paired with delegation depths exceeding five hops identifies agent systems where human control has been effectively lost even though the system formally maintains human-in-the-loop architecture ^[9]. The human oversight mechanisms exist but operate at the top of a delegation chain so deep that the human's ability to influence the executing agent's behavior is negligible. This pattern is particularly concerning because it represents a form of alignment theater: the organization can point to human oversight processes while the actual agent behavior is determined by sub-delegation dynamics that the human overseer cannot observe or control.

The fifth and sixth patterns involve three-way combinations. The fifth pattern, High Fidelity + High Cascade + Low Resilience, identifies agent systems that are well-aligned and deeply embedded in the network topology but operationally fragile, meaning that their failure would be both devastating (high Cascade) and likely (low Resilience). The sixth pattern, Low Drift + Low Harmony + High Torque, identifies efficient but poorly competitive markets where agents are well-aligned with their principals but operate in a market structure that suppresses competition and innovation ^[10]. These three-way patterns illustrate the combinatorial depth of cross-framework intelligence: each component score is individually unremarkable, but the specific three-way combination reveals a systemic condition that demands attention.

Implications

The existence of emergent cross-framework intelligence patterns has profound implications for the design and deployment of AI impact measurement systems. The most important implication is that measurement frameworks should be designed not only for individual informativeness but also for joint informativeness ^[11]. A framework that is moderately informative in isolation but produces high-value cross-framework patterns when paired with other frameworks may be more valuable than a framework that is highly informative in isolation but adds no cross-framework intelligence. This design principle suggests that framework selection should consider the information-theoretic properties of framework combinations, not just individual frameworks.

The practical deployment implication is that organizations should implement cross-framework monitoring dashboards that surface the six primary patterns automatically, rather than relying on analysts to manually examine pairwise score combinations. A monitoring system that alerts when Fidelity exceeds 80 while Drift falls below 40, or when Cascade exceeds 60 while Harmony falls below 40, provides early warning of systemic conditions that would otherwise go undetected until they manifest as incidents. The false positive rate for these alerts, estimated from our validation dataset, is below 8%, making them practical for deployment in production monitoring environments.

The cross-framework patterns also have implications for regulatory design. Current AI regulations tend to impose requirements along single dimensions: fairness requirements, transparency requirements, safety requirements, oversight requirements ^[12]. The cross-framework patterns demonstrate that single-dimension regulatory requirements can produce unintended consequences that are only visible through multi-dimensional analysis. A fairness requirement that achieves high Equity scores but drives Torque scores below viable levels may be self-defeating if the resulting market inefficiency causes the agent ecosystem to contract, reducing access to AI services for the populations the fairness requirement was intended to protect. Regulators should consider multi-dimensional impact assessments that examine the cross-framework consequences of single-dimension requirements.

The methodology for discovering cross-framework patterns is itself a contribution that extends beyond the specific patterns documented in this paper. As the Amplitude measurement system matures and accumulates more evaluation data, the statistical screening pipeline will identify additional co-occurrence patterns that meet the causal and interpretive validation criteria ^[13]. The methodology is designed to be applied iteratively, with each new batch of evaluation data producing candidate patterns that are tested against the existing pattern library. This iterative discovery process means that the cross-framework intelligence capability improves over time as the evaluation dataset grows, creating a positive feedback loop between measurement deployment and measurement value.

Finally, the cross-framework patterns provide a foundation for the development of composite meta-indices that aggregate information from multiple frameworks into higher-order indicators ^[14]. A systemic fragility index that combines Cascade, Harmony, and Resilience scores according to the weights implied by the cross-framework patterns would provide a more informative and actionable measure of ecosystem health than any of the three component scores. The development of such meta-indices is a natural extension of the cross-framework intelligence methodology, translating pattern-level insights into scalar scores that can be monitored, thresholded, and incorporated into governance processes with the same rigor applied to the individual framework scores.

References

Rifai, N., & Ridker, P. M. (2001). Proposed Cardiovascular Risk Assessment Algorithm Using High-Sensitivity C-Reactive Protein and Lipid Screening. Clinical Chemistry, 47(1), 28-30.
Penman, S. H. (2013). Financial Statement Analysis and Security Valuation (5th ed.). McGraw-Hill.
Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. Journal of Finance, 23(4), 589-609.
Agresti, A. (2002). Categorical Data Analysis (2nd ed.). John Wiley & Sons.
Dawid, A. P. (1979). Conditional Independence in Statistical Theory. Journal of the Royal Statistical Society: Series B, 41(1), 1-31.
Simpson, E. H. (1951). The Interpretation of Interaction in Contingency Tables. Journal of the Royal Statistical Society: Series B, 13(2), 238-241.
Haldane, A. G., & May, R. M. (2011). Systemic Risk in Banking Ecosystems. Nature, 469(7330), 351-355.
Okun, A. M. (1975). Equality and Efficiency: The Big Tradeoff. Brookings Institution Press.
Shneiderman, B. (2020). Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy. International Journal of Human-Computer Interaction, 36(6), 495-504.
Schumpeter, J. A. (1942). Capitalism, Socialism and Democracy. Harper & Brothers.
Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423.
European Parliament and Council. (2024). Regulation (EU) 2024/1689 Laying Down Harmonised Rules on Artificial Intelligence (AI Act). Official Journal of the European Union.
Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
Freudenberg, M. (2003). Composite Indicators of Country Performance: A Critical Assessment. OECD Science, Technology and Industry Working Papers, 2003/16.