Abstract
We present the Meridian scoring framework for evaluating external data sources consumed by AI agents during inference-time operations. Meridian scores across four dimensions using a weighted geometric mean: Scarcity via sigmoid normalization of alternative source availability, Quality via composite accuracy-completeness-freshness with exponential decay, Decision Impact via KL divergence of agent decisions with and without the data source, and Defensibility via compliance coverage assessment. The non-compensatory aggregation ensures that strength in one dimension cannot offset critical weakness in another.
Challenge
Traditional data quality frameworks were designed for a world in which humans consume data through reports, dashboards, and analytical workflows [1]. In that world, data quality is primarily a matter of accuracy, completeness, timeliness, and consistency, evaluated against the needs of human analysts who can exercise judgment about data limitations. The agentic era introduces a fundamentally different consumption pattern: autonomous AI agents consume data during inference-time operations, often without human review, and use that data to make or recommend decisions in real time.
This shift in consumption pattern invalidates several assumptions that underpin traditional data quality metrics [2]. First, traditional metrics treat quality as an intrinsic property of the data, independent of how it is used. In the agentic context, quality is inseparable from decision impact: a data source with perfect accuracy but zero influence on agent behavior has zero functional quality, while a noisy source that drives critical decisions has quality implications far exceeding what its accuracy score suggests. Second, traditional metrics do not account for substitutability: the quality significance of a data source depends on whether alternative sources carrying the same information exist.
Third, traditional frameworks ignore the regulatory and legal dimensions that are increasingly relevant when data flows through AI decision-making systems [3]. A data source may be accurate, complete, and timely but lack the licensing terms, provenance documentation, or consent basis required for use in automated decision-making under applicable regulations. This defensibility dimension is absent from traditional data quality frameworks because human-mediated data consumption typically operates within established institutional processes that handle compliance separately from quality assessment.
The Meridian framework addresses these challenges by defining data quality in the agentic era as a four-dimensional construct: Scarcity (how unique is this data source?), Quality (how accurate, complete, and fresh is the data?), Decision Impact (how much does this source affect agent behavior?), and Defensibility (can this data source be used in automated decision-making under applicable regulations?). Each dimension captures an aspect of data value that is essential in the agentic context and that traditional frameworks either miss entirely or treat as a separate concern.
Architecture
The Scarcity dimension measures the uniqueness of a data source relative to available alternatives. For a data source d, let A(d) denote the number of alternative sources that provide substantially equivalent information, as determined by content similarity analysis. The scarcity score is computed using a sigmoid normalization [4]: S(d) = 1 / (1 + e^(k * (A(d) - m))), where k controls the steepness of the transition and m is the inflection point. This formulation produces a score near 1.0 for data sources with very few alternatives and a score near 0.0 for data sources with many alternatives, with a smooth transition in between. The sigmoid function is preferred over linear normalization because it correctly captures the nonlinear relationship between alternative availability and marginal scarcity value.
The Quality dimension is a composite of three sub-dimensions: accuracy, completeness, and freshness [1]. Accuracy measures the degree to which the data reflects the true state of the phenomenon it represents, assessed through cross-validation against independent sources or ground-truth datasets. Completeness measures the proportion of expected data elements that are present and non-null, assessed against a schema-defined expectation. Freshness measures the temporal currency of the data relative to the phenomenon it describes, incorporating an exponential decay function [5]: F(t) = e^(-lambda * t), where t is the time since the last update and lambda is a domain-specific decay constant that reflects how quickly the information loses relevance. The composite quality score is the geometric mean of accuracy, completeness, and freshness [6], ensuring that severe deficiency in any sub-dimension is not masked by strength in others.
The Decision Impact dimension measures the degree to which a data source affects the agent's output distribution during inference. This dimension is computed using the counterfactual KL-divergence protocol described in our companion paper: D(d) = C(d) * D_KL(P(y | S) || P(y | S \ {d})), where C(d) is the binary criticality gate and D_KL is the Kullback-Leibler divergence [7] between the agent's output distributions with and without access to source d. Decision Impact is the dimension that most sharply distinguishes Meridian from traditional data quality frameworks, because it grounds quality assessment in the actual behavioral consequences of data consumption.
The Defensibility dimension assesses the legal and regulatory basis for using a data source in automated decision-making [3]. This dimension evaluates four compliance vectors: licensing terms (does the license permit use in AI inference?), provenance documentation (is the data lineage traceable to its origin?), consent basis (is there a valid legal basis for processing under applicable data protection regulations?), and regulatory alignment (does the data source meet the requirements of sector-specific regulations such as the EU AI Act [8], financial services rules, or healthcare data regulations?). The defensibility score is the minimum of the four compliance vector scores, reflecting the principle that compliance is determined by the weakest link: a data source that satisfies three of four compliance requirements but fails the fourth is not defensible.
Implementation
The four dimension scores are aggregated into a single Meridian score using a weighted geometric mean [6]: M(d) = (S(d)^w_s * Q(d)^w_q * I(d)^w_i * D(d)^w_d)^(1 / (w_s + w_q + w_i + w_d)), where w_s, w_q, w_i, and w_d are the dimension weights. The geometric mean is selected over the arithmetic mean for a fundamental mathematical property: it is non-compensatory, meaning that a zero score in any dimension produces a zero composite score regardless of the scores in other dimensions [9]. This property is essential for data quality assessment because certain failures should be disqualifying: a data source with zero defensibility should receive a zero Meridian score regardless of its scarcity, quality, or decision impact.
The sigmoid normalization for the Scarcity dimension requires calibration of two parameters: the steepness k and the inflection point m [4]. We calibrate these parameters empirically using a reference dataset of data sources with known alternative counts, setting m to the median number of alternatives across the reference population and k to produce a score differential of 0.8 between sources at the 10th and 90th percentiles of alternative availability. This calibration procedure ensures that the sigmoid function produces meaningful discrimination across the observed range of scarcity values while remaining robust to outliers.
The exponential decay function for freshness requires calibration of the decay constant lambda for each data domain [5]. We define lambda in terms of the half-life h of the data: lambda = ln(2) / h, where h is the time period after which the data retains half its freshness value. Half-life is set based on domain-specific analysis of information velocity. Real-time market data may have a half-life of minutes, company financial data may have a half-life of weeks, and demographic data may have a half-life of months. This parameterization makes the freshness model transparent and interpretable while accommodating the wide variation in temporal sensitivity across data domains.
The implementation architecture is designed for three operational modes. Transaction-grade scoring operates at under 100 milliseconds, providing Meridian scores during agent inference-time tool calls. In this mode, Decision Impact is computed using the real-time KL-divergence protocol [7], while Scarcity, Quality, and Defensibility are retrieved from cached assessments that are updated asynchronously. Monitoring-grade scoring operates at seconds to minutes, providing comprehensive Meridian assessments that include fresh computation of all four dimensions. Assessment-grade scoring operates at hours to days, providing deep evaluations that include manual compliance review, extended cross-validation against multiple ground-truth sources, and sensitivity analysis of the scoring parameters.
We validate the Meridian framework using a controlled evaluation across 200 data sources consumed by financial analysis agents. The validation protocol compares Meridian scores against expert assessments of data source value, measured by the degree to which expert analysts agree with the relative ranking produced by the Meridian scores [10]. The rank correlation between Meridian scores and expert rankings is 0.87, indicating strong agreement. Ablation analysis confirms the importance of each dimension: removing any single dimension from the composite score reduces the rank correlation to below 0.75, demonstrating that all four dimensions contribute essential information to the overall assessment.
Applications
The Meridian framework enables several applications that are not possible with traditional data quality metrics. The most immediate application is quality-aware agent infrastructure: AI agents that can evaluate the quality of their own data inputs in real time and adjust their behavior accordingly [11]. An agent with access to Meridian scores can implement quality thresholds that prevent it from acting on low-quality data, can disclose the quality profile of the data underlying its recommendations, and can flag situations where critical data sources have degraded below acceptable levels.
A second application is data marketplace governance. As data exchanges and marketplace platforms emerge to serve the growing demand for AI-consumable data [12], Meridian scores provide a standardized quality signal that enables efficient market operation. Buyers can filter and rank data sources by their Meridian profiles, sellers can differentiate their offerings based on verified quality characteristics, and marketplace operators can establish minimum quality thresholds for listed sources. The four-dimensional structure of Meridian is particularly valuable in this context because it enables buyers to weight dimensions according to their specific needs: a buyer in a highly regulated industry may prioritize Defensibility, while a buyer seeking competitive advantage may prioritize Scarcity.
A third application is supply chain risk management for data-dependent AI operations. Organizations that rely on AI agents for critical business processes can use Meridian scores to monitor the quality of the data sources feeding those agents over time. Degradation in any dimension triggers alerts that enable proactive intervention before quality issues affect business outcomes. The temporal tracking of Meridian scores across the four dimensions also enables trend analysis that can identify systemic quality issues, such as a gradual decline in freshness indicating that a data provider has reduced its update frequency.
Finally, Meridian scores provide the quantitative foundation for regulatory compliance in the agentic era. Regulations such as the EU AI Act [8] require organizations to ensure the quality of data used in AI systems, but they do not specify how quality should be measured. Meridian provides a defensible, transparent, and reproducible methodology that organizations can use to demonstrate compliance with data quality requirements. The inclusion of the Defensibility dimension within the scoring framework itself ensures that compliance is not treated as a separate concern but is integrated into the core quality assessment.
References
- Wang, R. Y., & Strong, D. M. (1996). Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems, 12(4), 5-33.
- Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data Quality Assessment. Communications of the ACM, 45(4), 211-218.
- European Parliament and Council. (2024). Regulation (EU) 2024/1689 Laying Down Harmonised Rules on Artificial Intelligence (AI Act). Official Journal of the European Union.
- Han, J., & Moraga, C. (1995). The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning. In J. Mira & F. Sandoval (Eds.), From Natural to Artificial Neural Computation, Lecture Notes in Computer Science, 930, 195-201. Springer.
- Barlow, R. E., & Proschan, F. (1975). Statistical Theory of Reliability and Life Testing: Probability Models. Holt, Rinehart and Winston.
- Hardy, G. H., Littlewood, J. E., & Pólya, G. (1952). Inequalities (2nd ed.). Cambridge University Press.
- Kullback, S., & Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1), 79-86.
- European Commission. (2021). Proposal for a Regulation Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). COM(2021) 206 final.
- Fleming, P. J., & Wallace, J. J. (1986). How Not to Lie with Statistics: The Correct Way to Summarize Benchmark Results. Communications of the ACM, 29(3), 218-221.
- Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81-93.
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
- Agarwal, A., Dahleh, M., & Sarkar, T. (2019). A Marketplace for Data: An Algorithmic Solution. Proceedings of the 2019 ACM Conference on Economics and Computation, 701-726.