ESG Ratings and Their Disagreements

ESG Rating Disagreements: Why Raters Disagree Up to 50% of the Time

Pomegra Learn

Why Do ESG Rating Agencies Disagree So Much?

The most counterintuitive finding in ESG data research is that major ESG rating agencies frequently assign substantially different ratings to the same company. While credit ratings from Moody's, S&P, and Fitch correlate above 0.99, ESG ratings from MSCI, Sustainalytics, Refinitiv, and others correlate at approximately 0.54 to 0.61 — about the correlation between two people's subjective rankings of the same movie. This is not a marginal disagreement about borderline companies; it is significant disagreement affecting how companies of all sizes and sectors are assessed. Understanding why ESG raters disagree — and what that means for investors — is one of the most important practical questions in ESG investing.

Quick definition: ESG rating divergence is the documented phenomenon that different ESG rating agencies assign materially different scores to the same company, with inter-provider correlations of approximately 0.54–0.61. The divergence arises from three sources: different measurement scopes (what is assessed), different measurement choices (how factors are measured), and different factor weights (how much each factor counts).

Key takeaways

The landmark academic study on ESG rating divergence, "Aggregate Confusion: The Divergence of ESG Ratings" by Berg, Kölbel, and Rigobon (2022, Review of Finance), decomposed divergence into three sources: scope divergence (38%), measurement divergence (56%), and weight divergence (6%).
Measurement divergence — how individual ESG factors are measured — is the largest single source of disagreement, not scope (what is covered) or weights (how much each factor counts).
A "rater effect" has been documented: certain raters have a systematic tendency to rate companies with better or worse scores relative to other raters, depending on company characteristics unrelated to ESG performance (such as company size or geographic region).
ESG rating divergence has practical financial consequences: fund performance can differ significantly depending on which ESG rating system was used to construct the portfolio.
The solution is not to find the "correct" ESG rating but to understand what each rating measures and use multiple raters where feasible, focusing on factor-level data over aggregate scores.

Three Sources of Disagreement

The Berg, Kölbel, and Rigobon decomposition identified three sources of ESG rating divergence:

Scope divergence: Different raters include or exclude different ESG categories in their assessments. One provider may assess a company's lobbying activities and political contributions as governance factors; another may not. One may include biodiversity and water as environmental factors; another may assess only carbon and energy. When raters disagree about which factors to measure, they necessarily produce different outputs — they are, in part, measuring different things.

Measurement divergence: Even when raters measure the same ESG factor (say, board independence), they may measure it differently — using different definitions, different data sources, or different assessment methods. Board independence as a percentage of total board size produces different scores than board independence as a percentage of independent directors excluding the chairman. Carbon emissions measured per unit of revenue vs. per unit of production produces different scores for the same emissions level. Measurement divergence is the dominant source — 56% of total divergence — indicating that most disagreement comes from how factors are measured, not what is covered.

Weight divergence: Different raters assign different importance to the same ESG factors. If Provider A weights governance at 40% and Provider B weights it at 20%, a company with excellent governance and poor environmental performance will score very differently between the two providers. Weight divergence is the smallest contributor (6%) — which is counterintuitive, since weighting is often assumed to be the main source of disagreement. Most of the action is in measurement, not weighting.

Sources of ESG rating divergence

The "Rater Effect"

Beyond the decomposition of divergence sources, researchers have identified a rater effect — a systematic bias in certain raters' assessments that is correlated with company characteristics rather than actual ESG performance:

Some raters systematically rate larger companies higher (because large companies disclose more, and disclosure-based scoring rewards disclosure quality regardless of performance)
Some raters systematically rate companies in certain geographies higher (reflecting cultural familiarity with reporting conventions)
Some raters show sector biases (systematically favorable or unfavorable to specific industries)

The rater effect means that a company's ESG score partly reflects the rating agency's methodology biases, not just the company's actual ESG performance. This is analogous to sampling bias in survey research — the question wording affects the answer.

Financial Consequences of Divergence

ESG rating divergence has documented financial consequences for investors:

Portfolio return divergence: Two ESG funds with identical sector allocations but different ESG rating systems in their construction methodology can produce meaningfully different returns, because the underlying ESG screening selects different companies. Studies have shown annualized return differences of 1%–2% between portfolios built on different ESG rating systems in the same benchmark — not trivial for institutional investors.

Factor exposure differences: Because ESG scores correlate with certain financial factors (size, quality, momentum), different ESG ratings that weight these correlating factors differently will produce portfolios with different factor exposures. An ESG portfolio built on Sustainalytics may have different size and quality factor tilts than one built on MSCI, even holding sector allocation constant.

Uncertainty about "true" ESG quality: Divergence creates genuine uncertainty about which companies actually have better ESG management. A company rated AA by MSCI (leader) and Medium Risk by Sustainalytics may have good ESG practices by one measure and mediocre practices by another. Investors cannot be certain which rating better reflects actual ESG quality.

Real-world examples

Tesla's rating divergence (2022 exemplar): Tesla received a high MSCI ESG rating (AA in environmental, but overall A due to governance) while S&P Global removed it from the S&P 500 ESG Index due to a low S&P DJI ESG score driven by governance controversies and social issues. The same company, the same year, receiving treatment that ranged from "leader" (MSCI) to index exclusion (S&P 500 ESG). This is not a marginal case.

Amazon's divergence: Amazon typically receives mid-range ratings from most providers — but the specific pillar scores diverge substantially. Strong supply chain integration and logistics efficiency support some environmental metrics; labor practices and worker safety issues depress social scores; governance concerns around founder control affect G scores. Which dimension receives the most weight determines whether Amazon scores in the "leader" or "laggard" half of the distribution.

Agricultural commodities companies: Companies in agricultural commodity sectors (soy, palm oil, beef) receive dramatically different ESG scores across providers depending on whether the provider includes supply chain deforestation risk in its environmental assessment. Providers that do not include this factor rate these companies more favorably; providers that weight deforestation heavily rate them less favorably. Scope divergence in action.

Practical Implications for Investors

Don't rely on a single ESG rating: The academic evidence is clear that no single ESG rating captures comprehensive ESG quality. Using multiple ratings — particularly comparing MSCI's relative performance approach with Sustainalytics' risk exposure approach — provides a fuller picture.

Focus on specific material factors rather than aggregate scores: Aggregate ESG scores embed methodological choices that may or may not align with the investor's objectives. A pension fund concerned specifically about climate transition risk is better served by looking at scope 1 and 2 emissions intensity and SBTi commitment status than by using an aggregate ESG score that mixes climate with governance and social factors.

Use divergence as an information signal: When two credible ESG raters disagree substantially about a company, the disagreement itself is informative. It signals that the company's ESG profile is contested or complex — warranting additional analysis rather than reliance on either rating alone.

Be explicit about methodology in reporting: Institutional investors reporting portfolio ESG quality should specify which ESG rating system they used. Claiming a "high ESG portfolio quality" without specifying the rating system is epistemically questionable given the divergence in what different systems measure.

Common mistakes

Averaging across ESG raters without normalization: Simply averaging two providers' scores is problematic because the scales are different (MSCI uses A-letter grades; Sustainalytics uses 0–100 inverted scale; Refinitiv uses A+ to D-). Averaging requires normalization to a common scale, and even then, it conflates different conceptual frameworks.

Assuming higher ESG scores are unambiguously better: A company with a high MSCI ESG score and a high Sustainalytics risk score (worse) may have good relative performance in its sector but significant unmanaged ESG risk. Neither score is wrong; they are measuring different dimensions.

Concluding that ESG ratings are useless because they disagree: ESG rating divergence does not mean ESG data is uninformative — it means aggregate scores are too simple to capture the complexity of ESG performance. Factor-level data (specific emissions, specific governance metrics, specific incident records) is more reliable and more actionable than aggregate scores.

FAQ

Is ESG rating divergence getting better or worse?

Evidence from longitudinal studies suggests divergence has persisted through the mid-2020s despite increased corporate disclosure and growing ESG industry investment. The primary driver — measurement divergence — may actually increase as more providers develop sophisticated but differently-designed measurement methodologies. Regulatory convergence through CSRD and ISSB may reduce scope divergence (making more consistent data available) while measurement divergence among providers that process the same data differently may persist.

Do ESG ratings converge for the best and worst performers?

Somewhat — studies find that divergence is greatest for mid-range companies and somewhat smaller for the clearest leaders and laggards. Companies with exceptional ESG management across all dimensions tend to receive high scores from most providers; companies with egregious ESG failures (major environmental disasters, significant corruption convictions) tend to receive low scores from most providers. The most ambiguous cases — typical companies with mixed profiles — show the most rating divergence.

Should investors pay more for ESG data if divergence means it's unreliable?

ESG data is not useless because it diverges — it provides information about specific ESG dimensions that is valuable for risk analysis and company comparison, even if aggregate scores are imprecise. The investment case for ESG data is not "it tells you exactly how good a company is on ESG" but "it provides systematic, comparable information about ESG dimensions that would otherwise require bespoke company-level research."

Could a single global ESG standard solve the divergence problem?

A single global ESG disclosure standard (like ISSB S1/S2) would reduce scope divergence by ensuring more consistent data availability. It would not eliminate measurement divergence, which arises from the different analytical choices providers make when processing the same raw data. Some divergence is inherent in any system where multiple analysts process complex multi-dimensional information about thousands of companies.

What is the "aggregate confusion" paper and why is it important?

"Aggregate Confusion: The Divergence of ESG Ratings" by Florian Berg, Julian Kölbel, and Roberto Rigobon was published in the Review of Finance in 2022. It is the definitive academic analysis of ESG rating divergence, quantifying the correlation between major providers, decomposing divergence into scope/measurement/weight components, and documenting the rater effect. It has become the standard reference for anyone discussing ESG rating reliability.

Summary

ESG rating agencies disagree substantially — with inter-provider correlations around 0.54–0.61 — because they make different choices about what to measure (scope), how to measure it (measurement methodology), and how much to weight each factor. The most important finding from academic research is that measurement divergence — how individual factors are quantified — is the largest source of disagreement, not the factor weights that most practitioners assume drive divergence. A documented rater effect adds systematic biases related to company characteristics. The practical response is not to abandon ESG ratings but to use multiple providers, focus on factor-level data over aggregate scores, and treat divergence between credible raters as a signal that a company's ESG profile warrants deeper investigation.

→ Scope Divergence

Key takeaways​

Three Sources of Disagreement​

Sources of ESG rating divergence​

The "Rater Effect"​

Financial Consequences of Divergence​

Real-world examples​

Practical Implications for Investors​

Common mistakes​

FAQ​

Is ESG rating divergence getting better or worse?​

Do ESG ratings converge for the best and worst performers?​

Should investors pay more for ESG data if divergence means it's unreliable?​

Could a single global ESG standard solve the divergence problem?​

What is the "aggregate confusion" paper and why is it important?​

Related concepts​

Summary​

Next​