Skip to main content
Critiques of ESG

ESG Ratings: Problems, Divergence, and Limitations

Pomegra Learn

What Are the Substantive Problems with ESG Ratings?

ESG ratings have become the dominant mechanism through which ESG analysis is operationalized — determining fund eligibility, index composition, portfolio screening, and engagement priorities. Yet ESG ratings have fundamental problems that are widely acknowledged by researchers but insufficiently understood by investors who rely on them. The most critical problem: ESG ratings from different providers disagree more than they agree. When two major ESG rating providers assign a company substantially different ESG scores — one rating it as a sustainability leader, another as an average performer — at least one must be wrong. Given how routinely this disagreement occurs, the validity of ESG ratings as objective assessments of corporate sustainability performance is genuinely in doubt. This article examines the evidence on ESG ratings divergence, its causes, its implications for investment strategy, and what investors should do about it.

ESG ratings divergence — the finding that major ESG rating providers disagree substantially on the same companies' scores — is the most significant empirical problem in ESG investing. Berg-Koelbel-Rigobon (2022) documented average pairwise correlations of 0.38-0.71 between major providers, compared to 0.99 between Moody's and S&P credit ratings — indicating ESG ratings are measuring fundamentally different things.

Key Takeaways

  • Berg-Koelbel-Rigobon (2022) documented average pairwise ESG rating correlations of 0.38-0.71 — compared to 0.99+ for credit ratings. This is not noise — it indicates ESG providers are measuring different things.
  • Three sources of divergence: scope (what is measured), measurement (how it's measured), and aggregation (how indicators are weighted). Scope and measurement divergence account for most of the correlation gap.
  • "Rater effect" problem: companies' reputations and the industries they're in influence ESG scores beyond what the underlying indicators warrant — large tech companies systematically receive higher scores than their individual indicator performance justifies.
  • ESG ratings are substantially based on disclosure quality rather than actual performance — companies that disclose more receive higher scores even if their absolute performance is average.
  • Practical implication: ESG ratings are opinions, not measurements. They should be used as screening inputs, not as definitive assessments of sustainability quality.

The Berg-Koelbel-Rigobon Finding

The most rigorous academic analysis of ESG ratings divergence was published by Berg, Koelbel, and Rigobon in 2022 in the Review of Finance:

Key finding: Average pairwise correlations between six major ESG rating providers (MSCI, Sustainalytics, Moody's ESG, S&P Global, Refinitiv, Vigeo Eiris) ranged from 0.38 to 0.71. The average across all pairs was approximately 0.54.

The comparison: Credit ratings from Moody's and S&P have a correlation of approximately 0.99. Financial statement ratios calculated from the same data have correlations near 1.0. ESG ratings appear to have fundamental information content disagreement, not just random noise.

Decomposition: The researchers decomposed the divergence into three components:

  • Scope divergence (56% of total): Different providers measure different sets of indicators
  • Measurement divergence (38%): Same indicators measured differently
  • Aggregation divergence (6%): Different weights applied to indicators

The dominant sources are scope and measurement — meaning providers are not just weighing the same information differently, but fundamentally measuring different things and measuring them differently.

The rater effect: Berg et al. also identified a "rater effect" — each provider's scores are influenced by their overall impression of the company, beyond what the individual indicators warrant. This introduces systematic bias: companies perceived as responsible (large tech, healthcare) receive score boosts; companies in controversial industries receive discounts.


Scope Divergence: Different Providers Measure Different Things

ESG is a broad concept — "environmental, social, and governance" encompasses hundreds of potential indicators. Providers choose different subsets:

Environmental scope examples:

  • Provider A may measure: GHG emissions, energy intensity, water withdrawal, biodiversity impact, waste generation
  • Provider B may measure: GHG emissions, energy mix, climate change policy quality, environmental controversies
  • A company can perform very differently on these different sets of indicators

Social scope examples:

  • Provider A: Labor rights, supply chain standards, community relations, product safety
  • Provider B: Employee satisfaction, turnover rates, diversity metrics, human rights policy quality
  • A company with strong labor relations but weak diversity disclosure looks different to these two providers

Governance scope:

  • Provider A: Board independence, executive pay, shareholder rights, anti-corruption
  • Provider B: Ownership structure, board diversity, audit quality, ethics programs
  • Different emphasis on independence vs. diversity changes company rankings

Industry adjustments: Some providers adjust weights by industry (environmental factors weighted higher for heavy industry; social factors higher for consumer-facing industries). Others use universal weights. This choice alone creates significant divergence.


Measurement Divergence: Different Approaches to the Same Indicator

Even where providers measure the same indicator, they measure it differently:

Greenhouse gas emissions example:

  • Some providers use absolute emissions (total tons CO2e)
  • Others use emissions intensity (tons per million revenue)
  • Others use emissions relative to industry peers
  • A company with high total emissions but below-average intensity receives high environmental scores from one provider and low from another

Board independence:

  • Some count formally independent directors as classified in proxy statements
  • Others apply their own independence criteria (discounting directors with prior business ties, long tenure, etc.)
  • The same board composition results in different independence scores

Pay equity:

  • Some measure unadjusted gender pay gap (raw difference in median salaries)
  • Others measure adjusted gap (controlling for role, seniority, location)
  • The same company can have a large unadjusted gap but small adjusted gap

Data source:

  • Some providers use company-reported data
  • Others supplement with controversy databases, news feeds, NGO reports
  • A company with no reported controversies but significant unreported issues may score high on provider A and lower on provider B that uses controversy monitoring

The Disclosure Problem: Scoring Disclosure Rather Than Performance

A fundamental methodological problem: when underlying performance data is unavailable, providers often score disclosure quality as a proxy.

The logic: Companies that disclose more are assumed to have better performance — because poor performers have incentives to hide results, good performers have incentives to disclose.

The problem: This assumption is often violated:

  • A company with poor environmental performance can score well by disclosing everything clearly (full disclosure of bad performance)
  • A company with strong environmental performance but minimal disclosure practices will score poorly
  • Disclosure quality is a governance indicator, not necessarily a performance indicator

The scale effect: Large companies have dedicated sustainability reporting teams producing comprehensive disclosures. Small and mid-cap companies have fewer resources for disclosure — systematically receiving lower scores not because their performance is worse but because their disclosure is less comprehensive.

Emerging market bias: Companies in markets with less developed sustainability reporting cultures disclose less — receiving systematically lower ESG scores independent of actual performance. This creates a geographic score bias that penalizes emerging market investments.


The Rater Effect: Reputation Contaminates Ratings

Berg et al. identified that companies' overall reputations influence their ESG scores beyond what the underlying indicators warrant:

The mechanism: Rating analysts form impressions of companies from their overall reputation, media coverage, and industry positioning. These impressions then influence how individual indicator scores are interpreted and aggregated — a form of confirmation bias at the ratings level.

Tech company example: Large technology companies consistently receive ESG scores above what their individual indicators would suggest — they benefit from associations with innovation, progressive corporate culture, and young workforce, even when their data center energy consumption, labor practices in supply chains, and governance (founder-dominated structures) are problematic on individual metrics.

Carbon-intensive industry penalty: Coal, oil, and mining companies receive ESG score penalties beyond what their individual indicator performance warrants — the industry itself triggers a downward rater effect.

Investment implication: The rater effect means that ESG scores cannot be used as simple screens — they contain systematic bias by industry and company reputation that conflates the company's reputation with its actual ESG performance.


Business Model Conflicts: Rater-Rated Interactions

ESG rating providers face potential conflicts of interest in their business models:

Data collection conflict: Some providers sell data services and consulting to the companies they rate. If companies can improve their scores by buying data services, the rating is not independent.

Assessment conflict: Companies know they are rated and can see their ratings. Some providers allow companies to contest specific data points before publication — creating a process that can favor well-resourced companies who actively manage their ratings.

SEC attention: The SEC has examined conflicts of interest in ESG rating methodologies — drawing parallels to credit rating agency conflicts that contributed to the 2008 financial crisis.

Regulatory response: The EU ESMA is developing a regulatory framework for ESG rating providers (proposed 2023) — addressing conflicts of interest, transparency of methodology, and independence requirements. This represents regulatory acknowledgment that current ESG ratings lack the governance standards that credit ratings are required to meet.


What Investors Should Do

ESG ratings are not worthless — they provide systematized sustainability information. But their limitations require specific practices:

Multi-provider cross-validation: For significant portfolio holdings, check scores across multiple providers. When providers disagree substantially, investigate why — the disagreement often reveals a genuine analytical question about the company.

Understand methodology: Know whether your primary ESG data provider emphasizes disclosure quality or actual performance, uses industry-adjusted or universal weights, and covers emerging markets robustly.

Sector-relative scoring: ESG scores are more useful as within-sector comparison tools than as absolute assessments. A company rated 60/100 in materials may be a better ESG performer than a tech company rated 70/100 if the materials score controls for industry-specific challenges.

Controversy cross-checking: Supplement ESG scores with controversy monitoring — incidents and violations that scores may not yet reflect.

Direct data sources: For high-conviction ESG positions, access underlying data (CDP disclosures, company sustainability reports, proxy statements) rather than relying solely on aggregated scores.


Common Mistakes

Using a single ESG rating provider and treating the score as objective. ESG ratings are opinions, not measurements. Treating a single provider's score as definitive is equivalent to consulting one analyst on a company's credit quality and treating that opinion as fact.

Assuming higher ESG scores mean better companies. ESG scores measure a mix of disclosure quality, risk management, and actual performance — in proportions that vary by provider. A higher score may reflect better disclosure of average performance rather than genuinely superior sustainability.

Not tracking methodology changes. ESG rating methodologies change frequently — MSCI has made significant methodology revisions that caused score changes unrelated to company behavior. Score movements should be attributed to company or methodology changes, not automatically treated as company performance signals.



Summary

ESG ratings diverge substantially across major providers — Berg-Koelbel-Rigobon (2022) found average pairwise correlations of 0.38-0.71, versus 0.99+ for credit ratings. This divergence stems primarily from scope differences (different indicators measured) and measurement differences (same indicators measured differently), not just aggregation weighting. Additional problems: disclosure quality is systematically scored as a proxy for performance (favoring large-cap, developed market companies); a rater effect biases scores by company reputation beyond what indicators warrant; and business model conflicts create potential rating inflation for companies that actively manage their ESG ratings. The practical implication is that ESG ratings are analytical inputs to be critically evaluated, not objective measurements to be accepted at face value. Investors should cross-validate across providers, understand methodology, use sector-relative comparisons, and supplement scores with direct data and controversy monitoring.

The Impact Attribution Problem