The Performance Debate

ESG Ratings and Investment Returns: The Evidence

Pomegra Learn

Do High ESG Ratings Predict Better Investment Returns?

The empirical relationship between ESG ratings and investment returns is one of the most studied and most confounded questions in sustainable finance. Hundreds of studies have examined whether companies with higher ESG scores (from MSCI, Sustainalytics, Refinitiv, or other providers) generate better stock returns than lower-rated peers. The findings are systematically complicated by three problems: ESG rating divergence (different providers score the same company very differently), factor contamination (ESG scores correlate with quality, growth, and low-volatility factors that have independent return effects), and time-period sensitivity (ESG rating-return correlations vary dramatically across market regimes). A clean, robust, well-controlled study finding a persistent ESG rating premium in returns does not exist — but neither does definitive evidence of an ESG rating return penalty.

The relationship between ESG ratings and investment returns is empirically contested — studies find mixed results depending on the ESG rating provider used, the time period analyzed, the benchmark and factor controls applied, and the specific return metric — reflecting both genuine uncertainty about the ESG-return relationship and systematic methodological challenges that complicate clean identification.

Key Takeaways

ESG rating divergence across providers (MSCI, Sustainalytics, Refinitiv) creates results that vary by which rating is used — studies showing positive ESG-return relationships using one provider may show neutral or negative results using another.
After controlling for quality, low-volatility, and profitability factors, the ESG premium in returns often shrinks significantly — suggesting much of the apparent ESG effect reflects correlated factor exposures.
ESG momentum (improvement in ESG scores over time, not level) shows stronger and more consistent return evidence than ESG level — companies improving ESG ratings show positive abnormal returns, consistent with information update effects.
The green premium hypothesis — that ESG stocks trade at higher multiples, implying lower future returns — suggests that very high ESG ratings may already be overpriced, reversing the positive rating-return relationship for the highest-rated companies.
The most practically useful finding: ESG ratings add value as risk screening tools (identifying high-ESG-risk companies to avoid) rather than as direct alpha sources (overweighting high-ESG companies).

The ESG Rating Divergence Problem

Before examining return evidence, a foundational problem must be addressed: ESG ratings from different providers disagree substantially.

Berg, Koelbel, and Rigobon (2022) — "Aggregate Confusion: The Divergence of ESG Ratings": This foundational study found that ESG ratings from major providers (MSCI, Sustainalytics, Refinitiv, RobecoSAM, S&P Global) have correlations of only 0.38–0.71 — far lower than credit rating correlations (>0.99). The same company may be rated excellent by one provider and below average by another.

Sources of divergence:

Scope disagreement: Providers include different ESG dimensions and metrics
Measurement disagreement: Different methodologies for measuring the same dimension
Weight disagreement: Different importance weights across pillars (E, S, G)

Return evidence implication: A study finding "high ESG score predicts positive returns" using MSCI ratings may find neutral or negative results using Sustainalytics ratings for the same companies over the same period. The empirical result depends on which rater's definition of ESG quality is used — making it impossible to identify a single "ESG premium."

Academic Evidence on ESG Ratings and Returns

Positive evidence:

Khan, Serafeim, Yoon (2016): High ratings on material ESG issues (as defined by SASB) predict positive abnormal returns; immaterial ESG issues do not predict returns. This is the cleanest positive finding — specific to financially material ESG, not broad ESG scores.
Friede, Busch, Bassen (2015) meta-analysis: 90% of 2,200 studies show neutral-to-positive ESG relationships — but as discussed, this aggregate conceals extreme heterogeneity.

Neutral evidence:

Halbritter and Dorfleitner (2015): After controlling for standard risk factors, ESG-sorted portfolios show no significant alpha across multiple rating providers.
Lins, Servaes, Tamayo (2017): High social capital (measured by employee and customer trust proxies, related to ESG) predicted positive abnormal returns during the 2008–2009 crisis — but not in normal periods.

Critical evidence:

Pedersen, Fitzgibbons, Pomorski (2021): ESG is correlated with quality factors. After accounting for quality factor exposure, much of the ESG premium disappears. ESG adds some information beyond factors but far less than raw ESG-return correlations suggest.
Avramov, Cheng, Lioui, Tarelli (2022): Finds that ESG constraints create long-run performance drag — challenging the neutral finding.

The ESG Momentum Signal

ESG momentum — change in ESG ratings over time — shows stronger return evidence than ESG level:

The mechanism: When a company improves its ESG profile (better disclosure, improved practices, reduced controversy), this often reflects operational improvements and management quality improvements that predict better future performance. The market may not immediately incorporate ESG improvements into prices.

Evidence: Nagy, Kassam, and Lee (2016) — MSCI researchers — found that ESG momentum (improving ESG ratings) predicted positive abnormal returns over 6-month horizons — distinct from ESG level effects.

Practical implication: Rather than overweighting already-high-ESG companies (which may already be priced for quality), overweighting companies showing ESG improvement may capture a more robust return signal — consistent with momentum research in other contexts.

Controversy score as short signal: Companies experiencing ESG controversy score increases (new adverse ESG events) show negative subsequent returns — consistent with controversy events containing information about operational problems not yet fully priced.

The Green Premium and Its Implications

Green premium existence: Multiple studies find evidence that high-ESG companies trade at valuation premiums relative to lower-ESG peers:

Higher P/E and P/B ratios at high-ESG companies
Lower cost of capital (lower required returns from investors)
ESG ETF and fund inflows creating demand-driven valuation uplift

Return implication: If ESG stocks already trade at a premium, they must have LOWER expected future returns than lower-ESG peers — investors buying at premium multiples accept lower expected return in exchange for ESG quality.

The Pastor, Stambaugh, Taylor (2021) model: These researchers developed a theoretical model showing that as ESG investor demand grows, ESG stock prices rise (producing historical ESG outperformance) — but once ESG preferences are fully priced, ESG stocks earn LOWER expected returns going forward because investors accept ESG premium for ESG quality. The historical ESG performance advantage reflects the transition to an equilibrium where ESG quality is priced, not persistent alpha.

This model elegantly explains both the apparent historical ESG outperformance AND why it should not be expected to persist.

Factor Contamination: The Critical Methodological Issue

The most important methodological challenge in ESG-return research:

ESG-Quality correlation: High-ESG companies tend to be high-quality companies — they have higher profitability, lower debt, better management, and more stable earnings. These are exactly the characteristics that define the "quality" factor in multi-factor models. A study finding ESG-return premium without controlling for quality is likely capturing the quality factor premium, not an ESG-specific effect.

ESG-Low Volatility correlation: ESG integration and ESG controversies create volatility. High-ESG companies, having fewer ESG controversies and more stable operations, tend to have lower volatility — correlating with the "low-volatility" factor. Without controlling for low-volatility, ESG-return correlations overstate the ESG-specific return.

ESG-Growth correlation: ESG strategies frequently underweight energy and overweight technology — sectors with different growth characteristics. Without sector and growth factor controls, ESG-return studies conflate sector rotation with ESG effects.

Correct methodology requires: Fama-French 5-factor model (market, size, value, profitability, investment) + momentum + low-volatility factor controls, plus sector neutralization or sector controls. Studies without these controls should be interpreted with significant caution.

Practical Implications for ESG Investors

Given the mixed evidence:

Use ESG ratings as risk filters, not return predictors: The most consistent finding is that very low ESG scores predict elevated downside risk (controversy, fraud, operational failure). Using ESG as a negative screen (avoid bottom-quintile ESG) is more evidentially supported than using it as a positive signal (overweight top-quintile ESG).

Incorporate ESG momentum: The improving-ESG signal (ESG momentum) shows more consistent evidence than ESG level. Overweighting companies showing ESG improvement — as opposed to already-high-ESG companies — may capture a more robust signal.

Avoid overpaying for ESG quality: If the green premium implies lower expected future returns for high-ESG stocks, buying the highest-ESG companies at premium multiples is not supported by expected return evidence. ESG integration should influence portfolio construction, not override valuation discipline.

Use material ESG: Khan-Serafeim-Yoon's finding that material ESG predicts returns while immaterial ESG does not is the strongest practical guidance: focus ESG analysis on issues identified as financially material for the specific industry, not on comprehensive ESG scores.

Common Mistakes

Citing a single positive ESG-return study as definitive. The evidence is genuinely mixed. Citing favorable studies while ignoring unfavorable ones produces confirmation bias. Honest ESG return communication requires acknowledging the full range of findings.

Ignoring factor controls. ESG-return findings without quality, low-volatility, and sector factor controls significantly overstate ESG-specific return effects. Factor contamination is the primary methodological challenge in ESG-return research.

Treating ESG rating provider choice as methodological detail. Given the low correlations between ESG providers, the choice of rating provider fundamentally changes results. Studies should specify which provider's ratings were used and test robustness across multiple providers.

Summary

The evidence on ESG ratings and investment returns is genuinely mixed — no robust, well-controlled study shows a persistent ESG level premium in returns. ESG rating divergence (correlations of 0.38–0.71 across providers) means results vary by which rating is used. Factor contamination (ESG correlates with quality and low-volatility) means ESG-return findings without factor controls substantially overstate the ESG-specific effect. ESG momentum (improving ratings) shows more consistent positive return evidence than ESG level. The green premium suggests that very high ESG stocks may already be overpriced — implying lower future returns at premium multiples. The most evidentially supported practical use of ESG ratings is as risk filters (avoid bottom-quintile ESG) rather than return generators (overweight top-quintile ESG), focusing on material ESG issues per SASB guidance and using ESG momentum as a supplementary signal.

→ ESG in Portfolio Construction

Key Takeaways​

The ESG Rating Divergence Problem​

Academic Evidence on ESG Ratings and Returns​

The ESG Momentum Signal​

The Green Premium and Its Implications​

Factor Contamination: The Critical Methodological Issue​

Practical Implications for ESG Investors​

Common Mistakes​

Related Concepts​

Summary​