How ESG Ratings Work: Inputs, Weights, and Aggregation
How Do ESG Rating Agencies Produce ESG Scores?
An ESG rating is the output of a complex data collection, weighting, and aggregation process that attempts to summarize a company's performance across dozens of environmental, social, and governance dimensions into a single score. Understanding how these scores are constructed — what data goes in, how that data is transformed and weighted, and how the final number is produced — is essential for anyone using ESG ratings in investment analysis. ESG ratings are not objective measurements; they are modeled opinions based on methodological choices that differ significantly across providers. Two companies with identical actual ESG performance can receive very different scores from different raters, not because one rater is wrong, but because they have made different choices about what to measure and how much to weight each dimension.
Quick definition: ESG ratings are scored assessments of a company's environmental, social, and governance practices and performance, produced by specialized data providers using structured data collection, factor selection, weighting, and aggregation methodologies. They are opinions based on models, not objective facts.
Key takeaways
- ESG ratings start with raw data: primarily corporate disclosures (sustainability reports, regulatory filings, annual reports), supplemented by government databases, NGO research, media monitoring, and direct company surveys.
- Data quality is a fundamental constraint: most ESG metrics rely on self-reported data with no mandatory external verification. Coverage and completeness vary enormously across company size, geography, and sector.
- Rating agencies apply materiality frameworks — most prominently SASB's industry-specific standards — to weight ESG factors differently by industry, reflecting the view that different ESG factors are financially relevant in different sectors.
- Aggregation from hundreds of individual data points to a single score involves multiple layers of judgment: which factors to include, how to weight them, how to handle missing data, and whether to apply controversy overlays.
- The same company can receive materially different ratings from different providers — a divergence documented in multiple academic studies — because each step of the methodology involves choices that different providers make differently.
Step 1: Data Collection
The foundation of any ESG rating is the data collected about the rated company. ESG data sources fall into several categories:
Company disclosures: The primary source for most ESG ratings is what companies report about themselves — sustainability reports, CSR reports, integrated annual reports, regulatory filings (10-K, 20-F, proxy statement), and responses to rating agency questionnaires. Corporate disclosures have grown dramatically since 2015, driven by investor pressure and, in Europe, by regulatory requirements under CSRD. However, self-reported data has inherent limitations: companies choose what to disclose, how to measure it, and how to present it.
Regulatory and government databases: Environmental violation records (EPA enforcement actions, EU environmental agency data), labor violation records (OSHA inspections, NLRB filings), and governance regulatory filings provide third-party verified data points that supplement company self-reporting.
Media and controversy monitoring: Rating agencies including Sustainalytics (RepRisk subsidiary) and MSCI monitor global media, NGO publications, and court filings for ESG controversies — incidents that may not appear in company disclosures but are relevant to ESG assessment. Controversy monitoring adds real-time responsiveness that annual disclosure cycles cannot provide.
NGO and civil society research: Reports from environmental organizations, human rights groups, and labor research bodies provide information about company practices in areas where company self-reporting is most unreliable — supply chain labor conditions, indigenous community impacts, and environmental violations in operations where companies have minimal disclosure obligations.
Direct company surveys: Major rating agencies (S&P Global for its CSA, MSCI for its issuer questionnaire) send annual questionnaires to companies requesting specific data points not always available in public disclosures. Company response rates and response quality vary significantly.
Step 2: Factor Selection and Materiality
Raw data must be organized into ESG factors — the specific issues that are assessed and scored. Factor selection is a major source of differentiation between rating providers.
Breadth of coverage: Different providers cover different numbers of ESG issues. MSCI assesses approximately 35 key issues across ESG pillars; Sustainalytics covers over 70 indicators; S&P's CSA uses over 130 questions across 24 criteria. Broader coverage captures more dimensions of ESG performance but creates greater data collection burden and more opportunities for different data to produce divergent scores.
Materiality frameworks: Most rating providers apply some form of industry-specific materiality weighting — assigning higher importance to ESG factors that are financially relevant for companies in specific industries. A mining company's tailings management and water use matter more than its workforce gender diversity for financial risk assessment; a technology company's data privacy and cybersecurity practices matter more than its water use.
SASB's industry-specific materiality standards are the most widely adopted framework for this purpose. SASB's standards, now maintained by the IFRS Foundation, identify the specific ESG disclosure topics most likely to be financially material for companies in each of 77 industries. Most major ESG rating agencies have incorporated SASB-aligned materiality into their weighting frameworks to varying degrees.
Step 3: Measurement and Scoring
For each ESG factor, rating agencies must convert raw data into a standardized score that can be compared across companies and aggregated. This conversion involves multiple decisions:
Measurement approach: Some ESG factors are quantitative (total CO₂ emissions, female board representation percentage, recordable injury rate). Others are qualitative (quality of environmental management systems, robustness of ethics governance). Qualitative factors require rating agency judgment about what "good" looks like.
Normalization: To compare companies of different sizes, many metrics are normalized — carbon emissions per unit of revenue, injury rates per 100,000 employees, board independence as a percentage rather than absolute number. Normalization choices significantly affect relative company scores.
Peer comparison vs. absolute standards: Some rating agencies score companies relative to their industry peers (MSCI's approach emphasizes relative performance within sector). Others score against absolute standards (does the company meet specific criteria). Relative scoring means a company can score highly if it is the best in a poor-performing sector; absolute scoring holds all companies to the same bar regardless of sector.
Handling missing data: When companies do not disclose a specific data point, rating agencies must decide: score zero (penalizing non-disclosure), impute from sector averages, or leave blank and reweight other factors. These choices systematically affect scores for smaller companies and companies in jurisdictions with lower disclosure norms.
ESG rating construction process
Step 4: Pillar and Overall Aggregation
Individual factor scores must be aggregated into pillar scores (E, S, and G separately) and then into an overall ESG score or rating. Aggregation weights are a critical methodological choice:
Fixed vs. variable pillar weights: Some providers use fixed weights (e.g., E = 33%, S = 33%, G = 33% across all companies). Others vary pillar weights by industry — energy sector ratings may weight E more heavily; banking sector ratings may weight G more heavily.
Additive vs. minimum-score aggregation: Additive aggregation (weighted average of pillar scores) allows high performance in one pillar to offset weak performance in another. Minimum-score aggregation (the final score cannot exceed the lowest pillar score) prevents a company from achieving a high overall rating while performing poorly on any single dimension.
Controversy overlays: Many rating agencies apply controversy adjustments that can reduce a company's score (or override it entirely) when serious conduct incidents occur, regardless of its underlying policy and disclosure quality. This creates a distinction between "performance" scores (based on policies and practices) and "controversy" assessments (based on incident records).
Step 5: Rating Output Formats
Different providers present ESG assessments in different formats:
- Letter grades (MSCI: AAA to CCC)
- Numerical scores (Sustainalytics: 0–100 risk scale, where lower is better)
- Percentile rankings within peer groups
- Risk-level categories (negligible, low, medium, high, severe)
The different scales and formats make direct comparison across rating providers impossible without normalization — adding another layer of complexity for investors trying to use multiple raters.
Real-world examples
Tesla's ESG rating divergence (2022): Tesla was removed from the S&P 500 ESG Index in 2022 due to a low S&P DJI ESG score — driven partly by governance concerns and social controversies — while MSCI gave it a higher rating based on different factor weights. The same company, the same year, rated very differently by two major providers. This case crystallized public awareness of ESG rating methodology differences.
ExxonMobil's governance strength: ExxonMobil, a major oil company, scores relatively high on governance metrics across most ESG rating systems — reflecting strong board structure, transparent disclosure, and regulatory compliance — even while scoring low on environmental metrics. This illustrates how aggregation methodology affects overall scores: strong governance partially offsets weak environmental performance.
Small-cap ESG coverage gaps: A typical small-cap company — say, a regional US manufacturer — may receive ESG coverage from only one or two rating agencies, versus a major index constituent that receives ratings from five or more providers. Coverage gaps mean small-cap ESG scores are based on less data, less analyst scrutiny, and more missing-data imputation.
Common mistakes
Treating ESG scores as objective facts: An ESG score is a model output — it reflects the assumptions, data choices, and weighting decisions of the rating agency as much as the actual ESG performance of the company. A company that scores poorly on one rating system may score well on another with no actual change in behavior.
Using a single rater's score for investment decisions: The academic literature on ESG rating divergence — particularly the landmark Aggregate Confusion paper by Berg, Kölbel, and Rigobon (2022) — shows that the correlation between major ESG raters is approximately 0.54 to 0.61. Using a single score creates a false sense of precision. Comparing across multiple providers, or understanding why they disagree, is better practice.
Ignoring the difference between ESG scores and controversy assessments: A company can have a strong long-term ESG management score while being involved in a current serious controversy. Scores and controversy flags are different outputs serving different analytical purposes.
FAQ
How often are ESG scores updated?
Most ESG rating agencies update scores on an annual cycle, coinciding with the publication of annual sustainability reports and corporate disclosure updates. Controversy overlays — the incident-triggered component — are updated more frequently, sometimes in real time for significant events. MSCI updates its controversy scores continuously; Sustainalytics uses a similar approach. Annual score updates mean scores may lag actual corporate ESG developments by months.
Do companies know their ESG scores in advance?
Most major rating agencies provide companies with their draft scores or assessments before publication, allowing companies to correct factual errors. This "issuer feedback" process is controversial: it creates an opportunity for companies to influence their scores before publication, raising questions about rating independence analogous to concerns in credit rating.
Why do ESG scores often favor large companies?
Large companies have more resources to produce comprehensive sustainability disclosures, dedicated sustainability departments, and staff to respond to rating agency questionnaires. This creates a disclosure-quality advantage that many ESG rating systems reward — leading to higher scores for large companies even when their actual ESG management is no better than smaller, less-resourced peers. The divergence is particularly pronounced between developed-market large-caps and emerging-market small-caps.
Can companies dispute their ESG scores?
Most major providers have formal dispute processes where companies can raise concerns about factual errors in data used for ratings. Disputing methodology choices (why does this factor weight so heavily?) is more difficult — raters defend methodology as proprietary and do not typically change it in response to individual company objections.
Are ESG scores the same as credit ratings?
No — ESG scores and credit ratings are structurally different despite some superficial similarities. Credit ratings are narrow (probability of default/capacity to service debt), highly regulated (by the SEC under the NRSRO framework), and have decades of historical performance data. ESG scores are broad (dozens of environmental, social, and governance dimensions), largely unregulated until recently, and have much shorter performance history. The EU has moved to regulate ESG rating agencies; the US has not yet done so comprehensively.
Related concepts
- Rating Disagreements
- MSCI ESG Ratings
- Sustainalytics Ratings
- ESG Rating Conflicts of Interest
- Materiality Concept
- ESG Glossary
Summary
ESG ratings are produced through a multi-step process: data collection (from disclosures, regulatory databases, media monitoring, and company surveys), factor selection with industry-specific materiality weighting, individual factor scoring with normalization choices, pillar aggregation with explicit weights, controversy overlays, and final score output. Every step involves methodological choices that differ across providers — creating the well-documented ESG rating divergence where different raters assign substantially different scores to the same company. Understanding this construction process is the foundation for using ESG ratings critically: treating them as models with explicit assumptions rather than objective measurements, comparing across multiple providers, and distinguishing between ESG management quality and controversy exposure.