Evaluating ESG Scoring Consistency of Large Language Models Using Retrieval Augmented Generation Methods
DOI:
https://doi.org/10.14738/assrj.1212.19750Keywords:
Environmental, Social, Governance (ESG), Artificial Intelligence (AI), Financial Institutions, Retrieval-Augmented-Generation (RAG)Abstract
This study evaluates how large language models (LLMs) determine environmental, social, governance (ESG) scores, utilizing retrieval-augmented-generation (RAG) procedures. Three LLMs–Claude-4, ChatGPT-4o, and Gemini-2.5–were used to find the environmental (E), social (S), and governance (G), scores for a total of nine different, publicly traded companies of three different sizes (small, medium, and large) based on market capitalization. The scores of four companies (Morgan Stanley, Goldman Sachs, Berkshire Hathaway, East West Bancorp) were found using one set of prompts, and the other five (BlackRock, PNC, Bank of America, American Financial Group, and GreenDot) were found using a separate set of prompts, utilizing the same criteria and methods with changes in wording. Both sets of prompts, or each trial, found that RAG approaches produce more stable scores that were more consistent with their existing scores found by established rating agencies (Morningstar Sustainalytics, S&P Global, and JUST Capital). These findings suggest that LLMs demonstrate greater consistency in measuring ESG performance when using RAG methods and providing structured data and criteria, exemplifying the growing capabilities and prospective viability for LLM-determined ESG ratings.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Gaargi Bora, Sashrika Gupta, Dr. Sudip Gupta, PhD

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors wishing to include figures, tables, or text passages that have already been published elsewhere are required to obtain permission from the copyright owner(s) for both the print and online format and to include evidence that such permission has been granted when submitting their papers. Any material received without such evidence will be assumed to originate from the authors.
