\section*{About the project: Causality (SDG/ESG Evaluation Agent)}
\subsection*{What inspired us} We kept seeing ``green'' investment recommendations that sounded convincing but were difficult to verify. Most real-world workflows still rely on opaque ESG scores, ad-hoc prompt tests, or narrative justifications, which makes it hard to tell whether a sustainability claim is \textit{evidence-supported} or drifting into greenwashing. This gap motivated \textbf{Causality}: an evaluation-layer agent that audits sustainability reasoning instead of generating investment advice.
\subsection*{What we learned} Building Causality taught us how to convert a qualitative question (``Is this SDG-aligned?'') into something \textit{testable} and \textit{repeatable}: \begin{itemize} \item how to decompose narratives into atomic, verifiable claims, \item how to align claims to indicators/disclosures with traceable provenance, \item how rubric design affects reliability and false positives/negatives, \item how to reduce hallucinations by forcing evidence-grounded outputs and structured constraints. \end{itemize}
\subsection*{How we built it} Causality is a modular pipeline that turns free-form recommendations into a structured audit: \begin{enumerate} \item \textbf{Input & normalization:} ingest a recommendation (AI or human) and standardize entities (company, sector, timeframe). \item \textbf{Claim extraction:} split text into testable sustainability claims (e.g., emissions reduction, labor practices, governance policies). \item \textbf{Indicator mapping:} map each claim to relevant SDG targets and ESG indicators/disclosures; attach citations/evidence snippets. \item \textbf{Rubric scoring:} apply SDG-aligned rubrics to score support strength, reasoning quality, and transparency. \item \textbf{Risk flagging:} detect red flags (missing evidence, vague claims, inconsistent logic, cherry-picked metrics). \item \textbf{Report generation:} output a compact scorecard plus an audit-style explanation and evidence links. \end{enumerate}
A simple aggregation used for summarizing rubric dimensions is: [ S_{\text{overall}}=\sum_{k=1}^{K} w_k\, s_k, \qquad \sum_{k=1}^{K} w_k = 1, ] where $s_k$ is the rubric score for dimension $k$ (e.g., evidence support, SDG alignment, transparency) and $w_k$ is its weight.
\subsection*{Challenges we faced (and how we handled them)}
\begin{itemize}
\item \textbf{Ambiguous claims} (e.g., sustainable'',responsible''): we enforced claim templates and required measurable indicators or disclosures.
\item \textbf{Evidence gaps / inconsistent disclosures:} we treated ``insufficient evidence'' as a first-class outcome rather than forcing a confident score.
\item \textbf{Hallucination risk in LLM reasoning:} we used structured outputs, strict evidence citations, and validation checks to keep the model grounded.
\item \textbf{Rubric calibration:} we iterated on scoring rules using edge cases (strong marketing language, partial data, mixed SDG impacts) to reduce over-penalizing legitimate claims.
\end{itemize}
\subsection*{Outcome} Causality turns ``trust me'' sustainability narratives into a repeatable evaluation: a scorecard with evidence links and greenwashing risk flags, designed to plug into compliance and portfolio workflows.
Built With
- api
- interactive-data
- natural-language-processing
- python
- webplatform
Log in or sign up for Devpost to join the conversation.