Question 1

What is ARA Eval?

Accepted Answer

ARA Eval is an open-source Agent Risk Assessment framework that decides when an AI agent can safely act without human approval. It produces a 7-dimension risk fingerprint (A–D classification per dimension) and applies deterministic gating rules on top of the fingerprint to classify a domain as ready now, ready with prerequisites, or human-in-the-loop required.

Question 2

Why 7 dimensions instead of a single risk score?

Accepted Answer

A single risk score collapses distinct failure modes into one number, which hides where danger actually lives. ARA Eval decomposes risk into 7 independent dimensions so operators can see exactly which dimension is blocking autonomy and target remediation there, rather than lowering an aggregate score through unrelated mitigations.

Question 3

Which regulatory frameworks does ARA Eval target?

Accepted Answer

ARA Eval is built for Hong Kong financial services regulation, including the Hong Kong Monetary Authority (HKMA), Securities and Futures Commission (SFC), Privacy Commissioner for Personal Data (PCPD), and the Personal Information Protection Law (PIPL) where applicable.

Question 4

How does ARA Eval combine LLM judgments with deterministic rules?

Accepted Answer

ARA Eval uses LLM-as-judge evaluation to produce probabilistic dimension classifications, then layers deterministic gating on top. The deterministic gates encode non-negotiable regulatory or safety constraints, so LLM uncertainty cannot override hard compliance rules.

Dimension	What it measures	Gate
Decision Reversibility	Can you undo it?	Soft
Failure Blast Radius	How many people/systems/dollars?	Hard
Regulatory Exposure	Does it touch compliance?	Hard
Decision Time Pressure	How long before you must act?	Soft
Data Confidence	Does the agent have enough signal?	Soft
Accountability Chain	Who’s responsible? Can you audit?	Soft
Graceful Degradation	Does it fail safely or cascade?	Soft

ARA Eval

Risk is a fingerprint, not a score

The 7 dimensions

Hard gates: the aviation principle

LLM-as-judge results

Who it's for

What's included