LLM Safety Testing
& AI Governance Compliance

Structured, evidence-based safety testing of LLM-based applications against 11 AI governance principles — mapped to AI Verify, EU AI Act, and NIST AI RMF. We deliver a scored compliance report with per-principle findings, benchmark results, and actionable remediation guidance.

Not sure where your LLM application stands? Book a free 60-minute scoping call. We’ll map your system against the governance framework and identify your highest-risk gaps.

Book Free Scoping Call →

A Principled Testing Framework

Our methodology follows a structured Principles → Outcomes → Processes → Evidence chain — the same approach underlying Singapore’s AI Verify and the EU AI Act conformity requirements.

01

Principles

Overarching governance considerations your AI application must adhere to — derived from AI Verify, NIST AI RMF, ISO 42001, and EU AI Act.

02

Outcomes

Measurable outcomes defined for each principle, spanning both technical tests and non-technical process checks (policies, documentation, governance).

03

Processes

Actionable testing processes: baseline public benchmarks, domain-specific custom tests, component-level checks, and manual red teaming sessions.

04

Evidence

Every process validated by documentary evidence — test logs, benchmark results, red team reports — forming your audit trail for regulatory compliance.

11 AI Governance Principles Tested

Every assessment covers all 11 principles. Each is scored (Yes / No / N/A) with evidence and a remediation recommendation where gaps are found.

2Explainability
3Repeatability / Reproducibility
4Safety
5Security
6Robustness
7Fairness
8Data Governance
9Accountability
10Human Agency & Oversight
11Inclusive Growth & Societal Well-being

Four Output Risk Domains

We run Baseline Tests (public benchmark datasets) and Specific Tests (domain-aware scenarios + red teaming) across four core output risk areas, plus component-level checks on RAG, filters, and system prompts.

💬

Hallucination

Generation of factually incorrect, ungrounded, or incomplete content that could mislead users in high-stakes contexts.

Baseline Tests
  • MMLU — 15,000+ questions across 57 domains
  • SimpleQA — 4,000+ short-form fact checks
  • FACTS Grounding — 1,700+ RAG accuracy prompts
  • CNN/DailyMail — 300K+ summarisation completeness
Specific Tests
  • Domain-specific factual accuracy in your application area
  • RAG grounding under adversarial document injection
  • Incompleteness and selective omission detection
⚠️

Undesirable Content

Generation of harmful, toxic, or legally prohibited content — including cultural and local legal context.

Baseline Tests
  • MLCommons AILuminate — 900 prompts, 9 harm categories
  • XSTest — 250 safe + 200 unsafe refusal calibration
Specific Tests
  • Use-case-specific harm scenarios
  • Cultural norm & Baltic/EU legal compliance
  • Toxicity in target languages (LV, LT, ET, EN, RU)
Component Tests
  • Input/output filter false negatives and false positives
🔒

Data Disclosure

Unintended leakage of personal, organizational, or confidential information — GDPR Article 9 categories included.

Baseline Tests
  • MLC AILuminate Privacy Subset — 100 PII extraction prompts
  • XSTest Personal Info — 75 safe/unsafe extraction scenarios
Specific Tests
  • Local law sensitivity (GDPR, Baltic data laws)
  • Internal vs external user context disclosure thresholds
Component Tests
  • System prompt confidentiality and knowledge base PII leakage via RAG
🎯

Adversarial Prompt Attacks

Susceptibility to producing unsafe outputs when presented with intentionally crafted prompt attacks designed to bypass guardrails.

Baseline Tests
  • CYBERSECEVAL 4 — 15 prompt injection risk variants
  • Microsoft BIPIA — 600K+ text and code attack prompts
Specific Tests
  • Targeted adversarial scenarios matched to your threat model
  • Indirect injection via documents, emails, web content
  • Jailbreak resistance: role-play, hypothetical, encoding bypass
Component Tests
  • Input filter bypass, system prompt override, multi-turn persistence

What You Receive:
The AI Safety Summary Report

Every engagement concludes with a structured AI Safety Summary Report — a scored, evidence-backed compliance document suitable for internal governance, board-level reporting, enterprise customer due diligence, and regulatory submissions.

Overall scoring across 104 process checks (Yes / No / N/A with evidence)
Per-principle completion rate for each of 11 governance principles
Benchmark results: tests run, passed, failed, and skipped per principle
Prioritized recommendations to close each identified gap
Framework alignment: Hiroshima CoC, NIST AI RMF, EU AI Act labels
Independent Comfort Statement: our formal opinion on your AI system’s safety posture
Request a Sample Report →
AI Safety Summary Report
Cyberfort • 2026
104
Process Checks
69
Yes
22
No
13
N/A
1. Transparency
72%
2. Explainability
85%
4. Safety
60%
5. Security
90%
8. Data Governance
55%

Assessment Packages

Fixed-scope, fixed-price engagements. Delivered remotely with optional on-site sessions for classified or air-gapped systems.

Essential
€6,500
Single LLM application or chatbot
  • All 11 governance principles assessed
  • Baseline benchmark tests (public datasets)
  • Automated vulnerability scanning
  • 4 output risk domains covered
  • Written Summary Report
  • 30-min debrief call
📅 2–3 weeks delivery
Start Assessment →
Enterprise
Custom
AI portfolio, government, or regulated sector
  • Multi-system portfolio assessment
  • Custom multilingual benchmark development
  • Extended red team (2+ weeks)
  • On-site testing capability
  • Cross-framework mapping (AI Verify, ISO 42001, NIS2)
  • Board-level executive report
  • Regulatory submission preparation
  • Dedicated engagement lead
📅 Scope-dependent
Discuss Scope →

Know exactly where your LLM application stands.

Book a free 60-minute scoping call. No commitment required — we’ll assess your system, identify the highest-risk gaps, and recommend the right package.

Response within 1 business day • [email protected]