GenAI Security & Compliance: OWASP LLM Top 10, NIST AI RMF & EU AI Act
Mapping AI security testing to the frameworks people answer to, OWASP LLM Top 10, NIST AI RMF, and the EU AI Act, and what a pentest does and doesn't cover.
This one isn't a payload walkthrough. It's the conversation I keep having lately. A customer sends a 200-line security questionnaire, or legal mentions the EU AI Act, or there's an audit on the calendar, and suddenly someone has to prove the AI is secure instead of just feeling pretty good about it. So let me walk through how I map an AI pentest onto the frameworks people ask about, what each one is for, and where testing stops and paperwork starts.
The scenario: the questionnaire landed on your desk
Say you shipped a support assistant or an internal agent six months ago. It works, customers like it, nobody got hurt. Then a bigger client wants to buy, and their procurement team sends over a spreadsheet asking about your AI risk controls. Now you need answers that hold up. Here's what's usually on the line:
- A sales deal that's blocked until you fill in the security section.
- An audit (SOC 2, ISO, or internal) that now has an AI scope.
- An EU AI Act obligation, because you have EU users or your use case sits in a higher-risk bucket.
- A board or a customer that wants evidence, not a vibe.
The good news is you don't have to satisfy every framework on earth. You aim at whichever one the people asking actually care about.
Which frameworks actually come up?
Three of them, almost always. They aren't competitors. They answer different questions.
OWASP LLM Top 10
This is the practical list of the worst things that go wrong in LLM apps: prompt injection, leaking the system prompt, the model handing back output that gets executed downstream, an agent doing more than it should, poisoned dependencies, and so on. It's the list I'm quietly testing against anyway, whether or not anyone names it. It's concrete, it's short, and it maps to real bugs you can demonstrate. If someone asks "is your AI secure," this is the list that actually answers it.
NIST AI RMF
This is a US process framework for managing AI risk across the whole life of the system, not just security. It talks in terms of govern, map, measure, and manage. It's broader than a pentest. It covers things like documenting intended use, watching for bias, and having someone accountable when the model drifts. It's voluntary, but a lot of US enterprises and government-adjacent buyers like to see you working against it because it shows there's a process, not just a one-time scan.
EU AI Act
This one is regulation, not a suggestion. If you serve EU users, or your system falls into a higher-risk category, there are real obligations attached: risk management, data governance, technical documentation, human oversight, logging. The penalties are the kind that get a CFO's attention. You can't pentest your way to compliance here, but a good security assessment feeds straight into the technical documentation and risk-management parts.
Pick based on who's asking. A US startup customer waving a questionnaire usually cares about OWASP and maybe NIST. A regulated EU deployment means the Act is non-negotiable. Most projects don't need all three at full depth.
How does a pentest map to OWASP LLM Top 10?
This is the part that makes compliance work go smoothly. The findings from a real assessment line up almost one to one with OWASP categories, so I can tag every issue to a label an auditor already recognises. There's no translation step and no "trust me, this matters."
A few of the categories I test most often, with the writeups where I take each one apart:
- Insecure output handling. The model's response gets trusted and passed into a browser, a shell, or a SQL query without sanitising. That's how an LLM bug turns into XSS or worse. More on that in my insecure output handling writeup.
- Sensitive information disclosure. The system prompt, API keys baked into instructions, or another user's data leaking out through a clever question. I cover the extraction tricks in system prompt and data leakage.
- Supply chain. A poisoned model, a backdoored fine-tune, or a dependency pulled from somewhere you didn't vet. See AI supply chain and model security.
- Excessive agency. The agent has tools (email, refunds, database writes) and not enough guardrails on when it's allowed to use them. Walkthrough in AI agent security testing.
When a finding is tagged "OWASP LLM06: Sensitive Information Disclosure, High," the auditor knows exactly what they're looking at, and you know exactly what to fix. That tagging is half the value of doing the test in a compliance context.
Does testing satisfy NIST or the EU AI Act on its own?
No. I'll be honest about this because it saves everyone a headache later. NIST AI RMF and the EU AI Act are about governance and process: who's accountable, how you document intended use, how you handle data, how you keep records, how a human stays in the loop. A pentest doesn't produce any of that.
What a pentest does produce is concrete evidence. It's one of the strongest inputs you can feed into those frameworks, because it's a real measurement of how the system behaves under attack instead of a paragraph that says "we take security seriously." Under NIST's "measure" function, it's exactly the kind of artifact they want. Under the Act's technical documentation and risk-management requirements, it slots right in.
But it's an input, not the deliverable. You still need the policies, the records, and someone whose name is on the risk decisions. Anyone who tells you a single test makes you "EU AI Act compliant" is selling you something.
What I deliver for a compliance-driven test
When the reason for the engagement is a questionnaire or an audit, I shape the report so it's useful to the people who have to sign things, not just the engineers who have to patch them. Every finding comes with:
- A tag mapping it to the relevant OWASP LLM Top 10 category, so it drops straight into an audit trail.
- A clear impact and severity rating, written so a non-engineer can tell a shrug from a fire.
- Reproduction steps an engineer can follow to confirm and re-test after the fix.
- A plain-language risk note, and where it's relevant, a line connecting it back to the framework the client cares about (NIST function or EU AI Act obligation).
The goal is a document you can hand to a customer's security team, attach to your technical file, or drop into an audit folder without rewriting it first.
Where to start if a deadline is coming
If the calendar is the problem, don't try to boil the ocean across three frameworks at once. Here's the order that works:
- Start with the OWASP LLM Top 10. It's concrete, it's testable, and it covers the things that actually sink AI apps. It's also the list that maps most directly to fixes.
- Run a real assessment against it. Not a checklist someone filled in from memory, an actual test that tries the attacks.
- Fix what comes back, and re-test to confirm the fix held.
- Then build the governance and documentation (NIST, the Act) around a system you've already hardened, instead of writing policy for a system you're hoping is fine.
Doing it in that order means by the time you're filling in the process paperwork, you have real evidence to put behind every claim. That's a much easier audit than the one where the policy exists but nobody ever checked whether the product matched it.
This mapping work sits inside the wider LLM penetration testing assessment I run, and the individual writeups linked above go deep on each category. If you've got a questionnaire, an audit, or an EU AI Act obligation bearing down on you and you need evidence that holds up, tell me what you're building and we'll work out which framework you actually need to aim at.