TLDR. AI vendors raise risks ordinary SaaS due diligence misses: training-data exposure, prompt-injection attack surface, model-version drift, opaque sub-processors, and unclear retention of prompts, completions, and embeddings. This guide gives you a 14-question scoring matrix calibrated for GenAI vendors, maps each question to NIST AI RMF and the model-risk-management principles of SR 11-7, and walks a worked example through an LLM-API procurement. The end-state is a single repeatable assessment your TPRM team can run in an afternoon and your DPO can defend.
Why AI vendor diligence is different
Traditional third-party risk management (TPRM) was designed for vendors that hold data at rest. Most of the questions in a SIG, SOC 2 review, or DPA negotiation assume a stable model of "vendor receives data, stores data, processes data per contract." AI vendors break that model in five specific ways:
- Training-data exposure. Inputs you submit may be used to train or fine-tune the vendor's models — potentially surfacing fragments of your inputs in completions to other customers.
- Prompt injection. User-controlled inputs that traverse an AI vendor are part of an attack surface that does not exist in non-AI SaaS. Malicious content in the input stream can subvert the model's behaviour.
- Model-update risk. The "system" you bought changes underneath you. A vendor that silently swaps GPT-4 for a quantised distillation can shift accuracy, latency, and bias profile without notice.
- Sub-processor opacity. Many AI vendors are themselves wrappers around foundation-model providers. The data flows two or three hops deep, and the sub-processor list often lags reality.
- Retention asymmetry. Prompts, completions, embeddings, and RLHF feedback all have separate retention regimes that are rarely disclosed in a single document.
Where this slots into existing TPRM and model risk
This is not a parallel programme. AI vendor risk assessment extends two existing frameworks:
SR 11-7 model risk management principles
The Federal Reserve's SR 11-7 (and its OCC counterpart, OCC 2011-12) established the discipline of model risk management: models in use must have documented purpose, data provenance, validation, monitoring, and governance. The principles transfer cleanly to vendor-supplied AI — even though SR 11-7 was written for banks, the discipline of "every model has an owner, a documented purpose, a validation file, and a monitoring plan" is the right shape for AI vendors in any regulated industry.
NIST AI Risk Management Framework
NIST AI RMF (AI 100-1) organises AI risk around four functions: Govern (policies, roles, accountability), Map (context, scope, impact), Measure (metrics for trustworthiness characteristics), and Manage (prioritisation, response, communication). The 2024 Generative AI Profile adds GenAI-specific considerations: confabulation, hazardous output, intellectual-property risk, and information-integrity. Each of the 14 questions below maps to one or more NIST AI RMF functions.
Where it lives in the privacy programme
An AI vendor assessment is also a DPIA trigger under GDPR Art. 35(1) and a data-protection-assessment trigger under U.S. state privacy laws. See DPIA vs PIA for the assessment shape, and the working DPIA template for the artefact itself. When the AI vendor onboarding closes, the residual controls flow into your ROPA (see also our ROPA template) and the vendor lands on your sub-processor list — with the relevant change-notification workflow described in our sub-processor change notification template.
The 14-question scoring matrix
Each question gets a 1–5 score (1 = unacceptable, 5 = strong) and a risk weight (1–3). Total score is the sum of score × weight. A weighted score below 60% of the maximum is a stop-deal signal at most organisations.
| # | Question | What "5" looks like | Weight | NIST AI RMF |
|---|---|---|---|---|
| 1 | Data residency & processing locations. Where exactly do prompts, completions, embeddings, and logs reside? | Documented region pinning, customer-selectable region, no out-of-region failover without notice. | 3 | Map, Govern |
| 2 | Training opt-out. Is customer data excluded from training and fine-tuning by default, with contractual commitment? | Excluded by default and contractually; published technical control; auditable. | 3 | Govern, Manage |
| 3 | Log retention. How long are prompts and completions retained, where, and who can access them? | Configurable retention (including zero-day or 24-hour); customer-controlled key for at-rest encryption; access logged. | 3 | Measure, Manage |
| 4 | Embeddings retention. Are embeddings derived from your data persisted, and if so under what control? | Embeddings retained only as long as your active index requires; deletable per item; not reused across tenants. | 2 | Map, Manage |
| 5 | RLHF / feedback data. Is feedback or thumbs-up/down data used to improve the vendor's model across customers? | Used only for your tenant, or fully opt-in with clear notice. | 2 | Govern |
| 6 | Model versioning. Can you pin a specific model version? Are deprecations announced with timelines? | Explicit version pinning available; deprecation notice ≥ 6 months; changelog published. | 3 | Govern, Manage |
| 7 | Override rights. Can the vendor force-migrate you to a new model without your consent? | Contractual commitment to no silent migration; documented migration path. | 2 | Govern |
| 8 | Sub-processor list. Is the list of sub-processors (foundation-model providers, hosting, observability) published and kept current? | Public sub-processor page with change feed and 30-day notice. | 3 | Map, Govern |
| 9 | Prompt-injection & output-safety controls. What controls exist against prompt injection, data exfiltration via outputs, and hazardous content? | Documented input/output filtering, abuse monitoring, red-team programme, customer-configurable safety policy. | 3 | Measure, Manage |
| 10 | DPA & legal terms. Is there a current DPA covering AI-specific processing, including the SCCs where applicable? | Standalone GenAI DPA addendum or AI-specific clauses; current SCCs Module 2; commitments on transfer impact. | 3 | Govern |
| 11 | SOC 2 / ISO 27001 / CSA STAR posture. Are the foundational security certifications current and scoped to the AI offering specifically? | SOC 2 Type II covering the AI offering; ISO 27001 + ISO 27701 + ISO 42001 where applicable; CSA STAR Level 2. | 2 | Govern |
| 12 | Bias / fairness testing. Has the vendor evaluated and disclosed model performance across demographic slices for your use case? | Public model card with disaggregated evaluation; customer access to evaluation results; willingness to support your own testing. | 2 | Measure |
| 13 | Incident response & breach notification. SLAs for security incidents involving AI-specific failure modes (training-data leak, jailbreak, model exfiltration)? | Documented IR plan including AI-specific scenarios; notification ≤ 24h; named contact. | 3 | Manage |
| 14 | Explainability & logging for audit. Can you reconstruct what the model saw and produced for a specific request months later? | Per-request audit log retrievable on demand; tied to your tenant identity; export available. | 2 | Measure, Govern |
Maximum weighted score: 5 × (3+3+3+2+2+3+2+3+3+3+2+2+3+2) = 5 × 36 = 180. Threshold for green-light: 108 (60%). Anything between 108 and 144 needs compensating controls and a remediation plan. Below 108 is a stop-deal.
Worked example: assessing a generative AI vendor
The hypothetical: your engineering team wants to integrate an LLM-API vendor — call them "LumenAI" — to power an in-product assistant that helps your customers draft policy documents. The assistant will send the customer's draft (which may contain employee names, internal financial commentary, and occasionally health information from HR contexts) to LumenAI's API, receive completions, and surface them in your UI. Below is what a real first-pass scoring conversation looks like.
The findings
| # | Question | Vendor answer | Score |
|---|---|---|---|
| 1 | Data residency | EU region selectable; US fallback only if EU is unavailable, with customer notice and 24h SLA to restore. | 4 |
| 2 | Training opt-out | API-tier traffic excluded from training by default; contractually committed; technical attestation in SOC 2 report. | 5 |
| 3 | Log retention | 30 days default for abuse monitoring; configurable to 0 days for enterprise tier; encryption with vendor-managed keys, BYOK roadmap. | 3 |
| 4 | Embeddings retention | Embeddings stored only if you use the vendor's vector store; deletable on request; not reused across tenants. | 4 |
| 5 | RLHF / feedback data | Thumbs up/down feedback used only within your tenant; vendor-side aggregate feedback is opt-in. | 5 |
| 6 | Model versioning | Pinning available for named model versions; deprecation notice 12 months; changelog published. | 5 |
| 7 | Override rights | Contract permits vendor to migrate after deprecation; commits to no silent migration. | 4 |
| 8 | Sub-processor list | Public sub-processor page; 30-day notice; RSS change feed. | 5 |
| 9 | Prompt-injection / output-safety controls | Documented input/output filtering, abuse monitoring, customer-configurable safety thresholds, red-team programme published. | 4 |
| 10 | DPA & legal terms | Current SCCs Module 2; standalone GenAI DPA addendum; transfer impact assessment available on request. | 5 |
| 11 | SOC 2 / ISO posture | SOC 2 Type II covering API offering; ISO 27001 + ISO 42001 in progress. | 4 |
| 12 | Bias / fairness testing | Public model card with general benchmarks; no disaggregated evaluation for policy-drafting use case. | 2 |
| 13 | Incident response & breach notification | Documented IR plan; notification 48h; named security contact; no AI-specific scenario in published plan. | 3 |
| 14 | Explainability & audit logging | Per-request log retrievable for 90 days; tied to API key; export to S3 available. | 4 |
Weighted total: 4·3 + 5·3 + 3·3 + 4·2 + 5·2 + 5·3 + 4·2 + 5·3 + 4·3 + 5·3 + 4·2 + 2·2 + 3·3 + 4·2 = 12 + 15 + 9 + 8 + 10 + 15 + 8 + 15 + 12 + 15 + 8 + 4 + 9 + 8 = 148/180 (82%).
That's a green-light score with two follow-ups: the 2 on bias/fairness testing and the 3 on incident response. The compensating actions would be (a) running your own bias evaluation on the policy-drafting workload, and (b) negotiating an AI-specific incident-response addendum in the contract. The DPIA records both as residual risks with named owners.
Red flags
Patterns we keep seeing in low-scoring assessments — any one of these should pause the procurement:
- Training opt-out only available on the "enterprise tier" and not contractually backed.
- "We delete logs on request" with no documented SLA or audit mechanism.
- No published sub-processor list, or a sub-processor page that hasn't been updated in 12+ months.
- Foundation-model provider not disclosed ("we use the best model for the job").
- No model card, or model card without disaggregated evaluation for the use case.
- DPA that omits AI-specific commitments and references only generic SaaS clauses.
- "Confidential by request" attitude toward security artefacts — no SOC 2 letter, no penetration test summary.
- Marketing claims about "private models" or "your data never leaves" that are not technically substantiated in the SOC 2 or architecture document.
- Contract permits silent model migration with no notification.
- No incident response commitment beyond a generic "we'll notify per applicable law."
Green-light checklist
Before the contract is signed and the integration goes live, this is what we expect to be on file:
- Completed 14-question scoring matrix with a weighted score and a signed-off remediation plan for any score ≤ 3.
- Current DPA covering AI-specific processing, including SCCs where applicable and a transfer impact assessment.
- Sub-processor list reviewed and any sub-processors with elevated risk individually scored.
- SOC 2 Type II (current) covering the AI offering specifically, plus any relevant ISO certifications.
- Documented data flow — what fields leave your system, what comes back, where they are stored, for how long.
- Configured retention — the lowest retention the vendor offers, switched on at procurement, not after first incident.
- Pinned model version and a defined upgrade evaluation process.
- DPIA closed and filed; residual risks named and owned.
- ROPA updated with the new vendor row (see our ROPA template).
- Customer-facing disclosure — if the integration changes who processes customer data, disclose in your sub-processor list with appropriate notice. See the change notification template.
- Monitoring plan — what you will watch (latency, output quality, abuse signals, cost) and at what cadence.
- Exit plan — how you would migrate off the vendor if a sub-processor change, model deprecation, or pricing event made you want to.
Operational notes
Refresh cadence
Standard SaaS assessments refresh annually. AI vendor assessments need to refresh on three triggers: annual, on every material model change, and on every sub-processor change. The first is calendar-driven; the second and third are event-driven and require the vendor to push notifications you can ingest. If they don't, you'll always be behind.
Who owns it
The assessment is most often co-owned: security owns the technical-control questions (1, 3, 9, 13), privacy / DPO owns the data-handling and DPA questions (2, 4, 5, 10), and the model owner — the business sponsor who actually depends on the vendor — owns model-versioning, bias, and monitoring (6, 7, 12, 14). Without a model owner the assessment becomes a paperwork exercise.
Connection to procurement
The assessment is most effective when it is a gating control at procurement, not a parallel exercise that runs after the contract is signed. Tying the assessment to a purchase-order block in your procurement system is the cheapest behavioural intervention available. Your trust architecture page is the natural place to publish the vendor's place in the system once it has cleared diligence.
FAQ
Do we need a separate process if we're already running SR 11-7 model governance?
No — you extend it. SR 11-7 already requires documented inventory, validation, and ongoing monitoring for every model in use, including third-party. The 14 questions above slot into the validation and ongoing-monitoring sections of your existing model risk framework. For non-FI organisations, the NIST AI RMF gives you the same structure without the bank-specific framing.
What about open-source models we host ourselves?
Self-hosted models eliminate the sub-processor and training-data questions but introduce questions about model provenance, weight integrity, and supply-chain risk. The same matrix applies with questions 1, 5, 8, and 10 reinterpreted: "data residency" becomes "where do you host," "training opt-out" becomes "what data did you fine-tune on," and so on.
Does this cover the EU AI Act?
Partially. The matrix above covers the operational and privacy aspects of AI vendor diligence. EU AI Act compliance (especially the high-risk system classification, technical documentation, and conformity assessment requirements) is a separate workstream that the model owner needs to drive. The assessment surfaces the data you need to make the classification decision; it does not make it for you.
Can a CIPP/E or CIPM run this without an ML background?
For the data-handling and contractual questions, yes. For questions 6, 9, 12, and 14, you need a model owner with technical literacy to evaluate vendor answers, or you'll accept marketing language at face value.
Run AI vendor diligence in an afternoon, not a fortnight.
Pre-built GenAI assessment template, scoring matrix, and DPIA flow that update your sub-processor list and ROPA in one move. Free 14-day trial, no credit card.