> ## Documentation Index
> Fetch the complete documentation index at: https://docs.agnost.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Governance

> How to send production conversation data to Agnost while minimizing sensitive data exposure

Agnost analyzes the conversation and event data you send to it. That usually includes user prompts, agent outputs, tool calls, error messages, user identifiers, and metadata. If your product handles job seekers, patients, students, employees, or other sensitive user groups, treat this data as sensitive before it leaves your system.

This page explains how to instrument Agnost safely today.

<Note>
  Agnost does not currently provide automatic PII redaction or DLP before ingestion. If a field can contain personal data, confidential business data, or regulated data, redact, pseudonymize, or omit it in your application before sending it to Agnost.
</Note>

## What Agnost receives

Depending on the integration path, Agnost may receive:

| Data type           | Examples                                              | Send it?                                                                                                                                              |
| ------------------- | ----------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| User identifiers    | `user_id`, account ID, tenant ID                      | Yes, but use stable pseudonymous IDs instead of raw emails or names.                                                                                  |
| Conversation input  | User prompts, chat turns, voice transcripts           | Yes when needed for analysis; redact sensitive fields first.                                                                                          |
| Agent output        | Assistant replies, tool results, generated actions    | Yes when needed for analysis; redact secrets and personal data first.                                                                                 |
| Tool/event metadata | Tool name, model, latency, success flag, intent, plan | Yes. Prefer allowlisted operational metadata.                                                                                                         |
| User traits         | Plan, role, company segment, cohort                   | Yes if useful; avoid name, email, phone, address, resume, SSN, health, financial, or job-application details unless you have approved that data flow. |

## Recommended defaults

Use these defaults unless your legal/security review approves something broader:

1. Use an internal stable `user_id` instead of email, phone, or full name.
2. Send only metadata fields you intentionally allowlist.
3. Redact obvious PII from `input`, `output`, tool arguments, and tool results.
4. Do not send secrets, API keys, access tokens, passwords, private keys, or auth headers.
5. Do not send resumes, full job applications, government IDs, health records, payment card data, or other regulated data unless you have a specific agreement and retention plan.
6. Keep a local mapping from your internal user ID to the real person in your own system, not in Agnost metadata.

## Pseudonymous user identity

Prefer this:

```python theme={null}
agnost.identify("user_8f3a91", {
    "plan": "team",
    "role": "recruiter",
    "account_segment": "mid_market"
})
```

Avoid this unless explicitly approved:

```python theme={null}
agnost.identify("alice@example.com", {
    "name": "Alice Smith",
    "email": "alice@example.com",
    "phone": "+1-555-0100"
})
```

The first example still lets you analyze behavior by user, plan, and segment. It does not expose directly identifying traits in Agnost.

## Redact before sending

Add a small scrubber around your instrumentation layer. Keep it close to the code that calls Agnost so every integration path uses the same policy.

```python theme={null}
import re

EMAIL_RE = re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b", re.I)
PHONE_RE = re.compile(r"\+?\d[\d\s().-]{7,}\d")

def redact_for_agnost(text: str | None) -> str:
    if not text:
        return ""
    text = EMAIL_RE.sub("[redacted_email]", text)
    text = PHONE_RE.sub("[redacted_phone]", text)
    return text

interaction = agnost.begin(
    user_id="user_8f3a91",
    agent_name="support-agent",
    input=redact_for_agnost(user_message),
    properties={
        "plan": "team",
        "intent_source": "support_chat"
    }
)

result = call_agent(user_message)
interaction.end(output=redact_for_agnost(result))
```

For structured tool calls, redact before JSON serialization or remove sensitive keys entirely:

```python theme={null}
SAFE_KEYS = {"tool_name", "status", "error_code", "model", "latency_ms"}

def allowlisted_metadata(metadata: dict) -> dict:
    return {key: value for key, value in metadata.items() if key in SAFE_KEYS}
```

## Metadata allowlist

Metadata is often more useful than raw personal data. Start with operational fields:

```json theme={null}
{
  "model": "gpt-4.1",
  "plan": "team",
  "agent_version": "2026-07-02",
  "surface": "onboarding",
  "intent": "setup_friction",
  "success": false
}
```

Avoid free-form user traits like:

```json theme={null}
{
  "email": "alice@example.com",
  "resume_text": "...",
  "home_address": "...",
  "cover_letter": "..."
}
```

## OpenTelemetry integrations

Many OTel integrations capture prompts, messages, tool parameters, and tool results automatically. Before enabling full traces in production:

1. Review what your framework exports.
2. Disable or scrub message/tool attributes that contain sensitive data.
3. Keep `user.id`, `session.id`, and tenant metadata pseudonymous.
4. Test with one staging trace and inspect the raw event in Agnost before rolling out broadly.

## If you handle regulated or high-risk data

If your users may enter health data, payment card data, government IDs, children's data, candidate/job-application data, or other regulated information, do not enable raw input/output capture until your team has reviewed the data flow.

Use one of these patterns instead:

| Pattern                     | When to use                                                                   |
| --------------------------- | ----------------------------------------------------------------------------- |
| Metadata-only capture       | You only need latency, success, agent name, model, and intent labels.         |
| Redacted transcript capture | You need conversation analysis but can remove direct identifiers and secrets. |
| Sampled capture             | You need debugging coverage on a limited subset of traffic.                   |
| Customer-approved capture   | You have explicit contractual approval for the data categories being sent.    |

For security or data-processing questions, contact [founders@agnost.ai](mailto:founders@agnost.ai).

## Instrumentation checklist

Before going live:

* [ ] Replace raw emails/names with internal user IDs.
* [ ] Remove secrets from inputs, outputs, tool args, and tool results.
* [ ] Allowlist metadata keys.
* [ ] Redact obvious PII from text fields.
* [ ] Confirm whether raw transcripts are necessary, or whether metadata-only events are enough.
* [ ] Run one test conversation and inspect the event in the Agnost dashboard.
* [ ] Document internally which fields your integration sends.

## Summary

Agnost is most useful when it can see real production behavior, but you control the data boundary. Send enough signal to debug and improve your agent, and keep directly identifying or regulated data in your own system unless it has been explicitly approved for ingestion.
