Audit log centralisé — append-only + signature chaînée

Modules : platform/audit-svc (Ktor + Postgres append-only) — shared/audit-client-jvm (lib intégrée 1 ligne par service, outbox local + retry).

ADR : ADR-036.

Dépendances : ADR-002 RLS Postgres, ADR-005 stockage S3, ADR-035 observability.

Cette page fixe la spec d’ingénierie complète de l’audit log VitaKYC. Tout dev qui active un nouveau module doit pouvoir, sans question résiduelle :

Émettre un event d’audit en 1 ligne depuis n’importe quelle route Ktor
Comprendre quand émettre (catalogue d’actions) et quel format (severity, outcome, details)
Garantir non-altérabilité via la chaîne HMAC + savoir comment la vérifier
Servir une demande DSAR en moins d’1 h
Tester son intégration sans Postgres (AuditClientFake in-memory)

1. Vue d’ensemble

Flow nominal d’un event :

Le service applicatif appelle auditClient.log(action, actor, ...) — non-bloquant
La lib pousse dans une outbox locale (mémoire bornée 10 K + spill H2 si plein/crash)
Worker async drain : POST /v1/audit/events vers audit-svc avec retry exponentiel (1s, 2s, 4s, …, max 5 min)
audit-svc reçoit, calcule chain_hash = HMAC(secret, prev_chain_hash || canonical(event)), INSERT en append-only avec RLS tenant
ACK 201 → outbox marque l’event comme drainé
Toutes les nuits : job archive la fenêtre [J-90, J-180] vers S3 Object Lock + verifies chain integrity

2. Modèle d’event

2.1 Schema Kotlin (lib partagée)

@Serializable
data class AuditEvent(
    val eventId: String,                    // UUID
    val ts: Instant,                         // ISO-8601 UTC
    val tenantId: String?,                   // null pour events platform
    val actor: AuditActor,
    val service: String,                     // "auth-svc"
    val serviceVersion: String,              // "0.1.0"
    val action: String,                      // "LOGIN_OK"
    val resource: String? = null,            // "form:FORM_KYC_INDIVIDUAL@v2.7.0"
    val severity: AuditSeverity,             // INFO, NOTICE, WARN, ALERT
    val outcome: AuditOutcome,               // success, failure, denied
    val ipAddress: String? = null,
    val userAgent: String? = null,
    val traceId: String? = null,             // depuis OTel
    val requestId: String? = null,
    val details: Map<String, JsonElement> = emptyMap()  // application-specific, ≤ 16 KB
)

@Serializable
data class AuditActor(
    val userId: String,                      // "user-amine" ou "system" ou "service:tenant-svc"
    val kind: ActorKind                      // human, system, service
)

enum class ActorKind { human, system, service }
enum class AuditSeverity { INFO, NOTICE, WARN, ALERT }
enum class AuditOutcome { success, failure, denied }

2.2 Validation (côté lib)

Règle	Effet
`action` matches `^[A-Z][A-Z0-9_]{2,127}$`	sinon `AuditValidationError`
`details` ≤ 16 KB sérialisé	sinon truncation + warning
`details` ne contient aucune clé interdite (`password`, `secret`, `token`, `cvv`, `pan`, `cvc`, `cvv2`, `pin`, `private_key`)	sinon `PiiViolation` (compile-time fail-fast en dev, runtime warn + redaction en prod)
`actor.userId` non blank	sinon `AuditValidationError`
`service` non blank	injecté automatiquement depuis la config

3. Lib `shared/audit-client-jvm`

3.1 API publique

interface AuditClient {
    /**
     * Émet un event d'audit. Non-bloquant par défaut (fire-and-forget via outbox).
     *
     * @param awaitConfirmation si true, bloque jusqu'à l'ACK audit-svc (events critiques)
     */
    suspend fun log(
        action: String,
        actor: AuditActor,
        tenantId: String? = null,
        resource: String? = null,
        severity: AuditSeverity = AuditSeverity.INFO,
        outcome: AuditOutcome = AuditOutcome.success,
        details: Map<String, JsonElement> = emptyMap(),
        awaitConfirmation: Boolean = false
    ): AuditAck
}

data class AuditAck(val eventId: String, val confirmed: Boolean)

3.2 Configuration

data class AuditClientConfig(
    val auditSvcUrl: String = System.getenv("AUDIT_SVC_URL") ?: "http://localhost:8084",
    val service: String,
    val serviceVersion: String,
    val outboxMaxMemory: Int = 10_000,
    val outboxSpillDir: java.nio.file.Path? = null,    // si null, pas de spill disque
    val httpTimeoutMs: Long = 5_000,
    val retryInitialDelayMs: Long = 1_000,
    val retryMaxDelayMs: Long = 300_000,               // 5 min
    val retryMaxAttempts: Int = 8,                     // jusqu'à ~5h pour épuiser
    val maxDetailsBytes: Int = 16_384,
    val ipFromHeader: String = "X-Forwarded-For"
)

3.3 Wire dans un service

fun Application.module() {
    configureObservability("auth-svc", "0.1.0")
    val audit = AuditClientFactory.build(AuditClientConfig(
        service = "auth-svc",
        serviceVersion = "0.1.0",
    )).also {
        // shutdown propre du worker outbox au stop
        environment.monitor.subscribe(ApplicationStopping) { it.shutdown() }
    }
    attributes.put(AuditClientKey, audit)
    // ... routes
}

post("/v1/sessions") {
    val req = call.receive<LoginRequest>()
    val result = authService.authenticate(req)
    if (result.success) {
        application.audit.log(
            action = "LOGIN_OK",
            actor = AuditActor(result.userId, ActorKind.human),
            tenantId = result.tenantId,
            ipAddress = call.request.origin.remoteHost,
            details = mapOf("auth_method" to JsonPrimitive(result.authMethod))
        )
    } else {
        application.audit.log(
            action = "LOGIN_FAIL",
            actor = AuditActor(req.username, ActorKind.human),
            severity = AuditSeverity.NOTICE,
            outcome = AuditOutcome.failure,
            details = mapOf("reason" to JsonPrimitive(result.failureReason))
        )
    }
    // ...
}

3.4 Outbox interne

[ AuditClient.log() ]
       ↓
[ Validate (regex action, size details, PII guard) ]
       ↓
[ Enrich (eventId, ts, traceId/requestId from OTel) ]
       ↓
[ Outbox in-memory (BlockingDeque max 10K) ]
       ↓                                     ↘
       ↓ awaitConfirmation=false              ↓ overflow / crash recovery
       ↓ → return immédiat                    ↓ spill to H2 file (optional)
       ↓
[ Worker coroutine async ]
       ↓
[ HTTP POST /v1/audit/events ]
       ↓ 201 Created → ACK
       ↓ 5xx / timeout → retry expo (1s, 2s, 4s, ..., max 5min)
       ↓ après retryMaxAttempts → DLQ disk + alerte ops

3.5 Tests : `AuditClientFake`

Pour les tests d’intégration des services, la lib fournit :

class AuditClientFake : AuditClient {
    val events: List<AuditEvent>      // recorded calls
    suspend fun log(...): AuditAck    // capture in-memory
}

Permet d’asserter dans un test :

@Test
fun `LOGIN_OK is audited on successful login`() = testApplication {
    val fakeAudit = AuditClientFake()
    application {
        attributes.put(AuditClientKey, fakeAudit)
        module()
    }
    client.post("/v1/sessions") { setBody(...) }
    assertThat(fakeAudit.events).anySatisfy { event ->
        assertThat(event.action).isEqualTo("LOGIN_OK")
        assertThat(event.actor.userId).isEqualTo("user-amine")
    }
}

4. Service `audit-svc`

4.1 Endpoints

Méthode	Path	Description	Auth
`POST`	`/v1/audit/events`	Ingestion d’un event (idempotent par `eventId`)	service token (mTLS prod)
`POST`	`/v1/audit/events:batch`	Ingestion batch (jusqu’à 1 000 events)	idem
`GET`	`/v1/audit/events`	Query events (filtres : tenant, actor, action, ts range, severity)	bearer admin
`GET`	`/v1/audit/events/:id`	Récupérer un event par `eventId`	bearer admin
`GET`	`/v1/audit/dsar/:userId`	DSAR fulfillment : tous les events liés à un userId (cross-services)	bearer admin + step-up
`GET`	`/v1/audit/chain/verify`	Vérification chain (range optionnel `from/to`) — async batch	bearer admin
`GET`	`/v1/audit/health`	Health check	public

4.2 Schema Postgres (extrait migration Flyway)

-- V1__audit_event.sql
CREATE TABLE audit_event (
  event_id        UUID PRIMARY KEY,
  ts              TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  tenant_id       VARCHAR(64),
  actor_user_id   VARCHAR(128) NOT NULL,
  actor_kind      VARCHAR(16) NOT NULL CHECK (actor_kind IN ('human','system','service')),
  service         VARCHAR(64) NOT NULL,
  service_version VARCHAR(32) NOT NULL,
  action          VARCHAR(128) NOT NULL,
  resource        VARCHAR(256),
  severity        VARCHAR(16) NOT NULL CHECK (severity IN ('INFO','NOTICE','WARN','ALERT')),
  outcome         VARCHAR(32) NOT NULL CHECK (outcome IN ('success','failure','denied')),
  ip_address      INET,
  user_agent      TEXT,
  trace_id        VARCHAR(32),
  request_id      VARCHAR(64),
  details         JSONB NOT NULL DEFAULT '{}'::jsonb,
  prev_chain_hash CHAR(64) NOT NULL,
  chain_hash      CHAR(64) NOT NULL,
  hmac_key_id     VARCHAR(32) NOT NULL,
  CONSTRAINT details_size CHECK (octet_length(details::text) <= 16384)
);

CREATE INDEX idx_audit_tenant_ts ON audit_event (tenant_id, ts DESC);
CREATE INDEX idx_audit_actor_ts  ON audit_event (actor_user_id, ts DESC);
CREATE INDEX idx_audit_action_ts ON audit_event (action, ts DESC);
CREATE INDEX idx_audit_severity_ts ON audit_event (severity, ts DESC) WHERE severity IN ('WARN','ALERT');
CREATE INDEX idx_audit_details_gin ON audit_event USING gin (details);

-- Append-only enforced
REVOKE UPDATE, DELETE ON audit_event FROM app_role;

-- RLS multi-tenant
ALTER TABLE audit_event ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON audit_event
  USING (tenant_id IS NULL OR tenant_id = current_setting('app.tenant_id', true));

-- Pour les jobs admin cross-tenant : rôle bypass-rls dédié
CREATE ROLE audit_admin BYPASSRLS;

4.3 Signature chaînée — implémentation

data class ChainComputation(
    val prevChainHash: String,
    val chainHash: String,
    val hmacKeyId: String
)

class AuditChain(
    private val secret: ByteArray,
    private val keyId: String,
    private val genesisHash: String = "0".repeat(64)
) {
    fun next(prevChainHash: String, event: AuditEvent): ChainComputation {
        val canonical = canonicalJson(event)
        val mac = Mac.getInstance("HmacSHA256")
        mac.init(SecretKeySpec(secret, "HmacSHA256"))
        mac.update(prevChainHash.toByteArray())
        mac.update(canonical.toByteArray())
        val digest = mac.doFinal()
        return ChainComputation(
            prevChainHash = prevChainHash,
            chainHash = digest.toHex(),
            hmacKeyId = keyId
        )
    }

    /**
     * Re-vérifie la chaîne sur une fenêtre temporelle.
     * Retourne la liste des `event_id` corrompus (vide si OK).
     */
    fun verify(events: List<AuditEvent>, startPrev: String): List<String> { /* ... */ }
}

Canonicalisation JSON : tri lexical des clés, encoding UTF-8 NFC, pas d’espaces, nombres sans zéros superflus (cf ADR-027 §3.1 pour le détail).

4.4 Vérification quotidienne

Job Temporal (cf ADR-001) AuditChainVerifyWorkflow :

Daily 02:00 UTC :
  1. Lire les events de la fenêtre [now-25h, now-1h]
  2. Pour chaque event, recalculer chainHash via secret de hmac_key_id
  3. Si mismatch → INSERT dans audit_anomaly + ALERT severity=page
  4. Émettre métrique vitakyc_audit_chain_verify_anomalies_total

Métrique Prometheus : vitakyc_audit_chain_verify_anomalies_total doit toujours être 0. Toute anomalie déclenche un alert page.

4.5 DSAR fulfillment

GET /v1/audit/dsar/:userId?from=2016-01-01&to=2026-04-30
Authorization: Bearer <admin token + step-up>
X-Justification: "DSAR-2026-0042"

→ 200 OK
Content-Type: application/x-ndjson  (streaming)
Content-Disposition: attachment; filename="dsar-userId-2026-04-30.ndjson"

{"event_id":"...", "ts":"...", "action":"LOGIN_OK", ...}
{"event_id":"...", "ts":"...", "action":"FORM_PUBLISHED", ...}
...

Volume cible : < 1 h pour 10 ans d’events (~10 000 events typiquement par utilisateur). Test charge : extraction de 100 K events en 30 min sur Postgres tier chaud.

5. Retention et archive

5.1 Tiering

Tier	Durée	Stockage	Coût	Restitution
Chaud	0–90 j	Postgres principal	élevé	< 100 ms
Tiède	90 j–1 an	Postgres archive partition	moyen	< 1 s
Froid	1 an–10 ans	S3 Object Lock COMPLIANCE / MinIO WORM	faible	restitution 1–10 min via Athena ou import

5.2 Archive nightly

Job Temporal AuditArchiveWorkflow :

Daily 03:00 UTC :
  1. Sélectionner events de la fenêtre [J-180, J-90] (chaud → tiède)
  2. Compresser en NDJSON.gz par jour + tenant + service
  3. Upload S3 Object Lock COMPLIANCE mode (immutable for 10 years)
  4. Vérifier hash SHA-256 du blob uploadé
  5. INSERT row audit_archive (key, blob_hash, event_count, ts_range)
  6. (mois suivant) Job DELETE des events archivés depuis le tier chaud Postgres
     — autorisé exclusivement par le rôle audit_admin BYPASSRLS, après vérification que l'archive S3 existe.

Note : la suppression du tier chaud après archive S3 est la seule exception au principe append-only Postgres. Elle est journalisée dans une audit_retention_log dédiée, elle-même append-only et hors retention policy.

6. Sécurité et privacy

Anti-PII guard côté lib : refus des keys interdites (password, secret, token, cvv, pan, pin, private_key). En mode prod, redaction silencieuse + warning Prometheus vitakyc_audit_pii_redacted_total.
RLS par tenant : un tenant ne peut JAMAIS lire les events d’un autre tenant. Le rôle audit_admin BYPASSRLS est utilisé exclusivement pour les jobs cross-tenant (vérif chain, archive) et lui-même audité.
Chiffrement at rest : Postgres TDE + S3 SSE-KMS via KEK per-tenant Vault (cf ADR-002).
Chiffrement in transit : mTLS service↔audit-svc en prod ; Bearer token + TLS dev.
Step-up MFA obligatoire pour /v1/audit/dsar/:userId et /v1/audit/chain/verify (cf ADR-033).
X-Justification header obligatoire pour les queries cross-tenant — capturé dans audit_event lui-même (méta-audit).
DSAR retention : la suppression d’un userId doit produire un event RETENTION_PURGED avec actor=system et details.user_id_hashed = sha256(userId) — l’event lui-même est conservé, mais sans le PII.

7. Performance et capacité

Mesure	Cible MVP	Cible V2
Throughput audit-svc ingestion	200 events/s	1 000 events/s
Latence p99 ingestion	< 50 ms	< 20 ms
Latence p99 query simple (tenant + ts range)	< 500 ms (1 M events)	< 200 ms (100 M events)
Volume Postgres / 1 M events	~5 GB	~5 GB
Vérification chain quotidienne	< 30 min sur fenêtre 24h	< 10 min
DSAR fulfillment (10 ans, 10K events utilisateur)	< 1 h	< 15 min
Outbox client mémoire max	10 K events (2 MB heap)	configurable

8. Métriques Prometheus

Metric	Type	Labels
`vitakyc_audit_events_ingested_total`	counter	service, action, severity, outcome
`vitakyc_audit_ingest_duration_seconds`	histogram	service
`vitakyc_audit_chain_verify_anomalies_total`	counter	hmac_key_id
`vitakyc_audit_pii_redacted_total`	counter	service
`vitakyc_audit_outbox_size`	gauge	service
`vitakyc_audit_outbox_drops_total`	counter	service, reason (overflow, retry_exhausted)
`vitakyc_audit_archive_events_archived_total`	counter	day

Alertes (cf observability §6) :

AuditChainAnomaly — vitakyc_audit_chain_verify_anomalies_total > 0 for 1m → page
AuditOutboxOverflow — rate(vitakyc_audit_outbox_drops_total[5m]) > 0 → ticket
AuditIngestSlow — histogram_quantile(0.99, ...) > 0.1 for 5m → ticket

9. Tests obligatoires par service intégrateur

@Test
fun `critical action emits audit event`() = testApplication {
    val fake = AuditClientFake()
    application {
        attributes.put(AuditClientKey, fake)
        module()
    }
    // exécuter l'action métier
    val resp = client.post("/v1/sensitive/action") { ... }
    assertThat(resp.status).isEqualTo(HttpStatusCode.OK)
    // vérifier l'event audit
    assertThat(fake.events).anySatisfy { e ->
        assertThat(e.action).isEqualTo("EXPECTED_ACTION_NAME")
        assertThat(e.severity).isEqualTo(AuditSeverity.NOTICE)
        assertThat(e.outcome).isEqualTo(AuditOutcome.success)
        assertThat(e.actor.userId).isEqualTo("expected-user")
    }
}

Convention : tout test d’intégration d’une action sensible doit avoir un assert audit emitted.

10. Adoption par module — checklist

build.gradle.kts : ajouter implementation(project(":shared:audit-client-jvm"))
Application.kt : créer AuditClient via AuditClientFactory.build(...) et l’attacher aux attributes
Identifier les actions sensibles du service (LOGIN_OK, FORM_PUBLISHED, …)
Ajouter auditClient.log(...) à chaque action sensible
Tests d’intégration : utiliser AuditClientFake et asserter les events
Documenter les actions dans docs-site/.../audit-actions-<service>.md
Vérifier vitakyc_audit_events_ingested_total côté Grafana après déploiement staging
Aucune clé password|secret|token|cvv|pan|pin|private_key dans details (test guard automatique)

11. Migration MVP → V2

Item	MVP (V0)	V2 (S+12)
Transport	HTTP POST + outbox client	+ Kafka topic `vitakyc.audit.v1` (option par flag tenant)
Storage chaud	Postgres single	Postgres partitionné par mois
Storage froid	S3/MinIO Object Lock	+ index Athena pour query directe
Verification	nightly batch	continuous (streaming) avec ksqlDB
Search avancée	Postgres GIN	+ OpenSearch index par tenant (option)
Analytics	Prometheus + Grafana	+ datalake exports (Parquet S3)

12. Références

ADR-036
Monorepo VitaKYC — où audit-svc et audit-client-jvm s’insèrent
Observability — corrélation trace_id ↔ events audit
Standards : BCT Circulaire 2017-08 LCB-FT, RGPD art. 30 et 32, SOC 2 CC4.1, ISO 27001 A.12.4, NIST SP 800-92

Spec audit log centralisé — version 1.0 (2026-04-30). Mises à jour bloquantes nécessitent un ADR.