Skip to content

Moderation Pipeline (Code Level)

C4 Level 4 detail of the PSI moderation system. This page shows the internal structure of the moderation module, its two-pass AI flow, and the standalone moderation service.

Module Structure

The moderation system spans two codebases: the built-in moderation module in the PSI server, and the standalone moderation service (moderation-service/).

classDiagram
    class ModerationModule {
        +publicFunctions
        +adminFunctions
        checkCommentWithGpt(store, params)
        checkCommentWithGptHeavy(store, params)
        recordJudgement(store, params)
        getJudgementHistory(store, params)
    }

    class JigsawModule {
        +publicFunctions
        measureToxic(store, params)
    }

    class PremodReviewModule {
        +publicFunctions
        +adminFunctions
        recordPremodData(store, params)
        getPremodReviewData(store, params)
        annotatePremodData(store, params)
    }

    class LLMAdapter {
        <<interface>>
        +chatCompletion(messages, options) Promise~Response~
    }

    class OpenAIAdapter {
        +chatCompletion(messages, options)
    }

    class AzureOpenAIAdapter {
        +chatCompletion(messages, options)
    }

    class ZDFOpenAIAdapter {
        +chatCompletion(messages, options)
    }

    class JigsawAdapter {
        <<interface>>
        +analyze(text, languages) Promise~ToxicityScores~
    }

    class PerspectiveAPIAdapter {
        +analyze(text, languages)
    }

    class ZDFJigsawAdapter {
        +analyze(text, languages)
    }

    ModerationModule --> LLMAdapter : uses for AI checks
    ModerationModule --> JigsawAdapter : uses for toxicity
    ModerationModule --> PremodReviewModule : stores flagged data
    JigsawModule --> JigsawAdapter : direct toxicity measurement
    LLMAdapter <|.. OpenAIAdapter
    LLMAdapter <|.. AzureOpenAIAdapter
    LLMAdapter <|.. ZDFOpenAIAdapter
    JigsawAdapter <|.. PerspectiveAPIAdapter
    JigsawAdapter <|.. ZDFJigsawAdapter

Two-Pass Moderation Flow (Detailed)

sequenceDiagram
    participant Client as PSI Frontend
    participant Backend as PSI Backend (Hono)
    participant Prompts as server/prompts/
    participant Light as Light Model (gpt-4o-mini)
    participant Heavy as Heavy Model (gpt-4o)
    participant Jigsaw as Perspective API
    participant Store as ServerStore
    participant PremodReview as PremodReview Module
    participant Email as Email Adapter

    Client->>Backend: POST /api/moderation/checkCommentWithGpt

    Backend->>Prompts: Load moderate.txt (system prompt)
    Prompts-->>Backend: System prompt with community guidelines

    par Parallel AI checks
        Backend->>Light: chatCompletion(system + user prompt)
        Note over Light: User prompt includes:<br/>comment text, parent context,<br/>community guidelines, response format
        Light-->>Backend: JSON: {decision, explanation, confidence}

        Backend->>Jigsaw: analyze(commentText, [language])
        Jigsaw-->>Backend: {toxicity, severeToxicity, insult, threat, ...}
    end

    Backend->>Store: Record light model result + toxicity scores

    alt decision = "flag" (needs heavy review)
        Backend->>Prompts: Load heavy/moderate.txt
        Backend->>Heavy: chatCompletion(system + user prompt + light result)
        Heavy-->>Backend: JSON: {decision, explanation, confidence}
        Backend->>Store: Record heavy model result
    end

    alt PREMOD_REVIEW_ENABLED = true AND comment rejected
        Backend->>PremodReview: recordPremodData(comment, decision)
        Note over PremodReview: Stores comment without user ID<br/>30-day TTL for GDPR compliance
    end

    alt Comment rejected
        Backend->>Email: Send rejection notification to user
    end

    Backend-->>Client: {decision, explanation, toxicityScores}

Comment State Machine

stateDiagram-v2
    [*] --> Submitted: User posts comment

    Submitted --> LightModelCheck: Sent to gpt-4o-mini

    LightModelCheck --> Approved: Light model approves
    LightModelCheck --> Rejected: Light model rejects
    LightModelCheck --> HeavyModelCheck: Light model flags

    HeavyModelCheck --> Approved: Heavy model approves
    HeavyModelCheck --> Rejected: Heavy model rejects
    HeavyModelCheck --> PendingReview: Heavy model uncertain

    PendingReview --> Approved: Moderator approves
    PendingReview --> Rejected: Moderator rejects

    Approved --> Published: Visible to all users
    Rejected --> Hidden: Not visible, user notified

    Published --> [*]
    Hidden --> [*]

Pre-Moderation Review Data Flow

graph TD
    subgraph "Comment Submission"
        A[User submits comment] --> B{AI moderation}
        B -->|Approved| C[Published]
        B -->|Rejected/Flagged| D[Comment blocked]
    end

    subgraph "Pre-Mod Review Pipeline"
        D --> E[Strip PII: remove user ID]
        E --> F[Store with 30-day TTL]
        F --> G[Available in Moderation Dashboard]
    end

    subgraph "Human Review"
        G --> H[Moderator reviews AI decision]
        H --> I{Correct?}
        I -->|Yes| J[Mark as true positive/negative]
        I -->|No| K[Mark as false positive/negative]
    end

    subgraph "Feedback Loop"
        J --> L[AI Evaluation Dataset]
        K --> L
        L --> M[Model accuracy metrics]
        L --> N[psi-ai-eval analysis]
    end

Prompt Architecture

The moderation system uses structured prompts stored as text files:

Prompt File Purpose Used By
moderate.txt Core moderation: evaluate against community guidelines Light model (gpt-4o-mini)
heavy/moderate.txt Detailed re-evaluation with additional context Heavy model (gpt-4o)
moderatePerspective.txt Perspective-aware moderation variant Alternative flow
comment_quality.txt Score comment quality (constructiveness) Ranking module
comment_bridging.txt Score bridging potential (cross-perspective) Ranking module
name_check.txt Check if display name violates guidelines Profile module
name_looks_real.txt Assess if a name appears realistic Profile module
tag_article.txt Auto-tag articles for categorization Article module
translate.txt Translate user content between languages Translation module
conversationhelper_*.txt (5) AI-assisted conversation guidance ZDF conversation helper

Standalone Moderation Service

The standalone service (moderation-service/) mirrors the adapter pattern:

graph TD
    subgraph "Moderation Service API (Express 4)"
        Routes[Express Routes]
        ModerationService[ModerationService class]
        BlocklistService[BlocklistService class]
    end

    subgraph "Adapter Layer"
        DBAdapter[Database Adapter]
        LLMAdapter2[LLM Adapter]
        JigsawAdapter2[Jigsaw Adapter]
        EmailAdapter[Email Adapter]
    end

    subgraph "External Services"
        Firestore[(Firebase Firestore)]
        MongoDB[(MongoDB 7)]
        OpenAI2[OpenAI API]
        Perspective[Perspective API]
        Sentry[Sentry]
    end

    Routes --> ModerationService
    Routes --> BlocklistService
    ModerationService --> LLMAdapter2
    ModerationService --> JigsawAdapter2
    ModerationService --> DBAdapter
    ModerationService --> EmailAdapter
    BlocklistService --> DBAdapter

    DBAdapter --> Firestore
    DBAdapter -.-> MongoDB
    LLMAdapter2 --> OpenAI2
    JigsawAdapter2 --> Perspective

Further Reading