Skip to content

Moderation Pipeline

PSI uses a multi-layered AI moderation system to screen user-generated content. The system is designed to be cost-effective (using a cheap model for initial screening) while maintaining high accuracy (using a more capable model for flagged content).

Two-Pass Architecture

sequenceDiagram
    participant User
    participant Frontend as PSI Frontend
    participant Backend as PSI Backend
    participant Light as Light Model<br/>(gpt-4o-mini)
    participant Heavy as Heavy Model<br/>(gpt-4o)
    participant Perspective as Perspective API

    User->>Frontend: Submit comment
    Frontend->>Backend: POST /api/moderation/checkComment

    par Parallel checks
        Backend->>Light: Check comment (fast, cheap)
        Backend->>Perspective: Score toxicity
    end

    Light-->>Backend: Result (pass / flag / reject)
    Perspective-->>Backend: Toxicity scores

    alt Comment flagged by light model
        Backend->>Heavy: Re-check comment (detailed, accurate)
        Heavy-->>Backend: Final decision (approve / reject)
    end

    Backend-->>Frontend: Moderation result

    alt Comment approved
        Frontend->>User: Comment published
    else Comment rejected
        Frontend->>User: Comment rejected with explanation
    end

Models

Model Default Role Cost
Light model gpt-4o-mini Initial screening of all comments Low
Heavy model gpt-4o Detailed analysis of flagged comments Higher
Perspective API Google Jigsaw Toxicity, threat, insult scoring Free (with API key)

Both OpenAI models are configurable via environment variables (OPENAI_MODEL_LIGHT, OPENAI_MODEL_HEAVY). Azure OpenAI is also supported as an alternative provider.

Moderation Prompts

AI moderation uses configurable prompts stored in server/prompts/. These prompts instruct the LLM to evaluate comments against community guidelines, checking for:

  • Toxicity and hate speech
  • Personal attacks
  • Misinformation
  • Off-topic content
  • Spam

Pre-Moderation Review

When PREMOD_REVIEW_ENABLED=true, AI-flagged comments are stored for human review in the Moderation Dashboard. This enables:

  • Moderator annotation -- moderators mark whether AI decisions were correct
  • False positive identification -- helps improve moderation accuracy over time
  • Evaluation datasets -- builds training data for model improvements
  • GDPR compliance -- review data is stored for 30 days; user IDs are not stored

Standalone Moderation Service

In addition to the built-in moderation in the PSI backend, there is a standalone moderation microservice (moderation-service/):

Property Value
Technology Express 4, TypeScript, Node.js
Database Firebase Firestore (primary), MongoDB 7 (alternative via adapter)
AI Services OpenAI, Perspective API
Integration AT Protocol (Bluesky)
Monitoring Sentry

This service can operate independently of the main PSI backend. It has its own adapter layer (database, email, LLM) mirroring the main server's pattern, enabling moderation for content from external sources (including Bluesky via the AT Protocol).

Moderation Service Endpoints

Endpoint Purpose
POST /moderation/check-comment GPT comment moderation
POST /moderation/check-perspective GPT perspective moderation
POST /moderation/check-jigsaw Jigsaw/Perspective toxicity check
POST /blocklist/check Check text against blocklist
GET /blocklist Get full blocklist
POST /blocklist/add Add term to blocklist
DELETE /blocklist/remove Remove term from blocklist
GET /health Health check

Configuration

Environment Variable Description
OPENAI_KEY OpenAI API key
OPENAI_MODEL_LIGHT Light model name (default: gpt-4o-mini)
OPENAI_MODEL_HEAVY Heavy model name (default: gpt-4o)
OPENAI_ENDPOINT Custom OpenAI endpoint (default: https://api.openai.com/v1/)
OPENAI_PROVIDER Provider: openai or azure_openai
AZURE_OPENAI_KEY Azure OpenAI key (if using Azure)
PERSPECTIVE_API_KEY Google Perspective API key
PREMOD_REVIEW_ENABLED Enable pre-moderation human review (default: false)

Further Reading