Moderation Pipeline

PSI uses a multi-layered AI moderation system to screen user-generated content. The system is designed to be cost-effective (using a cheap model for initial screening) while maintaining high accuracy (using a more capable model for flagged content).

Two-Pass Architecture

sequenceDiagram
    participant User
    participant Frontend as PSI Frontend
    participant Backend as PSI Backend
    participant Light as Light Model<br/>(gpt-4o-mini)
    participant Heavy as Heavy Model<br/>(gpt-4o)
    participant Perspective as Perspective API

    User->>Frontend: Submit comment
    Frontend->>Backend: POST /api/moderation/checkComment

    par Parallel checks
        Backend->>Light: Check comment (fast, cheap)
        Backend->>Perspective: Score toxicity
    end

    Light-->>Backend: Result (pass / flag / reject)
    Perspective-->>Backend: Toxicity scores

    alt Comment flagged by light model
        Backend->>Heavy: Re-check comment (detailed, accurate)
        Heavy-->>Backend: Final decision (approve / reject)
    end

    Backend-->>Frontend: Moderation result

    alt Comment approved
        Frontend->>User: Comment published
    else Comment rejected
        Frontend->>User: Comment rejected with explanation
    end

Models

Model	Default	Role	Cost
Light model	`gpt-4o-mini`	Initial screening of all comments	Low
Heavy model	`gpt-4o`	Detailed analysis of flagged comments	Higher
Perspective API	Google Jigsaw	Toxicity, threat, insult scoring	Free (with API key)

Both OpenAI models are configurable via environment variables (OPENAI_MODEL_LIGHT, OPENAI_MODEL_HEAVY). Azure OpenAI is also supported as an alternative provider.

Moderation Prompts

AI moderation uses configurable prompts stored in server/prompts/. These prompts instruct the LLM to evaluate comments against community guidelines, checking for:

Toxicity and hate speech
Personal attacks
Misinformation
Off-topic content
Spam

Pre-Moderation Review

When PREMOD_REVIEW_ENABLED=true, AI-flagged comments are stored for human review in the Moderation Dashboard. This enables:

Moderator annotation -- moderators mark whether AI decisions were correct
False positive identification -- helps improve moderation accuracy over time
Evaluation datasets -- builds training data for model improvements
GDPR compliance -- review data is stored for 30 days; user IDs are not stored

Standalone Moderation Service

In addition to the built-in moderation in the PSI backend, there is a standalone moderation microservice (moderation-service/):

Property	Value
Technology	Express 4, TypeScript, Node.js
Database	Firebase Firestore (primary), MongoDB 7 (alternative via adapter)
AI Services	OpenAI, Perspective API
Integration	AT Protocol (Bluesky)
Monitoring	Sentry

This service can operate independently of the main PSI backend. It has its own adapter layer (database, email, LLM) mirroring the main server's pattern, enabling moderation for content from external sources (including Bluesky via the AT Protocol).

Moderation Service Endpoints

Endpoint	Purpose
`POST /moderation/check-comment`	GPT comment moderation
`POST /moderation/check-perspective`	GPT perspective moderation
`POST /moderation/check-jigsaw`	Jigsaw/Perspective toxicity check
`POST /blocklist/check`	Check text against blocklist
`GET /blocklist`	Get full blocklist
`POST /blocklist/add`	Add term to blocklist
`DELETE /blocklist/remove`	Remove term from blocklist
`GET /health`	Health check

Configuration

Environment Variable	Description
`OPENAI_KEY`	OpenAI API key
`OPENAI_MODEL_LIGHT`	Light model name (default: `gpt-4o-mini`)
`OPENAI_MODEL_HEAVY`	Heavy model name (default: `gpt-4o`)
`OPENAI_ENDPOINT`	Custom OpenAI endpoint (default: `https://api.openai.com/v1/`)
`OPENAI_PROVIDER`	Provider: `openai` or `azure_openai`
`AZURE_OPENAI_KEY`	Azure OpenAI key (if using Azure)
`PERSPECTIVE_API_KEY`	Google Perspective API key
`PREMOD_REVIEW_ENABLED`	Enable pre-moderation human review (default: `false`)