Moderation Pipeline
PSI uses a multi-layered AI moderation system to screen user-generated content. The system is designed to be cost-effective (using a cheap model for initial screening) while maintaining high accuracy (using a more capable model for flagged content).
Two-Pass Architecture
sequenceDiagram
participant User
participant Frontend as PSI Frontend
participant Backend as PSI Backend
participant Light as Light Model<br/>(gpt-4o-mini)
participant Heavy as Heavy Model<br/>(gpt-4o)
participant Perspective as Perspective API
User->>Frontend: Submit comment
Frontend->>Backend: POST /api/moderation/checkComment
par Parallel checks
Backend->>Light: Check comment (fast, cheap)
Backend->>Perspective: Score toxicity
end
Light-->>Backend: Result (pass / flag / reject)
Perspective-->>Backend: Toxicity scores
alt Comment flagged by light model
Backend->>Heavy: Re-check comment (detailed, accurate)
Heavy-->>Backend: Final decision (approve / reject)
end
Backend-->>Frontend: Moderation result
alt Comment approved
Frontend->>User: Comment published
else Comment rejected
Frontend->>User: Comment rejected with explanation
end Models
| Model | Default | Role | Cost |
|---|---|---|---|
| Light model | gpt-4o-mini | Initial screening of all comments | Low |
| Heavy model | gpt-4o | Detailed analysis of flagged comments | Higher |
| Perspective API | Google Jigsaw | Toxicity, threat, insult scoring | Free (with API key) |
Both OpenAI models are configurable via environment variables (OPENAI_MODEL_LIGHT, OPENAI_MODEL_HEAVY). Azure OpenAI is also supported as an alternative provider.
Moderation Prompts
AI moderation uses configurable prompts stored in server/prompts/. These prompts instruct the LLM to evaluate comments against community guidelines, checking for:
- Toxicity and hate speech
- Personal attacks
- Misinformation
- Off-topic content
- Spam
Pre-Moderation Review
When PREMOD_REVIEW_ENABLED=true, AI-flagged comments are stored for human review in the Moderation Dashboard. This enables:
- Moderator annotation -- moderators mark whether AI decisions were correct
- False positive identification -- helps improve moderation accuracy over time
- Evaluation datasets -- builds training data for model improvements
- GDPR compliance -- review data is stored for 30 days; user IDs are not stored
Standalone Moderation Service
In addition to the built-in moderation in the PSI backend, there is a standalone moderation microservice (moderation-service/):
| Property | Value |
|---|---|
| Technology | Express 4, TypeScript, Node.js |
| Database | Firebase Firestore (primary), MongoDB 7 (alternative via adapter) |
| AI Services | OpenAI, Perspective API |
| Integration | AT Protocol (Bluesky) |
| Monitoring | Sentry |
This service can operate independently of the main PSI backend. It has its own adapter layer (database, email, LLM) mirroring the main server's pattern, enabling moderation for content from external sources (including Bluesky via the AT Protocol).
Moderation Service Endpoints
| Endpoint | Purpose |
|---|---|
POST /moderation/check-comment | GPT comment moderation |
POST /moderation/check-perspective | GPT perspective moderation |
POST /moderation/check-jigsaw | Jigsaw/Perspective toxicity check |
POST /blocklist/check | Check text against blocklist |
GET /blocklist | Get full blocklist |
POST /blocklist/add | Add term to blocklist |
DELETE /blocklist/remove | Remove term from blocklist |
GET /health | Health check |
Configuration
| Environment Variable | Description |
|---|---|
OPENAI_KEY | OpenAI API key |
OPENAI_MODEL_LIGHT | Light model name (default: gpt-4o-mini) |
OPENAI_MODEL_HEAVY | Heavy model name (default: gpt-4o) |
OPENAI_ENDPOINT | Custom OpenAI endpoint (default: https://api.openai.com/v1/) |
OPENAI_PROVIDER | Provider: openai or azure_openai |
AZURE_OPENAI_KEY | Azure OpenAI key (if using Azure) |
PERSPECTIVE_API_KEY | Google Perspective API key |
PREMOD_REVIEW_ENABLED | Enable pre-moderation human review (default: false) |
Further Reading
- Configuration reference (psi-product) -- all server environment variables
- Pre-Moderation Review (psi-product) -- detailed review workflow