Skip to content
Back to projects
Open sourcejavamlapihackathon

Detox

A web service that classifies and filters toxic content across languages, built around a fine-tuned XLM-RoBERTa model and deployed as an easy-to-integrate API.

Model
XLM-RoBERTa
Train acc.
96.55%
Val acc.
87.78%
Deploy
AWS EC2
detox · multilingual profanity API

Detox is a web service that filters toxic content in messages and webpages — multilingual by design, so it works across the languages people actually use online. Built with Nishka, Swathi, and Arumugam over a hackathon and iterated on after.

How it works

The classifier is a fine-tuned XLM-RoBERTa, trained to label toxic content across multiple languages. We exposed the model behind a single REST endpoint deployed on an AWS EC2 instance with a load balancer for handling traffic spikes. The fine-tuned model reaches 96.55% training and 87.78% validation accuracy on the toxicity benchmark we trained against.

Sample integrations

To prove the API was actually drop-in, we shipped two demo clients:

  • Discord bot — listens to channel messages, deletes toxic ones, warns the sender.
  • Android accessibility service — overlays a warning screen when the device displays offensive content in any app.

What’s next

The roadmap includes a Chrome extension that censors hateful content inline, plus a feedback loop for users to report false positives and false negatives so we can keep improving the model.