Back to portfolioOctober 2025 - December 2025

Project Detail

Machine Learning for Community Standards Violation Text Detection

Ứng dụng Machine Learning vào Nhận diện Văn bản Vi phạm Tiêu chuẩn Cộng đồng

A seminar project for automatic multi-label classification of comments that violate community standards. The work compares classical ML, deep learning, and transformer-based NLP pipelines on the Jigsaw Toxic Comment dataset.

NLPToxic Comment DetectionTF-IDFCNNBiLSTMRoBERTa
Role

AI Engineer

Highlight

End-to-end NLP moderation research with three model generations

View Code
3-module toxic text detection system architecture
Three-module system architecture for toxic text detection

Course & Team

Course

SE400.Q11 - Seminar IT

Supervisors

Dr. Huynh Minh Duc, Dr. Nguyen Tan Toan

Group

Group 09 - University of Information Technology, VNU

Members

Nguyen Khanh Huy, Tran Dinh Phuong Linh, Dang Thi Ngoc Minh, Tran Bao Phu

README as Interface

Problem Overview

The system classifies social-media comments that violate community standards. The task is multi-label classification, meaning one comment can belong to multiple labels at once.

Classical ML

Module 1

TF-IDF + Logistic Regression

Deep Learning

Module 2

CNN 1D / BiLSTM + GloVe

State-of-the-Art

Module 3

Fine-tuning RoBERTa + Hybrid Preprocessing

6 Output Labels

toxicGeneral toxic content
severe_toxicHighly toxic content
obsceneObscene language
threatThreatening language
insultInsulting language
identity_hateIdentity-based hate

Dataset

Source

Jigsaw Toxic Comment Classification Challenge - Kaggle

Total comments

~159,571

Original source

Wikipedia Talk Pages

Language

English

Label type

Multi-label binary

Clean ratio

~90%

Toxic ratio

~10% with at least one label

Rarest labels

threat, identity_hate (<1%)

Label Distribution

toxic~15,000
obscene~8,500
insult~8,000
severe_toxic~1,600
identity_hate~1,400
threat~500

The main challenge is severe data imbalance; threat and identity_hate are rare labels, making convergence harder.

Three-Module Architecture

Module 1 - Classical ML

TF-IDF + Logistic Regression

Module 2 - Deep Learning

CNN 1D / BiLSTM + GloVe

Module 3 - State-of-the-Art

Fine-tuning RoBERTa + Hybrid Preprocessing

Model Modules

Module 1 - Baseline

TF-IDF + Logistic Regression

Multi-label

Build a fast word-frequency baseline for lightweight rough filtering on CPU.

Pipeline

  • Lowercase, leet speak normalize, obfuscation decode, repeated char collapse
  • Word n-grams (1,3) + lemmatization
  • Character n-grams (3,5)
  • Feature stacking into around 100,000 dimensions
  • One-vs-Rest Logistic Regression with class_weight='balanced'
  • Sigmoid threshold 0.5 for 6 output labels

Results

  • Very fast CPU training
  • AUC around 0.95+

Limitations

  • Limited by bag-of-words representation
  • Dependent on data quality and dictionaries
  • Cannot capture deep semantic meaning

Module 2 - Deep Learning

CNN 1D / BiLSTM + GloVe

Multi-label

Move from sparse vectors to dense word embeddings so the model can capture word meaning better.

Pipeline

  • GloVe 6B 300d pre-trained on 6 billion tokens
  • CNN 1D uses kernel 3, 4, 5 + Global MaxPool + Dense + Dropout
  • BiLSTM uses Spatial Dropout, Bidirectional LSTM 128 units, Dense, and Sigmoid
  • CNN captures local patterns; BiLSTM learns long-range dependencies and longer sentences better

Results

  • CNN converges in around 6 epochs, AUC ~0.97
  • BiLSTM reaches AUC ~0.97+ but takes longer to train

Limitations

  • OOV issues with new slang because GloVe does not contain every new word
  • Weak at sarcasm and nuanced tone
  • Static embeddings: one word maps to one fixed vector
  • CNN context is limited to 3-5 words; BiLSTM degrades on very long comments

Module 3 - Fine-tuning Transformer

RoBERTa

Multi-label

Use contextual embeddings for deeper moderation with stronger context, nuance, and sarcasm understanding.

Pipeline

  • Hybrid preprocessing with syntax normalization and augmentation
  • RoBERTa BPE tokenizer creates [CLS] token ... [SEP] sequences
  • RoBERTa-base backbone with 12 Transformer Encoder layers
  • [CLS] token pooling, Dropout 0.1, Linear 768 -> 6, Sigmoid 6 heads
  • Data-centric strategy: hard negatives, slang examples, single-word tests, active learning

Results

  • Base model roberta-base
  • Learning rate 2e-5, batch size 32, 3-5 epochs
  • AUC around 0.99

Limitations

  • Requires a strong GPU; README notes A100
  • Fine-tuning takes around 12 hours

Hybrid Preprocessing

Leet Speak Normalization

@ -> a, 3 -> e, 0 -> o, $ -> s, 1 -> i, 5 -> s

Obfuscation Decoding

f.u.c.k -> fuck, b1tch -> bitch, f*ck -> fuck

Context-Aware Normalization

"fucking good" -> "very good", while "fucking idiot" keeps its negative tone

Repeated Character Collapse

coooooooool -> cool, lolllll -> lol

Slang Dictionary Expansion

kys -> kill yourself, stfu -> shut the f*** up, smh -> shaking my head

Emoji / Abbreviation Mapping

angry emoji -> angry face, omg -> oh my god, brb -> be right back

NLP Pipeline

1

Data Collection

Twitter, Reddit, Kaggle, Wikipedia Talk Pages; clearly labeled toxic / non-toxic data

2

Preprocessing

Lowercase, normalization, typo repair, and violation dictionaries

3

Tokenization & Encoding

TF-IDF for Module 1, GloVe for Module 2, RoBERTa BPE for Module 3

4

Training & Evaluation

Stratified split, Precision, Recall, F1, AUC-ROC, BCE / Focal Loss

5

Validation & Optimization

Overfitting checks, threshold tuning, hyperparameter search

6

Inference / API

FastAPI endpoint, Hybrid Pipeline, rule-based filter, AI model, output

Evaluation Strategy

MetricMeaningWhen to prioritize
AccuracyOverall ratio of correct predictionsBalanced data
PrecisionCorrectness when predicting a violationAvoid false flags
RecallAbility to detect actual violationsAvoid missed violations
F1-scoreBalance between Precision and RecallImbalanced data
ROC-AUCClass separation across thresholdsModel comparison

Important Trade-off

False Positives reduce user experience by flagging clean comments, so high Precision matters. False Negatives allow harmful content to pass through, so high Recall matters in risk-sensitive contexts.

Loss FunctionWhen to use
Binary Cross-EntropyBaseline or relatively balanced data
Class-weighted BCEImbalanced data where toxic-class recall matters
Focal LossExtreme imbalance such as threat and identity_hate

Experiment Results

ModuleModelAUCTraining TimeGPU
M1TF-IDF + LR~0.95MinutesNo
M2CNN 1D~0.97~4.5 hoursYes
M2BiLSTM~0.97+~6 hoursYes
M3RoBERTa fine-tuned~0.99~12 hoursA100
InputPredicted LabelReason
"This is fucking amazing!"CLEANAttention learns the supportive relation between "fucking" and "amazing"
"You're a killer at chess"CLEANDistinguishes "killer" in a skill-related context
"Kys loser"THREATDetects "kys" from augmentation examples
"f u c k this sh1t"TOXICNormalization converts it to "fuck this shit"

Challenges & Solutions

ChallengeDescriptionSolution
Imbalanced Datathreat, identity_hate < 1%Class Weight / Focal Loss
Leet Speak & Obfuscationf.u.c.k, b1tch, aholeRegex normalization pipeline
Reversed contextkiller strategy vs serial killerContextual Embedding with RoBERTa
Slang & abbreviationskys, stfu, fknSlang dictionary + FastText subword
Sarcasm / implicationGreat job... NOT!RoBERTa + hard negatives
New slang OOVGloVe does not contain new wordsFastText subword decomposition
Language & cultural biasLocal slang can be mislabeledDiverse dataset + fairness check

Main Contribution

Hybrid Preprocessing (Rules + Deep Learning)

The main contribution is combining rules-based preprocessing with deep learning pipelines to handle sensitive words in safe contexts, ambiguous sentences, and keywords that appear without harmful intent. The expected result is a strong reduction in False Positives and improved overall reliability.

Install & Run

Requirements & Setup

Python >= 3.9CUDA recommended for Module 2 & 3git clone https://github.com/your-repo/toxic-comment-detectioncd toxic-comment-detectionpip install -r requirements.txt

Main Requirements

torch>=2.0transformers>=4.35scikit-learn>=1.3nltknumpypandasfastapiuvicorn

Run Each Module

python train_module1.py --data data/train.csv --model lrpython train_module2.py --model cnnpython train_module2.py --model bilstm --glove data/glove.6B.300d.txtpython train_module3.py --base_model roberta-base --epochs 5uvicorn api:app --reload

Inference API

POST http://localhost:8000/predict
{
  "text": "You are an amazing person!"
}

Response:
{
  "toxic": 0,
  "severe_toxic": 0,
  "obscene": 0,
  "threat": 0,
  "insult": 0,
  "identity_hate": 0,
  "label": "CLEAN"
}

Folder Structure

data/train.csv, test.csv, glove.6B.300d.txt
preprocessing/leet_speak.py, obfuscation.py, slang_dict.json, pipeline.py
module1/tfidf_vectorizer.py, logistic_regression.py
module2/cnn_model.py, bilstm_model.py
module3/roberta_model.py, data_augmentation.py, active_learning.py
evaluation/metrics.py, confusion_matrix.py
api/app.py
train_module1.py, train_module2.py, train_module3.py, requirements.txt, README.md

Project Diagrams

3-module toxic text detection system architecture
Three-module system architecture for toxic text detection
NLP pipeline and hybrid preprocessing diagram
NLP moderation pipeline and hybrid preprocessing
RoBERTa architecture and model comparison
RoBERTa fine-tuning architecture and comparison table

References