Back to portfolioOctober 2025 - December 2025

Project Detail

Machine Learning for Community Standards Violation Text Detection

Ứng dụng Machine Learning vào Nhận diện Văn bản Vi phạm Tiêu chuẩn Cộng đồng

A seminar project for automatic multi-label classification of comments that violate community standards. The work compares classical ML, deep learning, and transformer-based NLP pipelines on the Jigsaw Toxic Comment dataset.

NLPToxic Comment DetectionTF-IDFCNNBiLSTMRoBERTa

Role

AI Engineer

Highlight

End-to-end NLP moderation research with three model generations

View Code

3-module toxic text detection system architecture — Three-module system architecture for toxic text detection

Course & Team

Course

SE400.Q11 - Seminar IT

Supervisors

Dr. Huynh Minh Duc, Dr. Nguyen Tan Toan

Group

Group 09 - University of Information Technology, VNU

Members

Nguyen Khanh Huy, Tran Dinh Phuong Linh, Dang Thi Ngoc Minh, Tran Bao Phu

README as Interface

Problem Overview

The system classifies social-media comments that violate community standards. The task is multi-label classification, meaning one comment can belong to multiple labels at once.

Classical ML

Module 1

TF-IDF + Logistic Regression

Deep Learning

Module 2

CNN 1D / BiLSTM + GloVe

State-of-the-Art

Module 3

Fine-tuning RoBERTa + Hybrid Preprocessing

6 Output Labels

toxicGeneral toxic content

severe_toxicHighly toxic content

obsceneObscene language

threatThreatening language

insultInsulting language

identity_hateIdentity-based hate

Dataset

Source

Jigsaw Toxic Comment Classification Challenge - Kaggle

Total comments

~159,571

Original source

Wikipedia Talk Pages

Language

English

Label type

Multi-label binary

Clean ratio

~90%

Toxic ratio

~10% with at least one label

Rarest labels

threat, identity_hate (<1%)

Label Distribution

toxic~15,000

obscene~8,500

insult~8,000

severe_toxic~1,600

identity_hate~1,400

threat~500

The main challenge is severe data imbalance; threat and identity_hate are rare labels, making convergence harder.

Three-Module Architecture

Module 1 - Classical ML

TF-IDF + Logistic Regression

Module 2 - Deep Learning

CNN 1D / BiLSTM + GloVe

Module 3 - State-of-the-Art

Fine-tuning RoBERTa + Hybrid Preprocessing

Model Modules

Module 1 - Baseline

TF-IDF + Logistic Regression

Multi-label

Build a fast word-frequency baseline for lightweight rough filtering on CPU.

Pipeline

Lowercase, leet speak normalize, obfuscation decode, repeated char collapse
Word n-grams (1,3) + lemmatization
Character n-grams (3,5)
Feature stacking into around 100,000 dimensions
One-vs-Rest Logistic Regression with class_weight='balanced'
Sigmoid threshold 0.5 for 6 output labels

Results

Very fast CPU training
AUC around 0.95+

Limitations

Limited by bag-of-words representation
Dependent on data quality and dictionaries
Cannot capture deep semantic meaning

Module 2 - Deep Learning

CNN 1D / BiLSTM + GloVe

Multi-label

Move from sparse vectors to dense word embeddings so the model can capture word meaning better.

Pipeline

GloVe 6B 300d pre-trained on 6 billion tokens
CNN 1D uses kernel 3, 4, 5 + Global MaxPool + Dense + Dropout
BiLSTM uses Spatial Dropout, Bidirectional LSTM 128 units, Dense, and Sigmoid
CNN captures local patterns; BiLSTM learns long-range dependencies and longer sentences better

Results

CNN converges in around 6 epochs, AUC ~0.97
BiLSTM reaches AUC ~0.97+ but takes longer to train

Limitations

OOV issues with new slang because GloVe does not contain every new word
Weak at sarcasm and nuanced tone
Static embeddings: one word maps to one fixed vector
CNN context is limited to 3-5 words; BiLSTM degrades on very long comments

Module 3 - Fine-tuning Transformer

RoBERTa

Multi-label

Use contextual embeddings for deeper moderation with stronger context, nuance, and sarcasm understanding.

Pipeline

Hybrid preprocessing with syntax normalization and augmentation
RoBERTa BPE tokenizer creates [CLS] token ... [SEP] sequences
RoBERTa-base backbone with 12 Transformer Encoder layers
[CLS] token pooling, Dropout 0.1, Linear 768 -> 6, Sigmoid 6 heads
Data-centric strategy: hard negatives, slang examples, single-word tests, active learning

Results

Base model roberta-base
Learning rate 2e-5, batch size 32, 3-5 epochs
AUC around 0.99

Limitations

Requires a strong GPU; README notes A100
Fine-tuning takes around 12 hours

Hybrid Preprocessing

Leet Speak Normalization

@ -> a, 3 -> e, 0 -> o, $ -> s, 1 -> i, 5 -> s

Obfuscation Decoding

f.u.c.k -> fuck, b1tch -> bitch, f*ck -> fuck

Context-Aware Normalization

"fucking good" -> "very good", while "fucking idiot" keeps its negative tone

Repeated Character Collapse

coooooooool -> cool, lolllll -> lol

Slang Dictionary Expansion

kys -> kill yourself, stfu -> shut the f*** up, smh -> shaking my head

Emoji / Abbreviation Mapping

angry emoji -> angry face, omg -> oh my god, brb -> be right back

NLP Pipeline

Data Collection

Twitter, Reddit, Kaggle, Wikipedia Talk Pages; clearly labeled toxic / non-toxic data

Preprocessing

Lowercase, normalization, typo repair, and violation dictionaries

Tokenization & Encoding

TF-IDF for Module 1, GloVe for Module 2, RoBERTa BPE for Module 3

Training & Evaluation

Stratified split, Precision, Recall, F1, AUC-ROC, BCE / Focal Loss

Validation & Optimization

Overfitting checks, threshold tuning, hyperparameter search

Inference / API

FastAPI endpoint, Hybrid Pipeline, rule-based filter, AI model, output

Evaluation Strategy

Metric	Meaning	When to prioritize
Accuracy	Overall ratio of correct predictions	Balanced data
Precision	Correctness when predicting a violation	Avoid false flags
Recall	Ability to detect actual violations	Avoid missed violations
F1-score	Balance between Precision and Recall	Imbalanced data
ROC-AUC	Class separation across thresholds	Model comparison

Important Trade-off

False Positives reduce user experience by flagging clean comments, so high Precision matters. False Negatives allow harmful content to pass through, so high Recall matters in risk-sensitive contexts.

Loss Function	When to use
Binary Cross-Entropy	Baseline or relatively balanced data
Class-weighted BCE	Imbalanced data where toxic-class recall matters
Focal Loss	Extreme imbalance such as threat and identity_hate

Experiment Results

Module	Model	AUC	Training Time	GPU
M1	TF-IDF + LR	~0.95	Minutes	No
M2	CNN 1D	~0.97	~4.5 hours	Yes
M2	BiLSTM	~0.97+	~6 hours	Yes
M3	RoBERTa fine-tuned	~0.99	~12 hours	A100

Input	Predicted Label	Reason
"This is fucking amazing!"	CLEAN	Attention learns the supportive relation between "fucking" and "amazing"
"You're a killer at chess"	CLEAN	Distinguishes "killer" in a skill-related context
"Kys loser"	THREAT	Detects "kys" from augmentation examples
"f u c k this sh1t"	TOXIC	Normalization converts it to "fuck this shit"

Challenges & Solutions

Challenge	Description	Solution
Imbalanced Data	threat, identity_hate < 1%	Class Weight / Focal Loss
Leet Speak & Obfuscation	f.u.c.k, b1tch, ahole	Regex normalization pipeline
Reversed context	killer strategy vs serial killer	Contextual Embedding with RoBERTa
Slang & abbreviations	kys, stfu, fkn	Slang dictionary + FastText subword
Sarcasm / implication	Great job... NOT!	RoBERTa + hard negatives
New slang OOV	GloVe does not contain new words	FastText subword decomposition
Language & cultural bias	Local slang can be mislabeled	Diverse dataset + fairness check

Main Contribution

Hybrid Preprocessing (Rules + Deep Learning)

The main contribution is combining rules-based preprocessing with deep learning pipelines to handle sensitive words in safe contexts, ambiguous sentences, and keywords that appear without harmful intent. The expected result is a strong reduction in False Positives and improved overall reliability.

Install & Run

Requirements & Setup

Python >= 3.9CUDA recommended for Module 2 & 3git clone https://github.com/your-repo/toxic-comment-detectioncd toxic-comment-detectionpip install -r requirements.txt

Main Requirements

torch>=2.0transformers>=4.35scikit-learn>=1.3nltknumpypandasfastapiuvicorn

Run Each Module

python train_module1.py --data data/train.csv --model lrpython train_module2.py --model cnnpython train_module2.py --model bilstm --glove data/glove.6B.300d.txtpython train_module3.py --base_model roberta-base --epochs 5uvicorn api:app --reload

Inference API

POST http://localhost:8000/predict
{
  "text": "You are an amazing person!"
}

Response:
{
  "toxic": 0,
  "severe_toxic": 0,
  "obscene": 0,
  "threat": 0,
  "insult": 0,
  "identity_hate": 0,
  "label": "CLEAN"
}

Folder Structure

data/train.csv, test.csv, glove.6B.300d.txt

preprocessing/leet_speak.py, obfuscation.py, slang_dict.json, pipeline.py

module1/tfidf_vectorizer.py, logistic_regression.py

module2/cnn_model.py, bilstm_model.py

module3/roberta_model.py, data_augmentation.py, active_learning.py

evaluation/metrics.py, confusion_matrix.py

api/app.py

train_module1.py, train_module2.py, train_module3.py, requirements.txt, README.md

Project Diagrams

NLP pipeline and hybrid preprocessing diagram — NLP moderation pipeline and hybrid preprocessing

RoBERTa architecture and model comparison — RoBERTa fine-tuning architecture and comparison table

References

Jigsaw Toxic Comment Classification Challenge RoBERTa: A Robustly Optimized BERT Pretraining Approach GloVe: Global Vectors for Word Representation Convolutional Neural Networks for Sentence Classification Focal Loss for Dense Object Detection