SVM vs BERT: Sentiment Analysis
A comparison of SVM and BERT on noisy review text to see which one holds up better.
Tech Stack
BERT SVM Pandas NumPy Scikit-learn PyTorch Python
Overview
This project asks a simple question: what happens when sentiment models leave clean benchmark data and hit text that looks more like the real internet?
Method
- Start with the IMDB dataset of 50k reviews.
- Train both SVM and BERT on clean text.
- Add noise like typos, word swaps, and missing punctuation.
- Measure how accuracy changes.
Result
On clean data, both models performed well. Once the text got noisy, BERT held up much better while SVM dropped sharply.
Takeaway
If the input text is user-generated and messy, the extra cost of BERT is usually worth it.