EMOTION DETECTION AND CLASSIFICATION ON TIGRIGNA SOCIAL MEDIA TEXTS USING TRANSFORMER MODELS

No Thumbnail Available

Date

2025-09-23

Journal Title

Journal ISSN

Volume Title

Publisher

Mekelle University

Abstract

The rapid growth of social media has reshaped emotional expression, producing large-scale digital data for social, cultural, and political analysis, thereby highlighting the importance of reliable automated emotion detection tools. Despite advances in Natural Language Processing (NLP), Tigrigna remains underrepresented, with existing multilingual models often underperforming due to limited annotated data, lack of tailored resources, and linguistic complexity. To address this gap, this study introduces transformer-based models tailored for emotion detection and classification in Tigrigna social media texts, focusing on four emotion categories: happiness, sadness, neutral, and disgust. A total of 4,000 Tigrigna sentences were collected from Facebook and YouTube and manually annotated with a high Inter-Annotator Agreement. To expand and balance the corpus, 6,000 additional sentences were generated using data augmentation techniques, including backtranslation and synonym replacement, resulting in a final dataset of 10,000 sentences. Following preprocessing, including normalization, tokenization, and cleaning, the data was split into training (8,000), validation (1,000), and testing (1,000) subsets. Three transformer-based models namely XLM-RoBERTa, tiBERT, and the Tigrigna-specific tiRoBERTa were fine-tuned and evaluated using Macro-F1, precision, and recall metrics to address class imbalance. The results demonstrated progressive improvements across models: XLM-R achieved an F1-score of 81%, tiBERT 84.4%, and tiRoBERTa 88%, with tiRoBERTa outperforming the others across all emotion categories, particularly in distinguishing subtle distinctions between sadness and happiness. Misclassifications between neutral and disgust persisted, reflecting data-related issues, model-specific challenges, and the low-resource nature of Tigrigna. Data augmentation improved F1-scores by 2–10% across models, underscoring its crucial role in enhancing performance in low-resource NLP tasks. The study concludes that transformer models, when culturally and linguistically adapted, are highly effective for Tigrigna emotion detection. Future research should expand Tigrigna-specific pretraining corpora, explore advanced augmentation, investigate hybrid architectures, and integrate multimodal data (e.g., combining text with images or videos). Applying these findings via APIs and dashboards can support researchers, policymakers, and organizations in leveraging Tigrigna social media for informed decision-making.

Description

Keywords

Data augmentation, Emotion detection, Low-resource NLP, Tigrigna, Social media, Transformer models, tiRoBERTa

Citation

Endorsement

Review

Supplemented By

Referenced By