EMOTION DETECTION AND CLASSIFICATION ON TIGRIGNA SOCIAL MEDIA TEXTS USING TRANSFORMER MODELS

dc.contributor.authorRahel Gebru
dc.date.accessioned2025-09-26T07:15:37Z
dc.date.issued2025-09-23
dc.description.abstractThe rapid growth of social media has reshaped emotional expression, producing large-scale digital data for social, cultural, and political analysis, thereby highlighting the importance of reliable automated emotion detection tools. Despite advances in Natural Language Processing (NLP), Tigrigna remains underrepresented, with existing multilingual models often underperforming due to limited annotated data, lack of tailored resources, and linguistic complexity. To address this gap, this study introduces transformer-based models tailored for emotion detection and classification in Tigrigna social media texts, focusing on four emotion categories: happiness, sadness, neutral, and disgust. A total of 4,000 Tigrigna sentences were collected from Facebook and YouTube and manually annotated with a high Inter-Annotator Agreement. To expand and balance the corpus, 6,000 additional sentences were generated using data augmentation techniques, including backtranslation and synonym replacement, resulting in a final dataset of 10,000 sentences. Following preprocessing, including normalization, tokenization, and cleaning, the data was split into training (8,000), validation (1,000), and testing (1,000) subsets. Three transformer-based models namely XLM-RoBERTa, tiBERT, and the Tigrigna-specific tiRoBERTa were fine-tuned and evaluated using Macro-F1, precision, and recall metrics to address class imbalance. The results demonstrated progressive improvements across models: XLM-R achieved an F1-score of 81%, tiBERT 84.4%, and tiRoBERTa 88%, with tiRoBERTa outperforming the others across all emotion categories, particularly in distinguishing subtle distinctions between sadness and happiness. Misclassifications between neutral and disgust persisted, reflecting data-related issues, model-specific challenges, and the low-resource nature of Tigrigna. Data augmentation improved F1-scores by 2–10% across models, underscoring its crucial role in enhancing performance in low-resource NLP tasks. The study concludes that transformer models, when culturally and linguistically adapted, are highly effective for Tigrigna emotion detection. Future research should expand Tigrigna-specific pretraining corpora, explore advanced augmentation, investigate hybrid architectures, and integrate multimodal data (e.g., combining text with images or videos). Applying these findings via APIs and dashboards can support researchers, policymakers, and organizations in leveraging Tigrigna social media for informed decision-making.
dc.identifier.urihttps://repository.mu.edu.et/handle/123456789/931
dc.identifier.urihttps://doi.org/10.82589/muir-829
dc.language.isoen
dc.publisherMekelle University
dc.subjectData augmentation
dc.subjectEmotion detection
dc.subjectLow-resource NLP
dc.subjectTigrigna
dc.subjectSocial media
dc.subjectTransformer models
dc.subjecttiRoBERTa
dc.titleEMOTION DETECTION AND CLASSIFICATION ON TIGRIGNA SOCIAL MEDIA TEXTS USING TRANSFORMER MODELS
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Rahel Gebru.pdf
Size:
5.33 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: