A Machine Learning Framework for Amharic Sentiment Analysis in Social Media Images Using OCR and NLP Techniques
| dc.contributor.author | Halefom Desta Fitsum | |
| dc.date.accessioned | 2026-04-21T09:25:11Z | |
| dc.date.issued | 2025-11-28 | |
| dc.description.abstract | In Ethiopia, social media platforms are increasingly used as spaces for public communication, with much of the opinion-rich content embedded in images containing Amharic text. Conventional sentiment analysis methods are designed for plain text and they fail to capture this significant portion of online discourse. Complexity of the Amharic script, scarcity of language processing tools, and limitations in computational resources further restrict automatic analysis of image-based text. So, this study develops an integrated framework that combines Optical Character Recognition (OCR) and Natural Language Processing (NLP) techniques to extract and classify Amharic text from social media images into Positive, Negative, and Neutral categories using machine learning classifiers. A balanced dataset of 600 annotated images was compiled and preprocessed with Open CV for image enhancement and Tesseract OCR for text extraction. The extracted texts underwent different text preprocessing stages, including normalization, character unification, and Stop word removal. Then the preprocessed texts are vectorized using Term Frequency–Inverse Document Frequency (TF-IDF). Four machine learning classifiers Support Vector Machine, Logistic Regression, Naive Bayes, and Random Forest were implemented, and the performance of each classifier were evaluated by different evaluation metrics such as, accuracy, precision, recall, F1-score and confusion matrices. The results from the evaluation metrics showed that SVM achieved the highest accuracy of 86%, Logistic Regression (83%) and Naive Bayes (82%), while Random Forest performed less by achieving 75%. These findings highlights that linear classifiers are suitable for Amharic sentiment analysis under resource-constrained conditions. The study demonstrates the feasibility of integrating OCR and NLP techniques for sentiment analysis of Amharic social media images and provides a solid baseline for future work in morphologically rich language processing. | |
| dc.identifier.uri | https://repository.mu.edu.et/handle/123456789/1363 | |
| dc.language.iso | en | |
| dc.publisher | Mekelle University | |
| dc.subject | Amharic | |
| dc.subject | sentiment analysis | |
| dc.subject | social media images | |
| dc.subject | OCR | |
| dc.subject | NLP | |
| dc.subject | TF-IDF | |
| dc.subject | machine learning | |
| dc.subject | SVM. | |
| dc.title | A Machine Learning Framework for Amharic Sentiment Analysis in Social Media Images Using OCR and NLP Techniques | |
| dc.type | Thesis |