Please use this identifier to cite or link to this item:
http://hdl.handle.net/123456789/4868
Title: | Automatic POS tagging of Arabic words using the YAMCHA machine learning tool | Authors: | Elnily A. Abdelghany, A. |
Keywords: | machine learning;POS tagging;support vector machine | Issue Date: | 2022 | Publisher: | Institute of Electrical and Electronics Engineers Inc | Conference: | Proceedings of the 20th Conference on Language Engineering, ESOLEC 2022 | Abstract: | The process of automatically giving the proper POS tag to each word in a text based on context is known as automatic POS tagging. The majority of NLP applications require this process as a crucial step. This study intends to propose a machine learning-based Arabic POS tagger. YAMCHA tool is the machine learning system employed in this study. YAMCHA utilizes Support Vector Machines as a machine learning algorithm. SVM classifies data with high accuracy because it makes use of part of data in training process. As a result, in order to train the system, a substantial amount of annotated data must be evaluated at the POS level. A corpus of 100,039 words is utilized in this study. It was divided into training and testing parts, totaling 64,608 and 35,431 words, respectively. A tag set of 48 morphological tags were used in training and testing. To reach the best result in the automatic POS tagging, the system was trained multiple times with changing the range of linguistic information used in training process, and then new texts were tested and evaluated. The least error rate achieved was 11.4%. This rate was reached when the preceding word of the target one was considered in the training process without considering its POS tag (F:-10: 0). |
Description: | Scopus |
URI: | http://hdl.handle.net/123456789/4868 | ISBN: | Institute of Electrical and Electronics Engineers Inc | DOI: | 10.1109/ESOLEC54569.2022.10009473 |
Appears in Collections: | Faculty of Language Studies and Human Development - Proceedings |
Show full item record
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.