SIRATa : a Real-Time Indexing Arabic Text Editor Based on the Extraction of Keywords
No Thumbnail Available
Date
2021-05-25
Journal Title
Journal ISSN
Volume Title
Publisher
University of Oum El Bouaghi
Abstract
Indexing stage in information retrieval process has a great importance as an essential tool for the performance of recall and precision. Despite the many studies that have been done on the indexing conducted in the last few decades, to our knowledge, no study has investigated whether indexing realtime based on keywords extraction is efficient to perform of recall and precision. Moreover, relatively fewer Arabic text indexing studies are currently available despite the enormous efforts put together to satisfy the needs of the growing number of Arabic internet users. This paper suggests a method for Arabic text indexing based on keywords extraction.
The proposed method consists of two stages.
The first stage conducts a real-time indexing.
The second stage is a keywords extraction and updating of initial index taking into account the output of keywords extraction process. We illustrate application and the performance of this method of indexing using an Arabic text editor (SIRAT) developed and designed for this aim.
We also illustrate the process of building a new form of Arabic corpus appropriate to conduct the necessary experiments.
Our findings show that SIRAT successfully identifies the keywords most relevant to the document. Finally, the main contribution of this experiment is to demonstrate the effectiveness of this method compared to other methods. In addition, the paper proposes a solution to issues and deficiencies Arabic language processing suffers from, especially regarding corpora building and keywords extraction evaluation systems.
Description
Keywords
NLPb; Arabic text indexing; real-time indexing; Arabic keywords extraction; Arabic information retrieval system.