|Title||On the Usage of a Classical Arabic Corpus as a Language Resource: Related Research and Key Challenges|
|Publication Type||Journal Article|
|Year of Publication||In Press|
|Journal||Transactions on Asian and Low-Resource Language Information Processing|
This paper presents a literature review of computer science related research applied on hadith, a kind of Arabic narrations which appeared in the 7th century. We study and compare existent works in several fields of Natural Language Processing (NLP), Information Retrieval (IR) and Knowledge Extraction (KE). Thus, we illicit their main drawbacks and identify some perspectives, which may be considered by the research community. We also study the characteristics of this type of documents, by enumerating the advantages/limits of using hadith as a language resource. Moreover, our study shows that previous studies used different collections of hadiths, thus making hard to compare objectively their results. Besides, many preprocessing steps are recurrent through these applications, thus wasting a lot of time. Consequently, the key issues for building generic language resources from hadiths are discussed, taking into account the relevance of related literature and the wide community of researchers which are interested in. The ultimate goal is to structure hadith books for multiple usages, thus building common collections which may be exploited in future applications.