Arabic IR

Traditional information retrieval system was carried out essentially in English and fueled by the annual Text Retrieval Conferences (TREC) sponsored by NIST (the National Institute of Standards and Technology). NIST has accumulated large amounts of standard data (text collections, queries, and relevance judgments) so that IR researchers can compare their techniques on common data sets. More recently, IR researchers have found a real interest to study new languages other than English. Now, TREC includes multilingual data and other organizations sponsor similar annual evaluations for European languages (CLEF) and Asian languages (NTCIR) (Chinese, Japanese, and Korean). Arabic began to be included in the TREC cross-lingual track, and in the TDT (Topic Detection and Tracking) evaluations. The availability of standard Arabic data sets from the NIST and the Linguistic Data Consortium (LDC) has in turned spurred a huge acceleration in progress in information retrieval and other natural language processing involving Arabic language. Arabic is an interesting case to study in IR, because it is a highly inflected language. In this sub-topic, we study some problematic related to IR systems (lemmatization, morphological analysis, indexation) and we use the Hadith corpora as knowledge basis.

Hadith NER

Description

This resource includes a set of gazetteer lists useful for NER (Named Entity Recognition) and Arabic text processing applications. It is composed of seven files corresponding to seven Arabic Named Entities (NE) extracted from hadith books.

Al-Bukhari NER

Description

This resource includes a set of gazetteer lists useful for NER (Named Entity Recognition) and Arabic text processing applications. It is composed of seven files corresponding to seven Arabic Named Entities (NE). Each file is tab-separated and provides the frequency of each NE extracted from Sahih Al-Bukhari book.

Selected references (Information retrieval )

Hi all


This is a bibliographic list collected by Ibrahim Bounhas for Information Retrieval related fields.
Subscribe to RSS - Arabic IR