Information Retrieval (IR)
Information retrieval (IR) is considered as the science of searching for relevant information including documents, images, video and other forms of media across databases, unstructured data and the World Wide Web. Nowadays information retrieval is widely known and used in the context of online web search engines. But Information has also many other fields of application domains, like biomedicine, social network, social science, genomic, geographic, etc. In this topic, we focus our research on: Contextual IR, Social IR, Temporal IR, Arabic IR and Testbeds for IR systems.
Information retrieval can benefit from contextual information to adapt the results to a user’s current situation and personal preferences (profile). For example, performing the same query in different contexts often leads to different result rankings. Hence, semantics-based information retrieval is especially challenging because a change in context has an impact on the knowledge base content. In the context of information retrieval, the impact of a contextual aspect on the query results determines its relevance. In this sub-topic, we study the problem of identification of relevant contextual information and his effect (or influence) on the performance of an information retrieval system. Our goal is to achieve high recall and precision for a given user in a specific situation.
Recently, the fields of information retrieval and social network analysis have contributed to the emergence of a new category of information retrieval systems, namely the SIR systems (Social Information Retrieval). These systems extend conventional IR to incorporate the social context of search and recommendation. A particular aspect of SIR is the modeling of the social interaction between people (social ties), which is used for enhancing recommendation systems. Our goal in this topic is to see how information retrieval systems can be enriched by analyzing these social ties, and particularly the strong and weak ties that measure the strength of relationship between people.
Traditional information retrieval system was carried out essentially in English and fueled by the annual Text Retrieval Conferences (TREC) sponsored by NIST (the National Institute of Standards and Technology). NIST has accumulated large amounts of standard data (text collections, queries, and relevance judgments) so that IR researchers can compare their techniques on common data sets. More recently, IR researchers have found a real interest to study new languages other than English. Now, TREC includes multilingual data and other organizations sponsor similar annual evaluations for European languages (CLEF) and Asian languages (NTCIR) (Chinese, Japanese, and Korean). Arabic began to be included in the TREC cross-lingual track, and in the TDT (Topic Detection and Tracking) evaluations. The availability of standard Arabic data sets from the NIST and the Linguistic Data Consortium (LDC) has in turned spurred a huge acceleration in progress in information retrieval and other natural language processing involving Arabic language. Arabic is an interesting case to study in IR, because it is a highly inflected language. In this sub-topic, we study some problematic related to IR systems (lemmatization, morphological analysis, indexation) and we use the Hadith corpora as knowledge basis.
Temporal Information Retrieval is an emerging research area in the field of Information Retrieval. It is a fact of modern life that an enormous volume of information is created, exchanged, and stored electronically on the Web. Much of the content of stored resources is strongly time-dependent. Hence, an electronic resource can be identifying not only by his content but also by some temporal features, like creation or update date. These temporal features can be used to increase precision of search in an information retrieval system. In fact, classical IR techniques based on topic similarity alone are not sufficient for the search in temporal document collections. To overcome this limit, temporal dimension available in electronic resources (like documents) should be incorporated with document ranking for efficient retrieval. Our objective in this sub-topic is twofold: (i) identifying the temporal characteristics of documents; (ii) incorporating these characteristics into information retrieval techniques in order to improve the retrieval effectiveness of an information retrieval system.
|Testbed for IR||
The main and difficult problem of information retrieval (IR) is to find test collections. The existing data sets (like those provided by TREC) have played a key role to promote progress in the IR domain. However, given the significant increase of online content over the past few years and their diversity, and of the increasing rate of search queries, the current testbeds are either too small or not representative of the real applications of IR systems. A testbed for evaluating information retrieval systems requires three parts: a document collection, a list of query topics, and a set of relevance judgements. Evaluating information retrieval algorithms performance, in different contexts (distributed IR, P2P network, language-oriented IR, IR for specific domains, is already a challenging task caused by the lack of realistic testbeds. In this sub-topic, we focus our research on testbeds for distributed systems (included P2P networks) and arabic context.