Proposals

Hybrid indexing tool for Arabic information retrieval system

Research laboratory: 
Research Profile: 
Master
Supervisor (s)
Co-supervisor(s)

The goal of this project is to enhance an existing hybrid indexing tool in order to give better efficiency (speed, resources) and effectiveness (recall, precision,…). The JARIR group has been working on developing an Arabic IRS (Information Retrieval System) based on a hybrid index (Ben Guirat et al. 2016). The proposed approach is to build a multilevel index where the hierarchical structure represents the semantic relations between the different word forms (root, verbal pattern and stem). Given the existent tool, this project aims to:

  1. Develop a performant hybrid indexing tool enhancing the capacity of the IRS. A tool developed by our group will be the starting point.
  2. Integrate the proposed tool on Terrier[1] IR Platform using BM25 model.
  3. Perform experiments based on a large scale corpus (Arabic newswire LDC test collection).
  4. Using MADAMIRA[2] and Alkhalil[3] tools to add the lemma unit to the hybrid index.

 Perform interpretations based on performance and significance tests using TANAGRA[4].

Pour postuler, voir la description ci-jointe.

[1] http://terrier.org/

[2] https://camel.abudhabi.nyu.edu/madamira/

[3] https://sourceforge.net/projects/alkhalil/

[4] http://eric.univ-lyon2.fr/~ricco/tanagra/fr/tanagra.html

File Attachment: 

Topics:

Tweet Credibility Assessment

Research laboratory: 
Research Profile: 
Master
Supervisor (s)
Co-supervisor(s)

Twitter evolved from a basic social networking platform containing only personal chat to a news media (Kwak et al. 2010). Through its trending topics feature, twitter provides its users with instant insights about events all around the globe. However, since tweets are written by ordinary users and no audit is performed on them, their credibility may be in jeopardy. Since credibility is a major criterion in information quality, numerous solutions were proposed to evaluate tweets credibility. By analyzing previous solutions, we noticed that there were some shortcomings in tweets credibility assessment mainly, the omission of the quote tweet and the immature use of the retweet tree. In this project we intend to enhance existing propagation-based approaches and derive a property graph from CredBank Corpus (Mitra et al. 2015). Mainly, this project aims to:

  1. Enhance propagation-based solution for tweets credibility assessment by analyzing user credibility through both his topical expertise and social affiliations.
  2. Use CredBank, which is an annotated corpus containing 60 million tweets along with their credibility labels.
  3. Apply measures like eigenvector centrality and betweenness on the property graph to derive their impact and relevance to credibility assessment.

Pour postuler, voir la description ci-jointe.

File Attachment: