Category:
Description
This resource includes a set of gazetteer lists useful for NER (Named Entity Recognition) and Arabic text processing applications. It is composed of seven files corresponding to seven Arabic Named Entities (NE). Each file is tab-separated and provides the frequency of each NE extracted from Sahih Al-Bukhari book.
Named Entity | # |
Adjectives (نعوت) | 44 |
Famous names ( أسماء الشهرة) | 1625 |
Full names (أسماء) | 1623 |
Nicknames(كنى) | 343 |
Origins(أنساب) | 412 |
Places (أماكن) | 220 |
Surnames (ألقاب) | 440 |
Licencing
The resource is available for free usage for the research community. It is distributed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Citing Al-Bukhari NER
When using it, you are encouraged to cite:
I. Bounhas and Slimani, Y., “A SOCIAL APPROACH FOR SEMI-STRUCTURED DOCUMENT MODELING AND ANALYSIS”, in Proceedings of the International Conference on Knowledge Management and Information Sharing (KMIS), Madeira, Portugal, October 6 - 8, 2009, pp. 95–102.
I. Bounhas, Elayeb, B., Evrard, F., and Slimani, Y., “Toward a computer study of the reliability of Arabic stories”, Journal of the American Society for Information Science and Technology (JASIST), vol. 61, no. 8, pp. 1686–1705, 2010.
I. Bounhas, “Construction et intégration d'ontologies pour la cartographie socio-sémantique de fonds documentaires arabes guidée par la fiabilité de l'information, Thèse de doctorat, Université Tunis El Manar, Tunis, Tunisia”, 2012.
I. Bounhas, Elayeb, B., Evrard, F., and Slimani, Y., “Information reliability evaluation: from Arabic storytelling to computer sciences”, ACM Journal on Computing and Cultural Heritage (JOCCH), vol. 8, no. 3, p. Article 14, 33 pages, 2015.
Feedback
For any inquiries or comments, contact Ibrahim Bounhas.
Attachement:
Access conditions:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.