Al-Bukhari NER


Data & Knowledge Bases


This resource includes a set of gazetteer lists useful for NER (Named Entity Recognition) and Arabic text processing applications. It is composed of seven files corresponding to seven Arabic Named Entities (NE). Each file is tab-separated and provides the frequency of each NE extracted from Sahih Al-Bukhari book.

Named Entity #
Adjectives (نعوت) 44
Famous names ( أسماء الشهرة) 1625
Full names (أسماء) 1623
Nicknames(كنى) 343
Origins(أنساب) 412
Places (أماكن) 220
Surnames (ألقاب) 440


The resource is available for free usage for the research community. It is distributed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Citing Al-Bukhari NER

When using it, you are encouraged to cite:

I. Bounhas and Slimani, Y.A SOCIAL APPROACH FOR SEMI-STRUCTURED DOCUMENT MODELING AND ANALYSIS, in Proceedings of the International Conference on Knowledge Management and Information Sharing (KMIS), Madeira, Portugal, October 6 - 8, 2009, pp. 95–102.

I. BounhasElayeb, B.Evrard, F., and Slimani, Y.Toward a computer study of the reliability of Arabic storiesJournal of the American Society for Information Science and Technology (JASIST), vol. 61, no. 8, pp. 1686–1705, 2010.

I. BounhasConstruction et intégration d'ontologies pour la cartographie socio-sémantique de fonds documentaires arabes guidée par la fiabilité de l'information, Thèse de doctorat, Université Tunis El Manar, Tunis, Tunisia, 2012.

I. BounhasElayeb, B.Evrard, F., and Slimani, Y.Information reliability evaluation: from Arabic storytelling to computer sciencesACM Journal on Computing and Cultural Heritage (JOCCH), vol. 8, no. 3, p. Article 14, 33 pages, 2015.


For any inquiries or comments, contact Ibrahim Bounhas.


Access conditions: