Dr. Zaghouani is an Assistant Professor in Digital Humanities within the Middle Eastern Studies department of the College of Humanities and Social Sciences (CHSS) at Hamad Bin Khalifa University (HBKU).
He received a Ph.D. in Natural Language Processing from the University of Paris Nanterre, and an M.A in Linguistics from the University of Montreal. His research interests span several areas of computational linguistics: Arabic Data Analytics, Linguistic Annotation, Language Resources and Evaluation, Fake News Detection, Sentiment Analysis, Lexical-Semantics and Computational Morphology. Over the years, he participated in multiple large scale human language technology projects such as the Penn Arabic TreeBank and PropBank, the Multi-Arabic Dialect Applications and Resources project (MADAR), The Qatar Arabic Language Bank in several universities and research institutions such as the University of Colorado Boulder, the Joint Research Center of the European Commission, the University of Montreal, Carnegie Mellon University and the University of Pennsylvania. Dr. Zaghouani worked as a consultant in various international companies specialized in Big Data and Information Management such as Nuance, OpenText, Nstein Technologies, Temis France and Lionridge. He co-organized several international conferences and workshops such as Arabic Natural Language Processing workshops, the Social Media Analysis in the Arab World SocInfo 2019 workshop, The QICC Fake News Detection Contest, the CheckThat! Fact Checking CLEF Lab, The Arabic Author Profiling and Deception Detection FIRE 2019 Task, and the Association of Computational Linguistics (ACL) Conference. He published over 50+ peer-reviewed journal and conference publications cited 912 times with an h-index of 17.
Information Systems Program, Carnegie Mellon University (Qatar)
2015-2018Computer Science Program, Carnegie Mellon University (Qatar)
2012-2015University of Pennsylvania, Philadelphia, USA
2006-2010University of Colorado Boulder, Colorado, USA
2009-2010The Language technology group of the Joint Research Center (JRC), Ispra, Italy
2005-2006OpenText Inc., Montreal, Canada
2002- 2004University of Paris Nanterre La Defense, (Paris, France)
2015University of Montreal, (Montreal, Canada)
2009(Minor in Computer Science, Major in Linguistics), University of Quebec in Montreal, (Montreal, Canada)
2002Language and Civilization, Université de Kairouan, (Kairouan, Tunisia)
1999ArapTweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification (LREC 2018, Miyazaki, Japan (7-12 May 2018)
2018A survey on author profiling, deception, and irony detection for the Arabic language Language and Linguistics Compass Vol 12 (Issue 4); First published: 11 April 2018 https://doi.org/1
2018Guidelines and Annotation Framework for Arabic Author Profiling (OSACT3 Workshop, LREC 2018, Miyazaki, Japan (7-12 May 2018)
2018The MADAR Arabic Dialect Corpus and Lexicon (LREC 2018, Miyazaki, Japan ((7-12 May 2018)
2018Unified Guidelines and Resources for Arabic Dialect Orthography (LREC 2018, Miyazaki, Japan (7-12 May 2018)
2018MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction. (LREC 2018, Miyazaki, Japan (7-12 May 2018)
2018ARLEX: A Large Scale Comprehensive Lexical Inventory for Modern Standard Arabic (OSACT3 Workshop, LREC 2018, Miyazaki, Japan (7-12 May 2018)
2018Language Technologies for Social Media. INFuture 2017, Zagreb, Croatia
2017Using Ambiguity Detection to Streamline Linguistic Annotation, In Proceedings of Coling Workshop "Computational Linguistics for Linguistic Complexity" (CL4LC), Osaka Japan, December 2016
2016Filtering Dialectal Arabic Text in Two Large Scale Annotation Projects. The 2nd Workshop on Noisy User-generated Text (W-NUT), December 11 2016, Osaka, Japan
2016AMPN: A Lexical Semantic Resource for Arabic Morphological Patterns. International Journal of Speech Technology (IJST).
2016Normalizing Mathematical Expressions to Improve the Translation of Educational Content. In Proceedings of the AMTA 2016 Workshop Semitic Machine Translation (SeMaT) Collocated with EMNLP 2016 Workshops on November 1st, 2016 Austin, Texas, USA
2016Toward an Arabic Punctuated Corpus: Annotation Guidelines and Evaluation. In Proceedings of The 2nd workshop on Arabic Corpora and Processing Tools 2016 Theme: Social Media.
2016Guidelines and Framework for a Large Scale Arabic Diacritized Corpus. In Proceedings of the International Conference on Language Resources and Evaluation (LREC'2016).
2016Building an arabic machine translation post-edited corpus: Guidelines and annotation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC'2016).
2016MANDIAC: A Web-based Annotation System For Manual Arabic Diacritization. In Proceedings of The 2nd Workshop on Arabic Corpora and Processing Tools 2016 Theme: Social Media
2016Generating acceptable Arabic Core Vocabularies and Symbols for AAC users. In Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, Interspeech 2015, Dreseden, Germany, 6-10 September 2015.
2015Correction Annotation for Non-Native Arabic Texts: Guidelines and Corpus. In Proceedings of the 9th Linguistic Annotation Workshop, co- located with NAACL in Denver, Colorado, USA, 2015
2015A Pilot Study on Arabic Multi-Genre Corpus Diacritization. 2015. In Proceedings of the ACL 2015 Workshop on Arabic Natural Language Processing (ANLP), Beijin, China, July 2015.
2015SAHSOH@QALB-2015 Shared Task: A Rule-Based Correction Method of Common Arabic Native and Non-Native Speakers’ Errors. The Second QALB Shared Task on Automatic Text Correction for Arabic. In Proceedings of the ACL 2015 Workshop on Arabic Natural Language Processing.
2015The Second QALB Shared Task on Automatic Text Correction for Arabic. In Proceedings of the ACL 2015 Workshop on Arabic Natural Language Processing (ANLP), Beijing, China, July 2015
2015The First QALB Shared Task on Automatic Text Correction for Arabic. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), Doha, Qatar, October 2014.
2014In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), Doha, Qatar, October 2014.
2014Large-scale Arabic Error Annotation: Guidelines and Framework. in the Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), Rejkavik, Iceland, 26-31 May 2014
2014Can Crowdsourcing be used for Effective Annotation of Arabic? In Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), Rejkavik, Iceland, 26-31 May 2014
2014Critical Survey of the Freely Available Arabic Corpora. In Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), OSACT Workshop. Rejkavik, Iceland, 26-31 May 2014
2014A Web-based Annotation Framework For Large- Scale Text Correction. In Proceedings of IJCNLP’2013, Nagoya, Japan.
2013A Lexical Semantic Resource for Quranic Morphological Patterns. The International conference for the development of Quranic studies. http://www.quranicconferences.com/ . Riyadh, KSA. 16-20 February 2013
2013In proceedings of the CECTAL’13, Montreal, Canada. Sept 26th 2013
2013"Building a Lexical Semantic Resource for Arabic Morphological Patterns," Communications, Signal Processing, and their Applications (ICCSPA), 2013, vol., no., pp.1,6, 12-14 Feb. 2013.
2013RENAR: A Rule-Based Arabic Named Entity Recognition System. ACM Trans. Asian Lang. Inf. Process. 11(1): 2 (2012).
2012A Pilot PropBank Annotation for Quranic Arabic. In Proceedings of the first workshop on Computational Linguistics for Literature, NAACL-HLT 2012, Montreal, Canada
2012Developing ARET: An NLP-based Educational Tool Set for Arabic Reading Enhancement. In Proceedings of The 7th Workshop on Innovative Use of NLP for Building Educational Applications, NAACL-HLT 2012, Montreal, Canada
2012Étude sur la composition des noms de personnes dans la langue arabe. In proceedings of the 25th Journées de linguistique de Laval. 9-11 March 2011, Laval University , Québec, Canada.
2011The Revised Arabic PropBank. In proceedings of the 4th Linguistic Annotation workshop ACL held in Uppsala. July 15-16 2010
2010Understanding the Quran: a new Grand Challenge for Computer Science and Artificial Intelligence. In Grand Challenges in Computing Research for 2010 and beyond. part of ACM-BCS Visions of Computer Science conference. 13-16 April 2010, Edinburgh University
2010From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News. In Proceedings of LREC 2010, Valetta, Malta, May 17-23, 2010.
2010Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic. In Proceedings of LREC 2010, Valetta, Malta, May 17-23, 2010.
2010L'intégration d'un outil de repérage d'entités nommées pour la langue arabe dans un système de veille. Session Démo, TALN 2010, Montréal, 19-23 juillet 2010
2010A Pilot Arabic Propbank; LREC 2008, Marrakech, Morocco, May 28-30, 2008
2008La campagne d'évaluation ARCADE II. In Chaudiron, S. & Choukri, K. (Eds.) L'évaluation des technologies de traitement de la langue (pp 47-69). Paris: Hermes Science Publications, IC2 Cognition Collection. ISBN 978-2-7462-1992-2.
2008Geocoding multilingual texts: Recognition, Disambiguation and Visualisation. Proceedings of the (LREC'2006), pp. 53-58. Genoa, Italy, 24-26 May 2006
2006Evaluation of multilingual text alignment systems: the ARCADE II project. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006). Genoa, Italy, 24-26 May 2006
2006Evaluation of multilingual text alignment systems: the ARCADE II project. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006). Genoa, Italy, 24-26 May 2006
2006Multilingual person name recognition and transliteration. Journal CORELA - Cognition, Représentation, Langage. Numéros spéciaux, Le traitement lexicographique des noms propres. Available online at: http://edel.univ- poitiers.fr/corela/document.php?id=490.
2005