Data Scientist at Middle Eastern Studies Center (ORSAM), Ankara, Turkey
Dr. Hala Mulki is a data scientist interested in computational linguistics within the socio-political domain. She obtained her PhD in Computer Engineering (NLP) from Selçuk University, Turkey in 2019.
Previously, she has worked as a research and teaching assistant at Aleppo University, Syria. Her research interests include social media analysis, opinion mining, stance detection, Abusive language/Hate speech detection, and users' behavior analysis/prediction.
Let-Mi: A Levantine Twitter Dataset for Misogynistic Language
Women in the Arab region, especially those who work in digital media sector, are subjected to several types of online misogynistic speech, through which, gender inequality, violence, mistreatment, and underestimation of women are, unfortunately, reinforced and justified. The automatic detection of misogynistic language can facilitate the prohibition of anti-women toxic contents, enabling female journalists to express their opinions freely and practice their job normally and safely. Nevertheless, what hindered the development of Arabic misogyny detection systems is the lack of the needed Arabic resources annotated for misogynistic speech. In this study, we introduce the first Levantine Twitter dataset for Misogynistic language (LeT-Mi), which has been spotted against 8 Lebanese female journalists during October, 2019 protests in Lebanon. The proposed dataset was annotated for seven misogyny categories considering a new category inspired from the Arabic culture and the Lebanese social/political context, in particular. Moreover, the annotation guidelines, that were provided to 3 male/female Levantine native speakers annotators, took into account the abusive/misogynistic key terms/phrases that were emerged and used during the protests. The exploratory analysis of the annotated data indicated that female journalists are usually attacked for, merely, their gender identity regardless of the news/opinions they post or the TV channel they represent. On the other hand, the obtained annotations were analyzed for reliability using inter-annotator agreement metrics; while conflicts among annotators were investigated based on the annotator's gender. With the values of Krippendorff’s alpha (α) and Cohen’s Kappa (k) found equal to 82.9%, it could be deduced that the annotations are consistent and thus the proposed dataset can be considered reliable for misogyny detection in Levantine textual contents.