As of July 26, 2020, Qatar has had the second-highest number of patients for novel coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), in the Middle-East after Saudi Arabia, with more than 109,000 affected people. The first case of the novel coronavirus in Qatar was reported on February 27, and the Qatari government’s first measure was to evacuate all Qatari nationals from Iran. Then, as an extension of its precautionary steps, Qatar introduced a temporary travel restriction on several international destinations in mid-March 2020 . In the meantime, considering the severity of COVID-19, the World Health Organization (WHO) declared the COVID-19 outbreak as a pandemic on March 12, 2020.
COVID-19 Viral Strains
Currently, no antiviral drugs exist that show any confirmed clinical efficacy, nor any vaccines for its prevention (though multiple vaccines are at different stages of clinical trials). These efforts are hindered by the limited knowledge we have of the molecular details and evolutionary relationship among the strains of novel coronavirus. To fill this gap, different laboratories across the world have produced a huge number of sequences from viral strains. As of July 2020, more than 75,000 high-quality complete sequences of novel coronavirus (SARS-CoV-2) have been identified, which encompass more than two billion sequence bases for multiple strains of this novel coronavirus. To understand the evolution and spread across different countries from this huge data source, our research leveraged machine learning techniques for the identification of representative novel coronavirus strains.
For this purpose, we analyzed SARS-CoV-2 strains from nearly 150 locations across the world, including multiple locations in China. Traditionally, alignment-based methods are used to understand the phylogenetic relationship (inference of evolutionary history and relationships among or within groups of organisms) of the viral strains. However, calculating the phylogenetic relationship based on genomic sequences is computationally expensive and requires a lot of memory. Consequently, alignment-free methods are getting attention in the scientific community to compare viral sequences as well as in constructing the phylogenetic relationship. As part of this study, an alignment-free phylogenetic analysis was carried out to uncover the evolutionary relationship among the strains of SARS-CoV-2 in all countries in the world. We leveraged the power of machine learning techniques to identify the most representative strain from multiple locations across the world. Based on our computational workflow, we identified a single representative strain from each location and built the phylogenetic tree to discover their evolutionary relationship (Figure 1).
Findings in Qatar
Our analysis revealed that the representative strain in Qatar is very similar to other strains found in Guangdong (China), Philippines, and India. Although these countries do not share a geographical border with Qatar, there are many expats from China, Philippines, and India who are also residents of Qatar. Though Qatar’s initial case was reported from Iran, subsequent cases might have originated from these countries. Our estimated phylogenetic tree places the representative strain of Qatar very close to England, Hong Kong, and Wales as well. Interestingly, strains from other Gulf countries (Saudi Arabia, United Arab Emirates, Kuwait, Oman) were under the same clade of the phylogenetic tree but proved quite far from the clade of the representative strain of Qatar. This is understandable as these Gulf countries share borders among them and there is a lot of interaction for day-to-day activities for common people; but due to the diplomatic rift, travel from Qatar has been restricted to many of these countries.
In summary, we leveraged the power of machine learning techniques along with the alignment-free methods to identify the most representative strain from multiple locations across the world including Qatar. The phylogenetic analysis reveals intriguing relationships among the strains across the world including the Gulf countries. Further analysis of the selected strains would be warranted in the near future.
The College of Science and Engineering (CSE), part of Hamad Bin Khalifa University, is celebrating its most successful cycle of the Qatar National Research Fund (QNRF) Graduate Student Research Award (GSRA) program yet, with 17 students receiving a scholarship.