Visit the Data Analytics site for more information about the team's work. Some of our more recent projects and demos include:

NADEEF

NADEEF(which means ''clean'' in Arabic) is an extensible and generalized data cleaning system. Released as open source, NADEEF allows users to implement their own data repairing algorithms to replace default NADEEF data repair implementation.

Rayyan

The Systematic Reviews Web App. Rayyan aims to build tools to support the process of creating, analyzing, and maintaining systematic reviews, in terms of data extraction, cleaning, integration, and mining of published clinical trials and journal articles. A production system is available here.

KATARA

KATARA aims to perform trusted data cleaning by using reliable knowledge bases augmented with crowd sourcing for validation.

Analytics on Data Anomalies

Oftentimes users face errors in the results of a query. We introduce DBRx, a system for discovering concise explanations of data anomalies.

Web Data Integration

Web data is a great opportunity, but using it in analytics requires new solution to overcome the varierty and volatily. In this project we exploit web data for data integration tasks.

World Bank's Auto Geotagger

The World Bank's Auto Geotagger is a prototype which automatically identifies locations in documents from the World Bank Projects Data API using the Stanford Name Entity Recognizer (NER) and Alchemy, geocodes them with the Google Geocoder, Yahoo! Placefinder , and Geonames and visualizes them on a map.

Tamr

Tamr , a start-up founded in 2013, is based on technology developed at QCRI that allows for scalable data curation and integration by deduplicating the resulting dataset composite.