Visit the Data Analytics site for more information about the team's work. Some of our more recent projects and demos include:
- NADEEF (which means ''clean'' in Arabic) is an extensible and generalized data cleaning system. Released as open source, NADEEF allows users to implement their own data repairing algorithms to replace default NADEEF data repair implementation.
- Research paper (SIGMOD)
- Dashboard (via Github)
- Rayyan: The Systematic Reviews Web App. Rayyan aims to build tools to support the process of creating, analyzing, and maintaining systematic reviews, in terms of data extraction, cleaning, integration, and mining of published clinical trials and journal articles. A production system is available here.
- KATARA aims to perform trusted data cleaning by using reliable knowledge bases augmented with crowd sourcing for validation.
- Analytics on Data Anomalies. Oftentimes users face errors in the results of a query. We introduce DBRx, a system for discovering concise explanations of data anomalies.
- Web Data Integration. Web data is a great opportunity, but using it in analytics requires new solution to overcome the varierty and volatily. In this project we exploit web data for data integration tasks.
- The World Bank's Auto Geotagger is a prototype which automatically identifies locations in documents from the World Bank Projects Data API using the Stanford Name Entity Recognizer (NER) and Alchemy, geocodes them with the Google Geocoder, Yahoo! Placefinder , and Geonames and visualizes them on a map.
- Tamr, a start-up founded in 2013, is based on technology developed at QCRI that allows for scalable data curation and integration by deduplicating the resulting dataset composite.