In the last 20 years, geospatial data (extracted from GPS traces, geo-tagged social media, weather maps, natural disasters, satellites imagery, and epidemic situations) has become wildly ubiquitous. This has led to the rise of spatial data science as a field, which usually refers to extracting meaningful information from geospatial data. However, the lack of scalability and interactivity in state-of-the-art spatial data systems makes it extremely difficult for a data scientist to store, retrieve, explore, analyze, visualize, and learn from large-scale geospatial data.
This webinar will shed light on GeoSpark, an open source data system that builds upon the core engine of Apache Spark to efficiently process large-scale geospatial data in a cluster computing environment.
Internally, GeoSpark represents geospatial data as a SpatialRDD, which is tailored for Apache Spark in-memory data processing paradigm. GeoSpark allows users to write their spatial data processing tasks in Spatial SQL, compiles the input SQL into a set of optimized SpatialRDD operations, and finally executes such operations in the cluster.
Mohamed Sarwat, assistant professor at Arizona State University, will give an overview of Hippo a lightweight indexing scheme that outperforms de-facto database indexes such B-tree and R-tree in terms of storage and maintenance overhead, while still executing range queries at a comparative performance to such indexes.
Furthermore, a data scientist may sometimes allow for a slight trade-off between the accuracy and scalability of the analysis. To allow for such trade-off, Sarwat will present a sampling middleware system called Tabula, which sits between the data system and the data science tool to make the inherently iterative human-in-the-loop analysis process more seamless and interactive.
Mohamed Sarwat, Assistant Professor of Computer Science, Arizona State University
Mohamed Sarwat is an assistant professor of computer science at Arizona State University. His general research interest lies in developing robust and scalable data systems. Besides impact through scientific publications, Sarwat is also the co-architect of several software artifacts.