Researchers Identify Coverage Disparities in Spatial Data

Nov 16, 2022By Thomas Hipchen

Location-based games, mapping and navigation, autonomous vehicles and other spatial-based intelligent technologies are used by millions of people each day. However, the crowdsourced data on which these systems rely may have potential issues that could impact their safety.

As part of the CASMI research project “Towards Contextualized Road Safety Conditions,” principal investigator Jacob Thebault-Spieker leads a team that is developing an intelligent system capable of predicting the safety conditions of roads in real-time.

“We are finding, somewhat expectedly, that some of the underlying data we were intending to rely on has incomplete coverage, and that data about some types of contextual and environmental factors that relate to road safety is unavailable in many places,” said Thebault-Spieker.

An assistant professor in the Information School at University of Wisconsin – Madison, Thebault-Spieker focuses on human-computer interaction (HCI) and social computing in his research work. The road conditions project is the perfect avenue to explore how humans interact with systems as well as the benefits and drawbacks of crowdsourced data.

While crowdsourcing can provide overall scale for data gathering, researchers found disparities in the available data for urban and rural localities. Data over- and underrepresentation both have significant drawbacks and these issues are compounded by the rapid rate at which heavily-localized data like road or sign visibility decays.

Yaxuan Yin, a PhD student and researcher on the project, spoke to the consequences of disparate representation in data. “When certain perspectives are overrepresented in data, this can serve as a form of ‘data stereotyping’. Algorithms might rely on the kinds of safety information that happens in urban areas, and ignore the kinds of things that could happen in rural areas (e.g. deer don’t often jump out in front of cars in Manhattan).”

These findings present wide-reaching implications for the safety of all intelligence systems but especially those built upon spatial and crowdsourced data including emergency response and content moderation. “Bad” or incomplete data in location and navigational systems is a critical issue because of the immediacy and nature of its consequences. Guiding and directing individuals through physical space with faulty or inaccurate information could put them in directly in harm’s way in the most literal sense.

Learn more about the “Towards Contextualized Road Safety Conditions” project and our additional ongoing research and findings on our research page.