The volume of data that is created, captured, copied, and consumed continues to grow exponentially, from 2 zettabytes in 2010 to an expected 181 zettabytes in 2025 according to some estimates. Along with the volume, other features like the velocity of generation or the variety of the data and its sources are also increasing, as well as the development and sophistication of the techniques to generate, store, manage and analyze this data, resulting in what has been termed “Big Data” (Gandomi and Haider, 2015).
The data phenomenon has an eminently urban nature. First, a large part of the economic and social activity happens in cities, so, to the extent that this activity is recorded by companies, organizations, and governments through their usual data collection processes, it will be in reference to urban events or things happening in the city. Second, the combined development of four technologies (smartphones, sensors, geolocation, and internet connectivity) has opened avenues for the generation of novel data about urban phenomena. For example, Sentilo, Barcelona’s sensors platform, generates 3 million daily registers measuring energy, noise, rubbish, weather, parking areas, air quality, water levels and flows of bicycles, people and vehicles (Barcelona Balance del Plan Digital 2015-2019), things that went unrecorded only a few years ago. Likewise, data about phone usage, mobility and digital and physical activity can now be mapped to specific locations, generating vast amounts of data about human activities and their connection to urban space.
This has led some authors to state that Big Data is, by its intrinsic nature, an urban phenomenon (Bannister and O’Sullivan, 2021). That belief, accurate or not, can explain the interest of urban scholars in exploring (1) how can data advance our understanding of the city? And (2) how can data improve the way in which cities are run?
The Urban Studies Journal recently dedicated a special issue to “Big Data in the City” to explore the first question. I would like to highlight five examples referenced in that special issue that touch upon some of the key urban dynamics (urbanization patterns, mobility, built environment, housing, and policing):
· Combining quantitative and qualitative data of people’s everyday movements and the decision-making behind them, derived from volunteered geographic information from smartphones, Howe (2021) demonstrates the role that everyday movements play in driving urbanization processes.
· Ellison et al. (2021) combine data on police deployment patterns and unstructured textual incident narratives with traditional administrative data from emergency calls, and then apply artificial intelligence and multilevel modeling techniques to determine the effectiveness, efficiency, and fairness of urban policing.
· Candipan et al. (2021) use geotagged tweets to build a segregated mobility index, which captures the extent of connection between neighborhoods of different racial composition and derive a measure of segregated urban neighborhood networks.
· Images captured by Google Street View are processed through machine learning and computer vision algorithms by Wang and Vermeulen (2021) to analyze the built environment (specifically walking-related and mixed-use land infrastructures) and its influence in neighborhood vitality, measured through the survival rate of neighborhood-based social organizations in Amsterdam.
· Finally, Harten et al. (2021) use classified advertisements scrapped from the internet to analyze Shanghai’s informal housing market.
This last paper presents how the data used by the researchers misrepresented market data, showing the possibilities but also limitations of new sources of data. This is particularly relevant for governments, who have become increasingly obsessed with emerging technologies and sophisticated data applications, when often, they have not invested in the most basic building blocks and uses yet. Governments’ uses of data will be the topic of the second part of this post.