Better data, better cities
22 February, 2016
This is part of an interview series with Antti Jogi Poikola – an expert in digital collaboration and urban development.
Joakim Breitenstein (JB): In the first post of this series we discussed how to create a more dynamic feedback system. The second post was about how to improve the outcomes of dialogue by changing how we discuss the subject in the first place. This final post will focus on the possibilities and challenges of personal data. Antti, how do we generate personal data and how can it be utilized for urban planning purposes?
Antti Jogi Poikola (AJP): We leave a lot of digital footprints behind us. Records of our behavior and location are stored when we shop for groceries, travel with public transportation, visit the doctor or use more or less any personalized web service on our computers or app on our smartphones. Location and communication data is stored even when we use an old mobile phone without GPS. A lot of data is collected for different purposes but the most relevant information for urban planners is where a person is and for what reason.
JB: Is identity and its relation to data relevant from the point of view of urban planning?
AJP: In urban planning, it is often more than enough to produce anonymous mass analyses where the identity of the analyzed individuals is not known or relevant. For example, public transport can be optimized based on big data about passengers who leave digital traces about when and where they use their travel card. But even with these anonymous mass analyses there are challenges with privacy because it is possible to identify individuals by cross-analyzing the data. An example is from New York where the city’s taxi data was opened to the public. This produced a lot of interesting analyses but also controversy. Even if you don’t know the identity of the person behind the data, you can make conclusions about it by combining the seemingly anonymous data with some other information. In New York this resulted in celebrities being stalked which was obviously not the desired outcome of the project.
It is possible to aggregate data by, for instance, grouping all residents in one area together. This makes it difficult to recognize individuals and helps in solving the privacy issue but on the other hand the data becomes less valuable from the point of view of analytics. Let’s take the public transport example again. It is more useful if we, in a detailed way, understand how, when and why individuals move from one place to another than if we just know the aggregated numbers on how many people use public transport on which station. Precise data helps to optimize service and traffic networks and ultimately helps to create better cities for all of us. When it comes to urban planning, precise data is better than non-precise data but then again precise data is a bigger risk for privacy than non-precise data.
JB: Regarding the privacy and how-to-get-value-of-our-data issues, you are working on a project called MyData. The aim is to give people practical means to benefit from their own data, but what is the MyData project exactly?
AJP: We have laws that regulate how to use private data and, regardless of how valuable the data is from the point of view of analytics, these laws restrict distribution of this data. MyData aims to give the power to the people when it comes to utilizing our own data – be it for someone else or for ourselves to use. Let’s take an example from our everyday lives. In Finland, it is common that people have bonus cards for specific supermarket chains. The idea is to collect bonus points by swiping the card every time you shop in that specific supermarket. The system collects a lot of information about your shopping habits. MyData aims to give you the power on how to use this data. You could, for instance, allow a personal finance application to have access to this data and by combining data about your shopping habits with your financial data, it would be possible to generate analyses that give concrete suggestions on how to improve your money using habits. My own research is focused on urban development where, like in all industries, there is potential for a variety of new services that would make our lives easier. The creation of these services is easier if we as individuals can decide how our data is used.
JB: What challenges are there in implementing this new kind of access to our data?
AJP: If we see the data collector as trustworthy, we are usually willing to give our data for improvement purposes. In Finland, public organizations are often seen as more trustworthy than private corporations and a doctor may seem more trustworthy than an advertising agency. If we trust the data collector, the next step is to make it easy for people to share their data. It should not be more complicated than to push-a-button or similar in order for people to really participate in these kind of data collecting efforts. Finally, there is always a trade-off and we want something in return for our data. Examples of this are when we give our data to medical research or when a group of cyclists give their data to urban planning purposes. In return we get better medicine or better cycling routes. The return may be cold cash or discount in a service, but it does not have to be money and it is often enough if the purpose is something that people relate to.
JB: How does MyData compare to services that are already based on collecting data from their users, Facebook for instance?
AJP: The basic principle of MyData is that different chunks of data is created for different specific purposes. Supermarkets collect data about your shopping habits, banks collect data about your finances, bus companies collect data about your commuting habits and so on. The data is, as a starting point, always accessible as an own chunk of data, its own entity. The idea is that we ourselves would have the power to combine and distribute these different chunks of our personal data. Facebook is not MyData because we have not the power to decide how to use our Facebook data and we can not export our own data from the service.
JB: Does MyData aim to become a standard for how to collect and distribute personal data?
AJP: EU’s newly reformed data protection rules will come into force by 2018. The rules will set a minimum standard on how our personal data will become accessible to ourselves. In practice this means that if we ask our data from someone who has collected it, Facebook for instance, then Facebook is obliged to provide us with the data. The challenge is how to make it work smoothly, how to create a system that makes it is easy for people to request and process their data and also easy for corporations to give out the data. MyData aims to make this exchange of data easy for all parties. Companies have to make a choice whether they see the data exchange valuable for themselves or not. Many large corporations in Finland are involved in the MyData project and see it as an opportunity to create even more value for their customers. Therefore these corporations are willing to provide more than just the minimum standard set by the EU data protection rules. But there are also companies whose business is based on only themselves owning the data and these companies will likely just provide the minimum standard for data exchange.
JB: Why should we not use personal data in urban planning?
AJP: There are certainly challenges in utilizing personal data for planning purposes. Firstly, planners need to have skills in data analytics in order to find valid correlations that inform the planning process. This is not easy and there are many ways to go wrong. In addition planners must know how to evaluate whether the available data is comprehensive enough for the planning issue in question. Here the important thing is to know how to validate data. This means that a planner must go through certain check routines before he or she knows if the available data is good for the purpose to start with. Bad data produce bad conclusions and without validation the risk for bad conclusions increase. This is also known as “garbage in, garbage out”.
Another thing is the privacy issue that we already discussed on some level. In planning, the idea behind open data is to enable everyone to have access to the same planning data. Let’s take an imaginary example where a planning firm predicts, with the help of data analysis, that the traffic in an area will increase in the future. The planning firm makes the analysis and publishes the results in a pdf report which states “Based on our analysis, we recommend building a bridge here”. Then the city’s urban planning council decides that a bridge will be built in the area. The problem is that data analysis is always sensitive to assumptions. The conclusion that we should build a bridge is based on the planning firm’s assumptions. We, active citizens can not check up on the issue because we don’t have access to the planning firm’s data and therefore the assumptions behind the analysis. The answer to this would be to make all data open and accessible for anyone. However, the problem with completely open data is privacy as we already pointed out. If personal data is involved, the data can not be completely open and therefore completely open processes are not always possible.
JB: What can we expect from MyData in the context of urban planning and architecture?
AJP: In urban planning, MyData is closely linked to traffic related issues and an example is a new concept called Mobility as a Service or MaaS. The idea is to offer a holistic traffic service where the customer pays a monthly fee and is then allowed to use various transport services such as bicycles, public transport, taxis and rental cars as a bundled package. The customer wouldn’t need to own anything like a public transportation card, carsharing membership or a private car. Instead you would pay a fixed fee and then you could use any means of transport as you wish. Corporations like Google and Uber are making efforts to own this space but there are currently several initiatives in Finland that aim to compete with them from a slightly different angle.
JB: What has this got to do with MyData?
AJP: These new mobility operators need personal data to make their services more attractive by means of personalization and on the other hand to optimize their capacity by predicting users needs. MyData is a sustainable way to direct personal data to future transport and mobility services. If we know our location history and current location, it is possible to predict our next destination with a 75–90% accuracy. This means that personal location data makes it possible to optimize services like MaaS so that a taxi drops you off just in time for your train and then you have a bicycle waiting when you arrive. In addition it helps the service providers to optimize their own resources and therefore offer better, more affordable services. An example is from Istanbul, Turkey, where IBM has helped to build an optimized public transport network by analyzing mobile data from millions of users. Another thing that is closely linked to urban planning is the possibility to optimize locations for other public service networks like libraries, healthcare centers etc.
JB: How could personal data be utilized in building design?
AJP: Indoor positioning is an interesting opportunity for building designers. Sandy Pentland from MIT is a pioneer in related research and one of his projects was based on analyzing interaction between people in a workplace by monitoring their encounters. This generated interesting data to support workplace design. With this kind of data designers can predict and optimize interactions in the space that they are designing. In addition to informing the design of physical environments, this kind of data collecting can support designing different modes of operations.
JB: Where do you see the biggest opportunities for personal data utilization?
AJP: The virtue of personal data is personalization. In other words personal data management makes it possible for us to get the services we want easier and in a more automated way. For planning and design, it has the potential to make participation and user centered approaches easier because the useful data is collected automatically.