Social context study and data collection utilizing smart phones


Jeffrey NEWMAN


Mobile phone, a monitoring tool

In recent years there has been ever increasing desire among scientist to understand human behavior, utilizing real or almost real time monitoring tools. The mobile phone has become one of the widely utilized tools in these studies. The reason for the popularity of mobile phones among scientists working with human behavior is easy to explain. One can identify at least three good reasons why the mobile devices are having a unique position when monitoring human behavior.
Firstly, mobile phones are essential companions to most people; we carry our mobiles with us all the time. This provides an excellent opportunity to monitor humans in nearly continuous manner. The possibility to monitor people non-intrusively in real time without forcing them to carry extra measurement units is invaluable.
Secondly, modern mobile phones come with a large number of different sensors. These sensors provide information about location, acceleration, ongoing activities, the acoustic environment around you, and importantly for us, the social interactions you have. The continuous information provided by the diverse sensors, and the fusion of the sensor values with other data in the phone is the real enabler for social monitoring. This provides rich information about the daily activities people are conducting.
Thirdly, mobile devices are the central points of our social networks today. We use them not only to make calls and send SMS messages, but also to stay connected to our virtual networks, organize our daily life, and to search, create and consume information. This huge amount of social information we carry in our mobile phones is a goldmine for researchers pursuit to increase our understanding of social interactions, context, and behavior of people.

Social context study and data collection

It is obvious that mobile phones provide an excellent platform to monitor everyday life of people. We have been investigating the possibility of collecting a large amount of data from selected participants over substantially long time period to solve various and diverse research questions related to human behavior. After several rounds of software development to build a reliable software client and back-end solutions capable of collecting the desired data, we are now ready to launch the study. We aim to provide Nokia multimedia smartphones to 120-150 carefully selected participants and to collect data from them over 9 to 12 months, continuously.
The data collected consists of the accurate location information, cellular network information, phone calls, SMS messages, acceleration values, media information (e.g. what songs you play and where were the images taken), calendar information, active processes run in the device, Bluetooth and WiFi nodes detected, and acoustic environment information (background noise without speech content). We believe that all of this data will be useful in solving our research questions.

Protecting the privacy

Above all, we will ensure the privacy of our data collection participants. The data we will collect is purely and solely used only for research purposes. All participants we have in the campaign have volunteered to provide their valuable contributions without receiving monetary payment beyond the expenses incurred due to participation. And all the information is gathered to gain understanding of social interactions and daily behavior of the participants. The collected data set has been selected keeping in mind the privacy of the participants. In each collected data item the 1st priority has been the privacy - a fact forcing us to carefully think how we utilize the data, what risks could be involved, and is it really necessary to collect this information. If we were not able to justify the need of any data item we planned to collect with our research questions, or we were not sure how to adequately protect the privacy, that data item was not included in the campaign. All personally identifiable information is anonymized before the data is utilized in later research. In addition, all participants are able to see all the data we collect from them and can delete any data they desire at any time (including after the survey has been completed) or pause the recording if they will.

What do we want to find out from the data?

We want to utilize the collected data to model the linkage between the social interaction, time, and location. What is the importance of a specific location in maintaining, creating, and maybe losing your social connections? Do we tend to change the communication patterns or modality based on the context, location and time? What contexts and locations are the most prominent for social interactions? All the questions related to improving the modeling of the interplay of time, location and social interactions are of interest to us, and we sincerely believe in finding the answers from the collected data.

Behavioral models

Daniel Gatica-Perez and his group at Idiap in Martigny will investigate probabilistic methods to discover personal and social behavioral patterns from the rich, large-scale data we will collect. The research has two main goals. The first one is the development of algorithms to robustly represent human behavior at the personal and group level from raw sensor data, based on the integration of heterogeneous observation sources (location, motion, proximity, and communication). These behavioral descriptors would in principle provide short-term snapshots of the physical and social pace of people’s lives. The second goal is the development of machine learning methods to automatically discover personal routines (regularities in people’s lives over multiple time scales) and to discover and characterize groups from communication patterns, mutual proximity, and similar routines. The research aims at designing algorithms capable of responding to questions like: What are the common daily or weekly habits of a given phone user? Was today an unusual day for a certain person? How are the existing communities in the sensed population related to each other? The availability of real-life data for a large population over an extended period of time is key for this investigation.

Mobility Models

Once collected, the rich location and acceleration data from those mobile phones also represents a unique opportunity to build discrete choice models to predict the travel behavior of individuals. Normally, transportation survey data is self-reported by individuals through written diaries, which suffer from systemic biases, rounding and perceptual errors. The electronic data being collected through the phones, while not perfect, corrects many of these problems. In the TRANSP-OR lab at EPFL, we are working with the location information from the GPS logs to predict how people travel, by comparing our models’ predictions with prompted recall questionnaires taken by the participants the same days that they are carrying the experimental phone.
However, data quality from GPS receivers is still subject to errors from many sources, including the number of satellites in view, horizontal dilution of position (HDOP), satellite geometry, clock or receiver issues, atmospheric and ionospheric effects, multi-path signal reflection and signal blocking. In previous studies, location observations with low precision were discarded, and only those location points that are believed to be reasonably accurate were included, mapped to the nearest location on the transportation network, which may or may not be the correct location. Instead of discarding weak but potentially important data for route choice analysis, we are retaining this information (both the location and the amount of imprecision) to probabilistically generate a set of likelihoods for different (generally similar) routes.
We are analyzing these points not only through space, but also time. This allows us to link the spatio-temporal movements of individuals to the space and time projection of the transportation network. For example, in the figure below there is a sample network made up of links each having travel time of one minute. There are three routes through the sample network, and three observed GPS points. Through a traditional map-matching algorithm, we might conclude the traveler was using Route B, as point G1 is slightly closer to the vertical arc. A more general probabilistic representation might rule out Route C because of point G3, but would not differentiate much between the likelihoods of Routes A and B. By incorporating the time information, which will show a gap of about 1 minute or about 3 minutes between the recording of points G1 and G2, can we more conclusively identify the actually used route.

In this manner, we can discriminate between, for example, a traveler on a bus making regular stops at bus stops, from a traveler in a car, which also will pause in traffic or at signals, but not at regularly at the proscribed bus stop locations. In addition to allowing for a more precise discrimination of transportation routes, this method will mitigate potential biases that can be introduced in data post-processing, including those from map-matching.

Open innovation

The data set we will have after one year of monitoring the participant’s phones will be full of interesting information supporting diverse fields of scientific research. We hope that the effort we undertake today will in the future help many scientists who have previously lacked this type of rich data. To foster the open innovation in various fields of science, we intend to provide the data for research groups against valid research plan. In addition to the plan, the research group obviously needs to obey the data handling and privacy rules.


We are collecting a rich set of data from 120-150 volunteering participants with a strong social flavor during the 2nd half of 2009 and 1st half of 2010. This data is intended to be used to solve multi-disciplinary research problems reaching from social interaction models to mobility patterns. This is all enabled by modern mobile phones, which provide an excellent platform with multiple sensors to record human behavior. If you are interested in our data collection campaign, or you have any research questions where the data we collect can be valuable, feel free to contact us!

Cherchez ...

- dans tous les Flash informatique
(entre 1986 et 2001: seulement sur les titres et auteurs)
- par mot-clé


Cette page est un article d'une publication de l'EPFL.
Le contenu et certains liens ne sont peut-être plus d'actualité.


Les articles n'engagent que leurs auteurs, sauf ceux qui concernent de façon évidente des prestations officielles (sous la responsabilité du DIT ou d'autres entités). Toute reproduction, même partielle, n'est autorisée qu'avec l'accord de la rédaction et des auteurs.

Archives sur clé USB

Le Flash informatique ne paraîtra plus. Le dernier numéro est daté de décembre 2013.

Taguage des articles

Depuis 2010, pour aider le lecteur, les articles sont taggués:
  •   tout public
    que vous soyiez utilisateur occasionnel du PC familial, ou bien simplement propriétaire d'un iPhone, lisez l'article marqué tout public, vous y apprendrez plein de choses qui vous permettront de mieux appréhender ces technologies qui envahissent votre quotidien
  •   public averti
    l'article parle de concepts techniques, mais à la portée de toute personne intéressée par les dessous des nouvelles technologies
  •   expert
    le sujet abordé n'intéresse que peu de lecteurs, mais ceux-là seront ravis d'approfondir un thème, d'en savoir plus sur un nouveau langage.