Naive solution proposal for synthetic mobility data generation
Mobility data is the geographic locations of a device, passively produced through normal activity. It has important applications ranging from transportation planning to migration forecasting. As mobility data is rare and hard to collect, researchers have begun exploring solutions for synthetically generating it.
In this article, I will discuss a naive solution for generating synthetic mobility data. This synthetic data can be used for research purposes and for training / fine-tuning algorithms. For example, one can synthetically generate tagged mobility data, and train a model to forecast urban traffic congestion. Then, the trained model can be applied to real-life data.
The code can be found here and you can use this colab notebook to try it yourself.
The data to be synthetically generated will represent location data records that were collected from cell phone devices. Normally, such data contain the following attributes:
- phone_id — unique identifier of the cell phone
- phone_type — cell phone operating system (iOS / Android)
- timestamp (In epoch time)
- accuracy (in meters)
Pick a location in the USA and create a bbox (bounding box) of x meters. Next, get public data sets:
Create a bounding box
Get ArcGIS Residence Locations
Use the arcgis_rest_url to get buildings’ polygons within your bbox.
*Limited to a sample of 2000 polygons.
Get Kaggle POIs data sets
Use Kaggle API to download POIs data sets. Then parse it, load it to geopandas and filter the data set to points within bbox only.
Get OSM roads from Overpass API
Now we have all we need to create a phone timeline — Residence locations (will be used for stays at home, family, and friends’ homes), POIs locations (will be used for stores’ visits), and roads (will be used for drives between stays). Before generating the actual mobility data we will generate a synthetic timeline that holds the phone stays and their timeframes.
Synthetic Timeline Logic
The synthetic timeline logic will iterate over all days between the start date and the end date and randomize stays in the workplace, residence locations, and POIs. To promise normal human behavior, the logic will produce work stays on weekdays only and will ensure the user is getting back home for nighttime.
Before running the logic, make sure to:
- Set random home & work locations
- Set a timeframe (start date & end date)
- Set max POIs and max residence locations to be visited on a given day
The below gif shows the first day in our synthetic timeline
Our synthetic timeline is ready, and a new logic is needed to translate it to synthetic signals. The first event in our timeline is a homestay (00:00 -> 08:00), so let’s start with generating signals for this stay.
Static Mode Signals
The following script will produce a data frame of signals between the stay start and the stay end. The sampling rate (time intervals between adjacent signals) is a configurable parameter. I’ve set it to 600 sec (5 minutes). Each signal’s lat,lng will be noised with a random“noise factor”
Applying the logic on the first stay will result in the following output:
Drive Mode Signals
The next event on our timeline is a stay at “Residence 1290”, but before generating signals for this stay, we need to generate signals for the drive that brings our phone from its origin (home) to its destination (“Residence 1290”).
To do that, we will use the roads graph and look for the shortest path from origin to destination. Then, we will randomly generate signals upon the ordered road segments with a sampling rate of 60 seconds.
That’s how the synthetic drive signals look on a map:
Full Synthetic Mobility Data Generation
In our final step, we will iterate over all of our synthetic timeline. For each stay, we will generate static mode signals, and between every two stays, we will generate drive mode signals.
Boom! We now have full synthetic mobility data, produced by open-source packages and free data
Generate Synthetic Mobility Data Republished from Source https://towardsdatascience.com/generate-synthetic-mobility-data-a32894f1a253?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed