Data Specification
Definition and preprocessing of available and necessary data to create a synthetic population.
Available data (as of 7. 12. 2021):
SLDB 2011 - Full census (Praha + stredni Cechy)
Česko v Pohybu - travel diary (2017-2019)
SHP files (2021)
Population address point - residences with average number of residents in Prague
Amenities - commercial, civic, recreational POI - [x, y] coordinates
Zones in Prague - ZSJ (základní sídelní jednotka) - 687 zones
Additional files
Mapping of ZSJ to districts (Praha 1-22, ...) - 57 districts
Mapping between districts (census (57) <-> travel diary (cca 30) districts)
Probability of commute between zones to work or education (from SLDB 2011)
Pipeline overview
Clean census -> one census person = individual of the synthetic population
Clean travel diary -> travelers with travel day plan and district of residence
Statistical match of a census person with a traveler (based on age, sex, employment, and district of residence) -> individual has an activity chain (travelling plan for a day) copied from the matched traveler
Assign home coordinates to each individual based on zone of residence -> individual has a home location based on which all other coordinates for other destinations are determined
Assign coordinates to destinations (work, education, shop, other, leisure) for each individual based:
Amenities
Travel mode
Trip duration
Beeline distance
Probability of commute between zones to work or education
Home coordinates
Convert to MATSim XML population file with travel demands
Data Preprocessing
SLDB
entries: 2551962 Prague entries: 1268463
Steps:
Filter by region 3018
Drop NA in ['employment', 'sex', 'zone_id']
Fill age median grouped by ['sex', 'employment']
Fill most frequent employment grouped by ['sex', 'age']
Česko v Pohybu
Households: # entries 1509
Travelers: # 3418
Trips: # 8219
Households
Steps: Sum # of cars (private, company, utility, other)) in car_number
Travelers
Steps:
Drop NA in ['sex', 'employment']
Fill age median grouped by ['sex', 'employment']
Fill NA using sociodemographics
If age > 18 and driving_license is NA -> driving_license = True
If age < 18 and driving_license is NA -> driving_license = False
If age < 16 or age > 64 -> pt_avail = True
Trips
Steps:
Drop NA in [traveler_id]
Drop NA in [departure_h, departure_m, arrival_h, arrival_m, origin_purpose, destination_purpose, traveling_mode]
Delete travelers who do not have any destination ‘home’
Keep only travelers where ‘_purpose’ == ‘home’ and ‘_code’ == 1000
Fill NA in ‘*_code’ for missing commute home-work, home-education (and vice versa)
Impute driving mode "other" by mode of groupby by traveling speed.
Last updated