Case Studies

Research studies used as a basis for the AMOS project.

Sao Paulo

Census Data

Sociodemographic information of people and households (HH) - scaled to 10%

Cleaning a. Original data

df_census.columns = ["federationCode", "areaCode", "householdWeight", "metropolitanRegion", "personNumber", "gender", "age", "goingToSchool", "employment", "onLeave", "helpsInWork", "farmWork", "householdIncome", "motorcycleAvailability", "carAvailability", "numberOfMembers"]

b. Spatial editing

df_census["zone_id"] = df_census["areaCode"]

c. cleaned

df = df[["person_id", "household_id","weight","zone_id","residence_area_index","age", "sex", "employment", "binary_car_availability","household_size", "household_income"]]

Entry HHi is multiplicated by weight wi (which indicates how many households a specific entry represents, use stochastic rounding for floats) ~ “copy” each household wi times. (Mind the sampling rate (direct or not.)
Income information is in the census - divided into bins

Zones

data OSM, zones, schools (shapefiles, csv)

From zones in shapefiles (use geopandas, set current coordinate system crs, transform to different coord system), set zone id

df_zones_census_dissolved = df_zones_census_dissolved[['geometry', 'AP_2010_CH']]
df_zones_census_dissolved.columns = [ "geometry", "zone_id"]

Extract roads from OSM: x, y, purpose
Create “opportunities” - offer work, offer houses - transform geometries to a new coordinate reference system (geopandas)
Add schools to opportunities - transform geometries to a new coordinate reference system (geopandas)
a.
```
df_facilities_education["offers_work"] = True
```
b.
```
df_facilities_education["offers_other"] = True
```
From shapefile of zones - transform geometries to a new coordinate reference system (geopandas): geometry, zone_id

Household Travel Survey

data household travel survey ~ contains 84 889 samples which are weighted, so that the total weight sum amounts to 20 508 979, more or less the number of inhabitants in the area in 2017.

Cleaning (removing NaN, duplicates), remapping categories
Divide into two dataframes - person, trips
Remapping categories (work - employed, not employed, student; trip purpose - home, leisure, shop, work; mode - pt, car, car-passenger...)

Create point from home_coord for each person + remapping to correct coordinate system

df_persons["geometry"] = [geo.Point(*xy) for xy in zip(df_persons["homeCoordX"], df_persons["homeCoordY"])]

df_geo = gpd.GeoDataFrame(df_persons, crs = {"init" : "EPSG:29183"})

Map home coords (points) of each person to zones (poly) -> home_zones

home_zones = gpd.sjoin(df_geo[["person_id","geometry"]], df_zones[["zone_id","geometry"]], op = "within",how="left")

Generating areas a.

sp_area = [3 * (z in center) + 2 * (z in city and z not in center) + 1 * (z in region and z not in city) for z in zone_id]

df_persons["residence_area_index"] = sp_area

Generating trips
1. origin and destination purpose (shop, work, ...), mode, zones, …
2. Remove trips from a place to the same place
3. Remove trips not starting at home, remove trips not ending at home
4. Calculate activity duration
5. Spatial join origin & destination coords with zones
Output
- persons.csv
- trips.csv

OD matrices An origin–destination matrix is a matrix in which each cell represents the number of trips from an origin zone (given by the corresponding row of the matrix) to a destination zone (column), or the percentage of trips starting in the origin zone that reach the destination zone. Those matrices can be created from the household travel survey. In this study, one weighted origin–destination matrix was generated for work trips.

Paris and Ile-de-France

Article GitHub

Data

Spatial zoning system
Census data
- Large microsample of the population - 30% households in France
- Commuting relations (flow matrix, moving for work and education purpose) between municipalities
- Aggregated zonal information
Household income
Household travel survey
- Detailed activity chain for one reference person (what activities, when, how the person moved between activities)
Locations at which activities can take place - address, coordinates,
- Work - number of employees
- Location of educational facilities

socio demographic information and residence locations of households and persons are generated - municipality or [area] is defined for each synthetic person
income information is added to each household
activity chains are attached to the synthetic persons - statistical matching based on the correlation of daily activity patterns and sociodem. attributes
places of work and education are assigned - primary location assignment, using commuting matrix
1. Correct number of people should commute from one municipality to another in the population
2. The commute distance should fit the assigned activity chain
the locations of all other activities in the persons’ activity chains are chosen

Ústí nad Labem

Paper

Input Data

full anonymous census 2011 with household information
CSU natality and mortality data between 2011 and 2016
National HTS 2016
City HTS 2016
Additional facility data - Registr sčítacích obvodů a budov

Processing Steps

Zoning data
stochastic simulation of several demographic transition processes that update census 2011 data (based on natality and mortality rates, residential mobility)
Clean raw travel survey data
Trips going in/out of the catchment area will start/end at the "city gates"
merge census and HTS - they use two different HTS data (2 sets of mandatory/preferred columns while matching with hot-deck)
Get facilities/build data - building purpose, activity sector (households, industry, agriculture, forestry, transportation, utilities, hospitality, administrative, public services, ...)
assign facilities to zones and classify them according to trip purposes
home and primary locations (work, education) - based on OD pairs from merged census and HTS and facility data
impute secondary locations

PreviousSynthetic Population NextData Specification

Last updated 2 years ago