Model Validation
Intro
This document presents various model validation statistics. These compare model results with observed data not used in estimation including counts of both roadway volumes and transit ridership. While sensitivity testing focuses on the model’s response to changes in inputs, validation measures the ability of an appropriately-sensitive model to accurately predict known conditions in the base year.
Roadway
The primary outputs of a travel demand model are roadway volume predictions. To help establish model validity, a prediction is made on a base year scenario for which traffic count data are collected. Tube counts and other methods for collecting roadway volumes have well-established error rates, so matching counts exactly is not the goal. Instead, target error thresholds have evolved over time for metrics like percentage difference and percentage root-mean-square-error (RMSE) between counts and volumes.
Importantly, target error rates fall as roadway volumes increase. Count error rates are lower for larger facilities, but additionally, regional travel models are best suited to measure flows on large facilities. Consequently, they are expected to perform best on large freeways and worst on local streets. As a point of fact, regional travel models should not be used to predict local street volumes).
The table below contains two measures of error:
- Percent Difference
- Percent Root-Mean-Square-Error
Percent difference is a straight-forward measure:
\(PctDiff = \frac{(\sum \hat{Y}_i - \sum Y_i)}{\sum Y_i} * 100\)
where:
\(\hat{Y_i}\) Estimated volume on link i (model)
\(Y_i\) Observed volume on link i (traffic count)
%RMSE is calculated as follows:
\(PRMSE = \frac{\sqrt{\frac{1}{N}\sum_{i=1}^N(\hat{Y_i}-Y_i)^2}}{\frac{1}{N}\sum_{i=1}^NY_i} * 100\)
The errors on each link between the model and traffic count are
- Squared
- Then averaged (mean)
- Then the root is taken
This provides the RMSE, which is then divided by the average (mean) of all counts. The table below shows these two metrics by volume group and for the model overall. Different models have a different mix of counts by volume group. For example, model regions with limited counts will have a higher percentage of counts on high-volume roads like freeways. The Triangle boasts excellent count coverage, which means many more observations on smaller streets are collected. These differences are one of many reasons why overall model %RMSE can’t be used as a single measure of model quality. %RMSE by volume group is better, but still contains no information about model sensitivity.
With these caveats in place, the table below shows that the TRMG2 model is matching observed counts well. The overall % RMSE of 34.6 is particularly impressive given the large percentage of counts are less than or equal to 10,000 in volume.
Volume Group | N | Total Count | Total Volume | % Difference | % RMSE |
---|---|---|---|---|---|
10000 | 2,546 | 10,230,705 | 9,619,155 | -6.0 | 56.9 |
25000 | 1,065 | 16,944,684 | 16,158,240 | -4.6 | 34.6 |
50000 | 406 | 13,741,466 | 13,935,428 | 1.4 | 24.5 |
100000 | 116 | 7,980,100 | 8,272,277 | 3.7 | 14.6 |
100000+ | 60 | 7,986,500 | 8,023,426 | 0.5 | 8.0 |
All | 4,193 | 56,883,455 | 56,008,526 | -1.5 | 34.6 |
In addition to evaluation by volume group, the model is also evaluated by facility type. All links with the same facility type share important characteristics like volume-delay function (VDF) parameters, free-flow speed adjustments, and other attributes. Problems in this table would indicate the model’s facility type parameters (like VDF coefficients) may be biased.
HCMType | N | Total Count | Total Volume | % Difference | % RMSE |
---|---|---|---|---|---|
Freeway | 183 | 14,669,248 | 14,794,872 | 0.9 | 10.4 |
MLHighway | 81 | 2,402,062 | 2,367,057 | -1.5 | 17.1 |
TLHighway | 81 | 752,820 | 746,628 | -0.8 | 23.2 |
MajorArterial | 768 | 19,552,270 | 19,113,638 | -2.2 | 30.0 |
Arterial | 1,556 | 13,999,227 | 13,659,930 | -2.4 | 44.4 |
MajorCollector | 272 | 1,460,400 | 1,398,020 | -4.3 | 48.8 |
Collector | 796 | 2,919,228 | 2,851,581 | -2.3 | 60.4 |
Local | 456 | 1,128,200 | 1,076,800 | -4.6 | 75.9 |
All | 4,193 | 56,883,455 | 56,008,526 | -1.5 | 34.6 |
The following table shows the same statistics by facility type and area type. It is shown for completeness, but many of the combinations have few observations.
HCMType | AreaType | N | Total Count | Total Volume | % Difference | % RMSE |
---|---|---|---|---|---|---|
Freeway | Downtown | 10 | 986,800 | 1,025,165 | 3.9 | 9.4 |
Freeway | Urban | 42 | 4,274,500 | 4,160,486 | -2.7 | 7.7 |
Freeway | Suburban | 85 | 7,254,334 | 7,412,256 | 2.2 | 11.5 |
Freeway | Rural | 46 | 2,153,614 | 2,196,965 | 2.0 | 11.1 |
MLHighway | Urban | 2 | 80,200 | 79,319 | -1.1 | 16.9 |
MLHighway | Suburban | 35 | 1,373,100 | 1,391,764 | 1.4 | 16.5 |
MLHighway | Rural | 44 | 948,762 | 895,973 | -5.6 | 16.0 |
TLHighway | Suburban | 13 | 164,800 | 159,425 | -3.3 | 17.2 |
TLHighway | Rural | 68 | 588,020 | 587,203 | -0.1 | 24.9 |
MajorArterial | Downtown | 142 | 3,048,600 | 2,888,822 | -5.2 | 37.3 |
MajorArterial | Urban | 263 | 6,777,316 | 6,726,061 | -0.8 | 31.0 |
MajorArterial | Suburban | 340 | 9,350,500 | 9,149,683 | -2.1 | 26.6 |
MajorArterial | Rural | 23 | 375,854 | 349,072 | -7.1 | 31.9 |
Arterial | Downtown | 61 | 749,498 | 764,466 | 2.0 | 43.5 |
Arterial | Urban | 222 | 2,805,220 | 2,856,839 | 1.8 | 46.9 |
Arterial | Suburban | 732 | 7,824,400 | 7,486,585 | -4.3 | 39.4 |
Arterial | Rural | 541 | 2,620,109 | 2,552,039 | -2.6 | 42.9 |
MajorCollector | Downtown | 6 | 28,900 | 10,310 | -64.3 | 74.1 |
MajorCollector | Urban | 27 | 218,700 | 231,010 | 5.6 | 43.6 |
MajorCollector | Suburban | 110 | 783,600 | 721,332 | -8.0 | 45.0 |
MajorCollector | Rural | 129 | 429,200 | 435,368 | 1.4 | 47.6 |
Collector | Downtown | 35 | 293,900 | 272,695 | -7.2 | 54.6 |
Collector | Urban | 66 | 457,100 | 461,338 | 0.9 | 43.7 |
Collector | Suburban | 270 | 1,401,910 | 1,326,291 | -5.4 | 52.8 |
Collector | Rural | 425 | 766,318 | 791,256 | 3.2 | 62.8 |
Local | Downtown | 61 | 279,200 | 257,784 | -7.7 | 64.5 |
Local | Urban | 52 | 251,500 | 244,263 | -2.9 | 56.3 |
Local | Suburban | 96 | 313,200 | 286,461 | -8.5 | 65.0 |
Local | Rural | 247 | 284,300 | 288,292 | 1.4 | 89.2 |
All | NA | 4,193 | 56,883,455 | 56,008,526 | -1.5 | 34.6 |
Another important check for the model is that aggregate regional flows are correct. These are checked using screen and cut lines, which aggregate counts based on geography. The map below shows the geographic locations of the screen lines used for TRMG2 validation. The odd shape of the lines is to ensure that, to the extent possible, the lines only cross links with counts on them. In this way, we can capture all flow across the line and compare it with matching count info.
The table below shows the comparison between model volumes and counts.
Screenline | N | Total Count | Total Volume | % Difference | % RMSE |
---|---|---|---|---|---|
3 | 40 | 443,550 | 375,174 | -15.4 | 40.1 |
6 | 69 | 1,419,300 | 1,451,974 | 2.3 | 31.2 |
10 | 30 | 694,100 | 689,186 | -0.7 | 32.0 |
Screen line 3 is lower than desired, but given the relatively low total volume (for a screen line), it is still acceptable.
The map below shows the cut lines used to validate TRMG2.
The table shows count validation aggregated by cut line. Only cut line 18 shows any cause for concern. This is the cutline between Orange and Alamance counties. The model only contains a small piece of Alamance county and instead relies heavily on the external models for flow in this region. It is possible that improved external flow data could improve model performance in this area, but to truly get it right, the model would need to be expanded westward. (Caliper is not recommending this action.)
Screenline | N | Total Count | Total Volume | % Difference | % RMSE |
---|---|---|---|---|---|
1 | 8 | 219,400 | 222,078 | 1.2 | 15.2 |
2 | 38 | 568,300 | 518,321 | -8.8 | 31.8 |
4 | 26 | 317,700 | 324,423 | 2.1 | 23.8 |
5 | 4 | 44,300 | 52,779 | 19.1 | 24.0 |
7 | 31 | 646,800 | 664,273 | 2.7 | 36.8 |
8 | 29 | 500,100 | 433,006 | -13.4 | 27.0 |
9 | 12 | 231,400 | 250,544 | 8.3 | 21.8 |
11 | 19 | 236,500 | 253,801 | 7.3 | 31.6 |
12 | 7 | 242,000 | 238,071 | -1.6 | 17.1 |
13 | 9 | 429,800 | 399,494 | -7.0 | 13.6 |
14 | 9 | 63,300 | 62,501 | -1.3 | 21.1 |
15 | 12 | 55,800 | 60,433 | 8.3 | 18.9 |
16 | 7 | 36,300 | 40,841 | 12.5 | 37.7 |
17 | 19 | 336,400 | 374,368 | 11.3 | 36.5 |
18 | 11 | 129,400 | 161,185 | 24.6 | 62.1 |
Transit
Observed transit boarding data was incomplete for 2020. Instead, Caliper validated the transit ridership using the 2016 scenario and observed data. This is shown in the table below by agency. (DATA)
Agency | Observed Boardings | Model Boardings | Difference | % Difference |
---|---|---|---|---|
Chapel Hill Transit | 26,444 | 24,425 | -2,019 | -8% |
GoRaleigh | 23,489 | 26,826 | 3,337 | 14% |
GoDurham | 21,602 | 23,383 | 1,781 | 8% |
NCSU Wolfline | 16,699 | 13,084 | -3,615 | -22% |
Duke | 13,602 | 7,729 | -5,873 | -43% |
GoTriangle | 9,691 | 13,680 | 3,989 | 41% |
GoCary | 1,003 | 2,137 | 1,134 | 113% |
Total | 112,530 | 111,264 | -1,266 | -1% |
Overall, ridership is close to observed and the model performs well by agency.
For reference, the model predicts 119,000 riders in the 2020 base year scenario. Compared to 2016, the 2020 scenario has a higher population and increased slightly better transit service, which makes the 119,000 estimate reasonable.
TransCAD GIS Software, 2022