Model Validation

Intro

This document presents various model validation statistics. These compare model results with observed data not used in estimation including counts of both roadway volumes and transit ridership. While sensitivity testing focuses on the model’s response to changes in inputs, validation measures the ability of an appropriately-sensitive model to accurately predict known conditions in the base year.

Roadway

The primary outputs of a travel demand model are roadway volume predictions. To help establish model validity, a prediction is made on a base year scenario for which traffic count data are collected. Tube counts and other methods for collecting roadway volumes have well-established error rates, so matching counts exactly is not the goal. Instead, target error thresholds have evolved over time for metrics like percentage difference and percentage root-mean-square-error (RMSE) between counts and volumes.

Importantly, target error rates fall as roadway volumes increase. Count error rates are lower for larger facilities, but additionally, regional travel models are best suited to measure flows on large facilities. Consequently, they are expected to perform best on large freeways and worst on local streets. As a point of fact, regional travel models should not be used to predict local street volumes).

The table below contains two measures of error:

  1. Percent Difference
  2. Percent Root-Mean-Square-Error

Percent difference is a straight-forward measure:

\(PctDiff = \frac{(\sum \hat{Y}_i - \sum Y_i)}{\sum Y_i} * 100\)

where:

\(\hat{Y_i}\) Estimated volume on link i (model)
\(Y_i\) Observed volume on link i (traffic count)

%RMSE is calculated as follows:

\(PRMSE = \frac{\sqrt{\frac{1}{N}\sum_{i=1}^N(\hat{Y_i}-Y_i)^2}}{\frac{1}{N}\sum_{i=1}^NY_i} * 100\)

The errors on each link between the model and traffic count are

  • Squared
  • Then averaged (mean)
  • Then the root is taken

This provides the RMSE, which is then divided by the average (mean) of all counts. The table below shows these two metrics by volume group and for the model overall. Different models have a different mix of counts by volume group. For example, model regions with limited counts will have a higher percentage of counts on high-volume roads like freeways. The Triangle boasts excellent count coverage, which means many more observations on smaller streets are collected. These differences are one of many reasons why overall model %RMSE can’t be used as a single measure of model quality. %RMSE by volume group is better, but still contains no information about model sensitivity.

With these caveats in place, the table below shows that the TRMG2 model is matching observed counts well. The overall % RMSE of 34.6 is particularly impressive given the large percentage of counts are less than or equal to 10,000 in volume.

Volume Group N Total Count Total Volume % Difference % RMSE
10000 2,546 10,230,705 9,619,155 -6.0 56.9
25000 1,065 16,944,684 16,158,240 -4.6 34.6
50000 406 13,741,466 13,935,428 1.4 24.5
100000 116 7,980,100 8,272,277 3.7 14.6
100000+ 60 7,986,500 8,023,426 0.5 8.0
All 4,193 56,883,455 56,008,526 -1.5 34.6

In addition to evaluation by volume group, the model is also evaluated by facility type. All links with the same facility type share important characteristics like volume-delay function (VDF) parameters, free-flow speed adjustments, and other attributes. Problems in this table would indicate the model’s facility type parameters (like VDF coefficients) may be biased.

HCMType N Total Count Total Volume % Difference % RMSE
Freeway 183 14,669,248 14,794,872 0.9 10.4
MLHighway 81 2,402,062 2,367,057 -1.5 17.1
TLHighway 81 752,820 746,628 -0.8 23.2
MajorArterial 768 19,552,270 19,113,638 -2.2 30.0
Arterial 1,556 13,999,227 13,659,930 -2.4 44.4
MajorCollector 272 1,460,400 1,398,020 -4.3 48.8
Collector 796 2,919,228 2,851,581 -2.3 60.4
Local 456 1,128,200 1,076,800 -4.6 75.9
All 4,193 56,883,455 56,008,526 -1.5 34.6

The following table shows the same statistics by facility type and area type. It is shown for completeness, but many of the combinations have few observations.

HCMType AreaType N Total Count Total Volume % Difference % RMSE
Freeway Downtown 10 986,800 1,025,165 3.9 9.4
Freeway Urban 42 4,274,500 4,160,486 -2.7 7.7
Freeway Suburban 85 7,254,334 7,412,256 2.2 11.5
Freeway Rural 46 2,153,614 2,196,965 2.0 11.1
MLHighway Urban 2 80,200 79,319 -1.1 16.9
MLHighway Suburban 35 1,373,100 1,391,764 1.4 16.5
MLHighway Rural 44 948,762 895,973 -5.6 16.0
TLHighway Suburban 13 164,800 159,425 -3.3 17.2
TLHighway Rural 68 588,020 587,203 -0.1 24.9
MajorArterial Downtown 142 3,048,600 2,888,822 -5.2 37.3
MajorArterial Urban 263 6,777,316 6,726,061 -0.8 31.0
MajorArterial Suburban 340 9,350,500 9,149,683 -2.1 26.6
MajorArterial Rural 23 375,854 349,072 -7.1 31.9
Arterial Downtown 61 749,498 764,466 2.0 43.5
Arterial Urban 222 2,805,220 2,856,839 1.8 46.9
Arterial Suburban 732 7,824,400 7,486,585 -4.3 39.4
Arterial Rural 541 2,620,109 2,552,039 -2.6 42.9
MajorCollector Downtown 6 28,900 10,310 -64.3 74.1
MajorCollector Urban 27 218,700 231,010 5.6 43.6
MajorCollector Suburban 110 783,600 721,332 -8.0 45.0
MajorCollector Rural 129 429,200 435,368 1.4 47.6
Collector Downtown 35 293,900 272,695 -7.2 54.6
Collector Urban 66 457,100 461,338 0.9 43.7
Collector Suburban 270 1,401,910 1,326,291 -5.4 52.8
Collector Rural 425 766,318 791,256 3.2 62.8
Local Downtown 61 279,200 257,784 -7.7 64.5
Local Urban 52 251,500 244,263 -2.9 56.3
Local Suburban 96 313,200 286,461 -8.5 65.0
Local Rural 247 284,300 288,292 1.4 89.2
All NA 4,193 56,883,455 56,008,526 -1.5 34.6

Another important check for the model is that aggregate regional flows are correct. These are checked using screen and cut lines, which aggregate counts based on geography. The map below shows the geographic locations of the screen lines used for TRMG2 validation. The odd shape of the lines is to ensure that, to the extent possible, the lines only cross links with counts on them. In this way, we can capture all flow across the line and compare it with matching count info.

The table below shows the comparison between model volumes and counts.

Screenline N Total Count Total Volume % Difference % RMSE
3 40 443,550 375,174 -15.4 40.1
6 69 1,419,300 1,451,974 2.3 31.2
10 30 694,100 689,186 -0.7 32.0

Screen line 3 is lower than desired, but given the relatively low total volume (for a screen line), it is still acceptable.

The map below shows the cut lines used to validate TRMG2.

The table shows count validation aggregated by cut line. Only cut line 18 shows any cause for concern. This is the cutline between Orange and Alamance counties. The model only contains a small piece of Alamance county and instead relies heavily on the external models for flow in this region. It is possible that improved external flow data could improve model performance in this area, but to truly get it right, the model would need to be expanded westward. (Caliper is not recommending this action.)

Screenline N Total Count Total Volume % Difference % RMSE
1 8 219,400 222,078 1.2 15.2
2 38 568,300 518,321 -8.8 31.8
4 26 317,700 324,423 2.1 23.8
5 4 44,300 52,779 19.1 24.0
7 31 646,800 664,273 2.7 36.8
8 29 500,100 433,006 -13.4 27.0
9 12 231,400 250,544 8.3 21.8
11 19 236,500 253,801 7.3 31.6
12 7 242,000 238,071 -1.6 17.1
13 9 429,800 399,494 -7.0 13.6
14 9 63,300 62,501 -1.3 21.1
15 12 55,800 60,433 8.3 18.9
16 7 36,300 40,841 12.5 37.7
17 19 336,400 374,368 11.3 36.5
18 11 129,400 161,185 24.6 62.1

Transit

Observed transit boarding data was incomplete for 2020. Instead, Caliper validated the transit ridership using the 2016 scenario and observed data. This is shown in the table below by agency. (DATA)

Agency Observed Boardings Model Boardings Difference % Difference
Chapel Hill Transit 26,444 24,425 -2,019 -8%
GoRaleigh 23,489 26,826 3,337 14%
GoDurham 21,602 23,383 1,781 8%
NCSU Wolfline 16,699 13,084 -3,615 -22%
Duke 13,602 7,729 -5,873 -43%
GoTriangle 9,691 13,680 3,989 41%
GoCary 1,003 2,137 1,134 113%
Total 112,530 111,264 -1,266 -1%

Overall, ridership is close to observed and the model performs well by agency.

For reference, the model predicts 119,000 riders in the 2020 base year scenario. Compared to 2016, the 2020 scenario has a higher population and increased slightly better transit service, which makes the 119,000 estimate reasonable.





TransCAD GIS Software, 2022