Model Validation

Intro

This document presents various model validation statistics. These compare model results with observed data not used in estimation including counts of both roadway volumes and transit ridership. While sensitivity testing focuses on the model’s response to changes in inputs, validation measures the ability of an appropriately-sensitive model to accurately predict known conditions in the base year.

Roadway

The primary outputs of a travel demand model are roadway volume predictions. To help establish model validity, a prediction is made on a base year scenario for which traffic count data are collected. Tube counts and other methods for collecting roadway volumes have well-established error rates, so matching counts exactly is not the goal. Instead, target error thresholds have evolved over time for metrics like percentage difference and percentage root-mean-square-error (RMSE) between counts and volumes.

Importantly, target error rates fall as roadway volumes increase. Count error rates are lower for larger facilities, but additionally, regional travel models are best suited to measure flows on large facilities. Consequently, they are expected to perform best on large freeways and worst on local streets. As a point of fact, regional travel models should not be used to predict local street volumes).

The table below contains two measures of error:

Percent Difference
Percent Root-Mean-Square-Error

Percent difference is a straight-forward measure:

\(PctDiff = \frac{(\sum \hat{Y}_i - \sum Y_i)}{\sum Y_i} * 100\)

where:

\(\hat{Y_i}\) Estimated volume on link i (model)
\(Y_i\) Observed volume on link i (traffic count)

%RMSE is calculated as follows:

\(PRMSE = \frac{\sqrt{\frac{1}{N}\sum_{i=1}^N(\hat{Y_i}-Y_i)^2}}{\frac{1}{N}\sum_{i=1}^NY_i} * 100\)

The errors on each link between the model and traffic count are

Squared
Then averaged (mean)
Then the root is taken

This provides the RMSE, which is then divided by the average (mean) of all counts. The table below shows these two metrics by volume group and for the model overall. Different models have a different mix of counts by volume group. For example, model regions with limited counts will have a higher percentage of counts on high-volume roads like freeways. The Triangle boasts excellent count coverage, which means many more observations on smaller streets are collected. These differences are one of many reasons why overall model %RMSE can’t be used as a single measure of model quality. %RMSE by volume group is better, but still contains no information about model sensitivity.

With these caveats in place, the table below shows that the TRMG2 model is matching observed counts well. The overall % RMSE of 34.6 is particularly impressive given the large percentage of counts are less than or equal to 10,000 in volume.

Volume Group	N	Total Count	Total Volume	% Difference	% RMSE
10000	2,546	10,230,705	9,619,155	-6.0	56.9
25000	1,065	16,944,684	16,158,240	-4.6	34.6
50000	406	13,741,466	13,935,428	1.4	24.5
100000	116	7,980,100	8,272,277	3.7	14.6
100000+	60	7,986,500	8,023,426	0.5	8.0
All	4,193	56,883,455	56,008,526	-1.5	34.6

In addition to evaluation by volume group, the model is also evaluated by facility type. All links with the same facility type share important characteristics like volume-delay function (VDF) parameters, free-flow speed adjustments, and other attributes. Problems in this table would indicate the model’s facility type parameters (like VDF coefficients) may be biased.

HCMType	N	Total Count	Total Volume	% Difference	% RMSE
Freeway	183	14,669,248	14,794,872	0.9	10.4
MLHighway	81	2,402,062	2,367,057	-1.5	17.1
TLHighway	81	752,820	746,628	-0.8	23.2
MajorArterial	768	19,552,270	19,113,638	-2.2	30.0
Arterial	1,556	13,999,227	13,659,930	-2.4	44.4
MajorCollector	272	1,460,400	1,398,020	-4.3	48.8
Collector	796	2,919,228	2,851,581	-2.3	60.4
Local	456	1,128,200	1,076,800	-4.6	75.9
All	4,193	56,883,455	56,008,526	-1.5	34.6

The following table shows the same statistics by facility type and area type. It is shown for completeness, but many of the combinations have few observations.

HCMType	AreaType	N	Total Count	Total Volume	% Difference	% RMSE
Freeway	Downtown	10	986,800	1,025,165	3.9	9.4
Freeway	Urban	42	4,274,500	4,160,486	-2.7	7.7
Freeway	Suburban	85	7,254,334	7,412,256	2.2	11.5
Freeway	Rural	46	2,153,614	2,196,965	2.0	11.1
MLHighway	Urban	2	80,200	79,319	-1.1	16.9
MLHighway	Suburban	35	1,373,100	1,391,764	1.4	16.5
MLHighway	Rural	44	948,762	895,973	-5.6	16.0
TLHighway	Suburban	13	164,800	159,425	-3.3	17.2
TLHighway	Rural	68	588,020	587,203	-0.1	24.9
MajorArterial	Downtown	142	3,048,600	2,888,822	-5.2	37.3
MajorArterial	Urban	263	6,777,316	6,726,061	-0.8	31.0
MajorArterial	Suburban	340	9,350,500	9,149,683	-2.1	26.6
MajorArterial	Rural	23	375,854	349,072	-7.1	31.9
Arterial	Downtown	61	749,498	764,466	2.0	43.5
Arterial	Urban	222	2,805,220	2,856,839	1.8	46.9
Arterial	Suburban	732	7,824,400	7,486,585	-4.3	39.4
Arterial	Rural	541	2,620,109	2,552,039	-2.6	42.9
MajorCollector	Downtown	6	28,900	10,310	-64.3	74.1
MajorCollector	Urban	27	218,700	231,010	5.6	43.6
MajorCollector	Suburban	110	783,600	721,332	-8.0	45.0
MajorCollector	Rural	129	429,200	435,368	1.4	47.6
Collector	Downtown	35	293,900	272,695	-7.2	54.6
Collector	Urban	66	457,100	461,338	0.9	43.7
Collector	Suburban	270	1,401,910	1,326,291	-5.4	52.8
Collector	Rural	425	766,318	791,256	3.2	62.8
Local	Downtown	61	279,200	257,784	-7.7	64.5
Local	Urban	52	251,500	244,263	-2.9	56.3
Local	Suburban	96	313,200	286,461	-8.5	65.0
Local	Rural	247	284,300	288,292	1.4	89.2
All	NA	4,193	56,883,455	56,008,526	-1.5	34.6

Another important check for the model is that aggregate regional flows are correct. These are checked using screen and cut lines, which aggregate counts based on geography. The map below shows the geographic locations of the screen lines used for TRMG2 validation. The odd shape of the lines is to ensure that, to the extent possible, the lines only cross links with counts on them. In this way, we can capture all flow across the line and compare it with matching count info.

The table below shows the comparison between model volumes and counts.

Screenline	N	Total Count	Total Volume	% Difference	% RMSE
3	40	443,550	375,174	-15.4	40.1
6	69	1,419,300	1,451,974	2.3	31.2
10	30	694,100	689,186	-0.7	32.0

Screen line 3 is lower than desired, but given the relatively low total volume (for a screen line), it is still acceptable.

The map below shows the cut lines used to validate TRMG2.

The table shows count validation aggregated by cut line. Only cut line 18 shows any cause for concern. This is the cutline between Orange and Alamance counties. The model only contains a small piece of Alamance county and instead relies heavily on the external models for flow in this region. It is possible that improved external flow data could improve model performance in this area, but to truly get it right, the model would need to be expanded westward. (Caliper is not recommending this action.)

Screenline	N	Total Count	Total Volume	% Difference	% RMSE
1	8	219,400	222,078	1.2	15.2
2	38	568,300	518,321	-8.8	31.8
4	26	317,700	324,423	2.1	23.8
5	4	44,300	52,779	19.1	24.0
7	31	646,800	664,273	2.7	36.8
8	29	500,100	433,006	-13.4	27.0
9	12	231,400	250,544	8.3	21.8
11	19	236,500	253,801	7.3	31.6
12	7	242,000	238,071	-1.6	17.1
13	9	429,800	399,494	-7.0	13.6
14	9	63,300	62,501	-1.3	21.1
15	12	55,800	60,433	8.3	18.9
16	7	36,300	40,841	12.5	37.7
17	19	336,400	374,368	11.3	36.5
18	11	129,400	161,185	24.6	62.1

Transit

Observed transit boarding data was incomplete for 2020. Instead, Caliper validated the transit ridership using the 2016 scenario and observed data. This is shown in the table below by agency. (DATA)

Agency	Observed Boardings	Model Boardings	Difference	% Difference
Chapel Hill Transit	26,444	24,425	-2,019	-8%
GoRaleigh	23,489	26,826	3,337	14%
GoDurham	21,602	23,383	1,781	8%
NCSU Wolfline	16,699	13,084	-3,615	-22%
Duke	13,602	7,729	-5,873	-43%
GoTriangle	9,691	13,680	3,989	41%
GoCary	1,003	2,137	1,134	113%
Total	112,530	111,264	-1,266	-1%

Overall, ridership is close to observed and the model performs well by agency.

For reference, the model predicts 119,000 riders in the 2020 base year scenario. Compared to 2016, the 2020 scenario has a higher population and increased slightly better transit service, which makes the 119,000 estimate reasonable.