with huge thanks to Logan Brooks, Xueda Shen, and also to Nat DeFries, Dmitry Shemetov, and David Weber
12 December – Morning
Some Words About Forecasting
Linear Regression for Time Series
Evaluation Methods
ARX Models
Overfitting and Regularization
Prediction Intervals
Forecasting with Versioned Data
Traditional Approaches to Time Series
Assume we observe a predictor \(x_i\) and an outcome \(y_i\) for \(i = 1, \dots, n\).
Linear regression seeks coefficients \(\beta_0\) and \(\beta_1\) such that
\[y_i \approx \beta_0 + \beta_1 x_i\]
is a good approximation for every \(i = 1, \dots, n\).
lm(y ~ x)
, where y
is the vector of responses and x
the vector of predictors.\[y_i \approx \beta_0 + \beta_1 x_{i1} + \dots + \beta_p x_{ip}\] is a good approximation for every \(i = 1, \dots, n\).
\[\hat y_t = \hat \beta + \hat \beta_0 x_{t-k}\]
i.e. regress the outcome \(y\) at time \(t\) on the predictor \(x\) at time \(t-k\).
\[\hat y_{t+k} = \hat \beta + \hat \beta_0 x_t\]
During the pandemic, interest in predicting COVID deaths 7, 14, 21, 28 days ahead.
Can we reasonably predict COVID deaths 28 days ahead by just using cases today?
\[y_{t+28} = \text{deaths at time } t+28 \quad\quad x_{t} = \text{cases at time } t\] is the following a good model?
\[\hat y_{t+28} = \hat\beta_0 + \hat\beta_1 x_{t}\]
An `epi_df` object, 6 x 4 with metadata:
* geo_type = state
* time_type = day
* as_of = 2024-11-05 23:50:44.00687
# A tibble: 6 × 4
# Groups: geo_value [1]
geo_value time_value cases deaths
* <chr> <date> <dbl> <dbl>
1 ca 2020-04-01 3.17 0.0734
2 ca 2020-04-02 3.48 0.0835
3 ca 2020-04-03 3.44 0.0894
4 ca 2020-04-04 3.05 0.0778
5 ca 2020-04-05 3.28 0.0876
6 ca 2020-04-06 3.37 0.0848
Note
Cases seem highly correlated with deaths several weeks later (but relation cases-deaths changes over time).
Let’s split the data into a training and a test set (before/after 2021-04-01).
On training set: large correlation between cases and deaths 28 days ahead (> 0.95).
\[\hat y_{t+28} = \hat\beta + \hat\beta_0 x_{t}\]
Check if deaths
is approximately linear in lagged_cases
:
Assume we have predictions \(\hat y_{new, t}\) for the unseen observations \(y_{new,t}\) over times \(t = 1, \dots, N\).
Four commonly used error metrics are:
mean squared error (MSE)
mean absolute error (MAE)
mean absolute percentage error (MAPE)
mean absolute scaled error (MASE)
\[MSE = \frac{1}{N} \sum_{t=1}^N (y_{new, t}- \hat y_{new, t})^2\] \[MAE = \frac{1}{N} \sum_{t=1}^N |y_{new, t}- \hat y_{new, t}|\]
MAE gives less importance to extreme errors than MSE.
Drawback: both metrics are scale-dependent, so they are not universally interpretable. (For example, if \(y\) captures height, MSE and MAE will vary depending on whether we measure in feet or meters.)
\[MAPE = 100 \times \frac{1}{N} \sum_{t=1}^N \left|\frac{y_{new, t}- \hat y_{new, t}}{y_{new, t}}\right|\]
Drawbacks:
Erratic behavior when \(y_{new, t}\) is close to zero
It assumes the unit of measurement has a meaningful zero (e.g. using Fahrenheit or Celsius to measure temperature will lead to different MAPE)
Important
There are situations when MAPE is problematic!
MAE | MAPE | |
---|---|---|
yhat1 | 2.873 | 43.140 |
yhat2 | 5.382 | 36.083 |
\[MASE = 100 \times \frac{\frac{1}{N} \sum_{t=1}^N |y_{new, t}- \hat y_{new, t}|} {\frac{1}{N-1} \sum_{t=2}^N |y_{new, t}- y_{new, t-1}|}\]
Advantages:
is universally interpretable (not scale dependent)
avoids the zero-pitfall
MASE in words: we normalize the error of our forecasts by that of a naive method which always predicts the last observation.
MAE | MAPE | MASE | |
---|---|---|---|
yhat1 | 2.873 | 43.140 | 66.100 |
yhat2 | 5.382 | 36.083 | 123.817 |
MSE <- function(truth, prediction) {
mean((truth - prediction)^2)}
MAE <- function(truth, prediction) {
mean(abs(truth - prediction))}
MAPE <- function(truth, prediction) {
100 * mean(abs(truth - prediction) / truth)}
MASE <- function(truth, prediction) {
100 * MAE(truth, prediction) / mean(abs(diff(truth)))}
Given an error metric, we want to estimate the prediction error under that metric.
This can be accomplished in different ways, using the
Training error
Split-sample error
Time series cross-validation error (using all past data or a trailing window)
The easiest but worst approach to estimate the prediction error is to use the training error, i.e. the average error on the training set that was used to fit the model.
The training error is
generally too optimistic as an estimate of prediction error
more optimistic the more complex the model!1
MAE MASE
training 0.0740177 380.9996
To compute the split-sample error
Split data into training (up to time \(t_0\)), and test set (after \(t_0\))
Fit the model to the training data only
Make predictions for the test set
Compute the selected error metric on the test set only
Note
Split-sample estimates of prediction error don’t mimic a situation where we would refit the model in the future. They are pessimistic if the relation between outcome and predictors changes over time.
Assume we want to make \(h\)-step ahead predictions, i.e. at time \(t\) we want to make a forecast for \(t+h\). Then, the split-sample MSE is
\[\text{SplitMSE} = \frac{1}{n-h-t_0} \sum_{t = t_0}^{n-h} (\hat y_{t+h|t_0} - y_{t+h})^2\]
where \(\hat y_{t+h|t_0}\) indicates a prediction for \(y\) at time \(t+h\) that was made with a model that was fit on data up to time \(t_0\).
MAE MASE
training 0.0740177 380.9996
split-sample 0.3116854 2914.4575
Predictions are overshooting the target, especially in early 2022 (Omicron phase).
This is because we are predicting deaths using lagged cases, but the relation between the two changes over time.
If we refit in the future once new data are available, a more appropriate way to estimate the prediction error is time series cross-validation.
To get \(h\)-step ahead predictions, for each time \(t = t_0, t_0+1, \dots\),
Fit the model using data up to time \(t\)
Make a prediction for \(t+h\)
Record the prediction error
The cross-validation MSE is then
\[CVMSE = \frac{1}{n-h-t_0} \sum_{t = t_0}^{n-h} (\hat y_{t+h|t} - y_{t+h})^2\]
where \(\hat y_{t+h|t}\) indicates a prediction for \(y\) at time \(t+h\) that was made with data available up to time \(t\).
n <- nrow(ca) #length of time series
h <- k #number of days ahead for which prediction is wanted
pred_all_past <- rep(NA, length = n) #initialize vector of predictions
for (t in t0:(n-h)) {
# fit to all past data and make h-step ahead prediction
reg_all_past = lm(deaths ~ lagged_cases, data = ca, subset = (1:n) <= t)
pred_all_past[t+h] = predict(reg_all_past, newdata = data.frame(ca[t+h, ]))
}
Note
With the current model, we can only predict \(k\) days ahead (where \(k\) = number of days by which predictor is lagged)!
MAE MASE
training 0.0740177 380.9996
split-sample 0.3116854 2914.4575
time series CV 0.2374931 2212.5992
Predictions are still overshooting the target, but error is smaller than split-sample.
Why?
👍 Forecaster is partially learning the change in cases-deaths relation (especially in late 2022)
👎 We refit on all past data, so predictions are still influenced by old cases-deaths relation
Idea 💡
Ignore old data when refitting?
Fit the model on a window of data of length \(w\), starting at \(t-w\) and ending at \(t\).
Advantage: if the predictors-outcome relation changes over time, training the forecaster on a window of recent data can better capture the recent relation which might be more relevant to predict the outcome in the near future.
Window length \(w\) considerations:
if \(w\) is too big, the model can’t adapt to the recent predictors-outcome relation
if \(w\) is too small, the fitted model may be too volatile (trained on too little data)
# Getting the predictions through CV with trailing window
w <- 120 #trailing window size
h <- k #number of days ahead for which prediction is wanted
pred_trailing <- rep(NA, length = n) #initialize vector of predictions
for (t in t0:(n-h)) {
# fit to a trailing window of size w and make h-step ahead prediction
reg_trailing = lm(deaths ~ lagged_cases, data = ca,
subset = (1:n) <= t & (1:n) > (t-w))
pred_trailing[t+h] = predict(reg_trailing, newdata = data.frame(ca[t+h, ]))
}
MAE MASE
training 0.07401770 380.9996
split-sample 0.31168536 2914.4575
time series CV 0.23749306 2212.5992
time series CV + trailing 0.09932651 925.3734
Idea: predicting the outcome via a linear combination of its lags and a set of exogenous (i.e. external) input variables
Example:
\[\hat y_{t+h} = \hat\phi + \sum_{i=0}^p \hat\phi_i y_{t-i} + \sum_{j=0}^q \hat\beta_j x_{t-j}\]
\[\hat y_{t+h} = \hat \phi + \hat\phi_0 y_{t} + \hat\phi_1 y_{t-7} + \hat\phi_2 y_{t-14} + \hat\beta_0 x_{t} + \hat\beta_1 x_{t-7} + \hat\beta_2 x_{t-14}\]
\[\hat y_{t+28} = \hat\phi + \hat\phi_0 y_{t} + \hat\beta_0 x_{t}\]
Note
From now on, we will only consider regression on a trailing window, since regression on all past data leads to overshooting during Omicron.
lm
on lagged cases MAE MASE
ARX(1) 0.07852942 731.6178
lm on lagged cases 0.09932651 925.3734
Regression on a trailing window can be quite sensitive to data issues.
(Intercept) lagged_deaths lagged_cases
0.067259206 0.304075294 -0.004285251
The downward dip is explained by the negative coefficient on lagged_cases
, and by the fact that at the forecast date
observed deaths are exactly equal to 0 (data issue)
observed cases are increasing
So far we only focused on COVID death predictions 28 days ahead.
We will now compare the model with lagged cases as predictor
\[\hat y_{t+h} = \hat\beta + \hat\beta_0 x_t\]
to the ARX(1) model
\[\hat y_{t+h} = \hat\phi + \hat\phi_0 y_t + \hat\beta_0 x_t\]
for horizons \(h = 7, 14, 21, 28\).
h_vals <- c(7, 14, 21, 28) #horizons
pred_m1 = pred_m2 <- data.frame(matrix(NA, nrow = 0, ncol = 3)) #initialize df for predictions
colnames(pred_m1) = colnames(pred_m2) = c("forecast_date", "target_date", "prediction")
w <- 120 #trailing window size
ca_lags <- ca |> select(!c(lagged_cases, lagged_deaths))
# Create lagged predictors
for (i in seq_along(h_vals)) {
ca_lags[[paste0("lagged_deaths_", h_vals[i])]] <- dplyr::lag(ca_lags$deaths, n = h_vals[i])
ca_lags[[paste0("lagged_cases_", h_vals[i])]] <- dplyr::lag(ca_lags$cases, n = h_vals[i])
}
# Only forecast on 1st day of the months
forecast_time <- which(ca_lags$time_value >= t0_date &
ca_lags$time_value < ca_lags$time_value[n-max(h_vals)] &
day(ca_lags$time_value) == 1)
for (t in forecast_time) {
for (i in seq_along(h_vals)) {
h = h_vals[i]
# formulas including h-lagged variables
m1_formula = as.formula(paste0("deaths ~ lagged_cases_", h))
m2_formula = as.formula(paste0("deaths ~ lagged_cases_", h, " + lagged_deaths_", h))
# fit to trailing window of data
m1_fit = lm(m1_formula, data = ca_lags, subset = (1:n) <= t & (1:n) > (t-w))
m2_fit = lm(m2_formula, data = ca_lags, subset = (1:n) <= t & (1:n) > (t-w))
# make h-step ahead predictions
pred_m1 = rbind(pred_m1,
data.frame(forecast_date = ca_lags$time_value[t],
target_date = ca_lags$time_value[t+h],
prediction = predict(m1_fit, newdata = data.frame(ca_lags[t+h, ]))))
pred_m2 = rbind(pred_m2,
data.frame(forecast_date = ca_lags$time_value[t],
target_date = ca_lags$time_value[t+h],
prediction = predict(m2_fit, newdata = data.frame(ca_lags[t+h, ]))))
}
}
lm
on lagged cases MAE MASE
lm on lagged cases 0.1049742 304.007
MAE MASE
ARX(1) 0.04463132 129.2531
Different ways to visualize predictions for multiple \(h\)
Last slides: group by forecast date, and show prediction “trajectories”
Other approach: one line and color per horizon \(h\)
The ARX(1) model \(\hat y_{t+h} = \hat \phi + \hat\phi_0 y_{t} + \hat\beta_0 x_{t}\) has good predictive performance
We will now try to improve the ARX(1) model by including more lags in the set of predictors
Let’s consider two extensions: the ARX(2) model
\[\hat y_{t+h} = \hat \phi + \hat\phi_0 y_{t} + \hat\phi_1 y_{t-7} + \hat\beta_0 x_{t} + \hat\beta_1 x_{t-7}\]
and the ARX(3) model
\[\hat y_{t+h} = \hat \phi + \hat\phi_0 y_{t} + \hat\phi_1 y_{t-7} + \hat\phi_2 y_{t-14} + \hat\beta_0 x_{t} + \hat\beta_1 x_{t-7} + \hat\beta_2 x_{t-14}\]
and fit them using a trailing window with \(w = 120\).
pred_arx2 = pred_arx3 <- rep(NA, length = n)
w <- 120 #trailing window size
h <- 28 #number of days ahead
# create lagged predictors
ca_lags$lagged_deaths_35 <- dplyr::lag(ca_lags$deaths, n = 35)
ca_lags$lagged_deaths_42 <- dplyr::lag(ca_lags$deaths, n = 42)
ca_lags$lagged_cases_35 <- dplyr::lag(ca_lags$cases, n = 35)
ca_lags$lagged_cases_42 <- dplyr::lag(ca_lags$cases, n = 42)
for (t in t0:(n-h)) {
arx2_formula = as.formula(paste0("deaths ~ lagged_cases_", h, " + lagged_deaths_", h,
" + lagged_cases_", h+7, " + lagged_deaths_", h+7))
arx3_formula = as.formula(paste0("deaths ~ lagged_cases_", h, " + lagged_deaths_", h,
" + lagged_cases_", h+7, " + lagged_deaths_", h+7,
" + lagged_cases_", h+14, " + lagged_deaths_", h+14))
# fit to trailing window of data
arx2_fit = lm(arx2_formula, data = ca_lags, subset = (1:n) <= t & (1:n) > (t-w-7))
arx3_fit = lm(arx3_formula, data = ca_lags, subset = (1:n) <= t & (1:n) > (t-w-14))
# make h-step ahead predictions
pred_arx2[t+h] <- max(0, predict(arx2_fit, newdata = data.frame(ca_lags[t+h, ])))
pred_arx3[t+h] <- max(0, predict(arx3_fit, newdata = data.frame(ca_lags[t+h, ])))
}
MAE MASE
ARX(1) 0.07852942 731.6178
ARX(2) 0.08716160 812.0393
ARX(3) 0.12487694 1163.4135
As we add more predictors, forecasts seem more volatile and errors increase.
Overfitting
When we introduce too many predictors in the model
The estimated coefficients will be chosen to mimic the observed data very closely on the training set, leading to small training error
The predictive performance on the test set might be very poor, producing large split-sample and CV error
What happens if we increase the number of predictors to 120?
Let’s fit
\[\hat y_{t+28} = \hat\phi + \hat\phi_0 y_{t} + \hat\phi_1 y_{t-1} + \dots + \hat\phi_{59} y_{t-59} + \hat\beta_0 x_{t} + \dots + \hat\beta_{t-59} x_{t-59}\]
and compare training vs split-sample errors
MAE MASE
split-sample 0.3978198 3706.28
Note
Some predictions were negative, which doesn’t make sense for count data, so we truncated them at 0.
How can we
select the “right” number of lags to include?
avoid overfitting, while considering a large number of predictors?
Idea: introduce a regularization parameter \(\lambda\) that shrinks or sets some of the estimated coefficients to zero, i.e. some predictors are estimated to have limited or no predictive power
Most common regularization methods
Ridge: shrinks coefficients to zero
Lasso: sets some coefficients to zero
Let’s predict \(h=28\) days ahead by regularizing
library(glmnet) # Implements ridge and lasso
h <- 28
X <- as_tibble(ca_lags) |> select(ends_with("_28"), ends_with("_35"), ends_with("_42"))
y <- ca_lags$deaths
# We'll need to omit NA values explicitly, as otherwise glmnet will complain
X_train <- X[43:t0, ]
y_train <- y[43:t0]
# Ridge regression: set alpha = 0, lambda sequence will be chosen automatically
ridge <- glmnet(X_train, y_train, alpha = 0)
beta_ridge <- coef(ridge) # matrix of estimated coefficients
lambda_ridge <- ridge$lambda # sequence of lambdas used to fit ridge
# Lasso regression: set alpha = 1, lambda sequence will be chosen automatically
lasso <- glmnet(X_train, y_train, alpha = 1)
beta_lasso <- coef(lasso) # matrix of estimated coefficients
lambda_lasso <- lasso$lambda # sequence of lambdas used to fit lasso
dim(beta_lasso) # One row per coefficient, one column per lambda value
[1] 8 85
h <- 28 # number of days ahead
w <- 120 # window length
# Initialize matrices for predictions (one column per lambda value)
yhat_ridge_mat <- matrix(NA, nrow = n, ncol = length(lambda_ridge))
yhat_lasso_mat <- matrix(NA, nrow = n, ncol = length(lambda_lasso))
yhat_ridge = yhat_lasso <- rep(NA, length = n)
# Select index of best lambda value on training set
ridge_index = cv.glmnet(as.matrix(X_train), y_train, alpha = 0, lambda = lambda_ridge)$index[1]
lasso_index = cv.glmnet(as.matrix(X_train), y_train, alpha = 1, lambda = lambda_lasso)$index[1]
for (t in t0:(n-h)) {
# Indices of data within window
inds = t-w < 1:n & 1:n <= t
# Fit ARX + ridge/lasso for each lambda value
ridge_trail = glmnet(X[inds, ], y[inds], alpha = 0, lambda = lambda_ridge)
lasso_trail = glmnet(X[inds, ], y[inds], alpha = 1, lambda = lambda_lasso)
# Predict for each lambda value
yhat_ridge_mat[t+h, ] = predict(ridge_trail, newx = as.matrix(X[(t+h), ]))
yhat_lasso_mat[t+h, ] = predict(lasso_trail, newx = as.matrix(X[(t+h), ]))
# Save prediction corresponding to best lambda so far
yhat_ridge[t+h] = max(0, yhat_ridge_mat[t+h, ridge_index])
yhat_lasso[t+h] = max(0, yhat_lasso_mat[t+h, lasso_index])
if (t >= t0+h) {
# Prediction error
mae_ridge = colMeans(abs(yhat_ridge_mat[1:(t+1), ] - y[1:(t+1)]), na.rm = T)
mae_lasso = colMeans(abs(yhat_lasso_mat[1:(t+1), ] - y[1:(t+1)]), na.rm = T)
# Select index of lambda vector which gives lowest MAE so far
ridge_index <- which.min(mae_ridge)
lasso_index <- which.min(mae_lasso)
}
}
MAE MASE
ARX(1) 0.07852942 731.6178
ARX(1) + ridge 0.07004585 652.5808
ARX(1) + lasso 0.07887651 734.8514
MAE MASE
ARX(2) 0.08716160 812.0393
ARX(2) + ridge 0.08143228 758.6622
ARX(2) + lasso 0.07807801 727.4122
MAE MASE
ARX(3) 0.12487694 1163.4135
ARX(3) + ridge 0.08555066 797.0310
ARX(3) + lasso 0.07633053 711.1318
Best model: ARX(1) + ridge
Second best: ARX(3) + lasso
Ridge worsens as more predictors are included
Lasso improves as more predictors are included
Important
What if we want to provide a measure of uncertainty around the point prediction or a likely range of values for the outcome at time \(t+h\)?
lm
fitsTo get prediction intervals for the models we previously fitted, we only need to tweak our call to predict
by adding as an input:
interval = "prediction", level = p
where \(p \in (0, 1)\) is the desired coverage.
The output from predict
will then be a matrix with
first column a point estimate
second column the lower limit of the interval
third column the upper limit of the interval
# Initialize matrices to store predictions
# 3 columns: point estimate, lower limit, and upper limit
pred_interval_lm <- matrix(NA, nrow = n, ncol = 3)
colnames(pred_interval_lm) <- c('prediction', 'lower', 'upper')
for (t in t0:(n-h)) {
# Fit ARX and predict
arx_trailing = lm(deaths ~ lagged_deaths + lagged_cases, data = ca,
subset = (1:n) <= t & (1:n) > (t-w))
pred_interval_lm[t+h, ] = pmax(0,
predict(arx_trailing, newdata = data.frame(ca[t+h, ]),
interval = "prediction", level = 0.8))
}
MAE MASE
lm.trailing 0.08932857 832.2278
We would expect the ARX model to cover the truth about 80% of the times. Is this actually true in practice?
The actual coverage of the predictive intervals is lower:
Actual Expected
Coverage 0.6 0.8
So far: data never revised (or simply ignored revisions, as_of
today)
Important
How can we train forecasters when dealing with versioned data?
→ An `epi_archive` object, with metadata:
ℹ Min/max time values: 2020-04-01 / 2023-03-09
ℹ First/last version with update: 2020-04-02 / 2023-03-10
ℹ Versions end: 2023-03-10
ℹ A preview of the table (24953 rows x 5 columns):
Key: <geo_value, time_value, version>
geo_value time_value version case_rate death_rate
<char> <Date> <Date> <num> <num>
1: ca 2020-04-01 2020-04-02 3.009195 0.06580240
2: ca 2020-04-01 2020-05-07 3.009195 0.06327156
3: ca 2020-04-01 2020-06-21 3.009195 0.06580242
4: ca 2020-04-01 2020-07-02 2.978825 0.06580242
5: ca 2020-04-01 2020-07-03 2.978825 0.06580242
---
24949: ca 2023-03-07 2023-03-08 0.000000 0.00000000
24950: ca 2023-03-07 2023-03-10 27.397832 0.00000000
24951: ca 2023-03-08 2023-03-09 21.083071 0.00000000
24952: ca 2023-03-08 2023-03-10 0.000000 0.00000000
24953: ca 2023-03-09 2023-03-10 22.185487 0.52072650
Important: when fitting and predicting, only use data in the latest version available at the forecast date!
# initialize dataframe for predictions
# 5 columns: forecast date, target date, 10%, 50%, and 90% quantiles
pred_aware <- data.frame(matrix(NA, ncol = 5, nrow = 0))
w <- 120 #trailing window size
h <- 28 #number of days ahead
# forecast once a week
fc_time_values <- seq(from = t0_date, to = as.Date("2023-02-09"), by = "1 week")
for (fc_date in fc_time_values) {
# get data version as_of forecast date
data <- epix_as_of(ca_archive, version = as.Date(fc_date))
# create lagged predictors
data$lagged_deaths <- dplyr::lag(data$deaths, h)
data$lagged_cases <- dplyr::lag(data$cases, h)
# perform regression
lm_weekly <- lm(deaths ~ lagged_deaths + lagged_cases,
# only consider window of data
data = data |> filter(time_value > (max(time_value) - w)))
# construct data.frame with the right predictors for the target date
predictors <- data.frame(lagged_deaths = tail(data$deaths, 1),
lagged_cases = tail(data$cases, 1))
# make predictions for target date and add them to dataframe of predictions
pred_aware <- rbind(pred_aware,
data.frame('forecast_date' = max(data$time_value),
'target_date' = max(data$time_value) + h,
t(pmax(0, predict(lm_weekly, newdata = predictors,
interval = "prediction", level = 0.8)))))
}
MAE MASE
version-aware 0.08001814 224.2782
MAE MASE
version-unaware 0.07934554 200.4657
First two here are classic and standard, second two are more recent. None are particularly well-suited for epi forecasting out-of-the-box, ask us about them if you’re curious. We’ll discuss ARIMA as it’s closest to what we’ve seen.
Worth comparing and discussing ARIMA versus the kind of autoregressive models you’ve just seen:
fable()
) includes them in a different manner; they are effectively subject to the same lags as the AR and MA termsTime Series — cmu-delphi/insightnet-workshop-2024