R Squared

R² measures the proportion of variance in the target variable explained by the model, ranging from 0 to 1 with higher values indicating better fit.

The R² metric, or Coefficient of Determination, is a key indicator of how well a regression model explains the variance in the target variable. It quantifies the proportion of the total variation in the observed data that is accounted for by the model's predictions. Values range from 0 to 1, where a value closer to 1 signifies a strong fit, meaning the model captures most of the underlying patterns in the data. This metric highlights the model's explanatory power, making it useful for assessing overall fit during model evaluation and comparison.

The formula for R² is:

R^2 = 1 - \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{\sum_{i=1}^n (y_i - \bar{y})^2}

Here, $y_i$ represents the actual observed values, $\hat{y}_i$ the predicted values, $\bar{y}$ the mean of the actual values, and $n$ the number of observations. This equation compares the residual sum of squares (numerator) to the total sum of squares (denominator), providing a normalized measure of fit.

When interpreting R², consider it alongside other metrics for a balanced view, as high R² alone doesn't guarantee no overfitting—especially in complex models. For instance, in predictive modeling, a high R² might confirm that input features effectively explain target variance, aiding data-driven insights. However, in datasets with high noise or multicollinearity, R² can be misleading, so cross-reference with validation metrics on unseen data.

R²'s strength lies in its interpretability, making it accessible for evaluating model performance. That said, for probabilistic models, consider complementing it with variants like Bayes R² to account for uncertainty.