Systematic comparison of ARIMA vs Linear Regression for sales forecasting. Plot twist: the simple model won by 41%
This was my second ML project in 2024, still learning time series analysis.
Expectation: ARIMA (time series specialist) would crush Linear Regression (simple baseline).
Reality: Linear Regression won by 41%.
This taught me an important lesson: model complexity doesn't always correlate with performance.
Retail businesses need accurate sales forecasting for:
Bad forecasting = expensive. Overstocking ties up capital, understocking loses revenue.
Kaggle dataset with 1,000 transactions from a supermarket, spanning 3 months.
Features:
Target: Predict future sales.
daily_sales = data.groupby('Date')['Total'].sum() # Statistics print(f"Mean: ${daily_sales.mean():.2f}") # $5,537 print(f"Std: ${daily_sales.std():.2f}") # $1,842 print(f"Min: ${daily_sales.min():.2f}") # $2,150 print(f"Max: ${daily_sales.max():.2f}") # $9,876
Findings:
All branches perform similarly (~33% each). No need for branch-specific models.
Peak hours: 13:00-15:00 (lunch) and 19:00-20:00 (dinner). Useful for staffing optimization.
Daily data was too noisy. I resampled to weekly:
daily_sales.set_index('Date', inplace=True) weekly_sales = daily_sales['Total'].resample('W').sum() print(f"Total weeks: {len(weekly_sales)}") # 13 weeks
Why weekly?
ARIMA = AutoRegressive Integrated Moving Average.
Components:
from statsmodels.tsa.arima.model import ARIMA # Test multiple orders best_aic = float('inf') best_order = None for p in range(6): for d in range(2): for q in range(3): try: model = ARIMA(train, order=(p, d, q)) model_fit = model.fit() if model_fit.aic < best_aic: best_aic = model_fit.aic best_order = (p, d, q) except: continue print(f"Best order: {best_order}") # (5, 1, 0)
ARIMA (5,1,0):
Not great. But this is the baseline.
Simplest possible approach: treat time as a feature.
from sklearn.linear_model import LinearRegression # Create time index X_train = np.arange(len(train)).reshape(-1, 1) y_train = train.values X_test = np.arange(len(test)).reshape(-1, 1) + len(train) y_test = test.values # Fit lr_model = LinearRegression() lr_model.fit(X_train, y_train) # Predict y_pred = lr_model.predict(X_test) # Evaluate mae = mean_absolute_error(y_test, y_pred) rmse = np.sqrt(mean_squared_error(y_test, y_pred)) print(f"LR MAE: ${mae:.2f}") # $3,726 print(f"LR RMSE: ${rmse:.2f}") # $4,792
Results: MAE $3,726, RMSE $4,792.
Wait, what? Linear Regression beat ARIMA by 41%.
This is counterintuitive. ARIMA is a time series specialist, LR is just a simple linear model.
Reasons:
ARIMA excels at capturing seasonal patterns. But this data has no clear seasonality. Weekly sales are relatively random.
ARIMA needs sufficient data to learn patterns (typically 50+ observations). We only have 13 weeks. Not enough.
Sales have a slight upward trend. Linear Regression is perfect for capturing this. ARIMA might overfit.
With limited data, simple models often generalize better.
Linear Regression works, but we can improve with regularization.
from sklearn.linear_model import Ridge from sklearn.model_selection import GridSearchCV # Parameter grid param_grid = {'alpha': [0.1, 1.0, 10.0, 100.0]} # Grid search ridge = Ridge() grid_search = GridSearchCV(ridge, param_grid, cv=5, scoring='neg_mean_squared_error') grid_search.fit(X_train, y_train) # Best model best_model = grid_search.best_estimator_ print(f"Best alpha: {grid_search.best_params_['alpha']}") # 0.1 # Evaluate y_pred = best_model.predict(X_test) mae = mean_absolute_error(y_test, y_pred) rmse = np.sqrt(mean_squared_error(y_test, y_pred)) print(f"Ridge MAE: ${mae:.2f}") # $3,676 print(f"Ridge RMSE: ${rmse:.2f}") # $5,049
Results: MAE $3,676 (slight improvement), RMSE $5,049.
Ridge has slightly higher RMSE but better generalization (less overfitting).
| Model | MAE | RMSE | vs ARIMA | |-------|-----|------|----------| | ARIMA (5,1,0) | $5,800 | $8,178 | Baseline | | Linear Regression | $3,726 | $4,792 | 41% better | | Ridge (α=0.1) | $3,676 | $5,049 | Best |
Winner: Ridge Regression for production.
Built an interactive forecasting interface:
import streamlit as st st.title('Sales Forecasting Dashboard') # Historical data st.subheader('Historical Weekly Sales') st.line_chart(weekly_sales) # Forecast input weeks_ahead = st.number_input('Forecast horizon (weeks):', min_value=1, max_value=52, value=12) if st.button('Generate Forecast'): # Predict last_week = len(weekly_sales) future_weeks = np.arange(last_week, last_week + weeks_ahead).reshape(-1, 1) predictions = model.predict(future_weeks) # Display st.subheader(f'Forecast for Next {weeks_ahead} Weeks') st.line_chart(forecast_df.set_index('Date')) # Summary st.metric("Total Forecast", f"${predictions.sum():,.0f}") st.metric("Avg Weekly", f"${predictions.mean():,.0f}") st.metric("Growth Rate", f"{((predictions[-1]/predictions[0])-1)*100:.1f}%")
Deployed to Streamlit Cloud.
Forecast: 8% growth over next 12 weeks.
Action: Increase inventory by 10% (with buffer for uncertainty).
Expected Impact: Reduce stockouts by 30%.
Finding: Peak hours 13:00-15:00 and 19:00-20:00.
Action: Schedule more staff during peak hours, reduce during slow hours.
Expected Impact: 20% improvement in labor efficiency.
Don't assume ARIMA is always best for time series. Test simple baselines first.
ARIMA needs sufficient data (50+ observations). Our 13 weeks wasn't enough.
No seasonality = ARIMA's strength wasted. Linear trend = LR's sweet spot.
Systematic comparison revealed a surprising winner. Don't rely on assumptions.
Ridge regression improved generalization with minimal complexity increase.
If starting again:
This project taught me to:
And most importantly: simple solutions often work best.
Live Demo: https://sales-forecasting-fauza.streamlit.app/
For other projects, see Customer Churn Prediction, Food Recommendation Chatbot, and Sentinel Predictive Maintenance.