Forecasting S&P 500 using technical analysis, macro indicators and machine learning. a hybrid approach
Προβλέψεις για τον S&P 500, χρησιμοποιώντας τεχνική ανάλυση, μακροοικονομικούς δείκτες και μηχανική μάθηση. Μία υβριδική προσέγγιση

Master Thesis
Author
Papavasileiou, Nikolaos
Παπαβασιλείου, Νικόλαος
Date
2025-06View/ Open
Keywords
SPY prediction ; Machine learning ; Technical analysis ; Macroeconomic indicators ; Principal component analysis ; k-nearest neighbors ; Stock forecasting ; Sharpe ratio ; Hybrid models ; Financial time seriesAbstract
This thesis investigates the application of machine learning techniques to predict directional
movements in the SPY ETF, an exchange-traded fund tracking the S&P 500 index. The primary
objective is to evaluate whether combining macroeconomic indicators with technical analysis
features can improve the predictive performance and financial profitability of classification-based trading models. Traditional models often rely on either technical or fundamental
indicators in isolation, but recent research suggests that hybrid approaches may offer better
robustness and generalization in volatile financial environments.
A comprehensive dataset was compiled covering the period from February 2003 to June 2025,
incorporating over 230 features, including macroeconomic metrics such as interest rates,
unemployment figures, inflation data, and monetary aggregates, alongside technical indicators
like Bollinger Bands, MACD, RSI, and TSI. Principal Component Analysis (PCA) was used to reduce
dimensionality, while various machine learning algorithms — including K-Nearest Neighbors
(KNN), Random Forests, Gradient Boosting, and Logistic Regression — were tested for their
classification accuracy.
Labels were generated based on future SPY returns over multiple time horizons (e.g., 3, 7, 30,
and 90 days), and categorized into three trading signals: BUY, NEUTRAL, and SELL. In addition to
evaluating classification accuracy, the thesis places significant emphasis on backtesting strategy
performance using key metrics such as cumulative return and Sharpe ratio. The findings reveal
that models using only macroeconomic or only technical indicators tend to underperform, while
hybrid models substantially improve both prediction quality and trading outcomes.
The best-performing configuration was achieved using the KNN classifier with 20 selected PCA
components and a 90-day prediction horizon, yielding a classification accuracy of approximately
85% and a Sharpe ratio exceeding 1.2. These results support the hypothesis that integrated
feature sets combined with proper model selection and threshold tuning can enhance financial
forecasting in complex market conditions.