Predicting drawdowns in NIFTY 50

Decision trees that use features based on momentum, CAPE, and USD/INR exchange rate to predict large monthly drawdowns in NIFTY

Raj Mehta

Mar 07, 2025

I wrote about using momentum to predict drawdowns in the NIFTY 50 previously:

Momentum in the Indian Market

Raj Mehta

Feb 23

Read full story

I further worked on the project by adding new features, training different models, and comparing their results. Eventually, I settled on a random forest model that gave me the best results on the test set. This article will walk you through the exercise of building the model and also provides links to the code and web app at the end.

Defining the problem

I have defined predicting drawdowns as a classification problem where I consider any month with more than a 2% drop in NIFTY 50 as a month of large drawdown. The model is trained on feature data up to the last available month end and it predicts whether the current month will end with a drawdown of more than 2%.

The -2% threshold is arbitrary to some extent, but it looks like a point where substantial negative returns start to become distinguishable from months that are close to 0%. We don’t want to avoid monthly with very small drawdowns, say between 0 and 0.5%, and we also want a considerable number of months of drawdowns so our model can have enough data to learn.

Getting historical data and creating meaningful features

NIFTY 50 Historical Data from 1992 onward. I combined old data from the NSE website for initial years with latest data from Yahoo Finance which can be fetched using their API.

nifty_historical = pd.read_csv("./data/NIFTY 50_Historical_PRICE.csv")
nifty_historical['Date'] = pd.to_datetime(nifty_historical['Date'])
nifty_historical = nifty_historical.set_index('Date').resample('ME').ffill()
nifty_historical = nifty_historical[:'2007-10-01']
nifty_historical  = pd.DataFrame(nifty_historical['Close'])
nifty_historical = nifty_historical.rename(columns = {'Close': '^NSEI'})

start_date = "1980-01-01"
end_date = datetime.today().strftime('%Y-%m-%d')
ticker = '^NSEI'
data = yf.download(ticker, start=start_date, end=end_date, interval="1d")
data = data[['Close']]

# Merging the two datasets
market_data = pd.concat([nifty_historical, yf_market_data], axis=0)

I have used 3 features:

Price Momentum: Momentum has historically been a strong and consistent predictor of returns. I’ve create 1, 3, 6, 9, and 12 months momentum features from the price data.
CAPE: Unlike the traditional PE ratio, which uses the most recent one year of earnings, the CAPE ratio averages inflation-adjusted earnings over a ten-year period. This longer horizon smooths out short-term fluctuations caused by business cycles, providing a clearer signal of long-term market valuation.
Jacob, J. and Raju, R. (2024) show that CAPE is effective in predicting market downturns. They also provide a regularly updated CAPE dataset for the Indian market. I use the 5-year SENSEX CAPE.

USD/INR Exchange Rate: I obtained the exchange rate for the initial years from FRED, and combined it with latest data from Yahoo Finance. The dollar’s performance against the rupee is an important factor for FIIs to decide if they want to pull out of the Indian Market. FII selling can lead to large drawdowns. I use USD/INR exchange rates to create momentum features.

windows = [1, 3, 6, 9, 12]

market_data['momentum_1_1'] = market_data['^NSEI'] / market_data['^NSEI'].shift(1)
market_data['momentum_3_1'] = market_data['^NSEI'] / market_data['^NSEI'].shift(3)
market_data['momentum_6_1'] = market_data['^NSEI'] / market_data['^NSEI'].shift(6)
market_data['momentum_9_1'] = market_data['^NSEI'] / market_data['^NSEI'].shift(9)
market_data['momentum_12_1'] = market_data['^NSEI'] / market_data['^NSEI'].shift(12)

cape = pd.read_csv("./data/india_cape.csv")
cape['Date'] = pd.to_datetime(cape['Date'])
cape = cape.set_index('Date').resample('ME').last()
cape.head()

ticker = 'USDINR=X'
# Download data
usd_inr = yf.download(ticker, start=start_date, end=end_date, interval="1d")

# Keep only the closing prices
usd_inr = usd_inr[['Close']]
usd_inr = usd_inr.resample('ME').last()

usd_inr = pd.concat([fred_data,usd_inr['Close']], axis=0)

Training the model

Our final dataset runs from 1995 to 2025 giving us 30 years of monthly data to work with. I use 75% of the data for training and 25% for testing. Within the training data I used 5-fold cross-validation to evaluate the results of training. The training data runs from April 1995 to Jan 2019.

split_of_date_to_use = round(0.75*len(training_data_final))
# Features and target variable
X = training_data_final[training_columns].iloc[:split_of_date_to_use]
y = training_data_final['positive_returns'].iloc[:split_of_date_to_use]

# Initialize the Random Forest Classifier
clf = RandomForestClassifier(n_estimators=10, random_state=42)

# Define k-fold cross-validation (-fold)
kfold = StratifiedKFold(n_splits=5, shuffle=False)

# Perform cross-validation
cv_scores = cross_val_score(clf, X, y, cv=kfold, scoring='accuracy')

# Train the model
clf.fit(X, y)

# Output results
print(f"Cross-Validation Scores: {cv_scores}")
print(f"Mean Accuracy: {np.mean(cv_scores):.4f}")
print(f"Standard Deviation: {np.std(cv_scores):.4f}")

Cross-Validation Scores: [0.68, 0.55, 0.64, 0.66, 0.81]

Mean Accuracy: 0.67

Standard Deviation: 0.08

Performance on Test Data

The test data runs from Feb 2019 to Jan 2025.

from sklearn.metrics import accuracy_score

test_data_final = training_data_final.iloc[split_of_date_to_use:]
test_data_final['market_regime'] = clf.predict(test_data_final[training_columns])

# Compute Confusion Matrix
cm = confusion_matrix(test_data_final["positive_returns"], test_data_final["market_regime"])

# Convert to DataFrame for better visualization
cm_df = pd.DataFrame(cm, index=["Actual 0", "Actual 1"], columns=["Predicted 0", "Predicted 1"])

print(cm_df)

acc_score = accuracy_score(test_data_final["positive_returns"], test_data_final["market_regime"])
print(f"Accuracy: {acc_score*100:.2f}%")

I also tried using Logistic Regression with PCA and XGBoost and here are the test Data Results for them:

Logistic Regression: 64.44%
XGBoost: 72.22%

Also, I have hosted the model on streamlit for anyone to see what the prediction for the current month is. Spoiler alert, it’s “SELL” for March 2025.

Link to web app: https://nifty-predictor.streamlit.app/

Link to github repo: https://github.com/rajmehta982/asset-allocation-research-india

References

Jacob, J. and Raju, R. (2024), Forecast or Fallacy? Shiller's CAPE: Market and Style Factor Forward Returns in Indian Equities. URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4911989

Quant India

Momentum in the Indian Market

Discussion about this post