Fine-tuned Models for Portfolio Hedging with Options

Why Fine-tune Models for Options Strategy?

Options are inherently complex instruments. The interplay of strike prices, expiration dates, implied volatility, and the Greeks (delta, gamma, vega, theta) creates a multi-dimensional optimization problem that general-purpose LLMs struggle with out of the box.

While models like GPT-4 can explain options concepts well, they often fail at practical strategy generation—recommending strikes that don't exist, miscalculating payoff profiles, or ignoring critical factors like bid-ask spreads and liquidity. This gap motivated my exploration into fine-tuning models specifically for derivative strategy generation.

The Fine-tuning Approach

Dataset Construction

I built a training dataset from historical options data including:

  • Options chains: Real market data with strikes, expirations, IV, and Greeks
  • Portfolio positions: Underlying holdings requiring hedging
  • Strategy examples: Expert-annotated hedging strategies with reasoning
  • Outcome data: P&L results for backtesting and reward modeling

Supervised Fine-tuning (SFT)

The first phase used SFT to teach the model the fundamentals:

  • Accurate Greeks calculation from options parameters
  • Volatility surface interpolation and term structure
  • Strategy construction (spreads, straddles, collars, etc.)
  • Position sizing based on portfolio delta/gamma targets

Reinforcement Learning (RL) for Optimization

SFT alone produces competent but not optimal strategies. The RL phase optimized for:

  • Hedging effectiveness: Minimizing portfolio variance under stress scenarios
  • Cost efficiency: Reducing premium spent for equivalent protection
  • Execution feasibility: Preferring liquid strikes with tight spreads

Integration with Risk Infrastructure

The fine-tuned model doesn't operate in isolation. It's integrated with our risk management stack:

VaR and Stress Testing

Before recommending any hedge, the model queries our VaR calculator to understand current portfolio risk. It then simulates how proposed hedges would perform under historical stress scenarios (2008 crisis, COVID crash, rate shock events).

Factor Exposure Monitoring

The system tracks exposures to key factors—market beta, sector concentrations, volatility regime. Hedging recommendations are contextualized by which factors are driving current risk.

Thinking Machine Tinker API

For complex multi-leg strategies, I integrated with the Tinker API for advanced reasoning. This helps with scenarios like calendar spreads across multiple expirations or ratio spreads requiring precise delta neutrality.

Practical Example

Consider a portfolio long $500K in tech stocks with concentrated NVDA exposure. The fine-tuned model might recommend:

  • Buy NVDA 30-delta puts, 45 DTE, sized to achieve 0.3 portfolio delta reduction
  • Partially finance with covered calls at 0.2 delta, reducing net premium by 40%
  • Add QQQ put spread as sector hedge for tail risk beyond single-stock exposure

Each recommendation comes with Greeks impact, cost breakdown, and scenario analysis showing P&L under ±10%, ±20% moves in the underlying.

Challenges and Learnings

Data Quality Matters Most

The biggest improvements came from better training data, not model architecture changes. Clean, well-annotated examples of expert reasoning were more valuable than volume.

Greeks Precision is Critical

Small errors in Greeks calculations compound into poor hedging recommendations. The model needed extensive calibration on IV surface dynamics and term structure.

Market Regime Awareness

Strategies optimal in low-vol regimes fail in crisis periods. The model learned to adjust recommendations based on VIX levels and realized vs. implied volatility gaps.

What's Next

Current work focuses on extending the model to handle exotic options and structured products, as well as improving real-time adaptation to intraday volatility shifts.