Research
AcademicAcademic research on LLM-based geopolitical conflict forecasting
Project Overview
War Forecast Arena is a research platform that evaluates frontier large language models on their ability to anticipate geopolitical conflict events. We generate daily structured probabilistic forecasts from 8 LLMs across 3 providers (OpenAI, Anthropic, Google), and compare predictions against ground-truth observations collected from multiple verified news and intelligence sources.
Methodology
Each LLM receives a curated context window containing recent news, diplomatic signals, macro-economic indicators, and prediction market data. Models output structured forecasts with event type, location, probability, and time horizon. Predictions are then matched against observed events using a composite scoring system that evaluates temporal proximity, geographic accuracy, and semantic similarity.
Data Sources
- Real-time news from multiple RSS feeds and news APIs covering Middle East geopolitics
- Diplomatic signals extracted from official statements and UN/IAEA reports
- Macro-economic indicators: oil prices, gold, defense sector indices
- Prediction market probabilities from Polymarket on Iran-related outcomes
Models Evaluated
- GPT-5.4
- GPT-5.2
- GPT-5.1
- Claude Sonnet 4.6
- Claude Opus 4.5
- Claude Haiku 4.5
- Gemini 2.5 Pro
- Gemini 3 Flash
Pipeline
Data ingestion runs every 6 hours via a scheduled pipeline. The system fetches news, diplomatic signals, market data, and prediction market prices. LLM forecasts are generated daily with structured prompts. Event matching and scoring run automatically to update the leaderboard.
Disclaimer
This platform is for academic research purposes only. Forecasts are generated by AI models and should not be used for decision-making. The research does not represent the views of any institution or organization.