Research

Academic

Academic research on LLM-based geopolitical conflict forecasting

Project Overview

War Forecast Arena is a research platform that evaluates frontier large language models on their ability to anticipate geopolitical conflict events. We generate daily structured probabilistic forecasts from 8 LLMs across 3 providers (OpenAI, Anthropic, Google), and compare predictions against ground-truth observations collected from multiple verified news and intelligence sources.

Methodology

Each LLM receives a curated context window containing recent news, diplomatic signals, macro-economic indicators, and prediction market data. Models output structured forecasts with event type, location, probability, and time horizon. Predictions are then matched against observed events using a composite scoring system that evaluates temporal proximity, geographic accuracy, and semantic similarity.

Data Sources

Real-time news from multiple RSS feeds and news APIs covering Middle East geopolitics
Diplomatic signals extracted from official statements and UN/IAEA reports
Macro-economic indicators: oil prices, gold, defense sector indices
Prediction market probabilities from Polymarket on Iran-related outcomes

Models Evaluated

GPT

GPT-5.4
GPT-5.2
GPT-5.1

Claude

Claude Sonnet 4.6
Claude Opus 4.5
Claude Haiku 4.5

Gemini

Gemini 2.5 Pro
Gemini 3 Flash

Pipeline

Data ingestion runs every 6 hours via a scheduled pipeline. The system fetches news, diplomatic signals, market data, and prediction market prices. LLM forecasts are generated daily with structured prompts. Event matching and scoring run automatically to update the leaderboard.

Links

Read the Paper GitHub Repository

Disclaimer

This platform is for academic research purposes only. Forecasts are generated by AI models and should not be used for decision-making. The research does not represent the views of any institution or organization.