Research

Academic

Academic research on LLM-based geopolitical conflict forecasting

Project Overview

War Forecast Arena is a research platform that evaluates frontier large language models on their ability to anticipate geopolitical conflict events. We generate daily structured probabilistic forecasts from 8 LLMs across 3 providers (OpenAI, Anthropic, Google), and compare predictions against ground-truth observations collected from multiple verified news and intelligence sources.

Methodology

Each LLM receives a curated context window containing recent news, diplomatic signals, macro-economic indicators, and prediction market data. Models output structured forecasts with event type, location, probability, and time horizon. Predictions are then matched against observed events using a composite scoring system that evaluates temporal proximity, geographic accuracy, and semantic similarity.

Data Sources

  • Real-time news from multiple RSS feeds and news APIs covering Middle East geopolitics
  • Diplomatic signals extracted from official statements and UN/IAEA reports
  • Macro-economic indicators: oil prices, gold, defense sector indices
  • Prediction market probabilities from Polymarket on Iran-related outcomes

Models Evaluated

GPT
  • GPT-5.4
  • GPT-5.2
  • GPT-5.1
Claude
  • Claude Sonnet 4.6
  • Claude Opus 4.5
  • Claude Haiku 4.5
Gemini
  • Gemini 2.5 Pro
  • Gemini 3 Flash

Pipeline

Data ingestion runs every 6 hours via a scheduled pipeline. The system fetches news, diplomatic signals, market data, and prediction market prices. LLM forecasts are generated daily with structured prompts. Event matching and scoring run automatically to update the leaderboard.

Disclaimer

This platform is for academic research purposes only. Forecasts are generated by AI models and should not be used for decision-making. The research does not represent the views of any institution or organization.

ResearchThis platform is part of an academic research project on LLM-based conflict forecasting.