CIG-Bench: Survey & Benchmark for AI-Driven Subsurface Imaging Understanding

Abstract摘要

Subsurface imaging understanding bridges observed geophysical data and quantitative geological models. It underpins engineering applications such as hydrocarbon exploration and development, site assessment and long-term monitoring for geological CO₂ storage, geothermal and mineral resource exploration, and geohazard early warning, while also providing quantitative evidence for fundamental geoscience research on basin evolution, sedimentary processes, and deep tectonic evolution. To systematically take stock of a decade of deep-learning progress, we analyze the literature published between 2015 and 2025 and organize it into four major tasks: structural interpretation, geobody identification, seismic facies analysis, and property estimation. Building on this review, we further distill three interrelated challenges that define the frontier of the field: reliable interpretation under complex geological conditions, cross-survey semantic generalization under low information density, and the absence of a unified benchmark. To address these challenges, we propose a community-oriented benchmark spanning four tasks — fault segmentation, relative geologic time estimation, geobody segmentation, and property modeling — and integrating unified evaluation protocols, pretrained models, and datasets that combine synthetic data (for quantitative evaluation) with real-survey data (for qualitative assessment).地下成像理解架起了观测地球物理数据与定量地质模型之间的桥梁，既支撑油气勘探与开发、CO₂ 地质封存的选址评估与长期监测、地热与矿产资源勘查、地质灾害预警等工程应用，也为盆地演化、沉积过程与深部构造演化等基础地球科学研究提供定量依据。为系统梳理深度学习近十年的进展，我们对 2015 至 2025 年间的文献进行了系统分析，并将其归纳为四大任务：构造解释、地质体识别、地震相分析与属性估计。在此基础上，我们进一步提炼出定义该领域前沿的三个相互关联的挑战：复杂地质条件下的可靠解释、低信息密度下的跨工区语义泛化，以及统一基准的缺失。针对上述挑战，我们提出一个面向社区的基准，涵盖断层分割、相对地质年代估计、地质体分割与属性建模四项任务，并整合统一的评测协议、预训练模型，以及兼顾合成数据（用于定量评测）与真实工区数据（用于定性评估）的数据集。

CIG-Bench graphical abstract: a unified pipeline from multi-task datasets and unified training to pretrained models and standard evaluation, covering fault, horizon/RGT, multi-geobody, and property tasks

CIG-Bench provides a unified pipeline — from multi-task datasets and unified training to pretrained models and standardized evaluation — spanning fault, horizon/RGT, multi-geobody, and property interpretation tasks. CIG-Bench 提供统一流程——从多任务数据集、统一训练，到预训练模型与标准化评测——覆盖断层、层位/RGT、多地质体与属性等解释任务。

An AI Framework for Subsurface Imaging面向地下成像的 AI 框架

A structured overview of AI for geophysics-based subsurface imaging understanding, and why subsurface data make machine learning harder than in natural or medical imaging.对面向地球物理地下成像理解的 AI 进行结构化梳理，并阐释为何地下数据使机器学习比自然图像或医学影像更具挑战性。

AI framework for subsurface imaging understanding

An AI framework for geophysics-based subsurface imaging understanding: core challenges and AI-driven solutions. 面向地球物理地下成像理解的 AI 框架：核心挑战与 AI 驱动的解决方案。

Data characteristics across natural, medical, and subsurface imaging

Figure 2. Why subsurface imaging is harder for machine learning than natural and medical imaging. 为何地下成像相比自然图像与医学影像对机器学习更具挑战性。

Key Contributions主要贡献

1

Decade-spanning survey of 652 publications across four subsurface interpretation directions, with citation-network analysis that reveals cross-domain knowledge flow and methodological evolution from seismic attributes to domain foundation models.横跨十年、覆盖 652 篇文献的综述，涉及四个地下解释方向，并通过引文网络分析揭示跨领域的知识流动，以及从地震属性到领域基础模型的方法学演进。

2

Systematic task-wise review of deep-learning methods across structure, geobody, facies, and property tasks, tracing each evolutionary trajectory and identifying the remaining bottlenecks for cross-survey generalization.系统的分任务综述，覆盖构造、地质体、相与属性任务中的深度学习方法，梳理各自的演进轨迹，并指出跨工区泛化仍存在的瓶颈。

3

CIG-Bench open-source library: pretrained baselines for fault, RGT, channel, karst, and property tasks with cross-survey transferability, plus one-click pip install inference APIs that load weights automatically from ModelScope, with one-line dataset downloads.CIG-Bench 开源库：为断层、RGT、河道、岩溶与属性任务提供具备跨工区迁移能力的预训练基线，并配套一键 pip install 的推理 API，自动从 ModelScope 加载权重，并支持一行代码下载数据集。

4

Long-term, community-maintained ecosystem: standardized splits and metrics, continuous model updates, and a roadmap for integrating community-contributed baselines and new tasks as the field evolves.长期、由社区维护的生态：标准化的数据划分与评价指标、持续的模型更新，以及随领域发展整合社区贡献基线与新任务的路线图。

Interpretation Tasks解释任务

Subsurface interpretation is decomposed into four conceptually overlapping but methodologically distinct categories.地下解释被分解为概念上相互重叠、但方法学上各有侧重的四类任务。

🏗️

Structure构造

Faults, horizons, unconformities, and relative geologic time (RGT) for constructing structural frameworks and stratigraphic models.断层、层位、不整合面与相对地质年代 (RGT)，用于构建构造格架与地层模型。

🪨

Geobody地质体

3D segmentation of salt bodies, channels, karst cavities, and igneous intrusions with relatively independent geometries.对盐体、河道、岩溶洞穴与岩浆侵入体等具有相对独立几何形态的目标进行三维分割。

📊

Facies地震相

Classification of seismic units by amplitude, frequency, continuity, and stratification — reflecting depositional environments.依据振幅、频率、连续性与层理对地震单元进行分类——反映沉积环境。

🔬

Property属性

Inversion of impedance, velocity, porosity, density, and Vp from seismic data calibrated with sparse well logs.利用稀疏测井标定，从地震数据反演阻抗、速度、孔隙度、密度与纵波速度 (Vp)。

Evolution of Research Paradigms研究范式的演进

Click tabs or use arrows to explore the decade-long evolution of each task category.点击标签页或使用箭头，探索各任务类别跨越十年的演进历程。

Seismic attributes (2015–16) → ML penetration with 2D segmentation and patch-based classification (2017–18) → synthetic-data era and 3D deep segmentation (FaultSeg3D, 2019–20) → semi-/weak supervision (2021–23) → domain foundation models (GEM/SAM, 2024–25). 地震属性 (2015–16) → 机器学习渗透，二维分割与基于图块的分类 (2017–18) → 合成数据时代与三维深度分割 (FaultSeg3D，2019–20) → 半监督/弱监督 (2021–23) → 领域基础模型 (GEM/SAM，2024–25)。

Multiscale attribute enhancement → TGS Salt Challenge and supervised 2D segmentation → 3D end-to-end volumetric segmentation with synthetic and semi-supervised training → SAM-style prompting and unified foundation frameworks (GEM). 多尺度属性增强 → TGS 盐丘挑战赛与有监督二维分割 → 结合合成数据与半监督训练的三维端到端体分割 → SAM 式提示与统一基础框架 (GEM)。

Attribute-based clustering and SOM → deep feature clustering and contrastive learning → self-supervised pretraining and cross-survey generalization with horizon-based and sectional viewing paradigms. 基于属性的聚类与 SOM → 深度特征聚类与对比学习 → 自监督预训练，以及基于层位与剖面观察范式的跨工区泛化。

Physics-driven inversion (Bayesian / geostatistical) → end-to-end deep inversion → foundation models (GEM) → cross-survey generalisation with unified seismic + well-log conditioning. 物理驱动的反演（贝叶斯 / 地质统计）→ 端到端深度反演 → 基础模型 (GEM) → 以统一的地震 + 测井条件实现跨工区泛化。

Quick Start快速开始

One-line install. One-line dataset download. Five predictors. Weights downloaded automatically on first call. Each task is paired with an example result on real field data.一行命令安装。一行代码下载数据集。五个预测器。首次调用时自动下载权重。每个任务都配有在真实野外数据上的示例结果。

pip install cig_bench

📦 Dataset download (being updated...)数据集下载（持续更新中...）

from cig_bench.dataset import cig_structureData, cig_geobodyData

# One line each — downloads the subset into a directory you choose
# and returns its local path. Two subsets: Structure and Geobody.
structure_dir = cig_structureData("./CIG-Bench-Dataset")  # -> ./CIG-Bench-Dataset/Structure
geobody_dir   = cig_geobodyData("./CIG-Bench-Dataset")    # -> ./CIG-Bench-Dataset/Geobody

🏗️ Fault segmentation断层分割

from cig_bench.predictor.fault import FaultPredictor

fault_predictor = FaultPredictor(device="cuda")
prob, used = fault_predictor.predict(
    seis,                        # (tline,iline,xline)
    rank=4, chunk_size=64,       # memory-bounded chunked inference
    threshold=0.5,
    scale_t=0.5, scale_h=0.85, scale_w=0.85,
    resize_back=True,            # return result at the original (T, H, W)
)
fault_predictor.visualize(used, prob)

Example result · Fault. Fault segmentation on four field surveys (a–d): the input seismic (columns *-1) and the predicted faults rendered in red over the seismic (columns *-2), shown on crossline-style cubes (a, b) and inline sections (c, d). The model resolves subtle polygonal faults, extremely dense intersecting networks, and deep-rooted dipping planes with associated secondary faults. 在四个野外工区 (a–d) 上的断层分割：输入地震数据（*-1 列）与以红色叠加在地震上的预测断层（*-2 列），分别以联络测线式数据体 (a, b) 与主测线剖面 (c, d) 展示。模型能够刻画细微的多边形断层、极其密集的交叉断层网络，以及深部根植的倾斜断面及其伴生的次级断层。

⏱️ Relative Geologic Time (RGT) estimation相对地质年代 (RGT) 估计

from cig_bench.predictor.rgt import RGTPredictor

rgt_predictor = RGTPredictor(device="cuda")
rgt_vol, used = rgt_predictor.predict(seis)  # (tline,iline,xline)
horizons      = rgt_predictor.extract_horizons(rgt_vol, n_horizons=100)
rgt_predictor.visualize(used, rgt_vol, horizons)
# visualize() also auto-traces horizons when none are passed:
# rgt_predictor.visualize(used, rgt_vol)

Example result · RGT. Relative-geologic-time estimation on four field surveys (a–d): input seismic (*-1), the regressed RGT volume (*-2), and horizons extracted from the RGT volume and overlaid on the seismic (*-3). Predictions stay smooth and stratigraphically consistent across slope bodies, unconformities, densely faulted zones, and multi-package stratigraphy. 在四个野外工区 (a–d) 上的相对地质年代估计：输入地震数据（*-1）、回归得到的 RGT 体（*-2），以及从 RGT 体中提取并叠加到地震上的层位（*-3）。在斜坡体、不整合面、密集断裂带与多套地层组合中，预测结果均保持平滑且地层一致。

🪨 Geobody segmentation (channel / karst)地质体分割（河道 / 岩溶）

from cig_bench.predictor.channel import ChannelPredictor

channel_predictor = ChannelPredictor(device="cuda")
scores, used = channel_predictor.predict(
    seis,                                 # (tline,iline,xline)
    scales=[0.5, 0.75, 1.0, 1.25, 1.5],   # custom scale set
    accumulate="sum",
)
mask = channel_predictor.postprocess(scores, threshold=0.75, min_size=50000)
channel_predictor.visualize(used, scores, mask)

# The karst predictor is used identically — only the checkpoint changes:
from cig_bench.predictor.karst import KarstPredictor

karst_predictor = KarstPredictor(device="cuda")
scores, used = karst_predictor.predict(seis)  # (tline,iline,xline)
mask = karst_predictor.postprocess(scores, threshold=0.75)

Example result · Channel / Karst. Geobody segmentation on three field examples (rows a–c): input seismic (*-1), the predicted probability overlaid on the seismic (*-2), and the extracted 3D geobody surface (*-3) after thresholding and connected-component cleanup. (a, b) channel systems and (c) karst-cave systems. 在三个野外示例（a–c 行）上的地质体分割：输入地震数据（*-1）、叠加在地震上的预测概率（*-2），以及经阈值化与连通域清理后提取的三维地质体表面（*-3）。(a, b) 为河道体系，(c) 为岩溶洞穴体系。

🔬 Property modeling (GEM-style conditional)属性建模（GEM 式条件建模）

import numpy as np
from cig_bench.predictor.property import PropertyPredictor

prop_predictor = PropertyPredictor(device="cuda")
vp_vol, used, wells = prop_predictor.predict(
    seis, vp_log,                              # (tline,iline,xline)
    infer_shape=(640, 512, 512),
)
prop_predictor.visualize(used, vp_vol, wells)

Example result · Property. Property modeling under the promptable conditional paradigm with seismic + sparse well-log inputs. (a) input seismic; (b–f) dense 3D property volumes predicted from the seismic conditioned on sparse well logs (the thin vertical strips are the conditioning wells) — e.g. acoustic impedance, gamma-ray, lithology, sonic, and Vp. Predictions stay coherent with the seismic reflection patterns despite spatially sparse well constraints. 在可提示的条件建模范式下，以地震 + 稀疏测井为输入的属性建模。(a) 输入地震；(b–f) 以稀疏测井为条件、从地震预测得到的稠密三维属性体（细的垂直条带为作为条件的井）——例如声波阻抗、伽马、岩性、声波时差与纵波速度 (Vp)。尽管井约束在空间上稀疏，预测结果仍与地震反射模式保持一致。

Weights download automatically on first use from ModelScope.
Datasets download with one line: from cig_bench.dataset import cig_structureData, cig_geobodyData.权重在首次使用时自动从 ModelScope 下载。
数据集一行代码下载：from cig_bench.dataset import cig_structureData, cig_geobodyData。

Citation引用

If you find this survey or benchmark useful, please cite our work:如果本综述或基准对您有帮助，请引用我们的工作：

@article{dou2026cigbench,
  title         = {CIG-Bench: A Comprehensive Survey and Benchmark for AI-Driven Subsurface Imaging Understanding},
  author        = {Yimin Dou and Xinming Wu and Hui Gao and Mingliang Liu and
                   Tao Zhao and Zhi Zhong and Haibin Di and Min Jun Park and
                   Robert G. Clapp and Zhixiang Guo and Long Han and Sergey Fomel},
  journal       = {arXiv preprint arXiv:2606.09094},
  year          = {2026},
  eprint        = {2606.09094},
  archivePrefix = {arXiv},
  doi           = {10.48550/arXiv.2606.09094},
  url           = {https://arxiv.org/abs/2606.09094}
}