Surface wave dispersion curve inversion plays a critical role in both shallow resource exploration and deep geological studies, yet it remains hindered by low computational efficiency, sensitivity to initial models, and susceptibility to local minima. Recently, data-driven deep learning methods, inspired by their success in computer vision and natural language processing, have shown promising potential to overcome these challenges. However, the lack of large-scale and diverse benchmark datasets remains a major obstacle to the development and evaluation of such methods. To address this gap, we introduce OpenSWI, a comprehensive benchmark dataset generated through our dataset construction pipeline, SWIDP. OpenSWI comprises two synthetic datasets tailored to different research scales and application scenarios, namely OpenSWI-shallow and OpenSWI-deep, as well as an AI-ready real-world dataset for generalization evaluation, OpenSWI-real. OpenSWI-shallow is constructed based on the 2-D geological model dataset OpenFWI and includes over 22 million 1-D velocity profiles paired with their fundamental-mode phase and group velocity dispersion curves, covering a wide range of shallow subsurface structures (e.g., flat layers, faults, folds, and realistic stratigraphy). OpenSWI-deep is generated from 14 global and regional 3-D geological models and consists of approximately 1.26 million high-fidelity 1-D dispersion data pairs for deep Earth studies. OpenSWI-real, compiled from open-source projects, contains two sets of observed dispersion curves and their corresponding 1-D reference models, providing a foundation for evaluating the generalization ability of deep learning models. To demonstrate the utility of OpenSWI, we trained deep learning models on OpenSWI-shallow and OpenSWI-deep, and evaluated them on OpenSWI-real. The results show strong agreement between the predicted and reference velocity models, confirming the diversity and representativeness of the OpenSWI dataset. To facilitate the advancement of intelligent surface wave dispersion curve inversion techniques, we release the SWIDP toolbox, the OpenSWI datasets, trained deep learning models, and other examples, aiming to provide comprehensive support and open resources for the research community.
@article{liu2025openswi,
title = {OpenSWI: A Massive-Scale Benchmark Dataset for Surface Wave Dispersion Curve Inversion},
author = {Liu, Feng and Zhao, Sijie and Gu, Xinyu and Ling, Fenghua and Li, Yaxing and Su, Rui and Zhuang, Peiqin and Lihua Fang and Huang, Jianping and Bai, Lei},
year = {2025},
journal = {arXiv preprint arXiv:XXXX.XXXXX},
url = {https://openswi.org},
note = {Dataset and code available at OpenSWI.org}
}