Skip to content

UConn-DSIS/Multi-modal-Time-Series-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

[KDD 2025] Multi-modal-Time-Series-Analysis

🎉 News: This survey has been ACCEPTED to the Lecture Style Tutorials Track of KDD 2025 as a HALF-DAY tutorial! 🎉

This is the official repository for "Multi-modal Time Series Analysis: A Tutorial and Survey". [Paper]

This repository is maintained by Yushan Jiang and Kanghui Ning from UConn DSIS.

Please consider citing our survey paper if you find it helpful :), and feel free to share this repository with others!

Motivation and Contribution:

This survey aims to provide a unique and systematic perspective on effectively leveraging cross-modal interactions from relevant real-world contexts to advance multi-modal time series analysis, addressing both foundational principles and practical solutions. Our assessment is threefold:

  • Reviewing multi-modal time series data
  • Analyzing cross-modal interactions between time series and other modalities (Fusion, Alignment, Transference)
  • Demonstrating revealing the impact of multi-modal time series analysis in applications across diverse domains.
Image Image
Figure 1: The Framework of Our Survey Figure 2: Categorization of cross-modal interaction methods and representative examples

Representative Open-Source Multi-Modal Time Series Datasets

Domain Dataset Modalities
Healthcare MIMIC-III[1], MIMIC-IV[2] TS, Text, Table
ICBHI[3], Coswara[4], KAUH[5], PTB-XL[6], ZuCo[7] TS, Text
Image-EEG[8] TS, Image
Finance FNSPID[9], ACL18[10], CIKM18[11], DOW30[12] TS, Text
Multi-domain MTBench[13], Time-MMD[14], TimeCAP[15], NewsForecast[16], TTC[17], CiK[18], TSQA[19] TS, Text
Retail VISUELLE[20] TS, Image, Text
IoT LEMMA-RCA[21] TS, Text
Speech LRS3[22], VoxCeleb2[23] TS (Audio), Image
Traffic NYC-taxi, NYC-bike[24] ST, Text
Environment Terra[25] ST, Text

Taxonomy of Representative Multi-Modal Time Series Methods

We define three fundamental types of interactions between time series and other modalities, including Fusion, Alignment, and Transference, which occur at different stages within a framework --- Input, Intermediate (i.e., representations or intermediate outputs), and Output.

  • Fusion refers to the process of integrating heterogeneous modalities in a way that captures complementary information across diverse sources to improve time series modeling.
  • Alignment ensures that the relationships between different modalities are preserved and semantically coherent when integrated into a unified learning framework.
  • Transference refers to the process of mapping between different modalities, which allows one modality to be inferred, translated, or synthesized from another.

Note:

  • F: Fusion; A: Alignment; T: Transference
Method Modality Domain Task Stage F A T Method Large Model
Time-MMD
(NeurIPS 2024) Code
TS, Text General Forecasting Output Addition Multiple
Wang et al.
(NeurIPS 2024) Code
TS, Text General Forecasting Input Prompt LLaMa2, GPT-4 Turbo
Intermediate Prompt; LLM reasoning
GPT4MTS
(AAAI 2024)
TS, Text General Forecasting Intermediate Addition; Self-attention GPT-2
TimeCMA
(AAAI 2025) Code
TS, Text General Forecasting Input Meta-description GPT-2
Intermediate Addition; Cross-attention
MOAT
(2024)
TS, Text General Forecasting Intermediate Concat.; Self-attention S-Bert
Output Offline synthesis
TimeCAP
(AAAI 2025)
TS, Text General Classification Input LLM Generation Bert, GPT-4
Intermediate Concat.; Self-attention, Retrieval
Output Addition
TimeXL
(NeurIPS 2025)
TS, Text General Classification Intermediate Concat., Prompt; LLM Reasoning Bert, S-Bert, GPT-4o
Forecasting Output Addition
Hybrid-MMF
(2024) Code
TS, Text General Forecasting Intermediate Concat. GPT-4o
Time-LLM
(ICLR 2024) Code
TS, Text General Forecasting Input Meta-description LLaMA, GPT-2
Intermediate Concat.; Self-attention
Time-VLM
(2025)
TS, Text, Image General Forecasting Input Feat. Imaging, Meta-description ViLT, CLIP, BLIP-2
Intermediate Addition; Gating, Cross-attention
Unitime
(WWW 2024)
TS, Text General Forecasting Input Meta-description GPT-2
Intermediate Concat.; Self-attention
TESSA
(2024)
TS, Text General Annotation Intermediate Prompt; RL; LLM Generation GPT-4o
InstrucTime
(WSDM 2025) Code
TS, Text General Classification Intermediate Concat.; Self-attention GPT-2
MATMCD
(2024)
TS, Text, Graph General Causal Discovery Intermediate Prompt; LLM Reasoning; Supervision Multiple
STG-LLM
(2024)
ST, Text General Forecasting Intermediate Concat.; Self-attention GPT-2
TableTime
(2024) Code
TS, Text General Classification Input Prompt; Reformulate Multiple
ContextFormer
(2024)
TS, Table General Forecasting Intermediate Addition; Cross-attention No
Time-MQA
(2025) Code
TS, Text General Multiple Input Prompt Multiple
MAN-SF
(EMNLP 2020)
TS, Text, Graph Finance Classification Intermediate Bilinear; Graph Convolution USE
Bamford et al.
(ICAIF 2023)
TS, Text Finance Retrieval Intermediate Supervision S-bert
TS, Image Output
Chen et al.
(2023)
TS, Text, Graph Finance Classification Input LLM Generation ChatGPT
Intermediate Concat.; Graph Convolution
Xie et al.
(2023)
TS, Text Finance Classification Input Prompt ChatGPT
Yu et al.
(EMNLP 2023)
TS, Text Finance Forecasting Input Prompt GPT-4, Open LLaMA
MedTsLLM
(2024) Code
TS, Text, Table Healthcare Multiple Intermediate Concat.; Self-attention Llama2
RespLLM
(2024) Code
TS (Audio), Text Healthcare Classification Intermediate Addition, Self-attention OpenBioLLM-8B
METS
(2023)
TS, Text Healthcare Classification Output Contrastive ClinicalBert
Wang et al.
(AAAI 2022)
TS, Text Healthcare Classification Intermediate Supervision Bart, Bert, RoBerta
EEG2TEXT
(BigData 2024)
TS, Text Healthcare Generation Output Self-supervision, Supervision Bart
MEDHMP
(EMNLP 2023) Code
TS, Text Healthcare Classification Intermediate Concat.; Self-attention, Contrastive ClinicalT5
Deznabi et al.
(ACL 2021) Code
TS, Text Healthcare Classification Intermediate Concat. Bio+Clinical Bert
Niu et al.
(2023)
TS, Text Healthcare Classification Intermediate Concat.; Cross-attention BioBERT
Yang et al.
(EMNLP 2021) Code
TS, Text Healthcare Classification Intermediate Concat., Addition; Gating ClinicalBERT
Liu et al.
(2023) Code
TS, Text Healthcare Classification, Regression Input Prompt PaLM
xTP-LLM
(2024) Code
ST, Text Traffic Forecasting Input Prompt; Meta-description Llama2-7B-chat
UrbanGPT
(2024) Code
ST, Text Traffic Forecasting Input Prompt; Meta-description Vicuna-7B
CityGPT
(2024) Code
ST, Text Mobility Multiple Input Prompt Multiple
MULAN
(WWW 2024)
TS, Text, Graph IoT Causal Discovery Intermediate Addition; Contrastive; Supervision No
MIA
(2023)
TS, Image IoT Anomaly Detection Intermediate Addition; Cross-attention, Gating No
Ekambaram et al.
(KDD 2020) Code
TS, Image, Text Retail Forecasting Intermediate Concat.; Self & Cross-attention No
Skenderi et al.
(2024) Code
TS, Image, Text Retail Forecasting Intermediate Concat.; Cross-attention No
VIMTS
(BigData 2022)
ST, Image Environment Imputation Intermediate Concat.; Supervision No
LITE
(2024) Code
ST, Text, Image Environment Forecasting Intermediate Concat.; Self-attention LLaMA-2-7b
AV-HuBERT
(ICLR 2022) Code
TS (Audio), Image Speech Classification Intermediate Concat.; Self-attention HuBert
SpeechGPT
(EMNLP 2023) Code
TS(Audio), Text Speech Generation Intermediate Concat.; Self-attention LLaMA-13B
LA-GCN
(2023) Code
ST, Text Vision Classification Intermediate Supervision Bert

Citation

      title={Multi-modal Time Series Analysis: A Tutorial and Survey}, 
      author={Yushan Jiang and Kanghui Ning and Zijie Pan and Xuyang Shen and Jingchao Ni and Wenchao Yu and Anderson Schneider and Haifeng Chen and Yuriy Nevmyvaka and Dongjin Song},
      year={2025},
      eprint={2503.13709},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.13709}, 
}

About

[KDD 2025] Awesome Multi-modal Time Series Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors