Kilic, UgurKaradag, Ozge OztimurOzyer, Gulsah Tumuklu2026-01-242026-01-2420261568-49461872-9681https://doi.org/10.1016/j.asoc.2025.114268https://hdl.handle.net/20.500.12868/5645Skeleton data has become an important modality in action recognition due to its robustness to environmental changes, computational efficiency, compact structure, and privacy-oriented nature. With the rise of deep learning, many methods for action recognition using skeleton data have been developed. Among these methods, spatial-temporal graph convolutional networks (ST-GCNs) have seen growing popularity due to the suitability of skeleton data for graph-based modeling. However, ST-GCN models use fixed graph topologies and fixed-size spatial-temporal convolution kernels. This limits their ability to model coordinated movements of joints in different body regions and long-term spatial-temporal dependencies. To address these limitations, we propose a fine-to-coarse self-attention graph convolutional network (FCSA-GCN). Our approach employs a fine-to-coarse scaling strategy for multi-scale feature extraction. This strategy effectively models both local and global spatial temporal relationships and better represents the interactions among joint groups in different body regions. By integrating a temporal self-attention mechanism (TSA) into the multi-scale feature extraction process, we enhance the model's ability to capture long-term temporal dependencies effectively. Additionally, during training, we employ the dynamic weight averaging (DWA) approach to ensure balanced optimization across the multi-scale feature extraction stages. Comprehensive experiments conducted on the NTU-60, NTU-120, and NW-UCLA datasets demonstrate that FCSA-GCN outperforms state-of-the-art methods. These results highlight that the proposed approach effectively addresses the current challenges in skeleton-based action recognition (SBAR).eninfo:eu-repo/semantics/closedAccessFine-to-coarse approachGraph convolutional networksMulti-scaleSkeleton-based action recognitionSkeletal dataTemporal self-attentionmulti-scale feature extraction stages. Comprehensive experiments conducted on the NTU-60NTU-120Fine-to-coarse self-attention graph convolutional network for skeleton-based action recognitionArticle10.1016/j.asoc.2025.1142681862-s2.0-105022187562Q1WOS:001624435100009Q1