Yazar "Karadag, Ozge Oztimur" seçeneğine göre listele
Listeleniyor 1 - 5 / 5
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe AGMS-GCN: Attention-guided multi-scale graph convolutional networks for skeleton-based action recognition(Elsevier, 2025) Kilic, Ugur; Karadag, Ozge Oztimur; Ozyer, Gulsah TumukluGraph Convolutional Networks have the capability to model non-Euclidean data with high effectiveness. Due to this capability, they perform well on standard benchmarks for skeleton-based action recognition (SBAR). Specifically, spatial-temporal graph convolutional networks (ST-GCNs) function effectively in learning spatial- temporal relationships on skeletal graph patterns. In ST-GCN models, a fixed skeletal graph pattern is used across all layers. ST-GCN models obtain spatial-temporal features by performing standard convolution on this fixed graph topology within a local neighborhood limited by the size of the convolution kernel. This convolution kernel dimension can only model dependencies between joints at short distances and shortrange temporal dependencies. However, it fails to model long-range temporal information and long-distance joint dependencies. Effectively capturing these dependencies is key to improving the performance of ST-GCN models. In this study, we propose AGMS-GCN, an attention-guided multi-scale graph convolutional network structure that dynamically determines the weights of the dependencies between joints. In the proposed AGMSGCN architecture, new adjacency matrices that represent action-specific joint relationships are generated by obtaining spatial-temporal dependencies with the attention mechanism on the feature maps extracted using spatial-temporal graph convolutions. This enables the extraction of features that take into account both the shortand long-range spatial-temporal relationship between action-specific joints. This data-driven graph construction method provides amore robust graph representation for capturing subtle differences between different actions. In addition, actions occur through the coordinated movement of multiple body joints. However, most existing SBAR approaches overlook this coordination, considering the skeletal graph from a single-scale perspective. Consequently, these methods miss high-level contextual features necessary for distinguishing actions. The AGMS-GCN architecture addresses this shortcoming with its multi-scale structure. Comprehensive experiments demonstrate that our proposed method attains state-of-the-art (SOTA) performance on the NTU RGB+D 60 and Northwestern-UCLA datasets. It also achieves SOTA competitive performance on the NTU RGB+D 120 dataset. The source code of the proposed AGMS-GCN model is available at: https: //github.com/ugrkilc/AGMS-GCN.Öğe Comparative Analysis of Vision Transformers and Morphological Approaches for Ki-67 Index Estimation on Histopathologic Images: An Experimental Evaluation(Ieee, 2024) Akdeniz, Ahmet Sezer; Ozgur, Berkan; Sahin, Emre; Karadag, Ozge Oztimur; Gunizi, Ozlem CerenThis research comprehensively compares two different methodologies for predicting the Ki-67 index- morphological-based analysis and Vision Transformers (ViT). The morphological method focuses on the shape and structural features of tissues and cell structures. On the other hand, Vision Transformers represents an innovative approach developed through the use of attention mechanisms and transformer architectures. ViT offers a different perspective by modeling global context information in recognizing image patterns. This analysis provides a deeper understanding of the accuracy, efficiency, and applicability of current techniques in histopathological image processing, highlighting the potential to advance existing methodologies used in cancer diagnosis. This comparative study aims to evaluate the performance differences between morphological analyses and transformer-based models, identifying the most effective and reliable methods for predicting the Ki-67 index. Experimental analysis, revealed that due to the limited number of labeled data on this domain, traditional morphologic approaches are currently more promising than the vision transformers.Öğe Fine-to-coarse self-attention graph convolutional network for skeleton-based action recognition(Elsevier, 2026) Kilic, Ugur; Karadag, Ozge Oztimur; Ozyer, Gulsah TumukluSkeleton data has become an important modality in action recognition due to its robustness to environmental changes, computational efficiency, compact structure, and privacy-oriented nature. With the rise of deep learning, many methods for action recognition using skeleton data have been developed. Among these methods, spatial-temporal graph convolutional networks (ST-GCNs) have seen growing popularity due to the suitability of skeleton data for graph-based modeling. However, ST-GCN models use fixed graph topologies and fixed-size spatial-temporal convolution kernels. This limits their ability to model coordinated movements of joints in different body regions and long-term spatial-temporal dependencies. To address these limitations, we propose a fine-to-coarse self-attention graph convolutional network (FCSA-GCN). Our approach employs a fine-to-coarse scaling strategy for multi-scale feature extraction. This strategy effectively models both local and global spatial temporal relationships and better represents the interactions among joint groups in different body regions. By integrating a temporal self-attention mechanism (TSA) into the multi-scale feature extraction process, we enhance the model's ability to capture long-term temporal dependencies effectively. Additionally, during training, we employ the dynamic weight averaging (DWA) approach to ensure balanced optimization across the multi-scale feature extraction stages. Comprehensive experiments conducted on the NTU-60, NTU-120, and NW-UCLA datasets demonstrate that FCSA-GCN outperforms state-of-the-art methods. These results highlight that the proposed approach effectively addresses the current challenges in skeleton-based action recognition (SBAR).Öğe SkelResNet: Transfer Learning Approach for Skeleton-Based Action Recognition(Ieee, 2024) Kilic, Ugur; Karadag, Ozge Oztimur; Ozyer, Gulsah TumukluSkeleton-based action recognition is an increasingly popular research area in computer vision that analyzes the spatial configuration and temporal dynamics of human action. Learning distinctive spatial and temporal features for skeleton-based action recognition is one of the main challenges in this field. For this purpose, various deep learning methods such as CNN, RNN, GCN and Transformer have been used in the literature. Although these methods can achieve high performance, they require high computational costs and large datasets due to their complexity. Transfer learning is an approach that can be used to overcome this problem. In transfer learning, a pre-trained model can be fine-tuned for a new task. In this way, the computational cost can be reduced and high performance can be achieved with less data. In this study, SkelResNet architecture is designed based on the pre-trained ResNet101 model. Four different image representations were created using skeletal data to meet the input requirements of the SkelResNet architecture. Experimental studies have shown that SkelResNet outperforms CNN-based methods in the existing literature in action recognition.Öğe SkelVIT: consensus of vision transformers for a lightweight skeleton-based action recognition system(Springer London Ltd, 2024) Karadag, Ozge OztimurSkeleton-based action recognition systems receive the attention of many researchers due to their robustness to viewpoint and illumination changes, along with their computational efficiency compared to the systems based on video frames. The advent of deep learning models has prompted researchers to explore the utility of deep architectures in addressing the challenge of skeleton-based action recognition. A predominant trend in existing literature involves the application of these architectures either to the vectorial representation of skeleton data or to its graphical depictions. However, deep architectures have demonstrated their efficacy primarily in vision tasks that involve image data. Consequently, researchers have proposed representing the skeleton data in pseudo-image formats and the utilizing the Convolutional Neural Networks (CNNs) for action recognition purposes. Subsequent research efforts have focused on devising effective methodologies for constructing pseudo-images from skeleton data. More recently, attention has shifted towards attention networks, particularly transformers, which have shown promising performance across various vision-related tasks. In this study, the effectiveness of vision transformers (VIT) for skeleton-based action recognition is examined and its robustness on the pseudo-image representation scheme is investigated. To this end, a three-level architecture, called SkelVit is proposed. In the first level of SkelVit, a set of pseudo images are generated from the skeletal images. In the second level, a classifier is trained on each pseudo-image representation. In the third level, the posterior probabilities of each classifier in the ensemble is aggregated and fed to a meta classifier to estimate the final action class. The performance of SkelVit is examined via a set of experiments. First, the sensitivity of the system to representation is investigated by comparing it with two of the state-of-the-art pseudo-image representation methods. Then, the classifiers of SkelVit are realized in two experimental setups by CNNs and VITs, and their performances are compared. In the final experimental setup, the contribution of assembling the classifiers is examined by applying the model with a different number of classifiers. Experimental studies reveal that the proposed system with its lightweight representation scheme achieves better results than the state-of-the-art skeleton-based action recognition systems that employ pseudo-image representation. It is also observed that the vision transformer is less sensitive to the initial pseudo-image representation compared to CNN. Nevertheless, experimental analysis revealed that even with the vision transformer, the recognition performance can be further improved by the consensus of classifiers.












