The Effect of Cluster Size for Model Performance in High-Dimensional Longitudinal Studies: A Simulation Study

dc.contributor.authorŞengül, Merve Türkegün
dc.contributor.authorTasdelen, Bahar
dc.contributor.authorYologlu, Saim
dc.date.accessioned2026-01-24T12:01:34Z
dc.date.available2026-01-24T12:01:34Z
dc.date.issued2023
dc.departmentAlanya Alaaddin Keykubat Üniversitesi
dc.description.abstractObjective: In order to prevent model estimation er- rors and deviations in high-dimensional longitudinal studies, risk models are established through penalized methods. The aim of this study is to examine the effect of small cluster effects on the gener- alized estimating equations (GEE) and penalized GEE (PGEE) model performances in high-dimensional longitudinal data. Mate- rial and Methods: A simulation study was designed to compare the GEE and PGEE model performances, Type I error rates, and power in two-period longitudinal data structures with different clus- ter sizes (n=20, 30, 50, 100, 200), different numbers of predictors (p=10, 20, 50) and different correlation levels between predictors (r=0.20, 0.50, 0.80). Results: It was observed that the GEE coef- ficient estimates were misleading and inconsistent, the Type I error rates were high, and the power of the test was weak at insuf- ficient cluster sizes and high correlations between predictors. Even when the number of predictors and cluster size were in the balance (p=10, n=100, 200), Type I error rates were obtanied high for GEE. Increasing the cluster size was not enough to re- duce the Type I error rate of GEE. The PGEE produced more successful results than GEE in all conditions. The power of PGEE increased to over 80% in all scenarios. Conclusion: The PGEE yielded more consistent results by controlling the relationships both within the cluster and between the predictors. In high- dimensional longitudinal studies, it was observed that the use of PGEE is more effective than GEE.
dc.identifier.doi10.5336/biostatic.2023-98699
dc.identifier.endpage170
dc.identifier.issn1308-7894
dc.identifier.issn2146-8877
dc.identifier.issue3
dc.identifier.startpage161
dc.identifier.trdizinid1258730
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/1258730
dc.identifier.urihttps://doi.org/10.5336/biostatic.2023-98699
dc.identifier.urihttps://hdl.handle.net/20.500.12868/4424
dc.identifier.volume15
dc.indekslendigikaynakTR-Dizin
dc.language.isoen
dc.relation.ispartofTürkiye Klinikleri Biyoistatistik Dergisi
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_TR-Dizin_20260121
dc.subjectModel selection
dc.subjectGeneralized estimating equations
dc.subjectpenalized generalized estimating equations
dc.subjectpenalized methods
dc.subjecthigh dimensional longitudinal data
dc.titleThe Effect of Cluster Size for Model Performance in High-Dimensional Longitudinal Studies: A Simulation Study
dc.typeArticle

Dosyalar