Skip to content

The in silico analysis of Isthmin-1 missense variants in the Turkish population: allele frequencies and functional implications ISM1 variants in Türkiye

ISM1 variants in Türkiye: in silico & amp; frequencies

Original Research doi:10.4328/ACAM.23003

Authors

Affiliations

1Department of Medical Biology, Faculty of Medicine, Canakkale Onsekiz Mart University, Çanakkale, Türkiye.

2Department of Medical Systems Biology, Graduate School of Sciences, Canakkale Onsekiz Mart University, Çanakkale, Türkiye.

3Department of Urology, Faculty of Medicine, Canakkale Onsekiz Mart University, Çanakkale, Türkiye.

Corresponding Author

Meliha Merve Cicekliyurt

mervemeliha@comu.edu.tr

+90 532 722 09 03

Abstract

AimIsthmin-1 (ISM1) is a secreted matricellular protein implicated in angiogenesis, vascular permeability, metabolic regulation, and innate immunity. This study aimed to characterize the distribution and potential functional impact of two candidate ISM1 variants in the Turkish population by combining population-genetic modelling, in silico annotation, and targeted genotyping.
MethodsUsing 1000 Genomes Project reference data, we first derived modelled Turkish minor allele frequency (MAF) ranges for the missense variant rs77255807 (p.Ser102Pro) and the intronic variant rs117461286 under three predefined ancestry mixtures based on European, South Asian, and African super-populations. We then genotyped 120 healthy Turkish individuals and compared observed MAFs with the predicted ranges, and mapped p.Ser102Pro onto the AlphaFold ISM1 structure to contextualize possible structural effects.
ResultsThe observed MAF of rs77255807 in the Turkish cohort was higher than predicted by all mixture models, consistent with regional allele enrichment, whereas rs117461286 closely matched the modelled interval. Structural mapping placed Ser102 in a low-confidence flexible loop, suggesting that the Ser-to-Pro substitution could influence local backbone geometry but requires experimental validation.
ConclusionTogether, these findings provide baseline frequency estimates and structural–functional hypotheses for ISM1 variation in Turkish individuals and offer a framework for future association and mechanistic studies.

Keywords

allele frequency Isthmin-1 in silico analysis missense variant

Introduction

Isthmin-1 (ISM1) is a secreted protein that plays a role in numerous processes such as angiogenesis, endothelial permeability, tissue homeostasis, metabolic balance, immune response, and tumor biology.1,2 In recent years, evidence has increased that ISM1 participates in inflammatory regulation.2,3 In LPS-induced acute lung injury models, ISM1 deficiency increases inflammation, whereas recombinant ISM1 (rISM1) suppresses it. rISM1 reduces NF-κB activation and the expression of pro-inflammatory cytokines in a dose-dependent manner.4 Despite this biological framework, the population-specific frequencies and functional significance of genetic variation at the ISM1 locus (particularly coding/non-coding SNPs) are not sufficiently defined. This gap poses a fundamental obstacle in both candidate marker design and the interpretation of association studies. Türkiye occupies a geographic crossroads between Europe and Asia, and previous population-genetic studies indicate that Turkish genomes reflect both regional admixture and continuity with pre-existing Anatolian ancestry.5,6,7
In this study, focusing on the ISM1 locus (GRCh37), we (i) perform systematic variant discovery with stringent consequence and frequency filters, (ii) derive a priori Turkish MAFs using three pre-specified models—pooled TSI+IBS, TSI-weighted (w = 0.7), and a continental mixture (60% EUR, 30% SAS, 10% AFR), and (iii) prioritize two candidates for in-depth analysis: a coding missense variant rs77255807 (p.Ser102Pro) and an intronic variant rs117461286. We integrate functional in-silico annotation (VEP/CADD) and structural mapping of p.Ser102Pro to the AlphaFold model, then compare the model-based frequency expectations with observed Turkish cohort data.

Materials and Methods

ISM1 Variants SelectionWe interrogated ISM1 (ENSG00000101230) on GRCh37.p13 using the Ensembl GRCh37 archive (accessed October 2025). The analysed interval was chr20: 13,202,418–13,281,298 (forward strand); the canonical transcript was ISM1-001 (CCDS46579.1; UniProtKB B1AKI9). We implemented a predetermined, two-stage prioritisation process to select ISM1 variants for analysis.
Population-specific frequency check. Because Global MAF aggregates super-populations, for each retained rsID, we inspected the Ensembl Population genetics panel and recorded allele frequencies in EUR and the Mediterranean subpopulations TSI and IBS. Variants with TSI/IBS MAF ≥ 0.03–0.05 were prioritised as a practical proxy for the Turkish population.
Frequency Thresholds and Turkish-Proxy RuleWe used Global MAF (1000 Genomes gMAF) with a primary threshold of ≥0.05 and a pre-specified sensitivity window of ≥0.011 to retain low–to–moderate frequency markers with potential assay feasibility.
To approximate Turkish frequencies, we relied on Mediterranean European subpopulations—TSI (Toscani in Italia) and IBS (Iberian in Spain)—as pragmatic geographic/ancestral proxies, and we complemented these with EUR, SAS, and AFR superpopulations for mixture models. In population-level analyses, rs77255807 showed low global allele frequencies, ranging from 0.0038 in AFR to 0.0467 in TSI, whereas rs117461286 exhibited slightly higher variability, with MAF values spanning 0.0023 in AFR to 0.0675 in SAS. Mixture-model estimates similarly indicated modest allele frequencies, with rs77255807 ranging from approximately 2.34% to 3.84% and rs117461286 from 3.04% to 3.94% across different European- and multi-population weighting schemes.
For each candidate rsID, allele and genotype counts were recorded from EUR, SAS, AFR, TSI, and IBS populations, and variants with TSI or IBS MAF ≥ 3–5% were flagged as proxy-supported for the Turkish population.
Three complementary models were then used to generate a priori Turkish allele-frequency estimates: (i) a pooled Mediterranean proxy (TSI + IBS), obtained from aggregated allele count intervals, p̂_TR,1 = (k_TSI + k_IBS) / (n_TSI + n_IBS) with Wilson score 95% confidence intervals(CI) computed on pooled counts; (ii) a TSI-weighted mixture (w = 0.7 for TSI, 0.3 for IBS), and (iii) a continental mixture (60% EUR, 30% SAS, 10% AFR).
For the weighted models, population-specific allele frequencies were treated as Beta posteriors, p_i ~ Beta (k_i + 1, n_i - k_i + 1), the mixture mean was computed as μ = Σ w_i μ_i, and the mixture variance as σ² = Σ w_i² σ_i². Normal-approximation 95% CIs were defined as μ ± 1.96σ and truncated to 0,1.
. All thresholds, mixture weights, and population panels were pre-specified. Model-based estimates and CI are provided in the Supplementary Tables. The Turkish cohort MAFs were evaluated against the in-silico expectations by determining whether the observed MAF fell within the model-derived 95% CI and by performing two-sided two-proportion z-tests (or exact tests when expected counts were small), with p-values reported in Supplementary Table 4.
Downstream AnalysesIn-silico annotation used Sorting Intolerant From Tolerant (SIFT), Polymorphism Phenotyping v2 (PolyPhen-2), Combined Annotation Dependent Depletion (CADD), Deleterious Annotation of genetic variants using Neural Networks (DANN), Functional Analysis Through Hidden Markov Models (FATHMM), Functional Sequencing 2 (FunSeq2), and Genome-Wide Annotation of Variants (GWAVA).Variants exhibiting SIFT scores ≤ 0.05 were designated as "deleterious”. CADD (Combined Annotation Dependent Depletion) scores (on a PHRED scale) reflected the pathogenic impact of variants, with scores ≥ 20 indicating high impact.
Protein ModellingThe AlphaFold2-predicted ISM1 structure was obtained from the AlphaFold Protein Structure Database (UniProt B1AKI9, model AF-B1AKI9-F1, GRCh37-aligned sequence). The pLDDT and PAE metrics were used directly from the model. The p.Ser102Pro mutation was applied as a single point mutation using PyMOL (Mutagenesis Wizard)/ChimeraX; the most probable sidechain rotamer was selected, and no energy minimization was performed. Structural figurs used the following colour code: pLDDT 0–50 (red orange), 50–70 (yellow), 70–90 (light blue), > 90 (dark blue). Structural comments were reported as limiting for low pLDDT/PAE regions.
PCR-Based SNP AnalysesGenotyping was performed by real-time PCR (qPCR) using the Bio-Rad CFX96™ system. The reaction mixture contained a low ROX-probe master mix (Ampliqon, Denmark), target-specific primer pairs (Oligomer, Turkey), and FAM-labeled hydroloysis probes (Probesyntesis, Turkey). Separate wells were used to detect wild-type and mutant alleles. All reactions were performed in triplicate . TPositive control DNA samples (known genotype) and no-template control (NTC) were included in each qPCR run. Reaction validity was assessed by amplification curves, ontrol reactionsand melt curve analysis. nly reproducible results with low Ct variation were included in the analysis.
Target-specific primers and hydrolysis probes were designed for the two ISM1 loci, ISM1_g.13270669 T > C (rs77255807) and ISM1_g.13326522 C > T (rs117461286). For rs77255807, the forward primer was 5′-GCAAGAGATTTCCCCAGAT-3′, and the reverse primer was 5′-ATTTGGATTCTGCCCATTGA-3′, with a 5′-FAM-labelled hydrolysis probe (5′-CCAAACTTTCCAGATCTTTCCAAAGCTGA-3′) carrying a 3′ quencher.For rs117461286, the forward primer was 5′-GCAAAGAGATTTCCCCAGAC-3′, whereas the reverse primer and probe were identical to those used for rs77255807.
The reaction mixture was prepared to a total volume of 25 µL containing 12.5 µL master mix, 1 µL forward primer, 1 µL reverse primer, 0.25 µL FAM-labeled probe, 5 µL genomic DNA, and 5.25 µL distilled water The cycle conditions were 15 min at 95 °C, followed by 40 cycles of 15s at 95 °C, 61 °C for 30s, and 72 °C for 30s, with a final 2-min extension at 72 °C. Alleles were discriminated by probe fluorescence during amplification, and amplification performance was assessed from the amplification curves.
Ethical ApprovalThis study was approved by the Ethics Committee of Canakkale Onsekiz Mart University (Date: 13.04.2024, Decision No: 2024-13/13-04).
Statistical AnalysisStatistical analyses were performed using IBM SPSS Statistics for Windows (v 20.0; IBM Corp., Chicago, IL, USA).Continuous variables are presented as median (interquartile range [IQR], minimum–maximum), and categorical variables as number (n) and percentage (%). Minor allele frequencies (MAFs) were calculated as minor allele counts/total alleles (2N). Observed allele frequencies were compared with in silico model-derived estimates using two-sided two-proportion z-tests. A p-value of <0.05 was considered statistically significant.
Reporting Guidelines This study is reported in accordance with the STROBE guidelines.

Results

Variant Discovery and Stepwise Reduction Within ISM1The initial dataset across all recorded classes. Restricting the search to include germline SNVs identifield 27,677 records, corresponding to 24,356 unique rsIDs. Applying the consequence filter (missense_variant ± splice_region_variant/stop_gained) reduced the set to 471 records (393 unique rsIDs).
Under the pre-specified frequency thresholds, the primary cut-off (Global MAF ≥ 0.05) retained no variants, whereas the sensitivity window (Global MAF ≥ 0.011) identified rs77255807 as the only missense candidate at this locus. For downstream analyses, rs77255807 was retained together with the intronic SNP rs117461286. f
Population-Specific Allele Frequencies (EUR → TSI/IBS as Turkish Proxy)In-Silico Functional Annotation
rs77255807 is a missense variant in ISM1. The variation rs77255807 involves changes at codon TCC to CCC, resulting in p. Ser102Pro (S102P). VEP predictors: SIFT 0.47 (tolerated), PolyPhen-2 0.543 (possibly damaging). CADD PHRED = 23 (above the commonly used deleteriousness threshold of 20). Additional scores: REVEL 0.142, MetaLR 0.02, MutationAssessor 0.25.
Single-nucleotide polymorphism (SNP) rs117461286 is located in the protein-coding region of the ISM1 gene and is characterized as a missense variant with potential functional effects. Located at position 13,326,522 on chromosome 20 according to the GRCh38 reference genome, this SNP is characterized by a C > T allele change. Although it appears to be an intergenic variant at first glance, annotation analyses based on the XM_017027680.2 reference transcript have determined that this change is located within the coding region and may cause the amino acid substitutions L (CTT) → F (TTT) and L (CTT) → V (GTT). These two changes, Leu355Val and Leu355Phe, suggest that the variant may affect the structural and functional properties of the protein. A series of in silico analyses was performed to assess the potential biological effects of the variant. In multi-algorithm evaluations conducted using the PredictSNP2 platform, the effect of the rs117461286 variant was predicted to be neutral (non-deleterious) at an 88% rate. These findings were supported by scoring systems such as CADD (83–88), DANN (85), FATHMM (85), FunSeq2 (68), and GWAVA (62). Furthermore, the CADD raw score values (G: 0.277, T: 0.333) and GERP score (-1.65) for the genomic region where the variant is located indicate that the variant is in a region that is not highly evolutionarily conserved. Overall interpretation. The coding change rs77255807 shows a protein-altering effect with CADD ≥ 20 and TSI-level frequency support and was prioritised for downstream validation. The intronic rs117461286 exhibits predominantly neutral/benign predictions and was retained conditionally based on its TSI frequency.
Structural ContextWe examined the AlphaFold2 model of human ISM1 (UniProt B1AKI9, model AF-B1AKI9-F1) to place the missense substitution in a 3D framework. The p.Ser102 residue lies in an N-terminal loop/coil region with very low confidence (local pLDDT ≈ 32). The detailed structural visualization is provided in Supplementary Figure 1.
The predicted aligned error (PAE) heatmap shows high uncertainty for residue–residue relationships in this segment, consistent with intrinsic flexibility/disorder (Supplementary Figure 1-A). In the full-length model, the structured core (pLDDT > 70; blue) is separated from the low-confidence peripheral segments (pLDDT < 50; orange–red), and Ser102 is located within the latter (Supplementary Figure 1-B).
A rotamer swap to Proline at position 102 suggests a local backbone rigidification and potential disruption of nearby main-chain H-bonding typical of Ser→Pro changes (Supplementary Figure 1-C). However, because the region is poorly constrained in the model (low pLDDT and elevated PAE), any structural inference should be considered hypothesis-generating rather than definitive. Overall, the 3D context is consistent with the mixed in-silico scores (SIFT tolerated; PolyPhen possibly damaging; CADD high), indicating a possible but uncertain structural impact confined to a flexible loop.
In Silico Allel Frequency EstimationTo define a priori allele-frequency benchmarks for the Turkish cohort, we estimated Turkish MAFs from 1000 Genomes population panels using three pre-specified mixtures: (i) a pooled Mediterranean proxy (TSI + IBS; aggregated counts, Wilson score 95% CIs), (ii) a TSI-weighted proxy (w = 0.7 for TSI, 0.3 for IBS; normal approximation 95% intervals from a Beta-posterior mixture ), and (iii) a continental mixture (60% EUR, 30% SAS, 10% AFR; normal-approximation 95% intervals from a Beta-posterior mixture ). All weights were defined a priori. For descriptive purposes, variants were categorized as rare (MAF < 1%), low-frequency (1–5%), and common (≥ 5%)
For rs77255807, the estimated Turkish MAFs were 2.57% (95% CI 1.44–4.54) under the pooled TSI + IBS model, 3.84% (1.76–5.93) under the TSI-weighted model, and ≈ 2.34% (1.65–3.04) under the continental mixture. For rs117461286, estimates were 3.04% (1.78–5.13) for pooled TSI + IBS, 3.94% (1.89–5.98) for the TSI-weighted model, and ≈ 3.93% (3.13–4.72) for the continental mixture.
In Vitro SNP AnalysesTable 1 summarizes the observed allele distributions, and Supplementary Table 4 provides the statistical comparisons. Among 240 alleles from 120 individuals, the minor allele frequency was 7.0% for rs77255807 and 2.9% for rs117461286. As summarized in Table 1 and detailed in Supplementary Table 4, the laboratory-derived allele frequencies from the Çanakkale control cohort were compared with three computational mixture models. Statistical significance was determined using two-proportion z-tests. No significant difference was found between laboratory data and in silico predictions for the rs117461286 (C > T) variant (p>0.6).
For rs77255807, the observed MAF in the Turkish cohort was significantly higher than the values predicted by the in silico mixture models, supporting regional allele enrichment.

Discussion

Our findings indicate that the observed minor allele frequency (MAF) of rs77255807 in the Turkish control cohort was higher than the model-based estimates, whereas no statistically significant difference was observed for rs117461286. In contrast to rs77255807, the lack of divergence for rs117461286 suggests that this variant follows expected population genetic patterns, though its rarity in public databases similarly necessitates population-specific empirical data.8
In low-frequency variants, changes in small allele counts can significantly affect estimated MAF values, particularly in modest sample sizes where standard errors remain substantial.9Therefore, validation studies are needed in larger and ethnically diverse cohorts to clarify whether the increase observed in this variant reflects a true population trend or sampling variance. The elevated MAF detected in our strictly control cohort should be interpreted within the context of Anatolia's unique demographic history.
Its geographical location has been the scene of large population movements. In contrast, research indicates that the Turkish population largely preserves the genetic structure of pre-existing Anatolian populations.5,6 Consistent with Turkey's historical position as an intersection between Europe and Asia, we have identified a diverse admixture of the Turkish population among Balkan, Caucasian, Middle Eastern, and European groups, revealing a closer genetic affinity to Europeans than previously assumed.7 Additionally, since all samples belong to the control group, the possibility of enrichment related to the disease is also excluded. When all these findings are considered together, these observations support the view that the elevated MAF of rs77255807 in our cohort may reflect regional variation within the Turkish population, rather than a methodological artefact. These findings underscore the critical importance of population-specific studies in addressing the current ascertainment bias toward European ancestry populations in genomic databases and highlight the necessity of expanding reference panels to include underrepresented groups such as Anatolian populations.10

Limitations

This study has several limitations, including a small sample size, analysis of only two ISM1 variants, single-center genotyping, and possible minor annotation discrepancies related to datasets used.

Conclusion

In this study, allele frequencies of two ISM1 variants were determined in a Turkish cohort and compared with publicly available in silico datasets. The absence of a significant difference for rs117461286 is consistent with its reported variability across global populations. These findings highlight the importance of combining population-specific genotyping with large-scale reference datasets to better characterize genetic variation in understudied populations. Further studies with larger, multi-regional Turkish cohorts and a broader set of gene variants are warranted to refine these observations and support their potential implications for future functional or clinical investigations.

Declarations

Ethics Declarations

This study was conducted in accordance with institutional and national ethical standards and the Declaration of Helsinki.

Animal and Human Rights Statement

All procedures performed in this study were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed Consent

Written informed consent was obtained from all participants.

Data Availability

Data are available from the corresponding author upon reasonable request.

Conflict of Interest

The authors declare that there is no conflict of interest.

Funding

None.

Author Contributions (CRediT Taxonomy)

Conceptualization: A.Y.
Methodology: A.Y., M.P.
Investigation: A.Y., M.P.
Data curation: A.Y.
Formal analysis: A.Y.
Writing – original draft: A.Y.
Writing – review & editing: A.Y., M.P.
Supervision: M.P.

Scientific Responsibility Statement

The authors declare that they are responsible for the article’s scientific content, including study design, data collection, analysis and interpretation, writing, and some of the main line, or all of the preparation and scientific review of the contents, and approval of the final version of the article.

Abbreviations

AFR: African
CADD: Combined Annotation Dependent Depletion
CI: confidence interval
EUR: European
FATHMM: Functional Analysis Through Hidden Markov Models
GWAVA: Genome-Wide Annotation of Variants
IBS: Iberian population in Spain
IQR: interquartile range
ISM1: Isthmin-1
MAF: minor allele frequency
NTC: no-template control
PAE: predicted aligned error
PolyPhen-2: Polymorphism Phenotyping v2
qPCR: quantitative polymerase chain reaction
SAS: South Asian
SIFT: Sorting Intolerant From Tolerant
SNP: single-nucleotide polymorphism
TSI: Toscani in Italia

References

  1. Shakhawat HM, Hazrat Z, Zhou Z. Isthmin—a multifaceted protein family. Cells. 2022;12(1):17
  2. Menghuan L, Yang Y, Qianhe M, et al. Advances in research of biological functions of Isthmin-1. J Cell Commun Signal. 2023;17(3):507-521
  3. Oz B, Gunduz I, Yamancan G, et al. The association between serum Isthmin-1 and disease activity, inflammation, and autoantibody status in rheumatoid arthritis. Diagnostics (Basel). 2025;15(11):1316
  4. Nguyen N, Xu S, Lam TYW, et al. ISM1 suppresses LPS-induced acute lung injury and post-injury lung fibrosis in mice. Mol Med. 2022;28(1):72
  5. Hodoğlugil U, Mahley RW. Turkish population structure and genetic ancestry reveal relatedness among Eurasian populations. Ann Hum Genet. 2012;76(2):128-141
  6. Arnaiz-Villena A, Karin M, Bendikuze N, et al. HLA alleles and haplotypes in the Turkish population: relatedness to Kurds, Armenians and other Mediterraneans. Tissue Antigens. 2001;57(4):308-317
  7. Kars ME, Başak AN, Onat OE, et al. The genetic structure of the Turkish population reveals high levels of variation and admixture. Proc Natl Acad Sci U S A. 2021;118(36):e2026076118
  8. Koenig Z, Yohannes MT, Nkambule LL, et al. A harmonized public resource of deeply sequenced diverse human genomes. Genome Res. 2024;34(5):796-809
  9. Pathan N, Deng WQ, Di Scipio M, et al. A method to estimate the contribution of rare coding variants to complex trait heritability. Nat Commun. 2024;15(1):1245
  10. Kore P, Wilson MW, Tiao G, et al. Improved allele frequencies in gnomAD through local ancestry inference. Nat Commun. 2025;16:8734

Additional Information

Publisher’s Note
Bayrakol MP remains neutral with regard to jurisdictional and institutional claims.

Rights and Permissions

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/

About This Article

Received:
November 29, 2025
Accepted:
January 12, 2026
Published Online:
April 17, 2026