Jump to content

Draft:CERNO test

From Wikipedia, the free encyclopedia
FieldBioinformatics, Statistics, Data Science


CERNO (Coincident Extreme Ranks in Numerical Observations) is a non-parametric, rank-based statistical test that evaluates the distribution of ranks for a subset of samples that have been labeled (the labels defining the subset). The method has been used in gene set and pathway analysis. In this applied context, the method assesses whether a predefined set of genes, proteins, or other features shows coincident enrichment for high or low ranks within a globally ranked list.

Publication of the Method

[edit]

The CERNO statistic was first published in a 2008 study on interferon-beta-regulated gene expression in relapsing–remitting multiple sclerosis.[1] It was subsequently used in transcriptomic and proteomic studies.[2] The test was fully described in the supplementary materials of a 2013 pharmacogenomics study.[3]

The first independent, comprehensive evaluation of the algorithm was published by Zyla et al. in 2019.[4]

Methodology

[edit]

The CERNO test evaluates whether the ranks of a set of genes or features within a genome-wide ranking (from most to least significant by any metric) are collectively more extreme than would be expected by chance. This makes it sensitive to sets with even a few strongly ranked members, rather than requiring uniform or over-a-threshold significance of all genes in the set.

The test statistic for a gene set of size k in a ranked list of N genes is:

where ri is the rank of the ith gene in the set. Under the null hypothesis of random rank distribution, S follows a chi-square distribution with 2k degrees of freedom.

Applications

[edit]

CERNO is referenced in over a 100 publications on genes and proteins in PubMed Central as of May 2025.

Comparison with Other Methods

[edit]

Zyla et al. noted some advantages of CERNO, including that it showed the highest reproducibility of the methods they investigated, as well as good sensitivity, prioritization and low computational time. That study notes the non-parametric method is robust to ranking metrics, as well as sample and gene set size.[4]

[edit]

The CERNO test is mathematically related to Fisher's method of combining p-values for independent statistical tests. Fisher’s method is known for its favorable asymptotic properties, especially as measured by Bahadur efficiency[5], which describes the rate at which the observed significance of a test statistic converges to zero as the sample size increases. Tests with higher Bahadur efficiency exhibit rapid convergence.

Littell and Folks (1971) demonstrated the asymptotic optimality of Fisher’s method of combining tests, showing that for independent tests, the negative logarithm of the significance level (−2log(significance)) diverges to infinity at the fastest possible rate among combination tests.[6]

In contrast, the Kolmogorov–Smirnov test, which is the basis for several gene set analysis methods, was shown by Hwang (1982) to have much lower Bahadur efficiency compared to the chi-squared test.[7] The Kolmogorov–Smirnov test is "always well worse" than the chi-squared test in this measure. This is relevant as the CERNO statistic S follows a chi-square distribution with 2k degrees of freedom.

As the Kolmogorov–Smirnov test is the basis of many commonly used gene set enrichment analysis methods, CERNO—which reflects Fisher’s combined test properties—may offer statistical power or efficiency advantages in this context.

Software

[edit]

The CERNO method is easily implemented due to its simple mathematical form. CERNO has been implemented in the tmod R package[8].

See also

[edit]
  • Gene set enrichment analysis
  • Fisher’s method
  • Order statistics
  • Pathway analysis
[edit]

References

[edit]
  1. ^ Yamaguchi, KD; Ruderman, DL; Croze, E; Wagner, TC; Velichko, S; Reder, AT; Salamon, H (2008). "IFN-beta-regulated genes show abnormal expression in therapy-naive relapsing-remitting MS mononuclear cells: Gene expression analysis employing all reported protein-protein interactions". Journal of Neuroimmunology. 195 (1–2): 116–120. doi:10.1016/j.jneuroim.2007.12.007. PMID 18280692.
  2. ^ Kunnath-Velayudhan, S; Salamon, H; Wang, HY; Davidow, AL; Molina, DM; Huynh, VT; Cirillo, DM; Michel, G; Talbot, EA; Perkins, MD; Felgner, PL; Liang, X; Gennaro, ML (2010-08-17). "Dynamic antibody responses to the Mycobacterium tuberculosis proteome". Proceedings of the National Academy of Sciences. 107 (33): 14703–8. Bibcode:2010PNAS..10714703K. doi:10.1073/pnas.1009080107. PMC 2930474. PMID 20668240.
  3. ^ Croze, E; Yamaguchi, KD; Knappertz, V; Reder, AT; Salamon, H (October 2013). "Interferon-beta-1b-induced short- and long-term signatures of treatment activity in multiple sclerosis". The Pharmacogenomics Journal. 13 (5): 443–451. doi:10.1038/tpj.2012.27. PMC 3793239. PMID 22711062.
  4. ^ a b Zyla, J; Marczyk, M; Domaszewska, T; Kaufmann, SHE; Polanska, J; Weiner, J (2019-12-15). "Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms". Bioinformatics. 35 (24): 5146–5154. doi:10.1093/bioinformatics/btz447. PMC 6954644. PMID 31165139.
  5. ^ "Bahadur efficiency". Encyclopedia of Mathematics.
  6. ^ Littell, RC; Folks, JL (1971). "Asymptotic Optimality of Fisher's Method of Combining Tests". Journal of the American Statistical Association. 66 (336): 802–806. doi:10.1080/01621459.1971.10482347. JSTOR 2284251.
  7. ^ Hwang, TY (1982). "Bahadur Efficiency of the One Sample Kolmogorov-Smirnov Test for Normal Alternatives". Sankhyā: The Indian Journal of Statistics, Series A. 44 (2): 233–241. JSTOR 25050525.
  8. ^ "tmod: General and Multivariate Enrichment Analysis". CRAN. 31 March 2023.