Generalized Hypergeometric Distributions Generated by Birth-Death Proc ess in Bioinformatics
2022, v.28, Issue 2, 303-327
V. A. Kuznetsov, D. Farbod, A. Grageda,
Modern high-throughput biological systems detection methods generate empirical frequency distributions (EFD) which exhibit complex forms and have long right-side tails. Such EFD are often observed in normal and pathological processes, of which the probabilistic properties are essential, but the underlying probability mechanisms are poorly understood. To better understand the probability mechanisms driving biological complexity and the pathological role of extreme values, we propose that the observed skewed discrete distributions are generated by non-linear transition rates of birth and death processes (BDPs). We introduce a (3d+1)-parameter Generalized Gaussian Hypergeometric Probability ((3d+1)-GHP) model with the probabilities defined by a stationary solution of generalized BDP (g-BDP) and represented by generalized hypergeometric series with regularly varying function properties. We study the Regularly Varying 3d-Parameter Generalized Gaussian Hypergeometric Probability (3d-RGHP) function's regular variation properties, asymptotically constant slow varying component, unimodality and upward/downward convexity which allows us to specify a family of 3d-RGHP models and study their analytical and numerical characteristics. The frequency distribution of unique mutations occurring in the human genome of patients with melanoma have been analyzed as an example application of our theory in bioinformatics. The results show that the parameterized model not only fits the 'heavy tail' well, but also the entire EFD taken on the complete experimental outcome space. Our model provides a rigorous and flexible mathematical framework for analysis and application of skewed distributions generated by BDPs which often occur in bioinformatics and big data science.
Keywords: Birth-Death Process, generalized hypergeometric distributions, stationary process, limiting process, regular variation, bioinformatics data