Order of the Fluctuation for the LCS and Optimal Alignment Score of Binary Sequences When the Rescaled Expectation Is Not Constant

#### S. Amsalu, H. Matzinger

2018, v.24, №1, 5-24

ABSTRACT

We consider two independent binary random i.i.d.\

strings $X=X_1X_2\ldots X_n$ and

$Y=Y_1Y_2Y_3\ldots Y_n$ both of length $n$.

Let $p$ denote the probability of the symbol $1$ in the strings:

$p=P(X_1=1)=P(Y_1=1)$. Assume that for two

values $p_1,p_2\in(0,1)$ the limit of the

rescaled optimal alignment score is not the same.

For that situation we show that the order of the variance of the

optimal alignment score of $X$ and $Y$ is $\Omega(n)$

for all $p$ in a set of strictly positive measure. This disproves the

conjecture of Chvatal\tire Sankoff that the variance

of the LCS should be $o(n^{2/3})$. If we assume

that a unique order $\Theta(n^\alpha)$ exists for the variance of

the optimal alignment score,

then our result implies that the order is $\Theta(n)$.

For Longest Common Subsequences (LCS) of random strings

the same theorem holds.

Keywords: random strings, sequence alignment, large deviations

COMMENTS

Please log in or register to leave a comment