Order of the Fluctuation for the LCS and Optimal Alignment Score of Binary Sequences When the Rescaled Expectation Is Not Constant

S. Amsalu, H. Matzinger

2018, v.24, №1, 5-24


We consider two independent binary random i.i.d.\
strings $X=X_1X_2\ldots X_n$ and
$Y=Y_1Y_2Y_3\ldots Y_n$ both of length $n$.
Let $p$ denote the probability of the symbol $1$ in the strings:
$p=P(X_1=1)=P(Y_1=1)$. Assume that for two
values $p_1,p_2\in(0,1)$ the limit of the
rescaled optimal alignment score is not the same.
For that situation we show that the order of the variance of the
optimal alignment score of $X$ and $Y$ is $\Omega(n)$
for all $p$ in a set of strictly positive measure. This disproves the
conjecture of Chvatal\tire Sankoff that the variance
of the LCS should be $o(n^{2/3})$. If we assume
that a unique order $\Theta(n^\alpha)$ exists for the variance of
the optimal alignment score,
then our result implies that the order is $\Theta(n)$.
For Longest Common Subsequences (LCS) of random strings
the same theorem holds.

Keywords: random strings, sequence alignment, large deviations


Please log in or register to leave a comment

There are no comments yet