Family Trees: Languages and Genetics
2009, v.15, Issue 3, 417-440
We consider a large size population which evolves according to neutral haploid reproduction. The genealogical tree is very complex and genealogical distances are distributed according to a probability density which remains random in the limit of a large population. This density which varies for different populations, and varies for the same population at different times, has a distribution that we find out. The evolution of languages closely resembles the evolution of haploid organisms or mtDNA. This similarity allows for the construction of languages trees. The key point is the definition of a distance between pairs of languages. Here we use a renormalized Levenshtein distance among words with the same meaning and we average on all the words contained in a list. Assuming a constant rate of mutation, these lexical distances are logarithmically proportional, in average, to genealogical distances. The relation between lexical and genealogical distances is then further investigated in order to take into account the intrinsic randomness associated with the lexical evolution. We test our method by constructing the trees of the Indo-European and Austronesian groups.
Keywords: random processes,fluctuation phenomena,dynamics of socialsystems,dynamics of evolution,networks and genealogical trees