The evolution of DNA sequences has long been modeled as the evolution of independently mutating nucleotides. The mutation of any single nucleotide in the DNA sequence is considered to be independent of neighboring nucleotides and forms a Markov chain in time. Back in 1976-1977 biochemical studies have shown that the set of dinucleotide frequencies (AA, AC,…, TT) remains similar in different parts of the genome, the corresponding dinucleotide frequencies of related organisms are also similar, but this set of frequencies is obviously different in closely related organisms. Even later, the same was confirmed by statistical studies. It has been shown that neighboring nucleotides in DNA sequences can influence the type of mutation and the intensity of mutation. And although the influence of neighboring nucleotides in the mutation process has been understood for quite a long time, the so-called context-dependent models appeared only recently. They are usually constructed so that the stationary distribution of the nucleotide sequence is Markov, and the entire nucleotide sequence is a single chain state, and the decay of nucleotide dependence in time and space is exponential. Recent statistical studies show that the majority of even non-coding nucleotide sequences are not (at least first-order) Markov chains, which gives serious grounds for doubting the known models of nucleotide sequences, their existence assumptions and the adequacy of their application. This means that, for example, comparing real sequences with sequences generated by such models is not a reliable tool for searching for a biological signal or biological function of a specific sequence of nucleotides or amino acids. In turn, this indicates a clearer direction for future research.
Period of project implementation: 2023-10-04 - 2024-04-30
Project coordinator: Kaunas University of Technology