Models of nucleotide substitution have enabled accurate reconstruction of many evolutionary relationships. The majority of commonly used models are special cases of the General Time Reversible (GTR) model with rate variation across sites described by a Γ-distribution (+Γ) and by a proportion of invariant sites (+I). Unfortunately, GTR+Γ+I family models rarely describe all the variation present in real sequence data. Many cases of complex phenomena, such as changes in GC content and heterotachy, have been identified, and several studies have shown that analyses produced from simple models can lead to erroneous conclusions.
I introduce a simple general hidden Markov model of nucleotide substitution that allows many evolutionarily important factors to vary both spatially between sites in the alignment and temporally through the tree. The new model may be considered a generalisation of the covarion model of Tuffley and Steel, which enables the rate of evolution, nucleotide frequencies, and the transition/transversion rate ratio to vary according to a hidden process. I describe the new model’s relationship to existing models and use it to investigate spatial and temporal variation in real data. The results of these analyses show that all of the factors examined have the potential to vary both temporally and spatially. Furthermore, the new model provides substantially better fit to sequence data than GTR+Γ+I models.
view more