In contrast to the situation not even 20 years ago, molecular sequence data is now plentiful (if still patchily distributed) and phylogenomic studies of hundreds of taxa on a broad taxonomic scale are becoming increasingly common. Whereas the accuracy of phylogenetic analysis was limited until recently by a shortage of data (and then for both taxa and characters), the results of large and comprehensive phylogenomic studies where data are not limiting are also not without their problems. Analyses including large numbers of taxa run up against the superexponential increase in the number of possible solutions, requiring any or all of more time, faster computers in conjunction with parallel processing, and cleverer heuristics to find a hopefully near optimal solution. Perhaps less appreciated, however, is that the increasing taxonomic scope of our analyses demands the use of large amounts of molecular sequence data with significant rate heterogeneity across the data set (whether between or within partitions) to achieve full resolution throughout the tree. In this talk, I examine how the performance of phylogenetic analysis is affected when analyzing large number of taxa or a large multigene data set incorporating the degree of rate heterogeneity that is to be found, if not needed, in typical phylogenomic data sets.
view more