It is becoming easier to obtain genetic sequences. Many of these will be used for phylogenetic analyses. But these analyses have traditionally been hampered by the size of the data (usually, the number of sequences) that can be analysed satisfactorily. In this talk, I consider three problems that have arisen as a consequence of having "too much" data. First, supertree reconstruction may provide some relief with large datasets, but what is the best way of reconstructing supertrees? I explore the possibility of constructing maximum-likelihood estimates of supertrees, and discuss some of the features that such trees may possess. Second, I consider how we can break up datasets to make satisfactory inferences. Finally, I look at the new generation of automatic sequencing machines, and the promise of large amounts of data. I outline some computational approaches that may be useful in analysing this type of data. This talk provides very little in the way of concrete results but a great deal of informed speculation. It is not my intention to instruct, but to stimulate discussion.
view more