The ingroup sample included representatives of all seven subfamilies and all ten genera incertae sedis recognized by Harley et al.16 and all 14 tribes recognized by Olmstead18 . Nomenclature of Lamiaceae and Viticoideae s. str. followed Olmstead18 and Bramley et al.47 , respectively. Initially, we downloaded data for all taxa of Lamiaceae with sequence information for any of the five gene regions deposited in Genbank as of August 2015. In the five subfamilies whose monophyly is well supported (viz., Ajugoideae, Lamioideae, Nepetoideae, Prostantheroideae and Scutellarioideae), sampling was designed to cover their genus-level diversity. Generally, genera with at least two sequenced regions were selected, and each selected genus was represented by one or two species. Particular emphasis was placed on sampling Symphorematoideae, Viticoideae s. str., all genera incertae sedis, and three genera formerly assigned to Viticoideae—Cornutia, Gmelina, and Premna. In three large genera—Callicarpa, Premna, and Vitex, sampling was designed to cover their morphological and geographic breadth. In total, 288 species representing 191 genera were included, representing approximately 78% of the genera of Lamiaceae. Five outgroup species were selected representing the closest relatives to Lamiaceae in Lamiales12 13 (link)14 15 (link). They are Lindenbergia philippensis (Cham. & Schltdl.) Benth. and Pedicularis groenlandica Retz. from Orobanchaceae, Paulownia tomentosa (Thunb.) Steud. from Paulowniaceae, Mazus reptans N. E. Br. from Mazaceae and Phryma leptostachya L. from Phrymaceae. Information on sampled taxa and Genbank accession numbers is assembled in
The five separate molecular data sets matK, ndhF, rbcL, rps16 and trnL-F contained 202, 160, 170, 181, and 259 sequences with 54, 83, 59, 57, and 88 newly reported sequences, respectively. The dataset combining the five markers included 270 taxa (D270), with 39.65 % missing data. According to investigations by Wiens113 (link) and Wiens and Moen114 , the proportion of missing data should not affect the accuracy of the phylogenetic analysis; however, just to make sure, a reduced dataset was assembled including 155 taxa (D155) with at least three of the five regions or 50 % of the total aligned sequence length available for each terminal taxon. The total amount of missing data in D155 was 23.51 %. For most species in the combined datasets, data were available for all five regions, but there were some genera of Ajugoideae, Lamioideae, Nepetoideae, Prostantheroideae, and Scutellarioideae in which different species were used for different gene regions. When data were pooled in this way, generic names, rather than species names, were used to represent the combined sequences in the phylogenetic trees.