Cladistics Early on line: An update on DNA barcoding: low species coverage and numerous unidentified sequences
DNA barcoding was proposed in 2003, the Consortium for the Barcode of Life was established in 2004, and the movement has since attracted more than $80 million funding. Here we investigate how many species of multicellular animals have been barcoded. We compare the numbers in a public database (GenBank as of January 2012) with those in the Barcode of Life Database (BOLD) and find that GenBank contains COI (cytochrome c oxidase subunit 1) sequences for ca. 60 000 species while BOLD reports barcodes for ca. 150 000 species. The discrepancy is likely due to a large amount of unpublished data in BOLD. Overall, the species coverage remains sparse, growth rates are low, and the barcode accumulation curve for Metazoa is linear with only 4788 species having been added in 2011. In addition, the vast majority of species in the public database (73%) were barcoded by projects that are unlikely to be related to the DNA barcoding movement. Particularly surprising was the large number of DNA barcodes in GenBank that were not identified to species (Jan 2012: 74%), with insect barcodes often being identified only to order. Of these several hundred thousand have since been suppressed by NCBI because they did not satisfy the iBOL/GenBank early release agreement. Species coverage is considerably better for target taxa of DNA barcoding campaigns (e.g. birds, fishes, Lepidoptera), although it also falls short of published campaign targets.
© The Willi Hennig Society 2012