Species databases
Databases come in two flavours: reference only, or all genomes.
Typically, the reference only database will be sufficient for the main use case of assigning new samples to PopPUNK clusters, and updating the database with new clusters which have been found. The reference databases are usually significantly smaller.
For more detailed analyses, you may wish to download the all genomes database. If you wish to run either
poppunk-visualise
or any subclustering within strains this will
require the full database.
In either case only the reference genomes will actually be used for query assignment, which does not change the results but gives a good speed up in program runtime.
See the distributing models doc page for more details.
Database list:
Streptococcus pneumoniae
42,157 genomes
From the Global Pneumococcal Surveillance project, and other sequence collections. Used to assign global pneumococcal sequence clusters (GPSCs).
References only All genomesStreptococcus pyogenes (group A Streptococcus)
2,084 genomes
From Davies et al.
References only All genomesStreptococcus mitis
323 genomes
Contributed by Akuzike Kalizang'oma, based on publicly available data and carriage data.
All genomesSpecies not listed?
We are looking to expand our pre-existing databases. If you have a dataset that would be suitable for creating a PopPUNK reference database, for any bacterial species, or you could help with evaluating the quality of clusters for a species please get in touch.
Found a novel GPSC cluster?
If one of your query isolates was assigned to a new cluster using the GPSC database above, contact us at pneumo@poppunk.net and we'll add it to the database.