Databases come in two flavours: reference only, or all genomes.
Typically, the reference only database will be sufficient for the main use case of assigning new samples to PopPUNK clusters, and updating the database with new clusters which have been found. The reference databases are usually significantly smaller.
For more detailed analyses, you may wish to download the all genomes database. If you wish to run either
poppunk-visualise or any subclustering within strains this will
require the full database.
In either case only the reference genomes will actually be used for query assignment, which does not change the results but gives a good speed up in program runtime.
See the distributing models doc page for more details.
From the Global Pneumococcal Surveillance project, and other sequence collections. Used to assign global pneumococcal sequence clusters (GPSCs).References only All genomes
Streptococcus pyogenes (group A Streptococcus)
From Davies et al.References only All genomes
From Horesh et al.References only All genomes
Contributed by Akuzike Kalizang'oma, based on publicly available data and carriage data.All genomes
Species not listed?
We are looking to expand our pre-existing databases. If you have a dataset that would be suitable for creating a PopPUNK reference database, for any bacterial species, or you could help with evaluating the quality of clusters for a species please get in touch.
Found a novel GPSC cluster?
If one of your query isolates was assigned to a new cluster using the GPSC database above, contact us at email@example.com and we'll add it to the database.