This Dentinger_readme20220405.txt file was generated on [20220405] by [Bryn T.M. Dentinger] (Please update the file name to read AUTHOR_readmeYYYYMMDD.txt) ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset Large Kraken2 database for Fungi 2. Author Information Principal Investigator Contact Information Name: Bryn T.M. Dentinger Institution: Natural History Museum of Utah & School of Biological Sciences, Unviersity of Utah Address: 301 Wakara Way, Salt Lake City, UT 84108 Email: bdentinger@nhmu.utah.edu Associate or Co-investigator Contact Information Name: Institution: Address: Email: Alternate Contact Information Name: Institution: Address: Email: 3. Date of data collection (single date, range, approximate date) 20200329 4. Geographic location of data collection (where was data collected?): n/a 5. Information about funding sources that supported the collection of the data: n/a -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC BY NC – Allows others to use and share your data non-commercially and with attribution 2. Links to publications that cite or use the data: Weinstein, S.B., W.Z. Stephens, R. Greenhalgh, J.L. Round, M.D. Dearing. Wild herbivorous mammals (genus Neotoma) host a diverse but transient assemblage of fungi. Symbiosis. In Revision. 3. Links to other publicly accessible locations of the data: n/a 4. Links/relationships to ancillary data sets: n/a 5. Was data derived from another source? yes If yes, list source(s): The NCBI database is here: https://www.ncbi.nlm.nih.gov/genome/ The JGI database is here: https://mycocosm.jgi.doe.gov/mycocosm/home 6. Recommended citation for the data: Dentinger, B.T.M. (2022). Large Kraken2 database for Fungi. The Hive: University of Utah Research Data Repository. https://doi.org/10.7278/S50d-154b-fppf --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: bigDB.tar.gz Short description: kmer database formatted for Kraken2 B. Filename: AddTaxID_list.py Short description: python script that will lookup NCBI TaxIDs based on scientific names to modify FASTA headers in JGI genomes so they can be added to the Kraken2 database; courtesy of Heath O'Brien C. Filename: Short description: 2. Relationship between files: AddTaxID_list.py was used to modify FASTA headers in JGI genomes to make them compatible with Kraken2 3. Additional related data collected that was not included in the current data package: n/a 4. Are there multiple versions of the dataset? no If yes, list versions: Name of file that was updated: i. Why was the file updated? ii. When was the file updated? Name of file that was updated: i. Why was the file updated? ii. When was the file updated? -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: The database was built using the following commands: kraken2-build --download-taxonomy --db bigDB && \ kraken2-build --download-library archaea --db bigDB && \ kraken2-build --download-library bacteria --db bigDB && \ kraken2-build --download-library viral --db bigDB && \ kraken2-build --download-library plasmid --db bigDB && \ kraken2-build --download-library human --db bigDB && \ kraken2-build --download-library fungi --db bigDB && \ kraken2-build --download-library plant --db bigDB && \ kraken2-build --download-library protozoa --db bigDB && \ kraken2-build --download-library UniVec_Core --db bigDB && \ kraken2-build --download-library nt --db bigDB #now you can add the JGI fungal genomes with FASTA headers modified to include NCBI TaxIDs find jgi_genomes -name '*.fasta' -print0 | xargs -P -0 -I{} -n1 kraken2-build --add-to-library {} --db bigDB 2. Methods for processing the data: 3. Instrument- or software-specific information needed to interpret the data: Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0 The tar.gz file is a database formatted for use with the software Kraken2 (https://ccb.jhu.edu/software/kraken2/). The citation refers to the software. 4. Standards and calibration information, if appropriate: 5. Environmental/experimental conditions: 6. Describe any quality-assurance procedures performed on the data: 7. People involved with sample collection, processing, analysis and/or submission: ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: [FILENAME] ----------------------------------------- 1. Number of variables: 2. Number of cases/rows: 3. Variable List A. Name: [variable name] Description: [description of the variable] Value labels if appropriate B. Name: [variable name] Description: [description of the variable] Value labels if appropriate 4. Missing data codes: Code/symbol Definition Code/symbol Definition 5. Specialized formats of other abbreviations used