This a subset of 105 demultiplexed Illumina MiSeq raw sequencing samples run at the James Hutton Institute in March 2022, consisting of two 96-well plates labelled with the Illumina A and D multiplexing kits with, 96 and 9 samples respectively.
There are 36 pairs of raw gzipped compressed FASTQ files (72 files), provided as a 336MB gzipped compressed tar-ball.
All 6 controls and 30 samples of interest are from the A multiplexing kit.
The sample filenames start with the sample name, followed by something like CZ-N001-1910-S1Z1A-AD02_S38_L001_R1_001.fastq.gz where AD02 indicates well D02 on the 96-well plate labelled with the A multiplexing set, and S38 is the MiSeq sample number (from 1 to 192), and R1 (or R2) indicate the Illumina forward (or reverse) paired read files.
The 6 controls have filenames starting GL3A-0x, shorthand for an undiluted synthetic sequence mix. Control GL3A-0x-AB06_S18_L001 had the highest levels of non-synthetic sequence, with 422 copies of a biological sequence. This was therefore used by the THAPBI PICT pipeline as the minimum abundance threshold.