Additional file 2: Table S1. Metadata of 1,198 E. coli isolates. Table S2. Genes and genetic variants associated with different hosts that came as significant by both k-mer based as well as pan-genome based GWAS. Table S3. Prevalence of ompP, arlC and ompT in different STs in the RefSeq collection (n=17,994) and in our collection of E. coli isolates (n=1198). Table S4. Prevalence of the iroBCDEN gene cluster in sequence types and associated hosts. Details are shown for sequence types (STs) with at least ten isolates harboring iroBCDEN. Table S5. Genomes from our dataset used for annotating host-associated k-mers.