AMnrGC : Amazon river basin non-redundant microbial gene catalogue
RELEASE 2018/01
-------------------------------------- 1. INTRODUCTION AMnrGC is a collection of genes and proteins which were constructed
by use of Amazon river basin openly available metagenomes from
sequencing projects (SRP044326, PRJEB25171 and SRP039390). Briefly,
metagenomes were coassembled by groups made up their geographical
location with Megahit v.1.0 and the contigs were used to gene predictions
by Prodigal v.2.6.3. Genes sequences were length filtered (> 150 bp) and
clustered by CD-HIT-EST (version 4.6) at 95% of nucleotide identity and
90% of overlap of the shorter gene. Theorical protein products were annotated
by the most completes databases up to date and their complete information
is available here. 2. LOCATION AMnrGC versions will be available on the web only under the current ZENODO
repository: 10.5281/zenodo.1484504 3. FORMAT Gene entries were named as ">AM_AGSSY_XXX" where XXX represents an unique numerical
identifier. Genes were deposited in their coding phase, because of this, all of them
can be used to generate the protein sequences by transeq function at ORF+1.
The protein entries correspond to genes artifical translation used in the annotations,
and also available, codified in the same way, but containing the indication "_1" in
the end of the header. Example: Gene:
>AM_AGSSY_151515 Protein:
>AM_AGSSY_151515_1 Annotations were provided as separate tables for each database used to annotate the
sequences. The header of these tables indicates the meaning of each value. 2. FUTURE FORMAT CHANGES No major changes are expected for the main general format of the database.
New versions should include updated versions of annotations or even additional sequences,
numbered as subsequent entries. 3. ACKNOWLEDGEMENTS
This work is a joint effort of Laboratory of molecular biology from Federal
University of São Carlos, São Paulo, Brazil (LBM/UFSCAR) and Protists group
of Institut del Ciencias del Mar, Barcelone, Spain (ICM). We are grateful to
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), as well as, the spanish funding organ Conse...
-------------------------------------- 1. INTRODUCTION AMnrGC is a collection of genes and proteins which were constructed
by use of Amazon river basin openly available metagenomes from
sequencing projects (SRP044326, PRJEB25171 and SRP039390). Briefly,
metagenomes were coassembled by groups made up their geographical
location with Megahit v.1.0 and the contigs were used to gene predictions
by Prodigal v.2.6.3. Genes sequences were length filtered (> 150 bp) and
clustered by CD-HIT-EST (version 4.6) at 95% of nucleotide identity and
90% of overlap of the shorter gene. Theorical protein products were annotated
by the most completes databases up to date and their complete information
is available here. 2. LOCATION AMnrGC versions will be available on the web only under the current ZENODO
repository: 10.5281/zenodo.1484504 3. FORMAT Gene entries were named as ">AM_AGSSY_XXX" where XXX represents an unique numerical
identifier. Genes were deposited in their coding phase, because of this, all of them
can be used to generate the protein sequences by transeq function at ORF+1.
The protein entries correspond to genes artifical translation used in the annotations,
and also available, codified in the same way, but containing the indication "_1" in
the end of the header. Example: Gene:
>AM_AGSSY_151515 Protein:
>AM_AGSSY_151515_1 Annotations were provided as separate tables for each database used to annotate the
sequences. The header of these tables indicates the meaning of each value. 2. FUTURE FORMAT CHANGES No major changes are expected for the main general format of the database.
New versions should include updated versions of annotations or even additional sequences,
numbered as subsequent entries. 3. ACKNOWLEDGEMENTS
This work is a joint effort of Laboratory of molecular biology from Federal
University of São Carlos, São Paulo, Brazil (LBM/UFSCAR) and Protists group
of Institut del Ciencias del Mar, Barcelone, Spain (ICM). We are grateful to
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), as well as, the spanish funding organ Conse...