This project aims to begin work on the nf-core/proteinannotator pipeline.

Vision

Build the best protein annotator in the world.

Protein fasta -> ??? -> Profit!

  • the ??? is nf-core/proteinannotator
  • We want to build the pipeline of choice by the people sequencing the genomes of new creatures to annotate protein fasta files with function
  • Future options include using synteny of genes, but that is beyond the 1.0.0 release

BEFORE WRITING ANY CODE, we will first draw out the metromap for the pipeline.

Similar pipelines

Below are pipelines that also process protein fasta files and add either functional or structural information to them, but don’t have exactly the same purpose as proteinannotator. We will likely use their modules.

  • funcscan to search (meta)genomic nucleotide data for functional protein sequences, e.g. for biosynthetic gene clusters, antimicrobial peptide genes, and antimicrobial resistance genes
  • reportho to compare ortholog predictions across methods
  • proteinfamilies to cluster protein sequences into families, and updates existing families with new sequences
  • proteinfold to fold protein sequences with ESMFold, AlphaFold2

Annotation Tools to Include

Please contribute more tools! This is just a starting point.

We welcome contributors of all experience levels.

nf-core proteinannotator hack
category
pipelines
group leader