SeqFindR

A tool to easily create informative genomic feature plots

View the Project on GitHub mscook/SeqFindR

SeqFindR

SeqFindR - easily create informative genomic feature plots.

This is an early release version of SeqFindR. The tool is still undergoing rapid development. We have only tested SeqFindR on linux systems.


Requirements

You'll need to install/have installed:

We also use the following python libraries:

These should be installed automatically If you follow the instructions below.


Installation

You'll need to have git installed. As a scientist git can be really useful. See here for some discussion.

Option 1 (with root/admin):

cd ~/
git clone git://github.com/mscook/SeqFindR.git
cd SeqFindR
sudo python setup.py install

Option 2 (standard user) replacing INSTALL/HERE with appropriate:

cd ~/
git clone git://github.com/mscook/SeqFindR.git
cd SeqFindR
echo 'export PYTHONPATH=$PYTHONPATH:~/INSTALL/HERE/lib/python2.7/site-packages' >> ~/.bashrc
echo 'export PATH=$PATH:~/INSTALL/HERE/bin' >> ~/.bashrc
source ~/.bashrc
python setup.py install --prefix=~/INSTALL/HERE/SeqFindR/  

If the install went correctly:

user@host:~/> which SeqFindR /INSTALL/HERE/bin/SeqFindR user@host:~/> SeqFindR -h

Please regularly check back or git pull & python setup.py install to make sure you're running the most recent SeqFindR version.


Example figure produced by SeqFindR

SeqFindR CU fimbriae genes image. 110 E. coli strains were investigated. Order is according to phylogenetic analysis.

CU fimbriae


SeqFindR database files

The SeqFindR database is in multi-fasta format. The header needs to be formatted with 4 comma separated elements.

The elements are:

The final element, separated by [] contains a classification.

An example:

>70-tem8674, bla-TEM, Beta-lactams Antibiotic resistance (ampicillin), Unknown sp. [Beta-lactams]
AAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATAC
>70-shv86, bla-SHV, Beta-lactams Antibiotic resistance (ampicillin), Unknown sp. [Beta-lactams]
CTCAAGCGGCTGCGGGCTGGCGTGTACCGCCAGCGGCAGGGTGGCTAACAGGGAGATAATACACAGGCGA
>70-oxa(1)256, bla-OXA-1, Beta-lactams Antibiotic resistance (ampicillin), Unknown sp. [Beta-lactams]
>70-tetB190, tet(B), Tetracycline Antibiotic resistance (tetracycline), Unknown sp. [Tetracycline]
CAAAGTGGTTAGCGATATCTTCCGAAGCAATAAATTCACGTAATAACGTTGGCAAGACTGGCATGATAAG

Tutorial

Navigate to the SeqFindR/example directory (from git clone). The following files should be present:

The toy assemblies and consesuses were generated such that:


Run 1 - Looking at only assemblies

Command:

SeqFindR -o run1 -d Antibiotic_markers.fa -a assemblies/ -l

Run1

Link to full size.

Run 2 - Combining assembly and mapping consensus data

Command:

SeqFindR -o run2 -d Antibiotic_markers.fa -a assemblies/ -m consensus/ -l

Run2

Link to full size.

Run 3 - Combining assembly and mapping consensus data with differentiation between hits

Command:

SeqFindR -o run3 -d Antibiotic_markers.fa -a assemblies/ -m consensus/ -l -r

Run3

Link to full size.

The clustering dendrogram looks like this:

Run3 dendrogram

Link to full size dendrogram.

Run 4 - Combining assembly and mapping consensus data with defined ordering

Command:

SeqFindR -o run4 -d Antibiotic_markers.fa -a assemblies/ -m consensus/ -l -i dummy.order -r

Run4

Link to full size.


SeqFindR usage options

Help listing:

Usage: SeqFindR.py -o OUTPUT -d DB -a ASS [-h] [-v] [-t TOL] [-m CONS]
                   [-i INDEX] [-l] [-c COLOR] [-r]

optional arguments:
  -h, --help                 show this help message and exit
  -v, --verbose              verbose output
  -o OUTPUT, --output OUTPUT [Required] output prefix
  -d DB, --db DB             [Required] full path database fasta file
  -a ASS, --ass ASS          [Required] full path to dir containing assemblies
  -t TOL, --tol TOL          Similarity cutoff (default = 0.95)
  -m CONS, --cons CONS       full path to dir containing consensuses (default = None)
  -i INDEX, --index INDEX    maintain order of index (no cluster) (default = None)
  -l, --label_genes          label the x axis (default = False)
  -c COLOR, --color COLOR    color index (default = None)
  -r, --reshape              Differentiate between mapping and assembly hits

Licence: ECL by Mitchell Stanton-Cook <m.stantoncook@gmail.com>

Future