A tool to easily create informative genomic feature plots
SeqFindR - easily create informative genomic feature plots.
This is an early release version of SeqFindR. The tool is still undergoing rapid development. We have only tested SeqFindR on linux systems.
You'll need to install/have installed:
We also use the following python libraries:
These should be installed automatically If you follow the instructions below.
You'll need to have git installed. As a scientist git can be really useful. See here for some discussion.
Option 1 (with root/admin):
cd ~/
git clone git://github.com/mscook/SeqFindR.git
cd SeqFindR
sudo python setup.py install
Option 2 (standard user) replacing INSTALL/HERE with appropriate:
cd ~/
git clone git://github.com/mscook/SeqFindR.git
cd SeqFindR
echo 'export PYTHONPATH=$PYTHONPATH:~/INSTALL/HERE/lib/python2.7/site-packages' >> ~/.bashrc
echo 'export PATH=$PATH:~/INSTALL/HERE/bin' >> ~/.bashrc
source ~/.bashrc
python setup.py install --prefix=~/INSTALL/HERE/SeqFindR/
If the install went correctly:
user@host:~/> which SeqFindR /INSTALL/HERE/bin/SeqFindR user@host:~/> SeqFindR -h
Please regularly check back or git pull & python setup.py install to make sure you're running the most recent SeqFindR version.
SeqFindR CU fimbriae genes image. 110 E. coli strains were investigated. Order is according to phylogenetic analysis.
The SeqFindR database is in multi-fasta format. The header needs to be formatted with 4 comma separated elements.
The elements are:
The final element, separated by [] contains a classification.
An example:
>70-tem8674, bla-TEM, Beta-lactams Antibiotic resistance (ampicillin), Unknown sp. [Beta-lactams]
AAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATAC
>70-shv86, bla-SHV, Beta-lactams Antibiotic resistance (ampicillin), Unknown sp. [Beta-lactams]
CTCAAGCGGCTGCGGGCTGGCGTGTACCGCCAGCGGCAGGGTGGCTAACAGGGAGATAATACACAGGCGA
>70-oxa(1)256, bla-OXA-1, Beta-lactams Antibiotic resistance (ampicillin), Unknown sp. [Beta-lactams]
>70-tetB190, tet(B), Tetracycline Antibiotic resistance (tetracycline), Unknown sp. [Tetracycline]
CAAAGTGGTTAGCGATATCTTCCGAAGCAATAAATTCACGTAATAACGTTGGCAAGACTGGCATGATAAG
Navigate to the SeqFindR/example directory (from git clone). The following files should be present:
The toy assemblies and consesuses were generated such that:
Command:
SeqFindR -o run1 -d Antibiotic_markers.fa -a assemblies/ -l
Command:
SeqFindR -o run2 -d Antibiotic_markers.fa -a assemblies/ -m consensus/ -l
Command:
SeqFindR -o run3 -d Antibiotic_markers.fa -a assemblies/ -m consensus/ -l -r
The clustering dendrogram looks like this:
Command:
SeqFindR -o run4 -d Antibiotic_markers.fa -a assemblies/ -m consensus/ -l -i dummy.order -r
Help listing:
Usage: SeqFindR.py -o OUTPUT -d DB -a ASS [-h] [-v] [-t TOL] [-m CONS]
[-i INDEX] [-l] [-c COLOR] [-r]
optional arguments:
-h, --help show this help message and exit
-v, --verbose verbose output
-o OUTPUT, --output OUTPUT [Required] output prefix
-d DB, --db DB [Required] full path database fasta file
-a ASS, --ass ASS [Required] full path to dir containing assemblies
-t TOL, --tol TOL Similarity cutoff (default = 0.95)
-m CONS, --cons CONS full path to dir containing consensuses (default = None)
-i INDEX, --index INDEX maintain order of index (no cluster) (default = None)
-l, --label_genes label the x axis (default = False)
-c COLOR, --color COLOR color index (default = None)
-r, --reshape Differentiate between mapping and assembly hits
Licence: ECL by Mitchell Stanton-Cook <m.stantoncook@gmail.com>