Genomics, Evolution and Medicine
This program is useful if you would like to select a subset of sequences from a fasta formatted file. I wrote it when working on the "Mitochondrial data are not suitable for resolving placental mammal phylogeny" paper to extract fasta sequences for different clades within the mammal phylogeny.
To run this program copy the text below and save it in a text editor as ExtractSequences.py. Then type
>python ExtractSequences.py SubList FastaFile Outfile
#SubList is the list of species/sequence names that you would like to select. The species/sequence names should have no spaces and each species/sequence should be on a new line.
print "python ExtractSequences.py SubList FastaFile Outfile"
import re, os,sys,glob
from collections import defaultdict
for name in ListofNames:
#this reads species names and creates a lits
savedSpecies = defaultdict( list )
for each_line in open(sys.argv,'r'):
if len( each_line ) == 1:
savedSpecies[ title ].append( each_line.strip() )
for stuff in list1:
if stuff in savedSpecies:
foundRecords = savedSpecies[stuff]
Outfile.write(">" + stuff + "\n" )
for line in foundRecords:
Outfile.write( line + "\n" )
Copyright © All Rights Reserved