This program is useful if you have a directory of sevreal hundred files and you want to identify one/more that contain a sequence of interest.


I origionally wrote it to quickly query protein families to identify which files contain a sequence of interest. At present it is written to search all files containing '.prot' with a query of interest. You can change this to *.txt, *.fa or what ever file ending is suitable for your question.


To run save the text below to a text editor and call it If you want to find which file ending in ".prot" contains the sequence name 'ENSG00000012048', then you would type


> python ENSG00000012048


import sys,os,glob

OpenFiles= glob.glob("*.prot")


#Open all files ending in .prot

for fileName in OpenFiles:

for line in open(fileName):

if Query in line:

#if your query is in any of these files

print Query, fileName

#print the query and the fileName

