FindFile

Genomics, Evolution and Medicine

FindFile.py

 

This program is useful if you have a directory of sevreal hundred files and you want to identify one/more that contain a sequence of interest.

 

I origionally wrote it to quickly query protein families to identify which files contain a sequence of interest. At present it is written to search all files containing '.prot' with a query of interest. You can change this to *.txt, *.fa or what ever file ending is suitable for your question.

 

To run save the text below to a text editor and call it Find.File.py. If you want to find which file ending in ".prot" contains the sequence name 'ENSG00000012048', then you would type

 

> python FindFile.py ENSG00000012048

 

import sys,os,glob

OpenFiles= glob.glob("*.prot")

Query=sys.argv[1]

#Open all files ending in .prot

for fileName in OpenFiles:

for line in open(fileName):

if Query in line:

#if your query is in any of these files

print Query, fileName

#print the query and the fileName

Copyright © All Rights Reserved