Genomics, Evolution and Medicine

Traditionally i have used this program to change the names of species codes in a nexus formatted tree file. This python code is flexible enough to be used to change sequences/IDs/Names in many of the standard genomics/phylogenetics file formats.


You will need a key_file, which is a tab delimited file where the first column contains the names/sequences that are in your infile and the second column contains the names/sequences you would like to subsitute for your outfile.


Example key_file


Canis_familiaris Dog

Rattus_norvergicus Rat

Gorilla_gorilla Gorilla

Homo_sapiens Human

Mus_musculus Mouse

To use this program copy and paste the code below into a text editor and save the file as


To this program on a file of interest type

> python key_file infile outfile

print 'To use: python key_file infile outfile '


import re,sys


oldname = re.compile(r'\b\w+\b')

#compiles regular expression: will match letters and numbers surrounded by non-word characters



d = {}


def repl_func(mo):

ms =

#links ms to the matched string

return d.get(ms.lower(), ms)

#this returns the match string in "lower" case. Everything is lower to help with ID.



with open(sys.argv[1], "r") as keyFile:y

for line in keyFile:

key, value = line.strip().split(None,1)

#trailing white space removed

d[key.lower()] = value

#key and value are added to dictonary

with open(sys.argv[3], "w") as resultFile:

with open(sys.argv[2], "r") as treeFile:

for line in treeFile:

NewSpecies = re.sub(oldname, repl_func, line)

#substitute the old name with rew name


Copyright © All Rights Reserved