Then run extractModel.pl on each pdb file.
The arguments to the script are:
extractModel.pl inputPDBFileName outputPDBFileName
This will read the pdb file inputPDBFileName and will extract *only* chain A into a file with
name outputPDBFileName
java -mx1600000000 BuildContactMapFromPDB PDFFileName threshold boolean.
The PDBFileName is the name of a model extracted in the previous point.
The threshold is a number in Angstroms to specify the distance to be considered in the contact map.
If boolean = true then the contact map will be plotted in the screen, otherwise it will only be saved.
The out file name (containing the contact map) will be PDFFileName.cm
The Java source files and class files are in a tar file called contactmapsources.tar and you can recompile with compileBuildContactMapFromPDB.
USM c_1-sizes c_1-c_2-sizes distancesFile
c_1-sizes is the name of the file with the estimated kolmogorov complexity
of each protein in the data set. The format for this file should be:
number_1 protId_1 number_2 protId_2 .... ... number_n protId_nnumber_i is the complexity of protein protId_1 and n is the size of the data set. c_1-c_2-sizes is the name of the file with the estimated kolmogorov complexity of each *concatenated* protein pair in the data set. The format for this file should be:
number_1 protId_1-protId_1 number_2 protId_1-protId_2 ... ... number_n protId_1-protId_n number_n+1 protId_2-protId_1 number_n+2 protId_2-protId_2 number_n+3 protId_2-protId_3 ... ... number_n*n protId_n-protId_ndistancesFile is the name of the output file where the USM accordingly to Eq. 4 in the paper will be computed. The format is self-evident, ie. a square matrix with all the n*n similarity values. To compile USM.java simply execute compileUSM
NOTE1: all the java programs were developed for jdk1.1.8. NOTE2: if you are using a mixed unix/dos environment you may need to use dos2unix in these files to get rid of unwanted control characters. NOTE3: you must set the $JAVA_HOME variable.