                           CONTACTS documentation



CONTENTS

   1.0 SUMMARY
   2.0 INPUTS & OUTPUTS
   3.0 INPUT FILE FORMAT
   4.0 OUTPUT FILE FORMAT
   5.0 DATA FILES
   6.0 USAGE
   7.0 KNOWN BUGS & WARNINGS
   8.0 NOTES
   9.0 DESCRIPTION
   10.0 ALGORITHM
   11.0 RELATED APPLICATIONS
   12.0 DIAGNOSTIC ERROR MESSAGES
   13.0 AUTHORS
   14.0 REFERENCES

1.0 SUMMARY

   Reads CCF files (clean coordinate files) and writes CON files (contact
   files) of intra-chain residue-residue contact data. Generate
   intra-chain CON files from CCF files

2.0 INPUTS & OUTPUTS

   CONTACTS reads a directory of domain or protein CCF files (clean
   coordinate files) and writes a CON file (contacts file) of intra-chain
   residue-residue contact data for each file in the input directory. Each
   CON file contains residue contact data for every chain of every model
   in a protein coordinate file, or for a single domain where a domain CCF
   file is read. The user is prompted for the paths for the CCF (input)
   and CON (output) files and the file extensions are specified in the ACD
   file. The domain identifier code or pdb identifier code is used as
   appropriate to name the output files. A log file is also written.

3.0 INPUT FILE FORMAT

   The format of the clean coordinate file is described in the PDBPARSE
   documentation.

4.0 OUTPUT FILE FORMAT

   The CON format used for the contact files (Figure 1) is similar to EMBL
   format and uses the following records:
     * XX - used for spacing and comments. The first line is bibliographic
       information and contains the text "XX Intra-chain residue-residue
       contact data".
     * TY - type of contact. For CON files generated by CONTACTS, 'INTRA'
       is always given.
     * EX - experimental information. The value of the threshold contact
       distance is given as a floating point number after 'THRESH'. For
       CON files generated by CONTACTS, the threshold ignore distance is
       given after 'THRESH'. The number of models and number of
       polypeptide chains are given after 'NMOD' and 'NCHA' respectively.
       For domain CCF files a 1 is always given.
     * NE - number of entries (chains) in the file. For CON files
       generated by CONTACTS this is the number of chains, equal to the
       number of models multiplied by the number of unique chains.
       Following the NE record, the file has a section for each entry
       containing records for entry number (EN), identifier codes (ID),
       polypeptide chain-specific data (CN), chain sequence information
       (S1) and number of contacts (NC), that together define the chain
       and its contacts.
     * EN - entry number. The number in brackets indicates the start of an
       entry (chain).
     * CN - polypeptide chain-specific data. Tokens delimiting data items
       are as follow. (1) MO: The model number (from the PDB file). (2)
       CN1: Chain number. (3) CN2: Not used by CONTACTS, a '.' is given.
       (4) ID1: PDB chain identifier (a '.` given in cases where a chain
       identifier was not specified in the original PDB file or, for
       domain CCF files, the domain is comprised of more than one domain).
       (5) ID2: Not used by CONTACTS, a '.' is given. (6) NRES1: number of
       amino acid residues in the chain or domain. (7) NRES2: Not used by
       SITES, a '.' is given.
     * NC - number of contacts: (1) SM: Number of residue-residue
       contacts; between the side-chain or main-chain atoms of different
       amino acid residues in the same chain (2) LI: Not used by CONTACTS,
       a '.' is given.
     * ID - identifier codes: (1) PDB: 4-character PDB identifier code.
       (2) DOM: 7-character domain identifier code from SCOP or CATH
       (where domain CCF files were read). (3) LIG: Not used by CONTACTS,
       a '.' is given.
     * S1 - polypeptide chain sequence for domain or protein. The number
       of residues is given before AA on the first line. The sequece is
       given on subsequent lines.
     * SM - Line of residue contact data. Pairs of amino acid identifiers
       and residue numbers are delimited by a ';'. Residue numbers are
       taken from the CCF file and give a correct index into the sequence
       (i.e. they are not necessarily the same as the original PDB file).
       This sequence is given in the CON file itself (S1 record).
     * // - delimiter for individual entries in the file and also given on
       the last line of the file.

  Output files for usage example

  File: 1cs4.con

XX   Intra-chain residue-residue contact data.
XX
TY   INTRA
XX
EX   THRESH 1.0; IGNORE 20.0; NMOD 1; NCHA 1
XX
NE   1
XX
EN   [1]
XX
ID   PDB 1cs4; DOM .; LIG .
XX
CN   MO 1; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 52; NRES2 .
XX
S1   SEQUENCE    52 AA;   5817 MW;  D8CCAE0E1FC0849A CRC64;
     ADIEGFTSLA SQCTAQELVM TLNELFARFD KLAAENHCLR IKILGDCYYC VS
XX
NC   SM 163; LI .
XX
SM   ASP 2 ; ILE 3
SM   ASP 2 ; GLU 4
SM   ASP 2 ; ASP 46
SM   ASP 2 ; CYS 47
SM   ILE 3 ; GLU 4
SM   ILE 3 ; GLY 5
SM   ILE 3 ; PHE 6
SM   ILE 3 ; LEU 9
SM   ILE 3 ; LEU 25
SM   ILE 3 ; ASP 46
SM   GLU 4 ; GLY 5
SM   GLU 4 ; PHE 6
SM   GLY 5 ; PHE 6
SM   GLY 5 ; THR 7
SM   GLY 5 ; SER 8
SM   GLY 5 ; LEU 9
SM   PHE 6 ; THR 7
SM   PHE 6 ; SER 8
SM   PHE 6 ; LEU 9
SM   PHE 6 ; ALA 10
SM   PHE 6 ; LEU 18
SM   PHE 6 ; LEU 22
SM   PHE 6 ; GLY 45
SM   PHE 6 ; ASP 46
SM   THR 7 ; SER 8
SM   THR 7 ; LEU 9
SM   THR 7 ; ALA 10
SM   THR 7 ; SER 11
SM   SER 8 ; LEU 9
SM   SER 8 ; ALA 10
SM   SER 8 ; SER 11


  [Part of this file has been deleted for brevity]

SM   PHE 29 ; LYS 31
SM   PHE 29 ; LEU 32
SM   PHE 29 ; ALA 33
SM   ASP 30 ; LYS 31
SM   ASP 30 ; LEU 32
SM   ASP 30 ; ALA 33
SM   ASP 30 ; ALA 34
SM   ASP 30 ; ARG 40
SM   LYS 31 ; LEU 32
SM   LYS 31 ; ALA 33
SM   LYS 31 ; ALA 34
SM   LYS 31 ; GLU 35
SM   LEU 32 ; ALA 33
SM   LEU 32 ; ALA 34
SM   LEU 32 ; GLU 35
SM   LEU 32 ; ASN 36
SM   ALA 33 ; ALA 34
SM   ALA 33 ; GLU 35
SM   ALA 33 ; ASN 36
SM   ALA 33 ; HIS 37
SM   ALA 33 ; CYS 38
SM   ALA 34 ; GLU 35
SM   ALA 34 ; ASN 36
SM   ALA 34 ; HIS 37
SM   GLU 35 ; ASN 36
SM   GLU 35 ; HIS 37
SM   ASN 36 ; HIS 37
SM   ASN 36 ; CYS 38
SM   HIS 37 ; CYS 38
SM   HIS 37 ; LEU 39
SM   CYS 38 ; LEU 39
SM   CYS 38 ; ARG 40
SM   LEU 39 ; ARG 40
SM   LEU 39 ; ILE 41
SM   ARG 40 ; ILE 41
SM   ARG 40 ; LYS 42
SM   ARG 40 ; ILE 43
SM   ILE 41 ; LYS 42
SM   LYS 42 ; ILE 43
SM   LYS 42 ; LEU 44
SM   LYS 42 ; CYS 47
SM   ILE 43 ; LEU 44
SM   ILE 43 ; GLY 45
SM   ILE 43 ; CYS 47
SM   LEU 44 ; GLY 45
SM   LEU 44 ; ASP 46
SM   LEU 44 ; CYS 47
SM   GLY 45 ; ASP 46
SM   GLY 45 ; CYS 47
SM   ASP 46 ; CYS 47
//

  File: 1ii7.con

XX   Intra-chain residue-residue contact data.
XX
TY   INTRA
XX
EX   THRESH 1.0; IGNORE 20.0; NMOD 1; NCHA 1
XX
NE   1
XX
EN   [1]
XX
ID   PDB 1ii7; DOM .; LIG .
XX
CN   MO 1; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 65; NRES2 .
XX
S1   SEQUENCE    65 AA;   7395 MW;  75FBE75B22FD3678 CRC64;
     MKFAHLADIH LGYEQFHKPQ REEEFAEAFK NALEIAVQEN VDFILIAGDL FHSSRPSPGT
     LKKAI
XX
NC   SM 151; LI .
XX
SM   ASP 8 ; ILE 9
SM   ASP 8 ; HIS 10
SM   ASP 8 ; GLY 48
SM   ASP 8 ; ASP 49
SM   ILE 9 ; HIS 10
SM   ILE 9 ; LEU 11
SM   ILE 9 ; PHE 25
SM   ILE 9 ; PHE 29
SM   ILE 9 ; ILE 46
SM   ILE 9 ; ASP 49
SM   ILE 9 ; LEU 50
SM   HIS 10 ; LEU 11
SM   HIS 10 ; GLY 12
SM   HIS 10 ; TYR 13
SM   HIS 10 ; PHE 25
SM   HIS 10 ; ASP 49
SM   HIS 10 ; LEU 50
SM   LEU 11 ; GLY 12
SM   LEU 11 ; TYR 13
SM   LEU 11 ; ALA 26
SM   LEU 11 ; PHE 29
SM   LEU 11 ; LEU 50
SM   GLY 12 ; TYR 13
SM   GLY 12 ; GLU 14
SM   GLY 12 ; GLU 22
SM   TYR 13 ; GLU 14
SM   TYR 13 ; GLN 15
SM   TYR 13 ; GLU 22
SM   TYR 13 ; PHE 25
SM   GLU 14 ; GLN 15


  [Part of this file has been deleted for brevity]

SM   ASN 31 ; ILE 35
SM   ALA 32 ; LEU 33
SM   ALA 32 ; GLU 34
SM   ALA 32 ; ILE 35
SM   ALA 32 ; ALA 36
SM   LEU 33 ; GLU 34
SM   LEU 33 ; ILE 35
SM   LEU 33 ; ALA 36
SM   LEU 33 ; VAL 37
SM   LEU 33 ; ILE 44
SM   GLU 34 ; ILE 35
SM   GLU 34 ; ALA 36
SM   GLU 34 ; VAL 37
SM   GLU 34 ; GLN 38
SM   ILE 35 ; ALA 36
SM   ILE 35 ; VAL 37
SM   ILE 35 ; GLN 38
SM   ILE 35 ; GLU 39
SM   ALA 36 ; VAL 37
SM   ALA 36 ; GLN 38
SM   ALA 36 ; GLU 39
SM   ALA 36 ; ASN 40
SM   ALA 36 ; VAL 41
SM   ALA 36 ; ILE 44
SM   VAL 37 ; GLN 38
SM   VAL 37 ; GLU 39
SM   VAL 37 ; ASN 40
SM   GLN 38 ; GLU 39
SM   GLN 38 ; ASN 40
SM   GLU 39 ; ASN 40
SM   GLU 39 ; VAL 41
SM   ASN 40 ; VAL 41
SM   ASN 40 ; ASP 42
SM   VAL 41 ; ASP 42
SM   VAL 41 ; PHE 43
SM   VAL 41 ; ILE 44
SM   ASP 42 ; PHE 43
SM   PHE 43 ; ILE 44
SM   PHE 43 ; LEU 45
SM   ILE 44 ; LEU 45
SM   ILE 44 ; ILE 46
SM   LEU 45 ; ILE 46
SM   LEU 45 ; ALA 47
SM   ILE 46 ; ALA 47
SM   ILE 46 ; GLY 48
SM   ILE 46 ; LEU 50
SM   ALA 47 ; GLY 48
SM   GLY 48 ; ASP 49
SM   GLY 48 ; LEU 50
SM   ASP 49 ; LEU 50
//

  File: 2hhb.con

XX   Intra-chain residue-residue contact data.
XX
TY   INTRA
XX
EX   THRESH 1.0; IGNORE 20.0; NMOD 1; NCHA 4
XX
NE   4
XX
EN   [1]
XX
ID   PDB 2hhb; DOM .; LIG .
XX
CN   MO 1; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 141; NRES2 .
XX
S1   SEQUENCE   141 AA;  15126 MW;  34D13618E62A33C1 CRC64;
     VLSPADKTNV KAAWGKVGAH AGEYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHGK
     KVADALTNAV AHVDDMPNAL SALSDLHAHK LRVDPVNFKL LSHCLLVTLA AHLPAEFTPA
     VHASLDKFLA SVSTVLTSKY R
XX
NC   SM 643; LI .
XX
SM   VAL 1 ; LEU 2
SM   VAL 1 ; SER 3
SM   VAL 1 ; LYS 127
SM   LEU 2 ; SER 3
SM   LEU 2 ; PRO 4
SM   LEU 2 ; ASP 6
SM   LEU 2 ; LYS 7
SM   LEU 2 ; VAL 73
SM   LEU 2 ; MET 76
SM   LEU 2 ; LYS 127
SM   LEU 2 ; PHE 128
SM   LEU 2 ; SER 131
SM   SER 3 ; PRO 4
SM   SER 3 ; ALA 5
SM   SER 3 ; ASP 6
SM   SER 3 ; LYS 7
SM   SER 3 ; LYS 127
SM   PRO 4 ; ALA 5
SM   PRO 4 ; ASP 6
SM   PRO 4 ; LYS 7
SM   PRO 4 ; THR 8
SM   ALA 5 ; ASP 6
SM   ALA 5 ; LYS 7
SM   ALA 5 ; THR 8
SM   ALA 5 ; ASN 9
SM   ASP 6 ; LYS 7
SM   ASP 6 ; THR 8
SM   ASP 6 ; ASN 9
SM   ASP 6 ; VAL 10


  [Part of this file has been deleted for brevity]

SM   GLN 131 ; LYS 132
SM   GLN 131 ; VAL 133
SM   GLN 131 ; VAL 134
SM   GLN 131 ; ALA 135
SM   LYS 132 ; VAL 133
SM   LYS 132 ; VAL 134
SM   LYS 132 ; ALA 135
SM   LYS 132 ; GLY 136
SM   VAL 133 ; VAL 134
SM   VAL 133 ; ALA 135
SM   VAL 133 ; GLY 136
SM   VAL 133 ; VAL 137
SM   VAL 134 ; ALA 135
SM   VAL 134 ; GLY 136
SM   VAL 134 ; VAL 137
SM   VAL 134 ; ALA 138
SM   ALA 135 ; GLY 136
SM   ALA 135 ; VAL 137
SM   ALA 135 ; ALA 138
SM   ALA 135 ; ASN 139
SM   GLY 136 ; VAL 137
SM   GLY 136 ; ALA 138
SM   GLY 136 ; ASN 139
SM   GLY 136 ; ALA 140
SM   VAL 137 ; ALA 138
SM   VAL 137 ; ASN 139
SM   VAL 137 ; ALA 140
SM   VAL 137 ; LEU 141
SM   ALA 138 ; ASN 139
SM   ALA 138 ; ALA 140
SM   ALA 138 ; LEU 141
SM   ALA 138 ; ALA 142
SM   ASN 139 ; ALA 140
SM   ASN 139 ; LEU 141
SM   ASN 139 ; ALA 142
SM   ASN 139 ; HIS 143
SM   ALA 140 ; LEU 141
SM   ALA 140 ; ALA 142
SM   ALA 140 ; HIS 143
SM   LEU 141 ; ALA 142
SM   LEU 141 ; HIS 143
SM   LEU 141 ; TYR 145
SM   ALA 142 ; HIS 143
SM   ALA 142 ; LYS 144
SM   ALA 142 ; TYR 145
SM   HIS 143 ; LYS 144
SM   HIS 143 ; TYR 145
SM   LYS 144 ; TYR 145
SM   LYS 144 ; HIS 146
SM   TYR 145 ; HIS 146
//

  File: contacts.log

1cs4
1ii7
2hhb

5.0 DATA FILES

   CONTACTS uses a data file containing van der Waals radii for atoms in
   proteins (below). The file Evdw.dat is such a data file and is part of
   the EMBOSS distribution.

6.0 USAGE

Generate intra-chain CON files from CCF files.
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers:
  [-cpdbdir]           dirlist    [./] This option specifies the location of
                                  CCF files (clean coordinate files) (input).
                                  A 'clean cordinate file' contains protein
                                  coordinate and derived data for a single PDB
                                  file ('protein clean coordinate file') or a
                                  single domain from SCOP or CATH ('domain
                                  clean coordinate file'), in CCF format
                                  (EMBL-like). The files, generated by using
                                  PDBPARSE (PDB files) or DOMAINER (domains),
                                  contain 'cleaned-up' data that is
                                  self-consistent and error-corrected. Records
                                  for residue solvent accessibility and
                                  secondary structure are added to the file by
                                  using PDBPLUS.
   -vdwfile            datafile   [Evdw.dat] This option specifies the name of
                                  the data file with van der Waals radii of
                                  atoms for different amino acid residues.
   -threshold          float      [1.0] Contact between two residues is
                                  defined as when the van der Waals surface of
                                  any atom of the first residue comes within
                                  the threshold contact distance of the van
                                  der Waals surface of any atom of the second
                                  residue. The threshold contact distance is a
                                  user-defined distance with a default value
                                  of 1 Angstrom. (Any numeric value)
  [-conoutdir]         outdir     [./] This option specifies the location of
                                  CON files (contact files) (output). A
                                  'contact file' contains contact data for a
                                  protein or a domain from SCOP or CATH, in
                                  the CON format (EMBL-like). The contacts may
                                  be intra-chain residue-residue, inter-chain
                                  residue-residue or residue-ligand. The
                                  files are generated by using CONTACTS,
                                  INTERFACE and SITES.
   -conlogfile         outfile    [contacts.log] The log file contains
                                  messages about any errors arising while
                                  contacts ran.

   Additional (Optional) qualifiers:
   -[no]ccfnaming      boolean    [Y] This option specifies whether to use
                                  pdbid code to name the output files. If set,
                                  the PDB identifier code (from the PDB file)
                                  is used to name the file. Otherwise, the
                                  output files have the same names as the
                                  input files.
   -skip               boolean    [N] Whether to calculate contacts between
                                  residue adjacent in sequence.
   -ignore             float      [20.0] If any two atoms from two different
                                  residues are at least this distance apart
                                  then no futher inter-atomic contacts will be
                                  checked for for that residue pair . This
                                  speeds the calculation up considerably. (Any
                                  numeric value)

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-cpdbdir" associated qualifiers
   -extension1         string     Default file extension

   "-conoutdir" associated qualifiers
   -extension2         string     Default file extension

   "-conlogfile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


  6.1 COMMAND LINE ARGUMENTS

