Back to TABLE OF CONTENTS           File Formats & Support
RasMol v2.6 is now able to load and to write most of the common coordinate file formats. The original description of PDB files from the v2.5 Manual has been updated to v2.6 with the addition of five additional file formats and a section on machine-specific support.
Portions of these topics are also integrated into various sections of the Manual.
PDB File Formats
Brookhaven Protein Data Bank Files
RasMol Interpretation of PDB Fields
PDB Color Scheme Specification
Multiple NMR Models in PDB Files

MOPAC File Formats
Alchemy File Format
IRIS RGB Image File Format
MDL Mol File Output
Machine Specific Support
Monochrome X Windows Support
Tcl/Tk 3.x and 4.x IPC support
UNIX sockets based IPC
Compiling RasWin with Borland

Brookhaven Data Bank Files
   If you do not have the Brookhaven documentation, you may find the following summary of the PDB file format useful. Additional information can be found at the PDB WWW Home Page. The Protein Data Bank is a computer-based archival database for macromolecular structures. The database was established in 1971 by the Brookhaven National Laboratory, New York, as a public domain repository for resolved crystallographic structures [20]. The Bank uses a uniform format to store atomic coordinates and partial bond connectivities as derived from crystallographic studies.
   PDB file entries consist of records of 80 characters each. Using the punched card analogy, columns 1 to 6 contain a record-type identifier, the columns 7 to 70 contain data. Columns 71 to 80 are normally blank, but may contain sequence information added by library management programs. The first four 4 characters of the record identifier are sufficient to identify the type of record uniquely, and the syntax of each record is independent of the order of records within any entry for a particular macromolecule.
   The only record types that are of major interest to the RasMol program are the ATOM and HETATM records, which describe the position of each atom. ATOM/HETATM records contain standard atom names and residue abbreviations, along with sequence identifiers, coordinates in Angstrom units, occupancies and thermal motion factors. The exact details are given below as a FORTRAN format statement:















 FORMAT(6A1,I5,1X,A4,A1,A3,1X,A1,I4,A1,3X,3F8.3,2F6.2,1X,I3)
  Column             Content
0   5    1    5    2    5    3    5    4    5    5    5    6    5    7    5    8
123456    'ATOM' or 'HETATM'
      7--11   Atom serial number (may have gaps)
    13-16   Atom name, in IUPAC standard format
      17    Alternate location indicator indicated by A, B or C
    18-20   Residue name, in IUPAC standard format
    23-26   Residue sequence number (ordered as below)
      27    Code for insertions of residues (i.e. 66A & 66B)
    31-38   X coordinate
    39-46   Y coordinate
    47-54   Z coordinate
    55-60   Occupancy
    61-66   Temperature factor
    68-70   Footnote number 
0   5    1    5    2    5    3    5    4    5    5    5    6    5    7    5    8
ATOM  AtomN     A     Sequ    Xcoordin        Zcoordin      Tfacto 
HETATM     Name  Res      C           Ycoordin        Occupa       Num

   Residues occur in order of their sequence numbers, which always increase starting from the N-terminal residue for proteins and 5'-terminus for nucleic acids. If the residue sequence is known, certain atom serial numbers may be omitted to allow for future insertion of any missing atoms. Within each residue, atoms are ordered in a standard manner, starting with the backbone (N-Ca-C-O for proteins) and proceeding in increasing remoteness from the alpha carbon, along the side chain.
   HETATM records are used to define post-translational modifications and cofactors associated with the main molecule. Optional TER records are interpreted as breaks in the main molecule's backbone.
   If present, RasMol also inspects HEADER, COMPND, HELIX, SHEET, TURN, CONECT, CRYST1, MODEL, ENDM and END records. Information such as the name, Brookhaven code, revision date and classification of the molecule are extracted from HEADER and COMPND records, initial secondary structure assignments are taken from HELIX, SHEET and TURN records, and the end of the file may be indicated by an END record.
An annotated Example of a PDB File for the protein crambin (1crn.pdb) is shown on a separate page.
RasMol Interpretation of PDB fields
   Atoms located at 9999.000, 9999.000, 9999.000 are assumed to be Insight pseudo atoms and are ignored by RasMol. Atom names beginning ' Q' are also assumed to be pseudo atoms or position markers.
   When a data file contains an NMR structure, multiple conformations may be placed in a single PDB file delimted by several MODEL and ENDM records. In this case, RasMol only displays the first NMR model displayed in the file.
   Residue names "CSH", "CYH" and "CSM" are considered pseudonyms for cysteine "CYS". Residue names "WAT", "H20", "SOL" and "TIP" are considered pseudonyms for water "HOH". The residue name "D20" is consider heavy water "DOD". The residue name "SUL" is considered a sulfate ion "SO4". The residue name "CPR" is considered to be cis-proline and is translated as "PRO". The residue name "TRY" is considered a pseudonym for tryptophan "TRP".
   RasMol uses the HETATM fields to define the sets hetero, water, solvent, and ligand. Any group with the name "HOH", "DOD", "SO4" or "PO4" (or aliased to one of these names by the preceding rules) is considered a solvent and is considered to be defined by a HETATM field.
   RasMol only respects CONECT connectivity records in PDB files containing less than 256 atoms. This is explained in more detail in the set bonds section on determining molecule connectivity. CONECT records that define a bond more than once are interpreted as specifying the bond order of that bond, i.e. a bond specified twice is a double bond and a bond specified three (or more) times is a triple bond.


PDB Color Scheme Specification
   RasMol also accepts the supplementary COLO record type in the PDB files. This record format was introduced by David Bacon's Raster3D program [4] for specifying the color scheme to be used when rendering the molecule. This extension is not currently supported by Brookhaven. The COLO record has the same basic record type as the ATOM and HETATM records described above.
   Colors are assigned to atoms using a matching process. The Mask field is used in the matching process as follows. First RasMol reads in and remembers all the ATOM, HETATM and COLO records in input order. When the user-defined (User) color scheme is selected, RasMol goes through each remembered ATOM/HETATM record in turn, and searches for a COLO record that matches in all of columns 7 through 30. The first such COLO record to be found determines the color and radius of the atom.

   Column             Content
0   5    1    5    2    5    3    5    4    5    5    5    6    5    7    5    8
COLOUR                        Red     Green   Blue    SphereComments  
      Mask Mask Mask Mask Mask
1-6    'COLOR' or 'COLOUR'
     7-30   Mask (described below)
    31-38   Red component
    39-46   Green component
    47-54   Blue component
    55-60   Sphere radius in Angstroms
    61-70   Comments

   Note that the Red, Green and Blue components are in the same positions as the X, Y, and Z components of an ATOM or HETATM record, and the van der Waal's radius goes in the place of the Occupancy. The Red, Green and Blue components must all be in the range 0 to 1.
   In order that one COLO record can provide color and radius specifications for more than one atom (e.g. based on residue, atom type, or any other criterion for which labels can be given somewhere in columns 7 through 30), a 'don't-care' character, the hash mark "#" (number or sharp sign) is used. This character, when found in a COLO record, matches any character in the corresponding column in a ATOM/HETATM record. All other characters must match identically to count as a match. As an extension to the specification, any atom that fails to match a COLO record is displayed in white.
Multiple NMR Models
   RasMol may now load all of the NMR models from a Brookhaven PDB file using the new command, loadnmrpdb <filename>. The NMR file format instructs the PDB reader to load all the models from the PDB file, instead of just the first one as is the behavior of the (default) "pdb" format specifier. If the specified PDB file does not contain an NMR structure the behavior of "nmrpdb" is identical to that of "pdb". Once multiple NMR conformations have been loaded they may be manipulated with the atom expression extensions described in Primitive Expressions.
MOPAC File Formats
   RasMol can now read MOPAC format files. The new loadmopac <filename> command automatically distinguishes between MOPAC input and output file types, and can read input files in both Cartesian and internal(z-matrix) formats. RasMol will also read the charge information in MOPAC output files, however, it cannot read the output files of MOPAC jobs specifying the NOXYZ keyword.
Alchemy File Format
   The Alchemy file format reader has been enhanced to allow hydrogen bonds to be explicitly represented in a file using the keyword HYDROGEN, instead of the typical SINGLE, DOUBLE, TRIPLE or AROMATIC.




IRIS RGB Image File Format
   RasMol on all platforms now supports the generation of images in IRIS RGB format files. This file format is often used when running on Silicon Graphics workstations. The appropriate form of RGB fileis used by both 8bit and 32bit versions of RasMol. These files may be created using the writeiris <filename> command.
MDL Mol File Output
   RasMol version 2.6 may now be used to generate MDL Mol files. The new command savemdl <filename> saves the currently selected set of atoms to the specified file in MDL file format.
Machine-Specific Support
   Monochrome X Windows Support. RasMol v2.6 now supports the many monochrome UNIX workstations typically found in academia, such as low-end SUN workstations and NCD X-terminals. The X11 version of RasMol (when compiled in 8 bit mode) now detects black & white X Windows displays and enables dithering automatically. The use of run-time error diffusion dithering means that all display modes of RasMol are available when in monochrome mode. For best results, users should experiment with the set ambient command to ensure the maximum contrast in resulting images.
   Tcl/Tk 3.x and 4.x IPC support. The recently announced version 4 of Tk graphics library has changed the protocol used to communicate between Tk applications. RasMol version 2.6 has been modified such that it can now communicate with both this new protocol and the previous version 3 protocol supported by RasMol v2.5. Although Tcl/Tk 3.x applications may only communicate with other 3.x applications and Tcl/Tk 4.x applications with other 4.x applications, these changes allow RasMol v2.6 to communicate between processes with both protocols (potentially concurrently).
   UNIX sockets based IPC. The UNIX implementation of RasMol v2.6 now supports BSD-style socket communication. An identical socket mechanism is also being developed for VMS, Apple Macintosh and Microsoft Windows systems. This should allow RasMol to interactively display results of a computation on a remote host. The current protocol acts as a TCP/IP server on port 21069 that executes command lines until either the command "exit" or the command "quit" is typed. The command exit disconnects the current session from the RasMol server, the command quit both disconnects the current session and terminates RasMol. This functionality may be tested using the UNIX command "telnet <hostname> 21069".
    Compiling RasWin with Borland. A number of changes have been made to the source code to allow the Microsoft Windows version of RasMol to compile using the Borland C/C++ compiler. These fixes include name changes for the standard library and special code to avoid a bug in _fmemset.
Back to TABLE OF CONTENTS