Photos Site

Principle Componant Analysis with Amber Instructions



==== Creation of Initial Files ====
The first step needed to start the PCA analysis is to create an Amber forcefield 
for the protein. If there is a ligand present, this will probably not be recognised 
by default so a forcefield file will have to be created for it (these steps can 
be skipped if this is not an issue).

Firstly, a pdb for the ligand with Hydrogen is needed, if you don't have one, take the 
ligand without Hydrogen and pass this through babel
   babel -h -i pdb ligand.pdb -o pdb ligandH.pdb

Check this using a viewer to make sure the connections are all right (e.g. gmolden as 
this uses  CONECT  entries if they exist). Now remove any CONECT lines in the pdb. 
This can be done with:
   sed '/CONECT/d' ligandH.pdb > ligandHalt.pdb
   mv ligandHalt.pdb ligandH.pdb
if you are not wanting to do this by hand.

Next a mol2 file needs be be produced which generates the bond lengths, angles, and 
atom charges.
   antechamber -i ligandH.pdb -fi pdb -o ligandH.mol2 -fo mol2 -c bcc -s 2
 -c bcc  selects what charge model is required and  -s 2  makes the terminal output 
more verbose.

It is possible that antechamber has missed some of the angles or dihedrals in the 
ligand, therefore this needed to be checked with parmchk:
   parmchk -i ligandH.mol2 -f mol2 -o ligandH.frcmod

Now all the parameters should be ready to input to generate the amber forcefield.

To do this the Amber programs  xleap  or  tleap  are needed.  xleap  is a more 
graphical method, but  tleap  is just as effective, and can be scripted if needed. 
This will generate the generate the forcefield parameters and initial position file 
for both the protein and ligand together.  tleap  is launched by: 
   tleap -f leaprc.ff99SB
where  -f leaprc.ff99SB  tells the program to use the 99SB forcefield parameters. 
Commands prefixed with  >  are ran in the  tleap  shell.

Firstly, load the general forcefield, followed by the missing values from  parmchk :
   > source leaprc.gaff
   > loadAmberParams ligandH.frcmod

Next load the mol2 file with your ligand parameters in:
   > lig = loadmol2 ligandH.mol2

The lig should be the same as the residue name in your full protein and ligand pdb 
file.

Now load the full protein and ligand pdb file with no Hydrogens, This is the first step 
needed if you had no ligand (apart from launching  tleap  that is):
   > protein = loadpdb protein.pdb

This will automatically add Hydrogens to the protein (and ligand if present). The Amber 
parameter files can now be created for the protein, if it wants to be ran in a vacuum:
   > saveAmberParm protein protein_vacuum.prmtop protein_vacuum.inpcrd

It is also good to make a backup of the protein in  tleap  in case a mistake is made in 
adding ions and water to the system:
   > protein2 = copy protein

 protein  can then be regenerated with:
   > protein = copy protein2

Now if the protein has a charge then counter ions need to be added. The charge can be 

checked with:
   > charge protein

If this is negative  Na+  will be needed, if positive  Cl-  ions will be needed. These can 
be added with:
   > addions protein Na+ 0

The 0 means that the protein will be neutralised.

With the option 0 the number of ions added will be the absolute value rounded down, so 
if the charge is -5.99999 only 5 Na+ ions will be added, where it is clear that 6 should 
be added. If this is the case the 0 should be replaced with the number of ions needed.

The protein and ions now need to be solvated with water, this can be done with:
   > solvateoct protein TIP3PBOX 8 iso

solvateoct  will add an octahedral box of water, a cubic box can be added by using  
solvatebox  instead. The 8 means that a spacing of 8 Angstroms will be left between 
the periodic images.

The full Amber parameter files now need to be saved:
   > saveAmberParm protein protein_solvated.prmtop protein_solvated.inpcrd

Also a pdb can be saved with all the atoms in:
   > savepdb protein protein_solvated.pdb

 tleap  can be exited with:
   > quit

==== MD simulations ====
Now that we have the parameter files and a file for the initial protein position, we need
 to carry out a few minimization and equilibrium steps before the full MD run. Firstly a 
position restrained protein minimisation is performed. For this the input file is needed 
({{:wat_min1.in|wat_min1.in}}), where the RES line needs to be changed to the range of 
residue numbers in your protein:

   sander -O -i wat_min1.in -o protein-min1.out -p protein_solvated.prmtop 
             -c protein_solvated.inpcrd -r protein-min1.rst 
             -ref protein_solvated.inpcrd

After this restrained minimisation run, and unrestrained run needs to be carried out 
({{:wat_min2.in|wat_min2.in}}):

   sander -O -i wat_min2.in -o protein-min2.out -p protein_solvated.prmtop 
                     -c protein-min1.rst -r protein-min2.rst

After these initial minimisations, temperature coupling needs to be added to the system. 
These run with a restrained protein, but only with loose springs, again the RES line 
needs to be changed to the range of residue numbers in your protein ({{:heat.in|heat.in}}):

   sander -O -i heat.in -o protein-heat.out -p protein_solvated.prmtop -c protein-min2.rst 
                  -r protein-heat.rst -ref protein-min2.rst -x protein-heat.mdcrd

Now the protein system also needs to be coupled to the pressure. Again with a weakly 
restrained protein ({{:density.in|density.in}}):

   sander -O -i density.in -o protein-density.out -p protein_solvated.prmtop 
                  -c protein-heat.rst -r protein-density.rst -ref protein-heat.rst 
                  -x protein-density.mdcrd

The system can now be run for an equilibration md run ({{:wat_md.in|wat_md.in}}):

   sander -O -i wat_md.in -o protein-md.out -p protein_solvated.prmtop 
                -c protein-density.rst -r protein-md.rst -x protein-md.mdcrd

Once this is finished, check that the system is at equilibrium. If not, re-run. If it 
is then the full md simulation can be carried out ({{:mm-pbsa-prod.in|mm-pbsa-prod.in}}):

   sander -O -i mm-pbsa-prod.in -o protein-full.out -p protein_solvated.prmtop 
                 -c protein-md.rst -r protein-full.rst -x protein-full.mdcrd

If you want the run in parallel then the sander command needs to be replaced by 
mpirun -np $NSLOTS pmemd.MPI

==== Principle Component Analysis ====
Now that you have your md simulation, you can analyse the results to generate the PCA. 
To do this it is fist better to centre the protein in the box. This can be done using 
the ptraj program (which is part of AMBER). The commands needed are:
    trajin protein-full.mdcrd
    reference protein_solvated.pdb
    center :1-401 mass origin
    image origin center familiar
    trajout centered.mdcrd
    go

The centring works best if the original pdb is used as a reference as this was created 
in the centre of the box. The numbers used are the residues of the protein and will need 
to be changed if the protein residues are numbered differently.

The best way to run this is to copy the above lines into a file e.g. centre.ptraj and 
then use the command:
    ptraj protein_solvated.prmtop < centre.ptraj > centre.log

After centring the file, the PCA can be carried out also using ptraj. The commands 
for this are:
    trajin centered.mdcrd
    matrix mwcovar name com-var :1-401
    analyze matrix com-var vecs 50 name evecs-com out evecs.out thermo
    analyze modes fluct out rmsfluct.dat stack evecs-com
    analyze modes displ out resdispl.dat stack evecs-com
    average average_struct.pdb pdb :1-401
    go

Again the numbers will need to be changed based on the residue numbers of the protein. 
The 50 corresponds to the fact that the first 50 eigenvectors will be solved and given 
out in the file evecs.out.

The best way to run this is to copy the above lines into a file e.g. pca.ptraj and 
then use the command:
    ptraj protein_solvated.prmtop < pca.ptraj > pca.log

The pca.log file will contain the values of the first 50 eigenfrequencies corresponding 
to the vectors in evecs.out.