Durhamlogo                                                                                                                                        BSIlogo

 

Principle Componant Analysis with Amber Instructions


==== Creation of Initial Files ==== The first step needed to start the PCA analysis is to create an Amber forcefield for the protein. If there is a ligand present, this will probably not be recognised by default so a forcefield file will have to be created for it (these steps can be skipped if this is not an issue). Firstly, a pdb for the ligand with Hydrogen is needed, if you don't have one, take the ligand without Hydrogen and pass this through babel babel -h -i pdb ligand.pdb -o pdb ligandH.pdb Check this using a viewer to make sure the connections are all right (e.g. gmolden as this uses CONECT entries if they exist). Now remove any CONECT lines in the pdb. This can be done with: sed '/CONECT/d' ligandH.pdb > ligandHalt.pdb mv ligandHalt.pdb ligandH.pdb if you are not wanting to do this by hand. Next a mol2 file needs be be produced which generates the bond lengths, angles, and atom charges. antechamber -i ligandH.pdb -fi pdb -o ligandH.mol2 -fo mol2 -c bcc -s 2 -c bcc selects what charge model is required and -s 2 makes the terminal output more verbose. It is possible that antechamber has missed some of the angles or dihedrals in the ligand, therefore this needed to be checked with parmchk: parmchk -i ligandH.mol2 -f mol2 -o ligandH.frcmod Now all the parameters should be ready to input to generate the amber forcefield. To do this the Amber programs xleap or tleap are needed. xleap is a more graphical method, but tleap is just as effective, and can be scripted if needed. This will generate the generate the forcefield parameters and initial position file for both the protein and ligand together. tleap is launched by: tleap -f leaprc.ff99SB where -f leaprc.ff99SB tells the program to use the 99SB forcefield parameters. Commands prefixed with > are ran in the tleap shell. Firstly, load the general forcefield, followed by the missing values from parmchk : > source leaprc.gaff > loadAmberParams ligandH.frcmod Next load the mol2 file with your ligand parameters in: > lig = loadmol2 ligandH.mol2 The lig should be the same as the residue name in your full protein and ligand pdb file. Now load the full protein and ligand pdb file with no Hydrogens, This is the first step needed if you had no ligand (apart from launching tleap that is): > protein = loadpdb protein.pdb This will automatically add Hydrogens to the protein (and ligand if present). The Amber parameter files can now be created for the protein, if it wants to be ran in a vacuum: > saveAmberParm protein protein_vacuum.prmtop protein_vacuum.inpcrd It is also good to make a backup of the protein in tleap in case a mistake is made in adding ions and water to the system: > protein2 = copy protein protein can then be regenerated with: > protein = copy protein2 Now if the protein has a charge then counter ions need to be added. The charge can be checked with: > charge protein If this is negative Na+ will be needed, if positive Cl- ions will be needed. These can be added with: > addions protein Na+ 0 The 0 means that the protein will be neutralised. With the option 0 the number of ions added will be the absolute value rounded down, so if the charge is -5.99999 only 5 Na+ ions will be added, where it is clear that 6 should be added. If this is the case the 0 should be replaced with the number of ions needed. The protein and ions now need to be solvated with water, this can be done with: > solvateoct protein TIP3PBOX 8 iso solvateoct will add an octahedral box of water, a cubic box can be added by using solvatebox instead. The 8 means that a spacing of 8 Angstroms will be left between the periodic images. The full Amber parameter files now need to be saved: > saveAmberParm protein protein_solvated.prmtop protein_solvated.inpcrd Also a pdb can be saved with all the atoms in: > savepdb protein protein_solvated.pdb tleap can be exited with: > quit ==== MD simulations ==== Now that we have the parameter files and a file for the initial protein position, we need to carry out a few minimization and equilibrium steps before the full MD run. Firstly a position restrained protein minimisation is performed. For this the input file is needed ({{:wat_min1.in|wat_min1.in}}), where the RES line needs to be changed to the range of residue numbers in your protein: sander -O -i wat_min1.in -o protein-min1.out -p protein_solvated.prmtop -c protein_solvated.inpcrd -r protein-min1.rst -ref protein_solvated.inpcrd After this restrained minimisation run, and unrestrained run needs to be carried out ({{:wat_min2.in|wat_min2.in}}): sander -O -i wat_min2.in -o protein-min2.out -p protein_solvated.prmtop -c protein-min1.rst -r protein-min2.rst After these initial minimisations, temperature coupling needs to be added to the system. These run with a restrained protein, but only with loose springs, again the RES line needs to be changed to the range of residue numbers in your protein ({{:heat.in|heat.in}}): sander -O -i heat.in -o protein-heat.out -p protein_solvated.prmtop -c protein-min2.rst -r protein-heat.rst -ref protein-min2.rst -x protein-heat.mdcrd Now the protein system also needs to be coupled to the pressure. Again with a weakly restrained protein ({{:density.in|density.in}}): sander -O -i density.in -o protein-density.out -p protein_solvated.prmtop -c protein-heat.rst -r protein-density.rst -ref protein-heat.rst -x protein-density.mdcrd The system can now be run for an equilibration md run ({{:wat_md.in|wat_md.in}}): sander -O -i wat_md.in -o protein-md.out -p protein_solvated.prmtop -c protein-density.rst -r protein-md.rst -x protein-md.mdcrd Once this is finished, check that the system is at equilibrium. If not, re-run. If it is then the full md simulation can be carried out ({{:mm-pbsa-prod.in|mm-pbsa-prod.in}}): sander -O -i mm-pbsa-prod.in -o protein-full.out -p protein_solvated.prmtop -c protein-md.rst -r protein-full.rst -x protein-full.mdcrd If you want the run in parallel then the sander command needs to be replaced by mpirun -np $NSLOTS pmemd.MPI ==== Principle Component Analysis ==== Now that you have your md simulation, you can analyse the results to generate the PCA. To do this it is fist better to centre the protein in the box. This can be done using the ptraj program (which is part of AMBER). The commands needed are: trajin protein-full.mdcrd reference protein_solvated.pdb center :1-401 mass origin image origin center familiar trajout centered.mdcrd go The centring works best if the original pdb is used as a reference as this was created in the centre of the box. The numbers used are the residues of the protein and will need to be changed if the protein residues are numbered differently. The best way to run this is to copy the above lines into a file e.g. centre.ptraj and then use the command: ptraj protein_solvated.prmtop < centre.ptraj > centre.log After centring the file, the PCA can be carried out also using ptraj. The commands for this are: trajin centered.mdcrd matrix mwcovar name com-var :1-401 analyze matrix com-var vecs 50 name evecs-com out evecs.out thermo analyze modes fluct out rmsfluct.dat stack evecs-com analyze modes displ out resdispl.dat stack evecs-com average average_struct.pdb pdb :1-401 go Again the numbers will need to be changed based on the residue numbers of the protein. The 50 corresponds to the fact that the first 50 eigenvectors will be solved and given out in the file evecs.out. The best way to run this is to copy the above lines into a file e.g. pca.ptraj and then use the command: ptraj protein_solvated.prmtop < pca.ptraj > pca.log The pca.log file will contain the values of the first 50 eigenfrequencies corresponding to the vectors in evecs.out.