franklyn
International Hazard
Posts: 3026
Registered: 30-5-2006
Location: Da Big Apple
Member Is Offline
Mood: No Mood
|
|
Anyone doing Protein Folding ?
Contribute to the advancement of Biological Science by
donating your computer's unused C.P.U. cycles. Join
the distributed computing effort to resolve the Protein
Folding dilema.
"In 1969 Cyrus Levinthal introduced the concept, later to be known
as Levinthal Paradox, that a protein cannot find its native state
by a random search through all its possible conformations, because
such an exhaustive enumeration in theory would require eons of
years, while proteins fold on a fraction of second."
Determing how proteins fold and misfold picks up where the
human Genome project left off. Some would say now the real
work begins.
"We have about a billion DNA bases in our genome.That's less than
1GB of hard drive storage! The genetic differences between humans
and chimps is less than a million DNA bases --
small enough to easily fit on a 1.4MB floppy!"
See the Project home page for more info here _
http://folding.stanford.edu
http://folding.stanford.edu/about.html
An exellent " how to " resource for installing and managing
the software.
http://www.maximumpc.com/forums/viewtopic.php?t=8036
.
|
|
Ramiel
Vicious like a ferret
Posts: 484
Registered: 19-8-2002
Location: Room at the Back, Australia
Member Is Offline
Mood: Semi-demented
|
|
Yea, I've been using <a href="http://boinc.berkeley.edu/">BOINC</a> to devote all of my idle time to predictors and [shamefully perhaps]
SETI for about 6 months now, it's a good idea of course, no matter how you look at it.
p.s. I apologise for not posting very much in the way of science people, been down in Denmark.au for a couple of months too, away from my lovely
chemicals.
Caveat Orator
|
|
Polverone
Now celebrating 21 years of madness
Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline
Mood: Waiting for spring
|
|
Yes, I let my PC do some folding when idle. It's my little way of saying "thank you" to the Pande group for porting the Amber force fields to Gromacs,
which I use daily for molecular dynamics simulations.
PGP Key and corresponding e-mail address
|
|
chemoleo
Biochemicus Energeticus
Posts: 3005
Registered: 23-7-2003
Location: England Germany
Member Is Offline
Mood: crystalline
|
|
Protein folding...yeah
It sounds so interesting, and so much like the final answer to biology - but in reality, the problems to predicting structure from sequence is almost
impossible.
The Levinthal paradox is always taught at uni, but it is inherently faulty - it ignores folds that overlap with itself, but instead calculates all
possible conformations (there are three angles to each amino acid, and at 100 amino acid that makes 3^100 possibilities. There are of course many
less, as there may be internal clashes. Plus, there are many that are energetically not favorable. That's why the view of a 'folding funnel' was
invoked, where some low energy structures fold further and further until the lowest energy one is found. Incrementality, or iteration, are the clues
to this.
Anyway... I really wonder what algorithm they use to calculate structures from sequence. To my knowledge no such algorithm exists, instead some force
field (all newtonian based, but not quantum mechanically) approximations are made. THey sometimes work, but mostly they don't, and often structures
are determined *with* the knowledge of experimetnally determined structures, making the predictions biased.
All the computing power is great, but if the algorithm is incorrect, then it is useless. Get the right algorithm, and the Nobel and eternal fame will
be yours, truly.
Never Stop to Begin, and Never Begin to Stop...
Tolerance is good. But not with the intolerant! (Wilhelm Busch)
|
|
nitro-genes
International Hazard
Posts: 1048
Registered: 5-4-2005
Member Is Offline
|
|
I wonder if the perfect algorithm is even possible. Some proteins are heavily dependant on the action of chaperones during folding. These chaperones
may actually give the protein a different folding energy distribution than that can be predicted from whatever computer algorith there is. Completely
ab inito calculations of protein structure may not be possible in the near future, but algoritms like ROSETTA and Monte Carlo are quite promising,
avoiding the "local minimum problem" that was a problem for some of the first algorithms designed. They can assist in "finetuning" NMR and
crystallography derived structures to a higher resolution and may give a good indiction of the structure of an unknown protein. I know for sure that
this method of looking at protein structures will become more and more important as this technique will become more mature...
Reasons enough for me to join the "Dutch Power Cows" in their epic battle to become champion of the peptides!
A good understandable book btw that explains protein structure prediction from sequence was "Introduction to bioinformatics, by Lesk" for me...
[Edited on 26-7-2006 by nitro-genes]
|
|
Polverone
Now celebrating 21 years of madness
Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline
Mood: Waiting for spring
|
|
The folding@home project is as much about testing methods as actually trying to fold proteins. I don't think there's any reason, in principle, why
newtonian methods can't handle most folding problems; excited states and the making/breaking of covalent bonds aren't typically involved, are they?
It's conformational changes and weak interactions within the molecule and with solvent and ions, right?
There are 3 big obstacles to protein folding with force field based methods, as I see them:
1) The force fields may be too simple to reproduce the path that leads to experimental structures. For example, maybe it turns out that explicit
polarization is necessary and the old AMBER and CHARMM workhorses would never, ever give rise to correct folding for most proteins.
2) The force fields may be theoretically adequate to represent the process, but are not appropriately parameterized to accomplish it in practice.
Water models, ion models, and biopolymer models all get parameterized based on different criteria. Parameterizations use substantial shortcuts because
of shortfalls in electronic structure calculations/experimental data that are available. Force field parameters explicitly dedicated to protein
folding might be necessary for good results.
3) If I'm to believe Wikipedia, proteins of a few hundred amino acids fold on a time scale of milliseconds. That's an enormous length of time for a
molecular dynamics simulation. Nanosecond simulations are routine, but microseconds are still quite exotic. Milliseconds are staggering, and the
slowness with which millisecond-length simulations can be produced will likely impede the iterative improvement of points 1 and 2.
For all we know, the algorithm(s) needed to properly fold proteins are already here: we just can't see it because our computers are too slow and/or
the parameters chosen so far have been poor.
PGP Key and corresponding e-mail address
|
|
chemoleo
Biochemicus Energeticus
Posts: 3005
Registered: 23-7-2003
Location: England Germany
Member Is Offline
Mood: crystalline
|
|
From what I understand, the biggest problem resides in 2). The presence of molecules *other* than the protein molecule itself. Most simiulations to my
knowledge model folding in vacuo, that is, in the absence of water, ions, cofactors, or simply other proteins with which the protein in question
cannot adopt the correct fold. How are we ever going to account for water molecules in a given protien molecule? We cannot even calculate the dynamic
hydration cage around a simple small amphiphatic molecule for instance. HOw can this then work with large molecules such as proteins? Furthermore many
proteins simply do not fold up in conditions of low salt. Physiological ionic strength within cells is around 150 mM, with all sorts of counterions,
them being K, Na, Ca and Cl, SO4, PO4 etc. How can you essentially calculate the infinite number of 'conformations' which are all highly dynamic of
the solvent surrounding the protien, and thus keeping it stably in solution?
This is why proteomics-based approaches are essentially doomed to fail. Noone can purify protiens, quantify them in their folded states, in simple
Chip assays, unlike DNA. Each protien is essentially a new chemical on its own. General purification protocols are only applicable to soluble simple
small proteins, but fail as soon as complexes, or larger proteins are involved.
Are you aware of programs that, apart from empirical parameterisation, that rigourously and physically account for the presence of solvent molecules?
We can't even do it for simple molecules, what makes you think it will be possible for large proteins to be folded in silico? Re. problems that are
non-Newtonian in nature - from NMR I know that the environment has huge effects on things such as the electron clouds, which is seen through the
observable effect on the spin resonances (chemical shifts). Noone can even attempt to predict how solvents will affect the resonances, how then can
you think we can predict the folding of a protien altogether? Then you have the 3-body problem. Here you have thousands of 'bodies', making a
deterministic solution impossible. THis is I suppose why such large computational power is required, to determine attempts of solutions iteratively...
Never Stop to Begin, and Never Begin to Stop...
Tolerance is good. But not with the intolerant! (Wilhelm Busch)
|
|
Polverone
Now celebrating 21 years of madness
Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline
Mood: Waiting for spring
|
|
MD simulations routinely use explicit water and counterions. Vacuum simulations of DNA fail horribly after a short time, so I always include water.
Are you sure you can't see hydration cages around small molecules? I've written little programs to analyze hydrogen bonding networks among and
residence times of water molecules around altered DNAs, and you can definitely see enhanced hydration effects around newly introduced polar groups
(this is all from analysis of MD simulations I've run).
We don't need exact or analytical solutions to the folding problem; iterative and/or stochastic solutions are fine as long as the later stages are
close enough to real protein behavior. Being able to get arbitrarily close to the solution, as with other many-body problems, should be fine.
As I said before, I don't know if current force field models are capable of handling folding well, but I wouldn't dismiss the entire approach so
quickly. At the same time I would remain cautious because it will take decades of Moore's law, or at least many years of development of dedicated
hardware, to make millisecond-length MD simulations routine, and that's regardless of whether or not current force fields are good enough. Fancier
force fields are generally slower as well, though over time that's offset by improved implementation techniques.
PGP Key and corresponding e-mail address
|
|
chemoleo
Biochemicus Energeticus
Posts: 3005
Registered: 23-7-2003
Location: England Germany
Member Is Offline
Mood: crystalline
|
|
Ok, I confess I am not up to date on the inclusion of solvent in MD simulations. AT the time of my undergrad, we were certainly taught of this
insurmountable problem. I wonder how this is taken account of. I shall ask around in the lab later.
Also, how does DNA horribly fail in vacuo? Does it simply not form the double helix? Are the positions of the water molecules realistic, do they
overlap with H2O's in i.e. crystal structures?
By the way, wht I am talking about is the precise locations of the water, which are of course extremely dynamic, but presumably 'average out' over
time.
Anyway, experimentally observed folding of proteins is still far from a decided overlap with simulated folding, although there are a couple of reports
which claim this (and of which I am doubtful, see Valerie Daggetts papers).
Never Stop to Begin, and Never Begin to Stop...
Tolerance is good. But not with the intolerant! (Wilhelm Busch)
|
|
Polverone
Now celebrating 21 years of madness
Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline
Mood: Waiting for spring
|
|
Explicit solvent has been available in popular MD codes for decades. Proteins and small molecules don't do such awful things as nucleic acids do in
vacuo, though, so maybe the simplified for-undergraduates story you heard was "we can't do solvated simulations" when they meant "we don't want to
spend the computer time to do solvated simulations."
Some popular water models are/were SPC, SPCE, and TIP3P. Since water molecules are very numerous in solvated systems, and people generally care more
about solute behavior than solvent behavior, they've generally been computationally simple models parameterized on (for example) reproducing the
density of liquid water. I wouldn't be surprised at all if they fail to match many experimental hydration effects. Anecdotally, I have simulated one
DNA starting from an experimental structure that had a single water molecule explicitly included (in the "hole" left by an abasic site), and water
molecules diffused in and out of that location over the course of the whole simulation.
Sometimes you get lucky: for example, with the right choices the SPC water model does an excellent job of reproducing the relative free energies of
hydration of chloride and bromide ions, and it's about as simple as water models get. But this is coincidence, not design. Michael Shirts has written
some very interesting papers on highly accurate, precise, reproducible determination of relative and absolute solvation free energies using molecular
dynamics (the folding@home project produced at least some of the data that he used for this). He found that current water models are all pretty bad at
reproducing solvation free energies, but now that simulation on the necessary grand scale is available, it's possible to reparameterize models with
the explicit goal of correcting these free energies.
It's possible in many packages to use non-water solvents also, but water is by far the most common and its performance is specially optimized in e.g.
Gromacs and probably other packages too; I could solvate my systems in methanol or THF, but it'd be considerably slower.
In vacuum, electrostatic repulsion between the backbones will rapidly disintegrate the double helix. You can fake it for a while (a few hundred
picoseconds) if you use a short cutoff for coulombic interactions, so the charges are rarely "seen" by the paired strands, but I think even those
simulations eventually fall apart. For stability we add explicit water, explicit Na+ counterions, and model long-range coulombic interactions with
particle mesh Ewald or fast multipole method (this isn't a way to "cheat" the charges out of the system, just a way to speed up simulation and keep
the computational expense scaling under control).
PGP Key and corresponding e-mail address
|
|
unionised
International Hazard
Posts: 5130
Registered: 1-11-2003
Location: UK
Member Is Offline
Mood: No Mood
|
|
I think it's a safe bet we are all doing protein folding. Otherwise we would be dead.
|
|
franklyn
International Hazard
Posts: 3026
Registered: 30-5-2006
Location: Da Big Apple
Member Is Offline
Mood: No Mood
|
|
Quote: | Originally posted by unionised
I think it's a safe bet we are all doing protein folding. Otherwise we would be dead. |
http://www.sciencemadness.org/talk/viewthread.php?tid=4442&a...
.
|
|
Nerro
National Hazard
Posts: 596
Registered: 29-9-2004
Location: Netherlands
Member Is Offline
Mood: Whatever...
|
|
Wouldn't it be much easier to just save up a lot of models of proteins yet to be folded and to let a supercomputer do it in a few hours? Or is the
combined power of the people on the web (those that want to participate that is, not all people on the web) equal to a supercomputer already?
Also if we have 1 billion nucleobases that would require almost 2 GB because there are four "bits" in DNA rather than the two bits used on a
harddrive. Every base would have to be numbered using 2 "harddrivebits" either 00 01 10 or 11
#261501 +(11351)- [X]
the \"bishop\" came to our church today
he was a fucken impostor
never once moved diagonally
courtesy of bash
|
|
franklyn
International Hazard
Posts: 3026
Registered: 30-5-2006
Location: Da Big Apple
Member Is Offline
Mood: No Mood
|
|
Quote: | Originally posted by Nerro
Wouldn't it be much easier to just save up a lot of models of proteins yet to be folded and to let a supercomputer do it in a few hours?
|
The problem is to determine the spacial configuration of a given protein chain.
Its called folding in a similar sense to origami, in which a sheet of paper can
be made into countless indefinite shapes.
A very good analogy is to dangle a chain of the kind used for jewelry and as it
is set down, observe the particular pattern it forms as it bundles in a heap.
This can be repeated indefinitely and the pattern will never repeat.
This illustrates the problem, each protein has a definite characteristic shape.
How does it do that ? Out of an almost literal infinity of possibilities.
.
|
|
franklyn
International Hazard
Posts: 3026
Registered: 30-5-2006
Location: Da Big Apple
Member Is Offline
Mood: No Mood
|
|
http://www.whatsnextnetwork.com/technology/index.php/2006/09...
.
|
|
Nerro
National Hazard
Posts: 596
Registered: 29-9-2004
Location: Netherlands
Member Is Offline
Mood: Whatever...
|
|
well, it's not exactly an infinate supply of possibilities. After all the energies vary with every way the proteins are folded and certain sequences
always form helices or beta-sheets. Proteins may have similar structures, or partially similar sequences.
Are you really telling me it's takes so long to do a large number of rough approximations? When those are done the most probable ones could be
calculated properly later...
Sorry for the critical look, it just seems like alot of people are making mice out of elephants
#261501 +(11351)- [X]
the \"bishop\" came to our church today
he was a fucken impostor
never once moved diagonally
courtesy of bash
|
|
unionised
International Hazard
Posts: 5130
Registered: 1-11-2003
Location: UK
Member Is Offline
Mood: No Mood
|
|
"well, it's not exactly an infinate supply of possibilities. "
Are you sure?
The angle between 2 bits of the protein might be 180degrees, it might be 90 and it might be any of an infinite variety of angles in between.
Anyway, supercomputers are expensive whereas getting people to donate computer time as screensavers a la SETI is cheap.
Presumably, if it is a computationally simple problem ,then as soon as someone downloads the software and gives it a few minutes it will all be over
and done with.
|
|
neutrino
International Hazard
Posts: 1583
Registered: 20-8-2004
Location: USA
Member Is Offline
Mood: oscillating
|
|
Supercomputers really aren’t as powerful as people think.
A supercomputer is nothing more than a large number of smaller computers in an array. Put together a bunch of desktops and you can rival the most
powerful supercomputers in existence. In fact, Virginia Tech did this a few years ago:
Quote: | from the Wikipedia
System X is a supercomputer assembled by Virginia Tech in the summer of 2003, comprising 1,100 Apple PowerMac G5 computers….On November 16, 2003, it
was ranked by the TOP500 list as the third-fastest supercomputer in the world.
|
A large network of ordinary desktops really does possess an awe-inspiring amount of computing power.
|
|
Polverone
Now celebrating 21 years of madness
Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline
Mood: Waiting for spring
|
|
Quote: | Originally posted by neutrino
Supercomputers really aren’t as powerful as people think.
A supercomputer is nothing more than a large number of smaller computers in an array. Put together a bunch of desktops and you can rival the most
powerful supercomputers in existence.
A large network of ordinary desktops really does possess an awe-inspiring amount of computing power. |
This is only true for some problems. The F@H and all other @Home projects work because they can break up their research into many small sub-problems
that can execute independently. These are "embarassingly parallel" problems that are trivial to scale up. Many problems need rapid and frequent
communication between processors to get any parallel speedup, and that's where multiprocessors, clusters, and "true" supercomputers shine a lot
brighter than a big pile of PCs with special screensavers.
The line between cluster and "true" supercomputer is blurred, but both of them are almost sure to have very high speed connections between the nodes
of the computer. Gigabit ethernet is entry-level nowadays, but faster and more specialized setups like Infiniband, Myrinet, or Quadrics are needed to
get decent scaling on many problems. When inter-node communication gets too slow, the processors waste their time waiting for information from their
neighbors. The larger the number of processors, the more likely (in general) that there will be significant slowdown due to communication bottlenecks.
Eventually you reach a point where adding new processors does no good at all. Making communication-heavy problems take advantage of a large number of
processors requires a careful mix of software and hardware capabilities, and is very difficult.
PGP Key and corresponding e-mail address
|
|
Wolfram
Hazard to Others
Posts: 133
Registered: 13-10-2003
Member Is Offline
Mood: No Mood
|
|
Are you sure the protein structures known are right? Could it not be so that the protein structure gets modified when proteins form crystals?
|
|
nitro-genes
International Hazard
Posts: 1048
Registered: 5-4-2005
Member Is Offline
|
|
No, because both crystallography and NMR can be used for determining protein structure. The advantage of NMR is that the protein can be used in
solution so no need to grow perfect crystals of the protein you want to study, which is not always easy. Crystallography does however give a higher
resolution, but generally the NMR and crystallography determined structures are not far apart...
|
|
Pyridinium
Hazard to Others
Posts: 258
Registered: 18-5-2005
Location: USA
Member Is Offline
Mood: cupric
|
|
Re: anyone doing protein folding?
This might be a huge oversimplification, but I always figured proteins "know" how to fold in the same way any other molecules "know" to form certain
compounds or stacked complexes in solution. It's just a difference of scale.
I'm looking in Proteins by T.E. Creighton (great book). It says a typical globular protein has maybe 30% of its amino acids at turns where they don't
participate in folding bonds. I guess some passable folding algorithms for a given protein could be made by putting together tidbits like these...
although computational chemistry was never my favorite subject.
Creighton's book hints that intracellular proteins are actually folding and unfolding all the time in a cell. I think I remember the profs telling us
the same thing. Strange, but I guess it makes sense if they are only H-bonded and not pegged together with disulfides.
They'd still have to spend more time folded than they would unfolded in order to do anything useful, correct?
EDIT: I should have mentioned that it's mainly the structural / extracellular proteins that have disulfide bonds. Intracellular proteins are more
often strictly H bonded. Yes I'm sure you could find some exceptions to this.
[Edited on 18-5-2007 by Pyridinium]
|
|
Taaie-Neuskoek
Hazard to Others
Posts: 222
Registered: 14-5-2004
Location: Zermany
Member Is Offline
Mood: Botanical!
|
|
Quote: | This is why proteomics-based approaches are essentially doomed to fail. Noone can purify protiens, quantify them in their folded states, in simple
Chip assays, unlike DNA. Each protien is essentially a new chemical on its own. General purification protocols are only applicable to soluble simple
small proteins, but fail as soon as complexes, or larger proteins are involved. |
As someone working in a group where people do a lot of proteomics, I tend to disagree. Ok, the proteome is still a really big black box, but there are
some really interesting developments... now there's the chip-chip array so you can fish transcription factors easier, there are people who made
protein arrays where you can dump your favourite MAPK on and see which proteins are phosphorilated, etc. Now do proteomic people themselve completely
confess that proteomics is still in its very beginning, and that still a hell of a lot needs to be done, but I believe that they'll give some great
tool to scientists... and at least they're honest about it, unlike metabolomics people who promise a hell of a lot more than they actually can do.
In our group we do some some very basic proteomics work, mainly 2D gels and try to find differences between genotypes (talking plants here), which
seems to work well. Sometimes even in-gel assays for native proteins are done successfully. This technique as also been automated, and has/is
delivering quite good results...
Never argue with idiots, they drag you down their level and beat you with experience.
|
|
pantone159
National Hazard
Posts: 591
Registered: 27-6-2006
Location: Austin, TX, USA
Member Is Offline
Mood: desperate for shade
|
|
My BOINC project
I am about to start doing some of this stuff. I will be working on a distributed computing project, also based on BOINC, in my case the goal is to
study cell adhesion and migration by molecular modeling of membrane proteins.
I don't understand the science much yet, at first I will be focused on the programming tasks of getting a placeholder computing system working, which
will take some time, and after that will start to look at distributing our existing codes. (I don't know much about the specific models/codes that
have been used so far. For example, I don't know how they compare to the Folding@home and Rosetta@home models.)
However, I see that some of the people here are really familiar with these kinds of models. I'm interested in pointers on where I can learn how these
models work.
Regarding distributed computing systems... It seems like there are two main distributed computing frameworks that people use, one being BOINC (Berkely
Open Infrastructure for Network Computing), which is used by Seti@home, Rosetta@home, and will be used by our stuff, and Cosm (perhaps also named
Mithril) which is used by Folding@home.
We picked BOINC as it seems to be more heavily used, so under more active development, more feature-filled, and more general. (I think that the exact
same code that you downloaded to run Seti@home could be used to run our stuff, just by pointing it at a different website to get instructions from.)
At some point (not yet), we will be interested in volunteers, mainly for basic testing purposes at first.
So, my main questions are:
1 - Where should I go to read about the models used? I know little specifically about molecular dynamics models, although I do generally understand
the underlying physics.
2 - Any advice regarding distributed computing?
|
|