The Link Between Google and Chemistry
Google PageRank is perhaps not the first network modeling tool to which you would expect chemists to turn in their quest to understand complex systems such as the hydrogen-bonding network of water molecules and how this influences solvation. However, Aurora Clark, Washington State University, USA, has done just that. She has adapted the original formula devised and patented by Google/Stanford University (Eq. (1), ) that orders the web pages in the search engine results to allow her to not only reduce the chemist’s overall workload, but to gain new insights into the way molecules interact either at the bulk level or by zooming in on molecular clusters.
Put simply, Google’s original PageRank (PR) formula counts the number of hyperlinks connecting individual pages on the web. It calculates the “relevance” of an individual page depending on how many pages link to it and how many pages link to those pages in turn. A page that is linked to by many pages with high PageRank receives a high rank itself. If there are no links to a web page there is no support for that page. A page with lots of links to it will be ranked higher, especially if those links come from pages that are themselves linked to by lots of other pages. A link to one’s blog from the American Chemical Society website, which is widely linked by other pages, would thus be worth more to your chemistry blog than a link from another chemistry blog and would incrementally increase your PR.
“PageRank is not about just the number of links a page has, but the number of links that the pages connecting a page have”, Clark says. What she and her colleagues have done is to make an analogy with PR in the world of water’s hydrogen bonds. “We replace the ‘web page’ with ‘water molecule’ and the ‘hyperlink’ with ‘H-bond’,” she told ChemViews magazine. The analogy follows from web pages and their links so that when water is organized in a certain way, it has a certain number of H-bonds, and those water molecules connected to it have a certain number of H-bonds and so on. “Bulk water thus has a certain fingerprint if you plot the page rank versus the frequency of occurrence”, Clark adds. This provides a map of the entire water network. “That fingerprint changes in a statistically significant way when you have a solute in the water”, she says.
Water Organization about a Solute
In biology, as well as in chemistry in general, water can perform key chemical and biochemical functions, such as facilitating protein folding or organizing around solutes to dissolve a molecule. The processes involved are not only dazzlingly complex, they are fleeting and change within tiny fractions of a second. The PageRank algorithm is very efficient and can assess an enormous number of pages/molecules quickly and rapidly characterize the connections between them.
“If you want to zoom in from the macroscopic to the molecular scale, and look at the PageRank of the solute, the organization of water about the solute gives the PageRank of the solute a certain value. If water is arranged in one way (shape) as opposed to another, then the PageRank of the solute is different”, explains Clark. “PageRank can thus be used as a structural characterization tool to elucidate water organization about a solute”, she adds.
Clark and her colleagues have run computer simulations of solutes in water, gathering together the statistical PageRanks of the solute so that they can watch for chemical reactions that occur and plot the PageRank vs. chemical reactivity. “What we found is that certain PageRanks, certain shapes/organizations of water about the solute, lead to the solute undergoing a chemical reaction, or not as the case may be.” Chemists are well aware that geometric organization often leads to chemical reactivity and so there is no reason to think that PageRank cannot be correlated with other types of reactivity as well.
Ultimately, the team hopes to develop their analog of the Google PageRank algorithm to help in drug discovery and investigate the problems of protein misfolding, which is common in a wide range of diseases from Alzheimer’s and Parkinson’s diseases to prion diseases, such as Creutzfeld-Jakob disease, BSE (so-called mad cow disease), and scrapie in sheep. They also suggest that the same technology might have applications in tracking and monitoring the impact of environmental pollutants including radioactive isotopes in waterways.
Figure 1. Aurora Clark, Associate Professor of Chemistry at Washington State University, USA, has adapted Google’s PageRank software to determine the way molecules are shaped and organized.
Credit: Washington State University
More information on how Clark et al. have adapted Google PageRank to what they refer to as the “moleculaRnetworks” software can be found on the Aurora Clark Group website along with download links. The research is funded by the US DOE Basic Energy Sciences.
- MoleculaRnetworks: an Integrated Graph Theoretic and Data Mining Tool to Explore Solvent Organization in Molecular Simulation,
B. L. Mooney, L. R. Corrales, A. E. Clark,
J. Comp. Chem. 2012.
- Determining Polyhedral Arrangements of Atoms Using PageRank,
M. Hudelson, B. L. Mooney, A. E. Clark,
J. Math. Chem. 2012, in press.
- Novel Analysis of Cation Solvation Using Graph Theoretic Approaches,
B. L. Mooney, L. R. Corrales, A. E. Clark,
J. Phys. Chem. B 2012, in press.
 S. Brin, L. Page, in Proc. 7th Int. Conf. World Wide Web (WWW) (Eds. P. H. Enslow, A. Ellis), Elsevier, Amsterdam, 1998.