Pseudo amino acid paper pseudo new — and now retracted
The Journal of Computational Chemistry is retracting a 2011 paper by a group of Chinese researchers for duplication.
The article was titled “Predicting Protein Folding Rates Using the Concept of Chou’s Pseudo Amino Acid Composition.” According to the notice:
The following article from the Journal of Computational Chemistry, ‘‘Predicting Protein Folding Rates Using the Concept of Chou’s Pseudo Amino Acid Composition,’’ by Jianxiu Guo, Nini Rao, Guangxiong Liu, Yong Yang, and Gang Wang, published online on 15 February 2011 in Wiley Online Library (wileyonlinelibrary.com), has been retracted by agreement between the authors, the journal’s editors, and Wiley Periodicals, Inc. The retraction has been agreed due to significant overlap with respect to another article, ‘‘Predicting Protein Folding Rate from Amino Acid Sequence,’’ published in Progress in Biochemistry and Biophysics (2010, 37, 1331) and authored by a subset of the present authors.
The paper has been cited 14 times, according to Thomson Scientific’s Web of Knowledge. Here’s the abstract of the article from Medline:
One of the most important challenges in computational and molecular biology is to understand the relationship between amino acid sequences and the folding rates of proteins. Recent works suggest that topological parameters, amino acid properties, chain length and the composition index relate well with protein folding rates, however, sequence order information has seldom been considered as a property for predicting protein folding rates. In this study, amino acid sequence order was used to derive an effective method, based on an extended version of the pseudo-amino acid composition, for predicting protein folding rates without any explicit structural information. Using the jackknife cross validation test, the method was demonstrated on the largest dataset (99 proteins) reported. The method was found to provide a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.81 (with a highly significant level) and the standard error is 2.46. The reported algorithm was found to perform better than several representative sequence-based approaches using the same dataset. The results indicate that sequence order information is an important determinant of protein folding rates.
The Progress in Biochemistry and Biophysics paper has three of the same authors — Guo, Rao and Liu — as well as two different names, Jie Li and Yun-He Wang. Rao, the corresponding author, is listed as being with the National Natural Science Foundation of China. Here’s the abstract from the article, which was submitted in July 2010 and published online in February of 2011:
Prediction of protein folding rate is one of the most important challenges in contemporary biophysics. Over the past few years, many researchers have devoted great efforts to reveal the major determinants of protein folding rate, and many parameters and methods have been proposed successively. However, the interaction of amino acids and the sequence order information have never been considered as a property for predicting protein folding rates. It was proposed a novel method, which adopted Chou’s pseudo-amino acid composition to extract the sequence order information, used Monte Carlo method to choose the optimal feature factors, and established the linear regression model to predict the protein folding rate. This novel method can predict protein folding rate from amino acid sequence without any knowledge of the tertiary or secondary structure, or structural class information. Using the Jackknife cross validation test, for the largest dataset yet studied including 99 proteins, it was found that the predicted folding rates correlated well with the experimental values; the correlation coefficient is 0.81, and the standard error is 2.54. The prediction quality is excelled with most existing sequence-based methods. The result implies that the sequence order information plays an important role in protein folding.
But it seems the pseudo-scientific publishing doesn’t end there. In our search for the PBB paper, we found another article by Guo and Rao with the same title in a third publication, the Journal of Bioinformatics and Computational Biology. Submitted in March 2010 and published in February 2011 — and cited twice since then — its abstract is very close to that of the PBB paper, although not identical:
Predicting protein folding rate from amino acid sequence is an important challenge in computational and molecular biology. Over the past few years, many methods have been developed to reflect the correlation between the folding rates and protein structures and sequences. In this paper, we present an effective method, a combined neural network — genetic algorithm approach, to predict protein folding rates only from amino acid sequences, without any explicit structural information. The originality of this paper is that, for the first time, it tackles the effect of sequence order. The proposed method provides a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.80 and the standard error is 2.65 for 93 proteins, the largest such databases of proteins yet studied, when evaluated with leave-one-out jackknife test. The comparative results demonstrate that this correlation is better than most of other methods, and suggest the important contribution of sequence order information to the determination of protein folding rates.
Originality. First time. Sequence order. Oh my!
It seems pretty clear that this mess was no accident. Three virtually identical papers submitted to different journals at roughly the same time is a fishing expedition. And it’s hard to see how the editors of each publication could have picked up on the duplication before the papers were published.