Skip to main content
Effective Health Care Program
Home » Products » Assessing the Accuracy of Google Translate To Allow Data Extraction From Trials Published in Non-English Languages » Accuracy of Data Extraction of Non-English Language Trials With Google Translate

Accuracy of Data Extraction of Non-English Language Trials With Google Translate

Research Report

People using assistive technology may not be able to fully access information in this file. For additional assistance, please contact us.

Structured Abstract


Systematic review prides itself on inclusion of all relevant evidence. However, study eligibility is often restricted to English language for practical reasons. Google Translate, a free Web-based resource for translation, has recently become available. However, it is unknown whether its translation accuracy is sufficient for Evidence-based Practice Center (EPC) systematic reviews. Therefore, we formally evaluated the accuracy of Google Translate for the purpose of data extraction of non-English language articles.


We retrieved 10 randomized controlled trials (RCTs) in eight languages (Chinese, French, German, Italian, Japanese, Korean, Portuguese, and Spanish) and eight observational studies in Hebrew. Eligible studies were RCTs that reported per-treatment group results data (except for Hebrew language studies, where no RCTs were identified). Each article was translated into English using Google Translate. The time required to translate each study was tracked. Data from the original language versions of the articles were extracted by one of 10 fluent speakers who were current or former members of our EPC. The English translated versions of the articles were extracted by one of five current EPC researchers who did not speak the given language. These five researchers also double data extracted 10 English language RCTs. Data extracted included: eligibility criteria, treatment description, study descriptors, quality issues, outcome description, and results. Extractors were also asked to estimate how much extra time was required for extraction compared to a similar English language article. For each study, pairs of data extractions were compared for agreement of each extracted item. We analyzed the percent agreement within sets of studies in each language for each extraction item and for groups of extraction items. We defined “high agreement” as at least 80 percent agreement within an item or article. The degree of agreement for each language was compared with that of the English language study comparisons with nonparametric tests.


The length of time required to translate articles ranged from seconds (51 articles, 58 percent) to about 1 hour. Assessment by the English language data extractors indicated that “a little” extra time was required for 40 articles (45 percent) and “a lot” for 42 (48 percent). When evaluating all extraction items together, Portuguese and German articles had the best agreement between original and translated extractions, with high agreement between extractors among about 60 percent of the items, compared with 80 percent in English articles. Spanish, Hebrew, and Chinese had the lowest agreement (30 percent, 24 percent, and 8 percent, respectively). The absolute agreement and the proportion of items with high agreement were statistically significantly worse for all languages, compared with English. Eight of 10 English language articles had high agreement for all items; compared with 7 of 10 Portuguese articles; 6 of 10 German articles; 4 of 10 French, Italian, and Korean; 3 of 8 Hebrew articles; 3 of 10 Japanese and Spanish articles; but no Chinese articles.


Translation was not always possible, but generally required few resources. Across all languages, data extraction from translated articles was less accurate than from English language articles. Accurate extraction was possible for some articles in all languages, except Chinese, with Portuguese and German articles yielding the most accurate extractions. Use of Google Translate has the potential of being an approach to reduce language bias; however, reviewers may need to be more cautious about using data from these translated articles.