ABSTRACT
Background and Aim: Little is known about data integration in public health research and its impact. This study aimed to summarize known collaboration information, the characteristics of the datasets used, the methods of data integration, and knowledge gaps.
Materials and Methods: We reviewed papers on infectious diseases from two or more datasets published during 2009- 2018, before the coronavirus disease pandemic. Two independent researchers searched the Medline and Global Health databases using predetermined criteria.
Results: Of the 2375 items retrieved, 2272 titles and abstracts were reviewed. Of these, 164 were secondary reviews. Full-text reviews identified 153 relevant articles; we excluded 11 papers that did not meet our inclusion criteria. Of the 153 papers, 150 were single-country studies. Most papers were from North America (n=47). Viral diseases were the most commonly researched diseases (n=66), and many studies sought to define infection rates (n=62). Data integration usually employed unique national identifiers (n=37) or address-based identifiers (n=30). Two data sources were combined (n=121), and at least one data source typically included routine surveillance information.
Conclusion: We found a growing usage of data integration in infectious diseases, emphasizing the advantages of data integration and linkage analysis, and reiterating its importance in public health emergency preparedness and response.
Keywords: data integration, infectious disease, national policy, public health, scoping review, surveillance.