Data management in substance use disorder treatment research: Implications from data harmonization of National Institute on Drug Abuse-funded randomized controlled trials
Secondary analysis of data from completed randomized controlled trials is a critical and efficient way to maximize the potential benefits from past research. De-identified primary data from completed randomized controlled trials have been increasingly available in recent years; however, the lack of standardized data products is a major barrier to further use of these valuable data. Pre-statistical harmonization of data structure, variables, and codebooks across randomized controlled trials would facilitate secondary data analysis, including meta-analyses and comparative effectiveness studies. We describe a pre-statistical data harmonization initiative to standardize de-identified primary data from substance use disorder treatment randomized controlled trials funded by the National Institute on Drug Abuse available on the National Institute on Drug Abuse Data Share website.
Standardized datasets and codebooks with consistent data structures, variable names, labels, and definitions were developed for 36 completed randomized controlled trials. Common data domains were identified to bundle data files from individual randomized controlled trials according to relevant concepts. Variables were harmonized if at least two randomized controlled trials used the same instruments. The structures of the harmonized data were determined based on the feedback from clinical trialists and substance use disorder research experts.
The harmonized data of randomized controlled trials of substance use disorder treatments can potentially promote future secondary data analysis of completed randomized controlled trials, allowing combining data from multiple randomized controlled trials and provide guidance for future randomized controlled trials in substance use disorder treatment research.