Determining genetic variants associated with safety and efficacy in oncology
We performed a meta-analysis of six clinical trials of a licensed biological therapy. Patients had been genotyped for a series of genetic markers in genes in the VEGF pathway, and we fitted Cox Proportional Hazards models to determine associations with efficacy and safety. Elastic Net analyses and gene-wise tests were run; we created graphical displays such as forest plots and Kaplan Meier curves, and two manuscripts were published (see ‘News’).
Characterization of a novel diagnostic test
A novel diagnostic kit is in development for over-the-counter use. A risk score had been developed in the training set, and we used data from independent patients to characterize its performance against that of other scores in current standard use. We performed Receiver Operating Characteristic (ROC) curve analysis, and estimated Area under the Curve (AUC) with 95% confidence intervals. We showed that the AUC of the new method was significantly better than other scores currently in use; we wrote up the results in a draft manuscript.
Genome-wide Scan (GWAS) for efficacy
We conducted a cumulative meta-analysis of a series of clinical trials of a vaccine, in order to determine genetic variants associated with efficacy. In total, over a million genetic markers were genotyped in almost 1700 patients, and efficacy was defined in six different ways. Quality checks were performed, ancestry analyses were conducted, and covariates were selected. Non-rare genetic markers were tested singly for association; QQ-plots and Manhattan plots were generated and markers showing suggestive and/ or genome-wide significance were highlighted. Combined rare and non-rare markers were analyzed using an agglomerative test. All methods and results were written up in a formal report, and a slide presentation was delivered; a manuscript is planned.
Development of a Clinical Decision Tool
A series of computationally intensive data mining methods were applied to determine whether clinical variables available during the first weeks of treatment are predictive of efficacy at the end of a 24-week trial. The performance of different data mining methods was determined by calculation of sensitivity, specificity, positive and negative predictive value in independent clinical trial data. A manuscript is in preparation to describe the methodology and its potential in the development of companion diagnostics.
Analysis of Multi-‘Omic Data
We were presented with a cohort of patients comprehensively profiled for lipidomics, metabolomics, proteomics, DNA, mRNA, miRNA and methylation. We performed quality checks and investigated biomarkers, platform by platform, for association with disease, adjusting for covariates. The next stage (in progress) is to conduct cross-platform analyses in order to build mechanistic hypotheses that it is hoped, will fuel novel drug development.
We were presented with a series of database tables arising from an electronic data capture (EDC) system that had been implemented across fifteen clinical sites across Europe. We performed comprehensive quality checks on the data using statistical and graphical techniques. We created scatter plots to show how data were distributed site by site, and showed for example, that some sites appeared to be using different measurement units for key hematology measures. We reported a list of queries back to the sites for resolution, and also automated our process, to allow the checks to be run repeatedly.
Data Management: sorting, cataloguing and filing
We received 2617 health questionnaires scanned into 746 pdf-files. Most of the electronic files contained multiple questionnaires; often the orientation was upside-down; sometimes multiple sheets were scanned to 50% size; pages were mixed up; some files did not pertain to the project in hand. We organized the documents into a logical directory tree, ensuring that each file contained a single questionnaire for a single patient, with all files named accordingly. The final, organized set of electronic files were checked against an inventory from an EDC system, allowing missing documents to be pursued.
Data Management: double data entry
Double data entry has been performed for thousands of scanned health questionnaires. The same process is always followed. Two data managers independently input questionnaire responses into a common template. Upon completion of the template, an adjudicator (statistician) generates an inconsistency list. Next, both data managers independently check all rows that have been flagged and make fixes as appropriate. A second inconsistency list is generated and the final few inconsistencies are manually resolved by the adjudicator, to arrive at an accurate data set for return.