Meta-analysis with Python: a practical tutorial

Data

We used the data set provided by Higgins et al. [11]a subset of data belonging to the Cochrane study entitled “haloperidol versus placebo for schizophrenia” [12]. The data set includes 17 different clinical trials to compare the effectiveness of haloperidol with placebo [12]. Data from the Cochrane study are publicly available [11, 13].

The variables

The following variables and labels (in parentheses) have been specified for each of these trials: author (author), year of publication (year), patients with haloperidol (resp.h), placebo (resp. P), haloperidol who do not respond. (fail.h) and placebo that do not respond (fail. p). The data set also transmits two additional variables, labeled drop.hi drop.p, to designate the haloperidol and placebo drop arms. PythonMeta to perform a meta-analysis needs four input variables that respond to haloperidol (resp.h), that respond to placebo (resp. P) and the total number of haloperidol (Th) and the total number of the placebo group ( Tp). Accordingly, we modified the dataset to facilitate its future use with Python. The modified dataset is available to readers (additional file 1).

The result of interest is the clinical improvement measured as a risk ratio (RR), which serves as a measure of the effect selected for the synthesis of evidence in this study. RR higher than unit suggests the efficacy of haloperidol against placebo [13].

Methods of meta-analysis

Fixed effects models assume a fixed effect size between studies. On the other hand, random effect models allow the size of the effect to vary from one study to another. While understanding the conceptual differences between the two models is crucial to model selection, the discussion goes beyond the scope of this article. For a quick review of the basics of meta-analysis, we highly recommend the paper by Bornstein et al. [14]. It is important to note that the analyst needs an adequate level of familiarity with the statistical methods used to estimate these models. [15]. In PythonMeta, the default method for the fixed effects model is Mantel – Haenszel (MH), which can be changed to “Peto” and “IV” for inverse variance. The package provides a method for estimating random effects to obtain variance between studies (tau2) using the DerSimonian and Laird (DL) method.

Stages of analysis

Step 1: Install the program and read the data

To perform a meta-analysis in Python, you must install PythonMeta (V.1.23) using “pip install PythonMeta” (Reference: enabled [16]). After installing the package, the Help () function displays PythonMeta help information. PythonMeta provides evidence-based medicine (EBM) tasks, such as: combination of OR (Probability Ratio), RR (Risk Ratio), RD (Risk Difference) effect measures for count data, and MD (mean difference), SMD (standardized mean difference) for continuous data; Heterogeneity test (square Q / Chi test); Subgroup analysis and plot drawing including forest plot, funnel plot [16]. Pymeta is an online version of the PythonMeta tool ( [10].

After preparing the dataset (see the “variables” section above), the dataset located in the same file directory as the Python scripts can be loaded directly via readfile (“Haloperidol.text”) [16]. Note that PythonMeta offers a web-based application, which facilitates direct data entry and provides some additional analytics. [9].

Step 2: Generation of the main results

First, we selected the binary result (“CATE” in PythonMeta) and the risk ratio (“RR”) as the desired effect size. Other options are continuous (“CONT”) for the Interest Result and Odds Ratio (“OR”) and Risk Difference (“RD”) for the desired effect size. Second, we preferred to run both fixed-effects and random-effects models. This election was for demonstration purposes. However, our a priori hypothesis was compatible with the latter. In the third step, we selected MH (Mantel – Haenszel) to run the fixed effect and DL (DerSimonian and Laird) to run the random effects models. Forest plots and funnel plots are the main results of this analysis step. Default Python scripts can be updated to generate cleaner, more informative images [16].

Step 3. Evaluate the impact of missing data

To understand the impact of the missing data, we cleared the data set using a simple code available in Additional File 2. After preparing the data set, studies with missing and lost patients were labeled with “name = Missing “i”< subgrup>name = not missing ”, and we analyzed them as subgroups. The dataset is available in Additional File 3.

It is common to impute the data set in several ways to assess the impact of completed data on results. Unlike R, Python meta-analysis packages do not handle an inclusive list of standard methods of data imputation that are missing. Therefore, we have added a selection of data imputation methods that are missing after the meta-analysis in this article. The methods are the available case study (ACS), the imputed case analysis (ICA) and the best and worst of the cases. ICA-0 is the designation under the assumption that none of the missing participants are experiencing the event. ICA-1 assumes that all missing participants are experiencing the event. In addition, we used ICA-b for the best case scenario, assuming that all missing participants from the experimental group and none from the control group experienced the event. ICA-w, used for the worst case, is the opposite of ICA-b [11]. To create a dataset for each method, as mentioned above, we used the original Cochrane dataset with six variables. (resp.h, fail.h, drop.h, resp.p, fail.p, drop.p) (Additional file 4) and wrote code for each method. We then ran a separate random effects model with method IV in each. With the zEpid package, we generated the relevant forest plots [17].

Step 4: Evaluation of the effect of the small study

The effects of small studies occur when small studies, relative to larger ones, demonstrate different, often larger, treatment effects. Funnel diagrams are a standard way to show this effect by measuring their symmetry [15, 18]. When assessing the asymmetry of funnel diagrams, several tests such as the Egger test indicate whether the association between the estimated effects and the size of the study is greater than expected. occurs by chance. [15, 18]. There are complementary methods to improve the evaluation of the effects of small studies and to perform sensitivity analysis of the results; however, Python packages do not offer these expanded analyzes. We perform the Egger test by applying the linear regression Statsmodels.

Comparison with R and STATA

We used STATA (Version 16. College Station, TX: StataCorp LLC) and R (R Core Team, 2021) for comparison of results. Balduzzi et al. [13] and Chaimani et al. [19] used the same data set we used in the current study to perform a meta-analysis. We used the respected STATA and R scripts that these authors provided to obtain the results of this comparison.