IX.Add I.2. Statistical methods

Location:

IX.Add I.2.

When the accrual to the database is too large to allow individual scrutiny of all incoming ICSRs, it is useful to calculate summary statistics on (subsets of) the data that can help to focus attention on groups of ICSRs containing an adverse reaction. Generally such statistics are used to look for high proportions of a specific adverse event with a given medicinal product, compared to the reporting of this event for all other medicinal products (disproportionate reporting). Sudden temporal changes in frequency of reporting for a given medicinal product may also indicate a change in quality or use of the product with adverse consequences (which could include a reduction in efficacy).

IX. Add I.2.1. Disproportionate reporting

Disproportionality statistics take the form of a ratio of the proportion of spontaneous ICSRs of a specific adverse event with a specific medicinal product to the proportion that would be expected if no association existed between the product and the event. The calculation of the expected value is based on ICSRs that do not contain the specific product and it is assumed that these ICSRs contain a diverse selection of products most of which will not be associated with the event. Hence the reporting proportions for events in these ICSRs will reflect the background incidence of the event in patients receiving any medicine. There are a number of different ways to calculate such statistics and this choice is the first step involved in designing a statistical signal detection system.

When an adverse event is caused by a medicine, it is reasonable to assume that it will be reported more often (above the reporting rate associated with the background incidence), and hence the reporting ratio will tend to be greater than one. Thus high values of the ratio for a given DEC suggest further investigation may be appropriate. In practice a formal set of rules, or signal detection algorithm (SDA) is adopted. This usually takes the form of specified thresholds that the ratio or other statistics must exceed, but more complex conditions may also be used. When these rules are satisfied for a given DEC, it is called a signal of disproportionate reporting (SDR). Then a decision needs to be made regarding whether further investigation is required.

A further decision needs to be taken as to whether the statistics are to be calculated across the whole database or if modifications based on subgrouping variables would be of value. While the decision is motivated by theoretical consideration, the specific choice of whether to use subgroups and, if so, which to use, should be based on empirical assessment of signal detection performance. In particular the impact on the false positive rate should be considered. Whether the database is sufficiently large to avoid very low case counts within subgroups may also be a factor in this decision.

IX. Add I.2.1.1. Considerations related to performance of signal detection systems

The performance of signal detection systems, including the SDA, can be quantified using three parameters that reflect the intended objective of the system. Desirable properties are:

high sensitivity (the proportion of adverse reactions for which the system produces SDRs);
high positive predictive value or precision (the proportion of SDRs that relate to adverse reactions);
short time to generate SDRs (that can be assessed from a chosen time origin, possibly the granting of a marketing authorisation to the first occurrence of an SDR for an adverse reaction).

Estimates of these performance parameters depend on the particular reference set (4) of known adverse reactions selected for their evaluation and are also not fixed because spontaneous reports accumulate over time. They are thus best used as relative measures for comparing competing methods of signal detection within the same spontaneous reporting system at the same point in time.

The following factors may affect the performance of signal detection systems:

MedDRA hierarchy

A precondition for automated screening of DECs for adverse reactions is the availability of schemes for classifying adverse events and medicinal products. The nature and granularity of these schemes affects the performance of the screening. MedDRA (see GVP Annex IV), used for reporting suspected adverse reactions for regulatory purposes, provides terms for adverse events and classifies them in a multiaxial hierarchical structure and a choice must be made whether to screen at one level of granularity (e.g. SOC, HLT, PT) or several and whether to include all terms or only a subset. Screening at the second finest level of granularity, i.e. Preferred Term (PT), has been shown to be a good choice in terms of sensitivity and positive predictive value (5) .

Finally, focus of statistical signal detection on adverse events considered clinically most important avoids time spent in assessments that are less likely to benefit patient and public health. A subset of MedDRA terms judged to be important medical events (IMEs (6) ) is thus considered a useful tool in statistical signal detection when filtering results for medical review.

The remarks above relate to routine signal detection and not to targeted monitoring of potential risks associated with specific products where ad hoc use of other levels of MedDRA terms may be appropriate. In addition, although no formally defined MedDRA term subgroups (e.g. HLT, SMQ) have proven better for signal detection than the PTs, some of them are effectively synonymous. The definition of a synonym in this context is the pragmatic one, i.e. that two PTs are considered synonyms if it is reasonable to suppose that a primary reporter of a suspected adverse reaction, presented with a single patient and without a specialist evaluation, would not necessarily be able to decide which term to use. It may also be appropriate to combine such terms when they relate to identified areas of interest.

Thresholds

The SDA applied to the summary statistics for each DEC usually takes the form of a set of threshold values such that SDRs occur only if each statistic exceeds its corresponding threshold. Very low thresholds will result in large, and potentially unmanageable, numbers of SDRs to investigate with a higher probability of being false. This will also reduce the resources available for assessment of true SDRs. Too high thresholds will result in identification of adverse reactions being delayed or even entirely prevented. Thus the appropriate choice of thresholds is fundamental to the success of the statistical signal detection system.

This has also been confirmed by studies comparing different disproportionality methods and different sets of threshold showing that the former can achieve similar overall performance by choice of appropriate SDA. Therefore, in contrast to the choice of disproportionality statistic, it is the choice of SDA to define a SDR that will strongly influence signal detection performance (4) .

Thresholds for disproportionality methods are usually based on two separate indicators, one reflecting the disproportionality statistic itself and another based on the number of ICSRs received. For a reason discussed later, limiting false positives is better handled by raising the threshold for the number of ICSRs than that for the disproportionality statistic. For the disproportionality statistic, in practice, rather than the point estimate, a formal lower confidence bound is often used. The rationale for its use is that when the statistic is based on few ICSRs, it falls further below the point estimate and makes an SDR less likely. Hence, this is an intuitive way of incorporating into the signal detection process the degree of confidence about the reliability of the data. It has also been shown that a threshold based on the lower confidence bound performed better alone than with an additional threshold for the absolute value of the disproportionality statistic itself (4) .

In addition, it has been shown that a correlation exists between the value of a disproportionality statistic and the relative risk of an adverse reaction when exposed to the medicinal product estimated in epidemiological studies (7) , therefore setting any threshold on the lower confidence bound of the disproportionate statistic above 1 might lead to missing an adverse reaction for which the risk ratio is not high.

Finally, there appears to be a reduction in positive predictive value with a medicinal product’s time on the market, hence it might be more efficient to vary the amount of effort to invest in signal detection over the life-cycle of the product. This might involve the use of differing thresholds to define an SDR depending on the time of the product on the market (4) .

Periodicity of monitoring

A one-month interval between consecutive data summaries has been investigated in validation studies for signal detection methods. More frequent monitoring has also been used, for instance for medicinal products under additional monitoring or during intensive vaccination programmes. The appropriate frequency of monitoring may vary with the accumulation of knowledge of the risk profile of a specific active substance/medicinal product (see GVP Module IX).

Spontaneous ICSR databases

The performance has also been shown to depend on the nature of the spontaneous ICSR database and this appears to be related to the range of medicinal products included in the database (4) .

An important inference from these considerations is that organisations doing signal detection should assess the performance of a signal detection system directly on the database to which it will be applied. This will allow the ability to detect new adverse reactions and the work load involved to be predicted and controlled by appropriate changes to the SDA. As databases evolve in terms of numbers of ICSRs included and their mix of medicinal products, periodic reassessment of performance should be undertaken.

Subgroup analysis and stratification

Spontaneous ICSR databases cover a range of medicinal products with different indications and that are used across a broad range of patient populations. Also, ICSR reporting patterns vary over time and between different geographical regions. Many quantitative signal detection algorithms disregard this diversity which may result in an SDR either being masked or in an association being incorrectly flagged as a signal.

Stratification and subgroup analysis are generally used in epidemiology to reduce bias due to confounding and may also have advantages in statistical signal detection. By subgroup signal detection here is meant analyses carried out to detect ADRs within specific ICSR subgroups e.g. by indication, age group, region or time period. Stratification involves combining results from within different subgroups to obtain an adjusted result for the whole dataset.

The comparison of stratified versus subgroup analysis has shown that the subgroup analysis consistently performed better. Moreover, subgroup analysis has also shown to provide clear benefits in both sensitivity and precision over crude analyses for large international databases (8) . However, such benefits may not be obtained in small databases.

Subgrouping variables that showed the most promising results included age and reporting region/country, but it is likely that choice of variables for subgroup analyses varies according to the database.