VI.Add I.3. Detection of duplicate cases

Location:

VI.Add I.

Databases should be reviewed regularly to identify duplicates. As a general rule, every newly received ICSR referring to an individual case should be considered a potential duplicate and should be checked thoroughly against the cases that are already present in the database. Therefore, screening for duplicates should be done at the time when a new report arrives in the database i.e. during data entry or during the process of loading ICSRs that have been received electronically. Some IT systems offer lookup and duplicate detection features to assist the identification of an identical case during data entry procedures, based on automated and semi-automated search criteria. Similar tools can be used for e.g. automatic flagging of potential duplicates at the time of importing ICSRs that are received electronically in ICH-E2B(R2) or ICH-E2B(R3) format (see GVP Annex IV – International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) guidelines for pharmacovigilance).

Duplicate searches are generally based on similarities in patient, adverse reaction and medicinal product data. Different search criteria may be suitable for different datasets. For pharmacovigilance systems that do not have to deal with large datasets, a simple table which sorts the reports by age, sex, suspected/interacting medicinal products and adverse reactions can be suitable to detect similarities. Adding ‘country’ to this search can be valuable, depending on the dataset. For cases received in ICH-E2B format, screening of the case ID numbers and duplicate fields (see below for ICHE2B (R2) & (R3) field names and codes) may offer a quick start.

In large databases like EudraVigilance, there is a strong need to eliminate duplicates. Therefore, an initial grouping of ICSRs is performed based on the primary source country, sex and age of the patient. The EudraVigilance algorithm further quantifies the difference of ICSRs from a statistical point of view taking into account additional parameters related to the patient, the primary source, the reported medicinal product(s)/active substance(s) and adverse reaction(s) as well as the fact that case information may vary e.g. due to differences in coding practices.

There are many options for using patient, adverse reaction and medicinal product data and their specific data-elements for duplicate detection purposes. Other data fields (e.g. reaction end/start date) can be used to make the assessment more likely. Whatever algorithm is applied, it should be taken into account that information in the cases may differ, and that the main purpose of this step is to seek for similarities in the cases, thus highlighting potential duplicates for manual review. If no match is found upon the initial search, the search can be broadened e.g. by expanding the criteria to include null values (e.g. a new report concerning a female patient will be checked against other cases with a female patient cases and where the patient’s gender is unknown).

Differences in coding practices based on MedDRA (see GVP Annex IV) can be addressed by taking into account that the medical concepts need to be consistent, rather than searching for an exact match of terms. Furthermore, it is important to be aware of the natural course of reported reactions and that these can become more serious (for example: a rash can develop into a Stevens Johnson Syndrome). Therefore, a search for duplicates can be based on the MedDRA Preferred Term (PT) Level, but moving up to the associated Higher Level Term (HLT) or even HLGT (Higher Level Group Term (HLGT) might be appropriate.

Individual cases originating from clinical trials are usually well-documented and duplicate detection can include other criteria which will be more reliable, e.g. Research centre ID and study details (EudraCT number, protocol number).

It is recommended to carefully validate the duplicate detection algorithms of databases and to evaluate the need for tuning the algorithms over time e.g. the quality/level of details of ICSRs may differ over time. For example, when specific data fields have been made mandatory, these might be considered for inclusion in the duplicate detection algorithm.

It is apparent that duplicates might involve more than two individual cases, and can be considered a cluster i.e. if case A is a potential duplicate of case B and case B is a potential duplicate of case C. Bearing this in mind, throughout this document the term “duplicate cluster” is used to denote two or more cases which have been identified as potential duplicates of each other.

VI. Add I.3.1. What to do if possible duplicates in EudraVigilance have been detected

If, when reviewing cases obtained from EudraVigilance, there is a suspicion that two or more cases are duplicates of one another; the reviewer should send an email to duplicates@ema.europa.eu with information on which cases are suspected to be duplicates. The Agency will not routinely send feedback on whether or not the cases are duplicates. To receive such feedback, the sender of the email should request this in the email.

The information that the Agency needs is either the case numbers (either Worldwide unique case safety IDs or Safety report IDs) or local report numbers (those starting with EU-EC-) of the suspected duplicates in a cluster.

To report suspected duplicates, the agency encourages that the sender sends each suspected cluster of duplicates as a single row in a table similar to the format below:

If the Agency confirms that the cases are duplicates, then, as described in VI.Add I.4.1.2., a master case will be created, with the duplicates merged underneath and the case numbers of the duplicates in the report duplicates section of the master. The master case will be transmitted to EV and, if necessary, rerouted to competent authorities in Member States within the usual rerouting timelines. The master case will be immediately available to marketing authorisation holders for downloading for Level 1 access and will be available the following day for Level 2 access as described in the EudraVigilance Access Policy8 . The awareness date for marketing authorisation holders and competent authorities in Member States of the confirmed duplication will then coincide with the day zero for the master for marketing authorisation holders & competent authorities in Member States.

VI. Add I.3.2. Confirmation of duplicates cases

Upon identification of potential duplicates, a manual confirmation will always be necessary. A welldocumented case, including a case narrative, is a prerequisite to confirm if two cases are duplicates and it is of utmost importance that all stakeholders adhere to the principles set out in GVP Module VI, regarding data quality of individual case safety reports transmitted electronically and duplicate management. This also applies for cases that are reportable in line with Directive 2001/20/EC.

Directive 2001/83/EC, Articles 107a(3) and 107(5) require Member States and marketing authorisation holders, respectively, to collaborate with the Agency, and each other, in the detection of duplicates of suspected adverse reaction reports. In addition GVP Module VI emphasises the need for marketing authorisation holders and competent authorities in Member States to ensure the highest quality of the ICSRs transmitted electronically to the EudraVigilance database within the correct time frames, and which enable the detection and management of duplicate ICSRs in their system. Those transmitted ICSRs should be complete, entire and undiminished in their structure, format and content. Judgement will always need to be applied especially for certain types of medicinal products and adverse reactions such as cases related to vaccines in ‘neonates/infants’ or widely used medicinal products amongst ‘elderly’ patients (e.g. vaccine reports in a ‘neonate’ with an adverse reaction of ‘injection site reaction’, even if the dates of administration, primary source, medical history and concurrent drug fields match, one cannot be certain that reports are true duplicates as it is a common reaction possibly reported for many ‘neonates’ with similar history from the same clinic).

Population of the ‘Linked reports’ section (ICH-E2B(R2) data field ‘A.1.12’)/ICH-E2B(R3) data field ‘C.1.10.r’) with the numbers of other cases that are linked by a common element or elements, but are distinct from one another, is a particularly effective method of enabling confirmation that cases are not duplicates of one another. Conversely, population of the ‘Report duplicates’ section (ICH-E2B(R2) data field ‘A.1.11’/ ICH-E2B(R3) data field ‘C.1.9.1’) with all other reference numbers by which the case is known is a particularly effective method of enabling detection and confirmation of duplicates.

If there is conflicting or limited information, which on first review does not allow determination that the cases are duplicates, additional information from the reporter or sender needs to be sought. It is recommended to keep track of all duplicate investigations, also if cases are confirmed not to be duplicates.

If the individuality of cases cannot be confirmed without compromising legal expedited reporting timelines, it is recommended to enter the potential duplicated case into the database as a valid case. However, investigations to confirm or clarify the information submitted should be continued. Once the individual case is confirmed as a duplicate or otherwise, appropriate steps should be taken to manage the duplicates as described in VI.Add I.4.