Standard for Subject Enrollment/Exclusion
All participants enrolled in the SPORE Head and Neck Neoplasm Project (Epidemiology of Genetic Susceptibility to Head and Neck Cancer) are consented and enrollment criteria are based on following inclusion/exclusion protocols:
Case definition
The inclusion and exclusion criteria, used to define the case series, appear below.
Inclusion Standards (case series)
-
I.
Age 18-79 years on date of first diagnosis of qualifying head and neck cancer. Biopsy proven primary squamous cell cancer at a head and neck site. For the purposes of this research, head and neck cancer sites include primary tumors coding to Chapter 3 (Lip and Oral Cavity), Chapter 4 (Pharynx, including base of tongue, soft palate, and uvula), Chapter 5 (Larynx), or Chapter 6 (Nasal Cavity and Paranasal Sinuses) of the American Joint Committee on Cancer (AJCC) Staging Manual (6th edition) [8].
-
II.
Study enrollment based on acquisition of informed consent and collection of questionnaire data at the time of diagnosis or after one year from the initial diagnosis of the qualifying head and neck cancer.
Exclusion criteria (case series)
-
I.
Age less than 18 years on date of first diagnosis of qualifying head and neck cancer.
-
II.
Age more than 79 years on date of first diagnosis of qualifying head and neck cancer.
-
III.
Absence of a clinical pathology report documenting invasive cancer involving a head and neck site.
-
IV.
Histopathologic diagnosis other than squamous cell carcinoma.
-
V.
More than one year elapsed time since date of first biopsy diagnosis of most recent primary squamous cell carcinoma at a head and neck cancer site.
Diagnoses are subject to verification by a pathologist. The investigators will remove a participant from this research study if the final pathology report does not confirm a provisional diagnosis used for study enrollment purposes. If the final pathology report does not demonstrate primary squamous cell carcinoma of the head and neck, the investigators will remove the participant from the study and his/her data and blood, urine, and saliva sample will be rendered anonymous and destroyed.
Control definition
The inclusion and exclusion criteria, used to define the control series, appear below.
Inclusion criteria (control series)
Age 18-80 years on date of enrollment.
No personal history of cancer at a head and neck site (based on eligibility screening interview and/or review of ENT or Dental Clinic medical record).
-
I.
(If a clinic control) Clinical examination (by personal ENT physician or dentist), without clinical suspicion of head and neck cancer, based on testimony of clinician or review of primary medical records.
Exclusion standards (control series)
-
I.
Age less than 18 years on date of enrollment.
-
II.
Age more than 80 years on date of enrollment.
-
III.
Subject self-report of personal history of cancer at a head and neck cancer site.
-
IV.
(If a clinic control) Indication in ENT or dental clinic record of personal history of cancer at a head and neck cancer site.
-
V.
(If a clinic control) Physical findings, on head and neck clinical examination, that creates a suspicion of cancer at a head and neck cancer site.
SPORE Head and Neck: Tissue Bank
Eligibility standards
-
I.
Participants under the care of doctors in the Department of Otolaryngology at the University of Pittsburgh, as well as healthy controls who are not under the care of our doctors will be accrued.
-
II.
Participants who are at risk for or have developed primary, recurrent, metastatic or second primary cancer of the head and neck will be enrolled. In addition, people with a non-cancerous diagnosis, who may or may not be having surgery, will be asked to participate as controls.
-
III.
Any age 18 years or older. Children are not included since the incidence of head and neck cancer in individuals under the age of 18 is remarkably rare.
-
IV.
Written and informed consent will be obtained. Mentally-impaired adults or those are not capable of understanding the consent information will not be included in this research project.
Standards for Data Collection and De-identification
Development of Common Data elements (CDE)
Common data elements are developed to facilitate annotation of cases at epidemiology, demographics, clinicopathology and follow-up status. CDEs also provide characterization of biospecimens collected at University of Pittsburgh Medical Center (UPMC). The CDEs encompass cancer registry data (treatment, vital status, recurrence etc) at the case level. CDEs provide specimen level pathology data elements to describe tumor staging grade and histological type, along with genotype data elements and block level annotation. The CDEs for standardized data collection were developed from several sources, including the North American Association of Central Cancer Registries (NAACCR)[6] and College of American Pathologists (CAP) [7] cancer protocol and checklist, and the Association of Directors of Anatomic and Surgical Pathology (ADASP)[9] and American Joint Committee on Cancer (AJCC) [8] cancer staging manuals. This was achieved through the combined efforts of domain experts from various specialities using core standards under the supervision of the SPORE Head and Neck Neoplasm Biorepository coordinating committee. The CDEs developed are ISO/IEC 11179 (International Standards Organization/the International Electrotechnical Commission) compliant. CDEs define a number of fields and relationships for metadata registries, including a metadata model for defining and registering items, of which the primary component is a data element [10].
The informatics supported architecture allows the CDEs that are used to annotate the biospecimens to be semantically and syntactically interoperable by describing them in the form of metadata or data descriptors. The individual CDEs are associated with an attribute of an object or concept, and valid value or values. For example, "patient age at diagnosis" is the CDE that consists of "patient" as an object, "age at diagnosis" as an attribute and the representation value domain in "years". To collect data based on approved CDEs, the data managers need to know the fundamental definition of the data element, standards of data collection, mutually accepted values or codes for the data element, and the acceptable data format for inclusion in the central database. Although the concept of formalized metadata is fairly straightforward, it has rarely been taken into account by clinical and research groups building databases [11].
Data Collection, Development of Data Collection Application and Data Transmission
The research nurse coordinator is responsible to consent the patient at the time of physician office visit before the surgery or undergoing surgical procedures. At the time of consent self-administered questionnaires are handed over to the patient to collect information on demographics, risk factor exposures including tobacco and alcohol use, sexual behavior and previous medical history. Structured pathology data is retrieved from the coPath system plus application (Version 3.0.2.74 Cerner DHT, Inc.) in the form of synoptic reports. The surgical pathology report and all histological sections available to the head and neck pathologist are reviewed to correctly categorize each case. The pathologist then selects representative slides and paraffin blocks according to a study standardized protocol. The selected slides show specific features of the case likely to be of interest for scientific investigators. After the pathological data are reviewed, the certified tumor registrars review and extract clinical and follow-up data. The data are collected through the cancer registry information system (CRIS) or manually from the patient medical charts. Collected data are then stored by using common web-based data entry forms that are correlated with the CDEs developed within the head and neck neoplasm organ specific database.
De-identification Process and Honest Broker Concept
The SPORE in head and neck cancer virtual biorepository is structured to protect patient privacy and confidentiality according to Institutional Review Board (IRB) regulations and HIPAA (Health Insurance Portability and Accountability Act) approved protocols. The database discloses only deidentified patient information and displays no links to patient identifiers (name, date of birth, procedure date, therapy date, etc). The only linkage is kept within the institution and the database generates de-identified dataset upon query by the end users (the so-called "safe harbor" approach to HIPAA compliance) [12]. The "safe-harbor" approach involves exclusion of all 18 identifiers enumerated in section 164.514(b) (2) of the regulations. Thus for example, a participant's age is presented as age range, rather than the date of birth, and therapy dates are provided in months from first positive tissue diagnosis to therapy start date rather than presenting a precise calendar date. These are some of the measures adopted to protect the identity of patients while still providing sufficient information for research purposes. All data requests are tracked in the secure SPORE Head and Neck Neoplasm Database regardless of whether the purpose is clinical or research related.
The de-identification process is performed by an honest broker, which acts as a filter between completely identified confidential clinical patient information and the completely de-identified data made available to the research community. An honest broker is an individual, organization or system acting on behalf of the covered entity to collect and provide health information to the investigators in such a manner whereby it would not be reasonably possible for the investigators or others to identify the corresponding patients/subjects directly or indirectly. The honest broker cannot be one of the investigators or researchers. A researcher may use the services of an honest broker service to obtain the Protected Health Information in a de-identified manner. The honest broker service will de-identify medical record information by automated or manual methods. All honest broker services are approved in advance by both the IRB of record and University of Pittsburgh Medical Center (UPMC). If an honest broker service is not part of the UPMC covered entity, a valid business associate agreement with UPMC is executed with UPMC in order to access UPMC-held Protected Health Information for de-identification. If an honest broker service is to be used to obtain de-identified Protected Health Information, this fact must be identified in the study's IRB submission. The honest brokers in this case are individuals who have clinical responsibilities as tissue bankers in the Health Sciences Tissue Bank (HSTB), postdoctoral fellows who manage the pathology data or cancer registry specialists in the UPMC Network Cancer Registry. Based on their clinical job duties, their educational backgrounds and experiences vary. Depending on the nature of the projects, these bankers can work autonomously or collaboratively to meet biospecimen and data needs [13].
Accuracy Assessment of Multimodal Datasets
The SPORE Head and Neck Neoplasm Database has collected essential information, such as patient demographic, pathology, treatment, recurrence and risk factor exposure data, from head and neck neoplasm patients at the University of Pittsburgh Medical Center since 1980. After importing the multimodal data into the head and neck organ specific database, accuracy is assessed by trained and certified personnel, using policies, variable constraints, and logical tests established by the resource. Data are collected by the data managers from electronic sources including the Medical Registry System (MRS), pathology reports from coPath Plus and the social security death index for entry into the database. The evaluation of the collected data is done using the following approach.
The first step is to evaluate discrepancies between the database quality check curators. The primary focus of data accuracy assessment is on tumor record, staging, histology, diagnosis, treatment, recurrences and risk factor exposure data. The selected data fields are categorized on separate error rate for primary, secondary and tertiary priority fields. The error rate for each case is calculated depending upon number of discrepant entries and the number of fields evaluated for a case. Evaluated number of fields and number of discrepant entries for selected cases are used to find the error rate for each discrete priority level data field.
The second step evaluates the accuracy of database entries by comparing them to the electronic data source from which data are collected. Data fields have been divided into high priority, secondary priority, and tertiary priority. For those fields listed as either high priority or secondary priority, the number of deviations from the entry per total number of high priority fields assessed will yield an estimate of the error rate for each priority tier. For high priority fields, which include patient demographic and clinical data and tumor pathologic data, the estimated error rate should not exceed 2%. For secondary priority fields, which include risk factor data, there is a slighter higher allowed rate of 5%. The high priority fields are composed largely of fields extracted from the UPCI tumor registry. The UPCI tumor registry guidelines require that the error rate not exceed 3%. A threshold of 2% was selected for high priority fields as a compromise between a desire for a research quality database with an error rate less than 1% and the practicality of the resources and effort already required to meet the mandated less than 3% error rate guideline. A less than 5% error rate was estimated to provide reasonable quality risk factor exposure data, which have been collected using instruments that have varied to some degree over the time period of collecting of these data. An error rate will also be calculated for tertiary fields; however, no threshold has been set for the tertiary field error rate to initiate a database update. Initially, 1% of the subjects are evaluated. If discrepancies are within error rate guidelines for primary and secondary fields, a further 5% of subjects will be randomly selected using the same strategy, and estimated database error rates will be calculated separately for the first and second priority fields for the 5% sample. If the error rate guidelines are not met for the 1% initial evaluation, a careful analysis of the differences will be performed and discrepancies identified. A quality check of all database subjects will be performed, focusing on discrepant fields. After the quality check has been completed, a second sampling of 1% of subjects will be performed. This sampling will exclude subjects sampled in the prior evaluation. Error rates will be determined, and if error rate guidelines are met, a further 5% of subjects will be evaluated using the same criteria.
The third step involves comparing the data in the database to data in primary sources such as clinical charts and pathology reports. Subject sampling will be performed and data field error rates will be calculated as above. However, due to the amount of time and effort required to review primary records, only 1% of database subjects will be evaluated. In order to best identify differences in electronic versus primary data, these 1% of subjects will be a randomly selected subset of the 5% of subjects assessed for consistency with the electronic database.
Accuracy Assessment of pathological data
The pathology data pertaining to each case of paraffin-embedded tissue block e is entered after complete assessment by trained head and neck pathologists. This is an independent review process in which a series of randomly selected cases are re-reviewed. The data managers randomly select cases for independent review from those added to the head and neck neoplasm database within certain cut-off dates. The independent review material consists of 2 to 5 pathologic matrix slides, defined as slides selected for annotation, for each case. Once the pathologist receives the slides, he/she will evaluate and annotate the matrix slides and histology CDE data for the case using their established processes. The pathologist will then scrutinize the histology CDE data for observer variability and diagnostic error rates. This process occurs biannually and is established to check specimen resource quality. Any discrepancies identified through independent review are communicated to the pathology subcommittee. The pathology subcommittee then discusses these findings in the subsequent general meeting of the coordinating committee through a formal report with recommendations for changes in process as indicated by the independent review findings. Any errors discovered during the independent review process are corrected [14].
SPORE Head & Neck Neoplasm Database
Integrated Informatics Modal
The overall system is designed as a multi-tiered application using Oracle 10 g Database Server, Oracle 10 g Application Server, and MOD_PLSQL, also known as PL/SQL Server, on a Compaq DL360 Server running Win2K with SP. This application utilizes the Oracle HTTP Server and MOD_PLSQL extensions to generate dynamic pages from the database to the users. The database is Oracle 9.2.0.1 Enterprise Edition implemented on a SunFire V880 Server running Solaris 2.8. Approximately four months period of time was spent in the development and deployment of the initial application set up. Application enhancement and expansion are continuous process that is carried out based upon end user requirements.
The informatics model depicts the SPORE Head and Neck Neoplasm Database in the following layers: Schema layer -this consists of concrete data and data relations. All classified data is stored as numbers and keys. Metadata layer - demonstrates data in terms of data elements and "groups of data elements". Data descriptions such as data attributes, display attributes, valid values, DB Link, validation rules and documentation are supported in metadata. The Metadata layer defines the application layer. Procedures/function layer - depicts a set of dynamic functions (in PL/SQL or Java) with control data transformation at the back end. The procedures accommodate changes in the metadata and immediately reflect the changes in the application layer. Application layer (Form builder) - presents a set of "applications" including the metadata dictionary builder and manager, user management, data entry/transfer, query, display, etc. The appearance may differ depending on user privileges. These differences are driven by the metadata and user management.
SPORE Head and Neck Neoplasm Database Model
The SPORE Head and Neck Neoplasm Database consists of the following integrated application layers that maintain data query/entry, data de-identification and the user management module (Figure 1).
i. Presentation Layer
This contains the following components: metadata curation is used by data administrators and CDE curators for registering new CDEs or editing definition of existing CDEs. The administrator security system is used by the application administrators to grant, revoke or limit privileges to new and existing users. Manual annotation is used by honest brokers or domain experts for collecting information regarding patients registered for the study. Data query is used by the honest brokers and research community to run criteria based queries. The query results show identified and de-identified outputs depending on the individual roles and privileges granted by application administrators. This tool provides two levels of access to researchers who participate in the SPORE Head and Neck neoplasm study. The first level of access is to have broker view of the consented patients for their own study and second is a de-identified view on all the patients for other studies for which they do not have access but want to study and analyze overall trends. The data import/export component provides users an option to electronically import preformatted data from existing systems or export data for their desktop analysis (Figure 2).
ii. Metadata Engine
The Metadata Engine is based on the development of Common Data Elements that are used to hold application data structure for data elements/fields as defined by the SPORE Head and Neck Neoplasm Project working group. The HELP builder is used for each data element/field with its detailed definition of business rules and usage. The business rules engine constitutes business rules for how multiple elements can be combined with simple numerical and algorithmic techniques to report complex values for decision support and statistical time sensitive outputs. The mapping engine maps logical and physical layers of design that facilitate data retrieval and storage (Figure 3).
iii. Security Engine
The security engine secures the application at three levels: the first is registration of new user accounts and requesting application roles. Second is authentication byadding/editing user information, and lastly, authorization is granting or revoking user roles and privileges.
iv. Physical Data
The physical database tables are presented in the data warehouse in a three step fashion. First is the application database that holds case data contents in a metadata coded format. Second is the metadata database, which holds metadata definitions and descriptions for all the attributes and values in the system. The third one is a security database which drives the security and authorizations definitions and assignments.