/
CanRisk Pedigree Data v3 File Format Specification

CanRisk Pedigree Data v3 File Format Specification


Description

CanRisk Pedigree Data v3 Files contain the target individual's risk factors (optional) in the header followed by pedigree data (mandatory) as a series of pedigree data records, one for each family member. The following sections describe the format of the File Header and the Pedigree Data records.

File Header

The File Header can optionally include the following breast and/or ovarian cancer risk factors for the target individual:

VARIABLE NAME

 

RISK FACTOR DESCRIPTION

ALLOWED VALUES

VARIABLE NAME

 

RISK FACTOR DESCRIPTION

ALLOWED VALUES

Ethnicity

†‡

UK based ethnic groups, The ethnicity categories here are based on those recommended by the Office for National Statisitcs for the UK. See table below.

ethnic group;ethnic background

Menarche

Age at menarche

NA=unspecified, integer=age at menarche

Parity

†‡

Parity

NA=unspecified, integer=number of children

First_live_birth

Age at first live birth

NA=unspecified, integer=age at first live birth

OC_use

†‡

Use of oral contraception

NA=unspecified, N=Never, F:Years=Former use, C:Years=Current use; F and C are followed by integer=number of years taken e.g. F:4 means former use of 4 years

MHT_use

†‡

Use of menopause hormone therapy

NA=unspecified, N=Never used, F=Former use, E=Current E-type use, C=Current other/unknown type (including combined type) use

BMI

Body mass index

NA=unspecified, real number=body mass index

Alcohol

Daily alcohol intake in grams per day

NA=unspecified, real number=daily alcohol intake in grams

Menopause

Age at menopause

NA=unspecified, integer=age at menopause or 'N' if pre-menopause

BIRADS

Mammographic density measured by BI-RADS

NA=unspecified, BI-RADS classification (a, b, c, d or 1, 2, 3, 4)

Stratus

Continuous mammographic density measured by Stratus

NA=unspecified, percentage density

Volpara

Continuous mammographic density measured by Volpara

NA=unspecified, percentage density

Height

†‡

Height in cm

NA=unspecified, real number=height in cm

TL

Tubal ligation procedure

NA=unspecified, N=No, Y=Yes

Endo

Endometriosis

NA=unspecified, N=No, Y=Yes

PRS_BC

Polygenic Risk Score (Breast Cancer)

e.g. alpha=real number, zscore=real number

PRS_OC

Polygenic Risk Score (Ovarian Cancer)

e.g. alpha=real number, zscore=real number

† Breast Cancer Risk Factor ‡ Ovarian Cancer Risk Factor

The PRS_BC and PRS_OC values are:

  • alpha - the square root of the proportion of the overall polygenic variance explained by the PRS. A real number between 0 and 1.

  • zscore – the standard normal PRS.

UK based Ethnic Groups and Background

Ethnic group

Ethnic background

Ethnic group

Ethnic background

White

English/Welsh/Scottish/Northern Irish/British
Irish
Gypsy or Irish Traveller
Any other White background, please describe

Mixed/Multiple ethnic groups

White and Black Caribbean
White and Black African
White and Asian
Any other Mixed/Multiple ethnic background, please describe

Asian or Asian British

Indian
Pakistani
Bangladeshi
Chinese
Any other Asian background, please describe

Black or Black British

African
Caribbean
Any other Black/African/Caribbean background, please describe

Other ethnic group

Arab
Any other ethnic group, please describe

Unknown

 

 

All header lines begin with '##'. Any missing risk factor variables are taken as unspecified. The risk factors are given as one per line, the variable names can be any case and in any order in the header. The following example header shows the first mandatory header record ##CanRisk 3.0 followed by some of the optional risk factors for the target in the pedigree. The last line in the header is the second mandatory pedigree data column header record beginning ##FamID Name….

Example CanRisk File Header

##CanRisk 3.0
##ethnicity=Black or Black British;African
##Menarche=13
##Parity=1
##First_live_birth=24
##OC_use=C:2
##BMI=27.1
##height=170
##alcohol=5.1
##PRS_BC=alpha=0.45, zscore=1.8
##FamID Name Target IndivID FathID MothID Sex MZtwin Dead Age Yob BC1 ….......

Pedigree Data

The CanRisk Pedigree Data Format is a simple TAB-delimited text format. CanRisk pedigree data files consist of the two mandatory header records followed by a series of pedigree data records, one for each family member. The pedigree data records include 27 parameters (data columns) separated by a single TAB (or whitespace) character.

Parameters 1-27 on the pedigree data records are defined as follows:

  1. FamID Family/pedigree ID, character string (maximum 13 characters)

  2. Name First name/ID of the family member, character string (maximum 8 characters)

  3. Target The family member for whom the CanRisk risk calculation is made, 1 = target for risk calculation, 0 = other family members.

  4. IndivID Unique ID of the family member, character string (maximum 7 characters)

  5. FathID Unique ID of their father, 0 = no father, or character string (maximum 7 characters. Each family member must have either: (1) no parents specified (e.g. see family member '103' in the pedigree 'Example without risk factors' in the accompanying FAQ CanRisk Pedigree Data File Examples), or (2) both parents specified (e.g. see family member '201' in the same pedigree).

  6. MothID Unique ID of their mother, 0 = unspecified, or character string (maximum 7 characters)

  7. Sex M or F

  8. MZtwin Identical twins, 0 = not an identical twin, Use one of these characters to identify MZ twins: 1 2 3 4 5 6 7 8 9 A

  9. Dead The current status of the family member, 0 = alive, 1 = dead

  10. Age Age at last follow up, 0 = unspecified, integer = age at last follow up

  11. Yob Year of birth, 0 = unspecified, or integer (consistent with Age if the person is alive)

  12. BC1 Age at first breast cancer diagnosis, 0 = unaffected, integer = age at diagnosis, AU = unknown age at diagnosis (affected unknown)

  13. BC2 Age at second (contralateral) breast cancer diagnosis, 0 = unaffected, integer = age at diagnosis, AU = unknown age at diagnosis (affected unknown)

  14. OC Age at ovarian cancer diagnosis, 0 = unaffected, integer = age at diagnosis, AU = unknown age at diagnosis (affected unknown)

  15. PRO Age at prostate cancer diagnosis 0 = unaffected, integer = age at diagnosis, AU = unknown age at diagnosis (affected unknown)

  16. PAN Age at pancreatic cancer diagnosis 0 = unaffected, integer = age at diagnosis, AU = unknown age at diagnosis (affected unknown)

  17. Ashkn Ashkenazi status, 0 = not Ashkenazi, 1 = Ashkenazi

  18. BRCA1 BRCA1 genetic test type:result; type 0=untested, S=mutation search, T=direct gene test; result 0=untested, P=positive, N=negative

  19. BRCA2 BRCA2 genetic test type:result; type 0=untested, S=mutation search, T=direct gene test; result 0=untested, P=positive, N=negative

  20. PALB2 PALB2 genetic test type:result; type 0=untested, S=mutation search, T=direct gene test; result 0=untested, P=positive, N=negative

  21. ATM ATM genetic test type:result; type 0=untested, S=mutation search, T=direct gene test; result 0=untested, P=positive, N=negative

  22. CHEK2 CHEK2 genetic test type:result; type 0=untested, S=mutation search, T=direct gene test; result 0=untested, P=positive, N=negative

  23. BARD1 BARD1  genetic test type:result; type 0=untested, S=mutation search, T=direct gene test; result 0=untested, P=positive, N=negative

  24. RAD51D RAD51D genetic test type:result; type 0=untested, S=mutation search, T=direct gene test; result 0=untested, P=positive, N=negative

  25. RAD51C RAD51C genetic test type:result; type 0=untested, S=mutation search, T=direct gene test; result 0=untested, P=positive, N=negative

  26. BRIP1 BRIP1 genetic test type:result; type 0=untested, S=mutation search, T=direct gene test; result 0=untested, P=positive, N=negative

  27. ER:PR:HER2:CK14:CK56 Colon separated Estrogen receptor, Progestrogen receptor, Human epidermal growth factor receptor 2, Cytokeratin 14, Cytokeratin 56 status, 0 = unspecified, N = negative, P = positive

Related content

CanRisk Pedigree Data v2 File Format Specification
CanRisk Pedigree Data v2 File Format Specification
More like this
CanRisk Pedigree Data v1 File Format Specification - DEPRECATED
CanRisk Pedigree Data v1 File Format Specification - DEPRECATED
More like this
How can I avoid underestimating risks?
How can I avoid underestimating risks?
More like this
What information do the breast and ovarian cancer models use to determine risks?
What information do the breast and ovarian cancer models use to determine risks?
More like this
Cancer Risk Calculations
Cancer Risk Calculations
More like this
CanRisk Pedigree Data File Examples
CanRisk Pedigree Data File Examples
More like this