Home >> Database >> Methods >> File-system-and-labelling

FILE SYSTEM AND LABELING

Each dataset in the CFE database is provided in the form of source file containing absolute numbers of respondents by parity, tabulated further by other characteristics. The filename is constructed as COUNTRY_YEAR.csv where:

–       COUNTRY is three letter international country code (ISO alpha-3)

–       YEAR is year of census or survey

For example, the 2001 census data for the Czech Republic are included in file CZE_2001.csv

The data are stored as comma-separated values (CSV) files in a long format with the header. In the statistical package R the following command can be used to import the data:

read.csv(‘CZE_2001.csv’, header = TRUE)

The first line contains a header, with the following categories listed: :

country,data_source,cohort_from,cohort_to,edu,edu_eurrep,isced_from,isced_to,sex,origin,stat,value

  • country – country name;
  • data_source – simple label of data source including the time of survey/census (e.g. Census 15-05-2001);
  • cohort_from,cohort_to – respondent’s birth cohort; these two columns are equal when one-year birth cohorts are displayed;
  • edu,edu_eurrep,isced_from,isced_to – coding of education, described in the next section;
  • sex – F for women, M for men (when included);
  • origin – Total for all places or birth combined; when available the data are specified by two broad categories of the place of birth/citizenship, further explained in country documentation (e.g. Born_Austria/Born_outside or Foreign/Swiss);
  • stat – indicators or data listed in the following column, labelled value:

o   women_total/men_total – total number of women/men;

o   children_total – the total number of children ever born to all respondents;

o   parity_0 – the number of childless women/men;

o   parity_i (i=1,2…,i_max) – number of respondents with i children; the the maximum-parity category i_max differs across surveys and includes all respondents with a higher number of children i+;

o   parity_unknown – number of respondents for whom the number of children is not known;

  • value – number of cases.

For the number of persons with unknown or not specified characteristics (cohort, education, country of origin etc.) the label Unknown is used. The database does not list totals by cohort, by education, or by sex; only total numbers by place of origin when origin is not available. Total can always be computed as a sum of all specified cases.

The source data are extracted directly from the survey or census records, with very few, if any, computations.

Download the Methods Protocol