apc.data.listR Documentation

Arrange data as an apc.data.list

Description

This is step 1 of the apc analysis.

The apc package is aimed at range of data types. This analysis and labelling of parameters depends on the choice data type. In order to keep track of this choice the data first has to be arranged as an apc.data.list. The function purpose of this function is to aid the user in constructing a list with the right information.

Age period cohort analysis is used in two situations. A dose-response situation, where both doses (exposure, risk set, cases) and responses (counts of deaths, outcomes) are available. And a response situation where only a response is available. If the aim is to directly model mortality ratios (counts of death divided by exposure) this will be thought of a response

The apc.data.list gives sufficient information for the further analysis. It is sufficient to store this information. It has 2 obligatory arguments, which are a response matrix and a character indicating the data format. It also has some further optional arguments, which have certain default values. Some times it may be convenient to add further arguments to the apc.data.list. This will not affect the apc analysis.

apc.data.list generates default row and column names for the response and dose matrices when these are not provided by the user.

Usage

apc.data.list(response, data.format, dose=NULL,
					age1=NULL, per1=NULL, coh1=NULL, unit=NULL,
					per.zero=NULL, per.max=NULL,
					time.adjust=NULL, label=NULL,
					n.decimal=NULL)

Arguments

response

matrix (or vector). Numbers of responses. It should have a format matching data.format. Time should be increasing with the row/column index of the matrix. For instance, consider a 10x5 matrix in "AP" format: Then the row index is for age, and it should be increasing in age. Thus, higher ages are further down the rows of the matrix. In the same way, the column index is for period.

data.format

character. The following options are implemented:

"AC"

has age/cohort as increasing row/column index.

"AP"

has age/period as increasing row/column index.

"CA"

has cohort/age as increasing row/column index.

"CL"

has cohort/age as increasing row/column index, triangular.

"CP"

has cohort/period as increasing row/column index.

"PA"

has period/age as increasing row/column index.

"PC"

has period/cohort as increasing row/column index.

"trapezoid"

has age/period as increasing row/column index, period-diagonals are NA for period <= per.zero and >per.zero+per.max.

dose

Optional. matrix or NULL. Numbers of doses. It should have same format as response.

age1

Optional. Numeric or NULL. Time label for youngest age group. Used if data.format is "AC", "AP", "CA", "CL", "PA", "trapezoid". If NULL default is unit.

per1

Optional. Numeric or NULL. Time label for oldest period group. Used if data.format is "AP", "CP", "PA", "PC". If NULL default is unit.

coh1

Optional. Numeric or NULL. Time label for youngest age group. Used if data.format is "AC", "CA", "CL", "CL.vector.by.row", "CP", "PC", "trapezoid". If NULL default is unit.

unit

Optional. Numeric or NULL. Common time steps for age, period and cohort. For quarterly data use 1/4. For monthly data use 1/12. If NULL default is 1.

per.zero

Optional. Numeric or NULL. Needed if data format is "trapezoid".

per.max

Optional. Numeric or NULL. Needed if data format is "trapezoid".

time.adjust

Optional. Numeric. Time labels are based on two of age1, per1 and coh1. The third time label is computed according to the formula age1+coh1=per1+time.adjust. Default is 0. If age1=coh=1 it is natural to choose time.adjust=1.

label

Optional. Character. Useful when working with multiple data sets. Some internal functions use the first three characters of the label for identification of the two datasets.

n.decimal

Optional. Numeric or NULL. The labels for parameters involves a date. This is found by converting a number into a character. If the value is set to d package uses sprintf. If the value is set to NULL and unit==1/4 for quarterly data or unit==1/12 for monthly data or 1/20<=unit && unit<1 then package uses sprintf. If the value is set to NULL and 1/20>unit || unit>=1 then package uses as.character, which looks nice for integers, but can be messy otherwise.

Details

If the user does not set values for any of age1, per1, coh1, unit then the value is set to unit.

The user can set values of age1, per1, coh1 that are incongruent. The functions only use two these that are relevant for the chosen data.format. Example: the data.format may be "AC" and the user sets age1, per1, but age1, coh1 are relevant for this data format. The apc.data.list then sets coh1=unit, by default, while ignoring the value for per1. Other commands such as apc.data.list.subset or apc.fit.table, will internally, as default option, call the function apc.get.index. That function will, in this example, set per1 according to the values of age1 and coh1.

If the user does not set a value for time.adjust this is set equal to unit when the user does not specify at least two age1, per1, coh1. Otherwise it is set to 0. The former choice matches the values in the theory papers, where indices count 1,2,... to follow standard notation for row/column indices for matrices, so that age+coh=per+unit. The latter choice seeks to match a real time scale the user sets according to age+coh=per.

Value

response

matrix (or vector). Numbers of responses.

dose

matrix (or NULL). Numbers of doses.

data.format

character.

age1

Numeric. Default is NULL.

per1

Numeric. Default is NULL.

coh1

Numeric. Default is NULL.

unit

Numeric. Default is NULL. For monthly data one use unit=1/12.

per.zero

Numeric. If data.format is not "trapezoid" the value is NULL. If data.format is "trapezoid" the coordinate system is in age-cohort format and this value counts how many periods are cut off. The default is per.zero=0.

per.max

Numeric. If data.format is not "trapezoid" the value is NULL. If data.format is "trapezoid" the coordinate system is in age-cohort format and this value counts how many periods are included in the data array. The default is per.max=nrow(response)+ncol(response)-1-per.zero.

time.adjust

Numeric. Default is NULL.

label

Character. Default of NULL.

n.decimal

Numeric or NULL.

Author(s)

Bent Nielsen <bent.nielsen@nuffield.ox.ac.uk> 17 Nov 2016

References

Kuang, D., Nielsen, B. and Nielsen, J.P. (2008a) Identification of the age-period-cohort model and the extended chain ladder model. Biometrika 95, 979-986. Download: Article; Earlier version Nuffield DP.

Nielsen, B. (2014) Deviance analysis of age-period-cohort models. Download: Nuffield DP.

Nielsen, B. (2015) apc: An R package for age-period-cohort analysis. R Journal 7, 52-64. Download: Open access.

See Also

The below example shows how the data.Japanese.breast.cancer data.list was generated. Other provided data sets include data.asbestos data.Belgian.lung.cancer data.Italian.bladder.cancer.

A subset of the data can be selected using apc.data.list.subset.

Examples

###############
#	Artificial data
#	(1) Generate a 5x7 matrix and make arbitrary decisions for rest

response <- matrix(data=seq(1:35),nrow=5,ncol=7)
data.list	<- apc.data.list(response=response,data.format="AP",
					age1=25,per1=1955,coh1=NULL,unit=5,
					per.zero=NULL,per.max=NULL)
data.list

#	(2) Chain Ladder data

k			<- 5
v.response 	<- seq(1:(k*(k+1)/2))
data.list	<- apc.data.list(response=vector.2.triangle(v.response,k),
							data.format="CL.vector.by.row",age1=2001)
data.list

###############
#	Japanese breast cancer
#	This is the code used to generate the data.Japanese.breast.cancer
v.rates		<- c( 0.44, 0.38, 0.46, 0.55, 0.68,
			 	  1.69, 1.69, 1.75, 2.31, 2.52,
				  4.01, 3.90, 4.11, 4.44, 4.80,
				  6.59, 6.57, 6.81, 7.79, 8.27,
				  8.51, 9.61, 9.96,11.68,12.51,
				 10.49,10.80,12.36,14.59,16.56,
				 11.36,11.51,12.98,14.97,17.79,
				 12.03,10.67,12.67,14.46,16.42,
				 12.55,12.03,12.10,13.81,16.46,
				 15.81,13.87,12.65,14.00,15.60,
				 17.97,15.62,15.83,15.71,16.52)
v.cases		<- c(   88,   78,  101,  127,  179,
				   299,  330,  363,  509,  588,
				   596,  680,  798,  923, 1056,
				   874,  962, 1171, 1497, 1716,
				  1022, 1247, 1429, 1987, 2398,
				  1035, 1258, 1560, 2079, 2794,
				   970, 1087, 1446, 1828, 2465,
				   820,  861, 1126, 1549, 1962,
				   678,  738,  878, 1140, 1683,
				   640,  628,  656,  900, 1162,
				   497,  463,  536,  644,  865)				 
#	see also example below for generating labels

rates	<- matrix(data=v.rates,nrow=11, ncol=5,byrow=TRUE)
cases	<- matrix(data=v.cases,nrow=11, ncol=5,byrow=TRUE)

# 	A data list is now constructed as follows
#	note that list entry rates is redundant,
#	but included since it represents original data

data.Japanese.breast.cancer	<- apc.data.list(response=cases,
			dose=cases/rates,data.format="AP",
			age1=25,per1=1955,coh1=NULL,unit=5,
			per.zero=NULL,per.max=NULL,time.adjust=0,
			label="Japanese breast cancer")

#	or when exploiting the default values

data.Japanese.breast.cancer	<- apc.data.list(response=cases,
			dose=cases/rates,data.format="AP",
			age1=25,per1=1955,unit=5,
			label="Japanese breast cancer")

###################################################
# 	Code for generating labels

row.names <- paste(as.character(seq(25,75,by=5)),"-",as.character(seq(29,79,by=5)),sep="")
col.names <- paste(as.character(seq(1955,1975,by=5)),"-",as.character(seq(1959,1979,by=5)),sep="")