Codebase list hmisc / 5c345aa
Import Upstream version 3.5-2 Dirk Eddelbuettel 5 years ago
3 changed file(s) with 352 addition(s) and 341 deletion(s). Raw diff Collapse all Expand all
00 Package: Hmisc
1 Version: 3.5-0
1 Version: 3.5-2
22 Date: 2008-12-26
33 Title: Harrell Miscellaneous
44 Author: Frank E Harrell Jr <f.harrell@vanderbilt.edu>, with
1818 License: GPL version 2 or newer
1919 LazyLoad: Yes
2020 URL: http://biostat.mc.vanderbilt.edu/s/Hmisc, http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf, http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatReport/summary.pdf, http://biostat.mc.vanderbilt.edu/trac/Hmisc
21 Packaged: Thu Jan 22 16:58:09 2009; dupontct
21 Packaged: Mon Jan 26 11:38:03 2009; dupontct
88 \alias{timePOSIXt}
99 \title{Convert a SAS Dataset to an S Data Frame}
1010 \description{
11 Converts a \acronym{SAS} dataset into an S data frame.
12 You may choose to extract only a subset of variables
13 or a subset of observations in the \acronym{SAS} dataset.
14 You may have the function automatically convert PROC FORMAT-coded
15 variables to factor objects. The original \acronym{SAS} codes are stored in an
16 attribute called \code{sas.codes} and these may be added back to the
17 \code{levels} of a \code{factor} variable using the \code{code.levels} function.
18 Information about special missing values may be captured in an attribute
19 of each variable having special missing values. This attribute is
20 called \code{special.miss}, and such variables are given class \code{special.miss}.
21 There are \code{print}, \code{[]}, \code{format}, and \code{is.special.miss}
22 methods for such variables.
23 The \code{chron} function is used to set up date, time, and date-time variables.
24 If using S-Plus 5 or 6 or later, the \code{timeDate} function is used
25 instead.
26 Under R, \code{\link{Dates}} is used for dates and \code{\link[chron]{chron}}
27 for date-times. For times without
28 dates, these still need to be stored in date-time format in POSIX.
29 Such \acronym{SAS} time variables are given a major class of \code{timePOSIXt} and a
30 \code{format.timePOSIXt} function so that the date portion (which will
31 always be 1/1/1970) will not print by default.
32 If a date variable represents a partial date (.5 added if
33 month missing, .25 added if day missing, .75 if both), an attribute
34 \code{partial.date} is added to the variable, and the variable also becomes
35 a class \code{imputed} variable.
36 The \code{describe} function uses information about partial dates and
37 special missing values.
38 There is an option to automatically uncompress (or gunzip) compressed
39 \acronym{SAS} datasets.
11 Converts a \acronym{SAS} dataset into an S data frame.
12 You may choose to extract only a subset of variables
13 or a subset of observations in the \acronym{SAS} dataset.
14 You may have the function automatically convert \preformatted{PROC FORMAT}-coded
15 variables to factor objects. The original \acronym{SAS} codes are stored in an
16 attribute called \code{sas.codes} and these may be added back to the
17 \code{levels} of a \code{factor} variable using the \code{code.levels} function.
18 Information about special missing values may be captured in an attribute
19 of each variable having special missing values. This attribute is
20 called \code{special.miss}, and such variables are given class \code{special.miss}.
21 There are \code{print}, \code{[]}, \code{format}, and \code{is.special.miss}
22 methods for such variables.
23 The \code{chron} function is used to set up date, time, and date-time variables.
24 If using S-Plus 5 or 6 or later, the \code{timeDate} function is used
25 instead.
26 Under R, \code{\link{Dates}} is used for dates and \code{\link[chron]{chron}}
27 for date-times. For times without
28 dates, these still need to be stored in date-time format in POSIX.
29 Such \acronym{SAS} time variables are given a major class of \code{timePOSIXt} and a
30 \code{format.timePOSIXt} function so that the date portion (which will
31 always be 1/1/1970) will not print by default.
32 If a date variable represents a partial date (0.5 added if
33 month missing, 0.25 added if day missing, 0.75 if both), an attribute
34 \code{partial.date} is added to the variable, and the variable also becomes
35 a class \code{imputed} variable.
36 The \code{describe} function uses information about partial dates and
37 special missing values.
38 There is an option to automatically uncompress (or \command{gunzip}) compressed
39 \acronym{SAS} datasets.
4040 }
4141 \usage{
4242 sas.get(library, member, variables=character(0), ifs=character(0),
6262 code.levels(object)
6363 }
6464 \arguments{
65 \item{library}{
66 character string naming the directory in which the dataset is kept.
67 }
65 \item{library}{
66 character string naming the directory in which the dataset is kept.
67 }
6868 \item{drop}{
6969 logical. If \code{TRUE} the result is coerced to the
7070 lowest possible dimension.
7171 }
72 \item{member}{
73 character string giving the second part of the two part \acronym{SAS} dataset name.
74 (The first part is irrelevant here - it is mapped to the UNIX directory name.)
75 }
76 \item{x}{
77 a variable that may have been created by \code{sas.get} with
78 \code{special.miss=T} or with \code{recode} in effect.
79 }
80 \item{variables}{
81 vector of character strings naming the variables in the \acronym{SAS} dataset.
82 The S dataset will contain only those variables from the
83 \acronym{SAS} dataset.
84 To get all of the variables (the default), an empty string may be given.
85 It is a fatal error if any one of the variables is not
86 in the \acronym{SAS} dataset. You can use \code{sas.contents} to get
87 the variables in the \acronym{SAS} dataset.
88 If you have retrieved a subset of the variables
89 in the \acronym{SAS} dataset and which to retrieve the same list of variables
90 from another dataset, you can program the value of \code{variables} - see
91 one of the last examples.
92 }
93 \item{ifs}{
94 a vector of character strings, each containing one \acronym{SAS} \dQuote{subsetting if}
95 statement.
96 These will be used to extract a subset of the observations in the \acronym{SAS} dataset.
97 }
98 \item{format.library}{
99 The UNIX directory containing the file \bold{formats.sct}, which contains
100 the definitions of the user defined formats used in this dataset.
101 By default, we look for the formats in the same directory as the data.
102 The user defined formats must be available (so \acronym{SAS} can read the data).
103 }
104 \item{formats}{
105 Set \code{formats} to \code{FALSE} to keep \code{sas.get} from telling the \acronym{SAS} macro to
106 retrieve value label formats from \code{format.library}. When you do not
107 specify \code{formats} or \code{recode}, \code{sas.get} will set \code{format} to \code{TRUE} if a
108 \acronym{SAS} format catalog (\code{.sct} or \code{.sc2}) file exists in \code{format.library}.
109 Value label formats if present are stored as the \code{formats} attribute of the returned
110 object (see below). A format is used if it is referred to by one or more
111 variables
112 in the dataset, if it contains no ranges of values (i.e., it identifies
113 value labels for single values), and if it is a character format
114 or a numeric format that is not used just to label missing values.
115 If you set \code{recode} to \code{TRUE}, 1, or 2, \code{formats} defaults to \code{TRUE}.
116 To fetch the values and labels for variable \code{x} in the dataset \code{d} you
117 could type:
118 \cr
119 \code{f <- attr(d$x, "format")}
120 \cr
72 \item{member}{
73 character string giving the second part of the two part \acronym{SAS} dataset name.
74 (The first part is irrelevant here - it is mapped to the UNIX directory name.)
75 }
76 \item{x}{
77 a variable that may have been created by \code{sas.get} with
78 \code{special.miss=T} or with \code{recode} in effect.
79 }
80 \item{variables}{
81 vector of character strings naming the variables in the \acronym{SAS} dataset.
82 The S dataset will contain only those variables from the
83 \acronym{SAS} dataset.
84 To get all of the variables (the default), an empty string may be given.
85 It is a fatal error if any one of the variables is not
86 in the \acronym{SAS} dataset. You can use \code{sas.contents} to get
87 the variables in the \acronym{SAS} dataset.
88 If you have retrieved a subset of the variables
89 in the \acronym{SAS} dataset and which to retrieve the same list of variables
90 from another dataset, you can program the value of \code{variables} - see
91 one of the last examples.
92 }
93 \item{ifs}{
94 a vector of character strings, each containing one \acronym{SAS} \dQuote{subsetting if}
95 statement.
96 These will be used to extract a subset of the observations in the \acronym{SAS} dataset.
97 }
98 \item{format.library}{
99 The UNIX directory containing the file \file{formats.sct}, which contains
100 the definitions of the user defined formats used in this dataset.
101 By default, we look for the formats in the same directory as the data.
102 The user defined formats must be available (so \acronym{SAS} can read the data).
103 }
104 \item{formats}{
105 Set \code{formats} to \code{FALSE} to keep \code{sas.get} from telling the \acronym{SAS} macro to
106 retrieve value label formats from \code{format.library}. When you do not
107 specify \code{formats} or \code{recode}, \code{sas.get} will set \code{format} to \code{TRUE} if a
108 \acronym{SAS} format catalog (\file{.sct} or \file{.sc2}) file exists in \code{format.library}.
109 Value label formats if present are stored as the \code{formats} attribute of the returned
110 object (see below). A format is used if it is referred to by one or more
111 variables
112 in the dataset, if it contains no ranges of values (i.e., it identifies
113 value labels for single values), and if it is a character format
114 or a numeric format that is not used just to label missing values.
115 If you set \code{recode} to \code{TRUE}, 1, or 2, \code{formats} defaults to \code{TRUE}.
116 To fetch the values and labels for variable \code{x} in the dataset \code{d} you
117 could type:
118 \cr
119 \code{f <- attr(d\$x, "format")}
120 \cr
121121 \code{formats <- attr(d, "formats")}
122 \cr
123 \code{formats$f$values; formats$f$labels}
124 }
125 \item{recode}{
126 This parameter defaults to \code{TRUE} if \code{formats} is \code{TRUE}. If it is
127 \code{TRUE}, variables that have an appropriate format (see above) are
128 recoded as \code{factor} objects, which map the values
129 to the value labels for the format. Alternatively, set \code{recode} to
130 1 to use labels of the form value:label, e.g. 1:good 2:better 3:best.
131 Set \code{recode} to 2 to use labels such as good(1) better(2) best(3).
132 Since \code{sas.codes} and \code{code.levels} add flexibility, the usual choice
133 for \code{recode} is \code{TRUE}.
134 }
135 \item{special.miss}{
136 For numeric variables, any missing values are stored as NA in S.
137 You can recover special missing values by setting \code{special.miss} to
138 \code{TRUE}. This will cause the \code{special.miss} attribute and the
139 \code{special.miss} class to be added
140 to each variable that has at least one special missing value.
141 Suppose that variable \code{y} was .E in observation 3 and .G
142 in observation 544. The \code{special.miss} attribute for \code{y} then has the
143 value
144 \cr
122 \cr
123 \code{formats\$f\$values; formats\$f\$labels}
124 }
125 \item{recode}{
126 This parameter defaults to \code{TRUE} if \code{formats} is \code{TRUE}. If it is
127 \code{TRUE}, variables that have an appropriate format (see above) are
128 recoded as \code{factor} objects, which map the values
129 to the value labels for the format. Alternatively, set \code{recode} to
130 1 to use labels of the form value:label, e.g. 1:good 2:better 3:best.
131 Set \code{recode} to 2 to use labels such as good(1) better(2) best(3).
132 Since \code{sas.codes} and \code{code.levels} add flexibility, the usual choice
133 for \code{recode} is \code{TRUE}.
134 }
135 \item{special.miss}{
136 For numeric variables, any missing values are stored as NA in S.
137 You can recover special missing values by setting \code{special.miss} to
138 \code{TRUE}. This will cause the \code{special.miss} attribute and the
139 \code{special.miss} class to be added
140 to each variable that has at least one special missing value.
141 Suppose that variable \code{y} was .E in observation 3 and .G
142 in observation 544. The \code{special.miss} attribute for \code{y} then has the
143 value
144 \cr
145145 \code{list(codes=c("E","G"),obs=c(3,544))}
146 \cr
147 To fetch this information for variable \code{y} you would say for example
148 \cr
146 \cr
147 To fetch this information for variable \code{y} you would say for example
148 \cr
149149 \code{s <- attr(y, "special.miss")}
150 \cr
151 \code{s$codes; s$obs}
152 \cr
153 or use \code{is.special.miss(x)} or the \code{print.special.miss} method, which
154 will replace \code{NA} values for the variable with \code{E} or \code{G} if they
155 correspond to special missing values.
156 The describe
157 function uses this information in printing a data summary.
158 }
159 \item{id}{
160 The name of the variable to be used as the row names of the S dataset.
161 The id variable becomes the \code{row.names} attribute of a data frame, but
162 the id variable is still retained as a variable in the data frame.
163 (if \code{data.frame.out} is \code{FALSE}, this will be the attribute \code{"id"} of the S
164 dataset.) You can also specify a vector of variable names as the \code{id}
165 parameter. After fetching the data from \acronym{SAS}, all these variables will be
166 converted to character format and concatenated (with a space as a separator)
167 to form a (hopefully) unique ID variable.
168 }
169 \item{dates.}{specifies the format for storing \acronym{SAS} dates in the
170 resulting data frame}
171 \item{as.is}{
172 IF \code{data.frame.out = TRUE}, \acronym{SAS} character variables are converted to S factor
173 objects if \code{as.is = FALSE} or if \code{as.is} is a number between 0 and 1 inclusive and
174 the number of unique values of the variable is less than
175 the number of observations (\code{n}) times \code{as.is}. The default if \code{as.is} is .5,
176 so character variables are converted to factors only if they have fewer
177 than \code{n/2} unique values. The primary purpose of this is to keep unique
178 identification variables as character values in the data frame instead
179 of using more space to store both the integer factor codes and the
180 factor labels.
181 }
182 \item{check.unique.id}{
183 If \code{id} is specified, the row names are checked for
184 uniqueness if \code{check.unique.id = TRUE}. If any are duplicated, a warning
185 is printed. Note that if a data frame is being created with duplicate
186 row names, statements such as \code{my.data.frame["B23",]} will retrieve
187 only the first row with a row name of \code{"B23"}.
188 }
189 \item{force.single}{
190 By default, \acronym{SAS} numeric variables having \code{LENGTH} > 4 are stored as
191 S double precision numerics, which allow for the same precision as
192 a \acronym{SAS} \code{LENGTH} 8 variable. Set \code{force.single = TRUE} to store every
193 numeric variable in single precision (7 digits of precision).
194 This option is useful when the creator of the \acronym{SAS} dataset has
195 failed to use a \code{LENGTH} statement.
196 R does not have single precision, so no attempt is made to convert to
197 single if running R.
198 }
199 \item{dates}{
200 One of the character strings \code{"sas"}, \code{"yearfrac"}, \code{"yearfrac2"}, \code{"yymmdd"}.
201 If a \acronym{SAS} variable has a date format (one of \code{"DATE", "MMDDYY", "YYMMDD",
202 "DDMMYY", "YYQ", "MONYY", "JULIAN"}), it will be converted to the format
203 specified by \code{dates} before being given to S. \code{"sas"} gives
204 days from 1/1/1960 (from 1/1/1970 if using \code{chron}),
205 \code{"yearfrac"} gives days from 1/1/1900 divided by
206 365.25, \code{"yearfrac2"} gives year plus fraction of current year,
207 and \code{"yymmdd"} gives a 6 digit number YYMMDD (year\%\%100, month, day).
208 Note that S will store these as numbers, not as
209 character strings. If dates="sas" and a variable has one of the \acronym{SAS}
210 date formats listed above, the variable will be given a class of "date"
211 to work with Terry Therneau's implementation of the "date" class in S.
212 If the \code{chron} package or \code{timeDate} function is available, these are
213 used instead.
214 }
215 \item{keep.log}{
216 logical flag: if \code{FALSE}, delete the \acronym{SAS} log file upon completion.
217 }
218 \item{log.file}{
219 the name of the \acronym{SAS} log file.
220 }
221 \item{macro}{
222 the name of an S object in the current search path that contains the text of
223 the \acronym{SAS} macro called by S. The S object is a character vector that
224 can be edited using for example sas.get.macro <- editor(sas.get.macro).
225 }
226 \item{data.frame.out}{
227 logical flag: if \code{TRUE}, the return value will be an S data frame,
228 otherwise it will be a list.
229 }
230 \item{clean.up}{
231 logical flag: if \code{TRUE}, remove all temporary files when finished. You
232 may want to keep these while debugging the \acronym{SAS} macro. Not needed for \R.
233 }
234 \item{quiet}{
235 logical flag: if \code{FALSE}, print the contents of the \acronym{SAS} log file if
236 there has been an error.
237 }
238 \item{temp}{
239 the prefix to use for the temporary files. Two characters
240 will be added to this, the resulting name
241 must fit on your file system.
242 }
243 \item{sasprog}{
244 the name of the system command to invoke \acronym{SAS}
245 }
246 \item{uncompress}{
247 set to \code{TRUE} to automatically invoke the UNIX \code{gunzip} command
248 (if \code{member.ssd01.gz} exists) or the \code{uncompress} command
249 (if \code{member.ssd01.Z} exists) to uncompress the \acronym{SAS} dataset before
250 proceeding. This assumes you have the file permissions to allow
251 uncompressing in place. If the file is already uncompressed, this
252 option is ignored.
253 }
254 \item{where}{
255 by default, a list or data frame which contains all the variables is returned.
256 If you specify \code{where}, each individual variable is placed into a
257 separate object (whose name is the name of the variable) using the
258 \code{assign} function with the \code{where} argument. For example, you can
259 put each variable in its own file in a directory, which in some cases
260 may save memory over attaching a data frame.
261 }
262 \item{code}{
263 a special missing value code (A through Z or underscore) to check
264 against. If \code{code} is omitted, \code{is.special.miss} will return
265 a \code{TRUE} for each observation that has any special missing value.
266 }
267 \item{object}{a variable in a data frame created by \code{sas.get}}
268 \item{\dots}{ignored}
150 \cr
151 \code{s\$codes; s\$obs}
152 \cr
153 or use \code{is.special.miss(x)} or the \code{print.special.miss} method, which
154 will replace \code{NA} values for the variable with \samp{E} or \samp{G} if they
155 correspond to special missing values.
156 The describe
157 function uses this information in printing a data summary.
158 }
159 \item{id}{
160 The name of the variable to be used as the row names of the S dataset.
161 The id variable becomes the \code{row.names} attribute of a data frame, but
162 the id variable is still retained as a variable in the data frame.
163 (if \code{data.frame.out} is \code{FALSE}, this will be the attribute \samp{id} of the \R
164 dataset.) You can also specify a vector of variable names as the \code{id}
165 parameter. After fetching the data from \acronym{SAS}, all these variables will be
166 converted to character format and concatenated (with a space as a separator)
167 to form a (hopefully) unique identification variable.
168 }
169 \item{dates.}{
170 specifies the format for storing \acronym{SAS} dates in the
171 resulting data frame
172 }
173 \item{as.is}{
174 IF \code{data.frame.out = TRUE}, \acronym{SAS} character variables are converted to S factor
175 objects if \code{as.is = FALSE} or if \code{as.is} is a number between 0 and 1 inclusive and
176 the number of unique values of the variable is less than
177 the number of observations (\code{n}) times \code{as.is}. The default if \code{as.is} is 0.5,
178 so character variables are converted to factors only if they have fewer
179 than \code{n/2} unique values. The primary purpose of this is to keep unique
180 identification variables as character values in the data frame instead
181 of using more space to store both the integer factor codes and the
182 factor labels.
183 }
184 \item{check.unique.id}{
185 If \code{id} is specified, the row names are checked for
186 uniqueness if \code{check.unique.id = TRUE}. If any are duplicated, a warning
187 is printed. Note that if a data frame is being created with duplicate
188 row names, statements such as \code{my.data.frame["B23",]} will retrieve
189 only the first row with a row name of \preformatted{B23}.
190 }
191 \item{force.single}{
192 By default, \acronym{SAS} numeric variables having \eqn{LENGTH > 4} are stored as
193 S double precision numerics, which allow for the same precision as
194 a \acronym{SAS} \preformatted{LENGTH} 8 variable. Set \code{force.single = TRUE} to store every
195 numeric variable in single precision (7 digits of precision).
196 This option is useful when the creator of the \acronym{SAS} dataset has
197 failed to use a \preformatted{LENGTH} statement.
198 R does not have single precision, so no attempt is made to convert to
199 single if running R.
200 }
201 \item{dates}{
202 One of the character strings \code{"sas"}, \code{"yearfrac"}, \code{"yearfrac2"}, \code{"yymmdd"}.
203 If a \acronym{SAS} variable has a date format (one of \code{"DATE"}, \code{"MMDDYY"}, \code{"YYMMDD"},
204 \code{"DDMMYY"}, \code{"YYQ"}, \code{"MONYY"}, \code{"JULIAN"}), it will be converted to the format
205 specified by \code{dates} before being given to S. \code{"sas"} gives
206 days from 1/1/1960 (from 1/1/1970 if using \code{chron}),
207 \code{"yearfrac"} gives days from 1/1/1900 divided by
208 365.25, \code{"yearfrac2"} gives year plus fraction of current year,
209 and \code{"yymmdd"} gives a 6 digit number \preformatted{YYMMDD} (year\%\%100, month, day).
210 Note that \R will store these as numbers, not as
211 character strings. If \code{dates="sas"} and a variable has one of the \acronym{SAS}
212 date formats listed above, the variable will be given a class of \samp{date}
213 to work with Terry Therneau's implementation of the \samp{date} class in S.
214 If the \code{chron} package or \code{timeDate} function is available, these are
215 used instead.
216 }
217 \item{keep.log}{
218 logical flag: if \code{FALSE}, delete the \acronym{SAS} log file upon completion.
219 }
220 \item{log.file}{
221 the name of the \acronym{SAS} log file.
222 }
223 \item{macro}{
224 the name of an S object in the current search path that contains the text of
225 the \acronym{SAS} macro called by \R. The \R object is a character vector that
226 can be edited using for example \code{sas.get.macro <- editor(sas.get.macro)}.
227 }
228 \item{data.frame.out}{
229 logical flag: if \code{TRUE}, the return value will be an S data frame,
230 otherwise it will be a list.
231 }
232 \item{clean.up}{
233 logical flag: if \code{TRUE}, remove all temporary files when finished. You
234 may want to keep these while debugging the \acronym{SAS} macro. Not needed for \R.
235 }
236 \item{quiet}{
237 logical flag: if \code{FALSE}, print the contents of the \acronym{SAS} log file if
238 there has been an error.
239 }
240 \item{temp}{
241 the prefix to use for the temporary files. Two characters
242 will be added to this, the resulting name
243 must fit on your file system.
244 }
245 \item{sasprog}{
246 the name of the system command to invoke \acronym{SAS}
247 }
248 \item{uncompress}{
249 set to \code{TRUE} to automatically invoke the \acronym{UNIX} \command{gunzip} command
250 (if \file{\var{member}.ssd01.gz} exists) or the \command{uncompress} command
251 (if \file{\var{member}.ssd01.Z} exists) to uncompress the \acronym{SAS} dataset before
252 proceeding. This assumes you have the file permissions to allow
253 uncompressing in place. If the file is already uncompressed, this
254 option is ignored.
255 }
256 \item{where}{
257 by default, a list or data frame which contains all the variables is returned.
258 If you specify \code{where}, each individual variable is placed into a
259 separate object (whose name is the name of the variable) using the
260 \code{assign} function with the \code{where} argument. For example, you can
261 put each variable in its own file in a directory, which in some cases
262 may save memory over attaching a data frame.
263 }
264 \item{code}{
265 a special missing value code (\samp{A} through \samp{Z} or \samp{\_}) to check
266 against. If \code{code} is omitted, \code{is.special.miss} will return
267 a \code{TRUE} for each observation that has any special missing value.
268 }
269 \item{object}{
270 a variable in a data frame created by \code{sas.get}
271 }
272 \item{\dots}{ignored}
269273 }
270274 \value{
271 if \code{data.frame.out} is \code{TRUE}, the output will
272 be a data frame resembling the \acronym{SAS} dataset. If \code{id}
273 was specified, that column of the data frame will be used
274 as the row names of the data frame. Each variable in the data frame
275 or vector in the list will have the attributes \code{label} and \code{format}
276 containing \acronym{SAS} labels and formats. Underscores in formats are
277 converted to periods. Formats for character variables have \code{$} placed
278 in front of their names.
279 If \code{formats} is \code{TRUE} and there are any
280 appropriate format definitions in \code{format.library}, the returned
281 object will have attribute \code{formats} containing lists named the
282 same as the format names (with periods substituted for underscores and
283 character formats prefixed by \code{$}).
284 Each of these lists has a vector called \code{values} and one called
285 \code{labels} with the \code{PROC FORMAT; VALUE \dots} definitions.
286
287
288 If \code{data.frame.out} is \code{FALSE}, the output will
289 be a list of vectors, each containing a variable from the \acronym{SAS}
290 dataset. If \code{id} was specified, that element of the list will
291 be used as the \code{id} attribute of the entire list.
275 if \code{data.frame.out} is \code{TRUE}, the output will
276 be a data frame resembling the \acronym{SAS} dataset. If \code{id}
277 was specified, that column of the data frame will be used
278 as the row names of the data frame. Each variable in the data frame
279 or vector in the list will have the attributes \code{label} and \code{format}
280 containing \acronym{SAS} labels and formats. Underscores in formats are
281 converted to periods. Formats for character variables have \code{\$} placed
282 in front of their names.
283 If \code{formats} is \code{TRUE} and there are any
284 appropriate format definitions in \code{format.library}, the returned
285 object will have attribute \code{formats} containing lists named the
286 same as the format names (with periods substituted for underscores and
287 character formats prefixed by \code{\$}).
288 Each of these lists has a vector called \code{values} and one called
289 \code{labels} with the \preformatted{PROC FORMAT; VALUE ...} definitions.
290
291
292 If \code{data.frame.out} is \code{FALSE}, the output will
293 be a list of vectors, each containing a variable from the \acronym{SAS}
294 dataset. If \code{id} was specified, that element of the list will
295 be used as the \code{id} attribute of the entire list.
292296 }
293297 \section{Side Effects}{
294 if a \acronym{SAS} error occurs and \code{quiet} is \code{FALSE}, then the \acronym{SAS} log file will be
295 printed under the control of the \bold{less} pager.
298 if a \acronym{SAS} error occurs and \code{quiet} is \code{FALSE}, then the \acronym{SAS} log file will be
299 printed under the control of the \command{less} pager.
296300 }
297301 \details{
298 If you specify \code{special.miss = TRUE} and there are no special missing
299 values in the data \acronym{SAS} dataset, the \acronym{SAS} step will bomb.
300
301 For variables having a \code{PROC FORMAT VALUE}
302 format with some of the levels undefined, \code{sas.get} will interpret those
303 values as \code{NA} if you are using \code{recode}.
304
305 The \acronym{SAS} macro \code{sas_get} uses record lengths of up to 4096 in two
306 places. If you are exporting records that are very long (because of
307 a large number of variables and/or long character variables), you
308 may want to edit these \code{LRECL}s to quadruple them, for example.
302 If you specify \code{special.miss = TRUE} and there are no special missing
303 values in the data \acronym{SAS} dataset, the \acronym{SAS} step will bomb.
304
305 For variables having a \preformatted{PROC FORMAT VALUE}
306 format with some of the levels undefined, \code{sas.get} will interpret those
307 values as \code{NA} if you are using \code{recode}.
308
309 The \acronym{SAS} macro \file{sas\_get} uses record lengths of up to 4096 in two
310 places. If you are exporting records that are very long (because of
311 a large number of variables and/or long character variables), you
312 may want to edit these \preformatted{LRECL}s to quadruple them, for example.
309313 }
310314 \note{
311 You must be able to run \acronym{SAS} (by typing \bold{sas}) on your system.
312 If the S command \code{!sas} does not start \acronym{SAS}, then this function cannot work.
313
314 If you are reading time or
315 date-time variables, you will need to execute the command \code{library(chron)}
316 to print those variables or the data frame if the \code{timeDate} function
317 is not available.
315 You must be able to run \acronym{SAS} (by typing \command{sas}) on your system.
316 If the S command \code{!sas} does not start \acronym{SAS}, then this function cannot work.
317
318 If you are reading time or
319 date-time variables, you will need to execute the command \code{library(chron)}
320 to print those variables or the data frame if the \code{timeDate} function
321 is not available.
318322 }
319323 \section{BACKGROUND}{
320 The references cited below explain the structure of \acronym{SAS} datasets and how
321 they are stored under UNIX.
322 See \emph{\acronym{SAS} Language}
323 for a discussion of the \dQuote{subsetting if} statement.
324 The references cited below explain the structure of \acronym{SAS} datasets and how
325 they are stored under \acronym{UNIX}.
326 See \emph{\acronym{SAS} Language}
327 for a discussion of the \dQuote{subsetting if} statement.
324328 }
325329 \author{
326 Terry Therneau, Mayo Clinic
327 \cr
328 Frank Harrell, Vanderbilt University
329 \cr
330 Bill Dunlap, University of Washington and Insightful Corporation
331 \cr
332 Michael W. Kattan, Cleveland Clinic Foundation
330 Terry Therneau, Mayo Clinic
331 \cr
332 Frank Harrell, Vanderbilt University
333 \cr
334 Bill Dunlap, University of Washington and Insightful Corporation
335 \cr
336 Michael W. Kattan, Cleveland Clinic Foundation
333337 }
334338 \references{
335 \acronym{SAS} Institute Inc. (1990).
336 \emph{\acronym{SAS} Language: Reference, Version 6.}
337 First Edition.
338 \acronym{SAS} Institute Inc., Cary, North Carolina.
339
340
341 \acronym{SAS} Institute Inc. (1988).
342 \acronym{SAS} Technical Report P-176,
343 \emph{Using the \acronym{SAS} System, Release 6.03, under UNIX Operating Systems and Derivatives. }
344 \acronym{SAS} Institute Inc., Cary, North Carolina.
345
346
347 \acronym{SAS} Institute Inc. (1985).
348 \emph{\acronym{SAS} Introductory Guide.}
349 Third Edition.
350 \acronym{SAS} Institute Inc., Cary, North Carolina.
339 \acronym{SAS} Institute Inc. (1990).
340 \emph{\acronym{SAS} Language: Reference, Version 6.}
341 First Edition.
342 \acronym{SAS} Institute Inc., Cary, North Carolina.
343
344
345 \acronym{SAS} Institute Inc. (1988).
346 \acronym{SAS} Technical Report P-176,
347 \emph{Using the \acronym{SAS} System, Release 6.03, under UNIX Operating Systems and Derivatives. }
348 \acronym{SAS} Institute Inc., Cary, North Carolina.
349
350
351 \acronym{SAS} Institute Inc. (1985).
352 \emph{\acronym{SAS} Introductory Guide.}
353 Third Edition.
354 \acronym{SAS} Institute Inc., Cary, North Carolina.
351355 }
352356 \seealso{
353 \code{\link{data.frame}}, \code{\link[Hmisc]{describe}},
354 \code{\link[Hmisc]{label}},
355 \code{\link[Hmisc]{upData}},
356 \code{\link[Hmisc]{cleanup.import}}
357 \code{\link{data.frame}}, \code{\link[Hmisc]{describe}},
358 \code{\link[Hmisc]{label}},
359 \code{\link[Hmisc]{upData}},
360 \code{\link[Hmisc]{cleanup.import}}
357361 }
358362 \examples{
359363 \dontrun{
1010 Converts a \acronym{SAS} dataset into an S data frame.
1111 You may choose to extract only a subset of variables
1212 or a subset of observations in the \acronym{SAS} dataset.
13 The function will automatically convert PROC FORMAT-coded
13 The function will automatically convert \preformatted{PROC FORMAT}-coded
1414 variables to factor objects. The original \acronym{SAS} codes are stored in an
1515 attribute called \code{sas.codes} and these may be added back to the
1616 \code{levels} of a \code{factor} variable using the \code{code.levels}
2424 \code{\link{Dates}}, \code{\link{DateTimeClasses}}, and
2525 \code{\link[chron]{chron}} variables.
2626 If using S-Plus 5 or 6 or later, the \code{timeDate} function is used instead.
27 If a date variable represents a partial date (.5 added if
28 month missing, .25 added if day missing, .75 if both), an attribute
27 If a date variable represents a partial date (0.5 added if
28 month missing, 0.25 added if day missing, 0.75 if both), an attribute
2929 \code{partial.date} is added to the variable, and the variable also becomes
3030 a class \code{imputed} variable.
3131 The \code{describe} function uses information about partial dates and
3232 special missing values.
33 There is an option to automatically \code{PKUNZIP} compressed
33 There is an option to automatically \command{PKUNZIP} compressed
3434 \acronym{SAS} datasets.
3535
3636 \code{sas.get} works by composing and running a \acronym{SAS} job that
37 creates various ASCII files that are read and analyzed
37 creates various \acronym{ASCII} files that are read and analyzed
3838 by \code{sas.get}. You can also run the \acronym{SAS} \code{sas_get} macro,
39 which writes the ASCII files for downloading, in a separate
39 which writes the \acronym{ASCII} files for downloading, in a separate
4040 step or on another computer, and then tell \code{sas.get} (through the
4141 \code{sasout} argument) to access these files instead of running \acronym{SAS}.
4242 }
8989 one of the last examples.
9090 }
9191 \item{ifs}{
92 a vector of character strings, each containing one \acronym{SAS} \dQuote{\code{subsetting if}}
92 a vector of character strings, each containing one \acronym{SAS} \dQuote{subsetting if}
9393 statement.
9494 These will be used to extract a subset of the observations in the \acronym{SAS} dataset.
9595 }
9696 \item{format.library}{
97 The directory containing the file \bold{formats.sc2}, which contains
97 The directory containing the file \file{formats.sc2}, which contains
9898 the definitions of the user defined formats used in this dataset.
9999 By default, we look for the formats in the same directory as the data.
100100 The user defined formats must be available (so \acronym{SAS} can read the data).
103103 Set \code{formats} to \code{FALSE} to keep \code{sas.get} from telling the \acronym{SAS} macro to
104104 retrieve value label formats from \code{format.library}. When you do not
105105 specify \code{formats} or \code{recode}, \code{sas.get} will set \code{format} to \code{TRUE} if a
106 \acronym{SAS} format catalog (\code{.sct} or \code{.sc2}) file exists in \code{format.library}.
107 \code{sas.get} stores \acronym{SAS} PROC FORMAT VALUE definitions
106 \acronym{SAS} format catalog (\file{.sct} or \file{.sc2}) file exists in \code{format.library}.
107 \code{sas.get} stores \acronym{SAS} \preformatted{PROC FORMAT VALUE} definitions
108108 as the \code{formats} attribute of the returned
109109 object (see below). A format is used if it is referred to by one or more
110110 variables
153153 \code{s\$codes; s\$obs}
154154 \cr
155155 or use \code{is.special.miss(x)} or the \code{print.special.miss} method, which
156 will replace \code{NA} values for the variable with \code{E} or \code{G} if they
156 will replace \code{NA} values for the variable with \samp{E} or \samp{G} if they
157157 correspond to special missing values.
158158 The describe
159159 function uses this information in printing a data summary.
165165 You can also specify a vector of variable names as the \code{id}
166166 parameter. After fetching the data from \acronym{SAS}, all these variables will be
167167 converted to character format and concatenated (with a space as a separator)
168 to form a (hopefully) unique ID variable.
169 }
170 \item{dates.}{specifies the format for storing \acronym{SAS} dates in the
171 resulting data frame}
168 to form a (hopefully) unique identification variable.
169 }
170 \item{dates.}{
171 specifies the format for storing \acronym{SAS} dates in the
172 resulting data frame.
173 }
172174 \item{as.is}{
173175 \acronym{SAS} character variables are converted to S factor
174176 objects if \code{as.is=FALSE} or if \code{as.is} is a number between 0 and 1 inclusive and
175177 the number of unique values of the variable is less than
176 the number of observations (\code{n}) times \code{as.is}. The default if \code{as.is} is .5,
178 the number of observations (\code{n}) times \code{as.is}. The default if \code{as.is} is 0.5,
177179 so character variables are converted to factors only if they have fewer
178180 than \code{n/2} unique values. The primary purpose of this is to keep unique
179181 identification variables as character values in the data frame instead
185187 uniqueness if \code{check.unique.id = TRUE}. If any are duplicated, a warning
186188 is printed. Note that if a data frame is being created with duplicate
187189 row names, statements such as \code{my.data.frame["B23",]} will retrieve
188 only the first row with a row name of \code{"B23"}.
190 only the first row with a row name of \samp{B23}.
189191 }
190192 \item{force.single}{
191 By default, \acronym{SAS} numeric variables having \code{LENGTH} > 4 are stored as
193 By default, \acronym{SAS} numeric variables having \eqn{LENGTH > 4} are stored as
192194 S double precision numerics, which allow for the same precision as
193 a \acronym{SAS} \code{LENGTH} 8 variable. Set \code{force.single = TRUE} to store every
195 a \acronym{SAS} \preformatted{LENGTH} 8 variable. Set \code{force.single = TRUE} to store every
194196 numeric variable in single precision (7 digits of precision).
195197 This option is useful when the creator of the \acronym{SAS} dataset has
196 failed to use a \code{LENGTH} statement.
198 failed to use a \preformatted{LENGTH} statement.
197199 R does not have single precision,
198200 so no attempt is made to convert to single if running \R.
199201 }
209211 can be edited using, for example, \code{sas.get.macro <- editor(sas.get.macro)}.
210212 }
211213 \item{data.frame.out}{
212 set to \code{FALSE} to make the result a list instead of a data frame}
214 set to \code{FALSE} to make the result a list instead of a data
215 frame
216 }
213217 \item{clean.up}{
214218 logical flag: if \code{TRUE}, remove all temporary files when finished. You
215219 may want to keep these while debugging the \acronym{SAS} macro. Not needed for \R.
216220 }
217 \item{quiet}{logical flag: if \code{FALSE}, print the contents of the
221 \item{quiet}{
222 logical flag: if \code{FALSE}, print the contents of the
218223 \acronym{SAS} log file if there has been an error.
219224 }
220225 \item{temp}{
227232 }
228233 \item{uncompress}{
229234 set to \code{FALSE} by default. Set it
230 to \code{TRUE} to automatically invoke the DOS \code{PKUNZIP} command
231 if \code{member.zip} exists,
235 to \code{TRUE} to automatically invoke the DOS \command{PKUNZIP} command
236 if \file{\var{member}.zip} exists,
232237 to uncompress the \acronym{SAS} dataset before
233238 proceeding. This assumes you have the file permissions to allow
234239 uncompressing in place. If the file is already uncompressed, this
244249 save memory over attaching a data frame.
245250 }
246251 \item{code}{
247 a special missing value code (\samp{A} through \samp{Z} or \samp{_}) to check against.
252 a special missing value code (\samp{A} through \samp{Z} or \samp{\_}) to check against.
248253 If \code{code} is omitted, \code{is.special.miss} will return a \code{TRUE} for each
249254 observation that has any special missing value.
250255 }
251 \item{object}{a variable in a data frame created by \code{sas.get}}
256 \item{object}{
257 a variable in a data frame created by \code{sas.get}
258 }
252259 \item{\dots}{ignored}
253260 }
254261 \value{
257264 as the row names of the data frame. Each variable in the data frame
258265 or vector in the list will have the attributes \code{label} and \code{format}
259266 containing \acronym{SAS} labels and formats. Underscores in formats are
260 converted to periods. Formats for character variables have \code{\$} placed
267 converted to periods. Formats for character variables have \samp{\$} placed
261268 in front of their names.
262269 If \code{formats} is \code{TRUE} and there are any
263270 appropriate format definitions in \code{format.library}, the returned
264271 object will have attribute \code{formats} containing lists named the
265272 same as the format names (with periods substituted for underscores and
266 character formats prefixed by \code{\$}).
273 character formats prefixed by \samp{\$}).
267274 Each of these lists has a vector called \code{values} and one called
268 \code{labels} with the PROC FORMAT; VALUE \dots definitions.
275 \code{labels} with the \preformatted{PROC FORMAT; VALUE} \ldots definitions.
269276 }
270277 \section{Side Effects}{
271278 if a \acronym{SAS} error occurs the \acronym{SAS} log file will be
275282 If you specify \code{special.miss = TRUE} and there are no special missing
276283 values in the data \acronym{SAS} dataset, the \acronym{SAS} step will bomb.
277284
278 For variables having a \code{PROC FORMAT VALUE}
285 For variables having a \preformatted{PROC FORMAT VALUE}
279286 format with some of the levels undefined, \code{sas.get} will interpret those
280287 values as \code{NA} if you are using \code{recode}.
281288
282289
283290 If you leave the \code{sasprog} argument at its default value of
284 \code{"sas"}, be sure that the \acronym{SAS} executable is in the \code{PATH}
291 \samp{sas}, be sure that the \acronym{SAS} executable is in the \file{PATH}
285292 specified in your \file{autoexec.bat} file. Also make sure that
286293 you invoke S so that your current project directory is known
287294 to be the current working directory. This is best done by creating
288295 a shortcut in Windows95, for which the command to execute will be
289 something like \code{drive:\\spluswin\\cmd\\splus.exe HOME=.} and the
290 program is flagged to start in \code{drive:\\myproject} for example.
296 something like \command{drive:\\spluswin\\cmd\\splus.exe HOME=.} and the
297 program is flagged to start in \file{drive:\\myproject} for example.
291298 In this way, you will be able to examine the \acronym{SAS} log file easily
292 since it will be placed in \code{drive:\\myproject} by default.
293
294 \acronym{SAS} will create \code{SASWORK} and \code{SASUSER} directories in what it thinks
299 since it will be placed in \file{drive:\\myproject} by default.
300
301 \acronym{SAS} will create \samp{SASWORK} and \samp{SASUSER} directories in what it thinks
295302 are the current working directories. To specify where \acronym{SAS} should
296303 put these instead, edit the \file{config.sas} file or specify a
297304 \code{sasprog} argument of the following form:
299306
300307 When \code{sas.get} needs to run \acronym{SAS} it is run in iconized form.
301308
302 The \acronym{SAS} macro \code{sas_get} uses record lengths of up to 4096 in two
309 The \acronym{SAS} macro \file{sas\_get} uses record lengths of up to 4096 in two
303310 places. If you are exporting records that are very long (because of
304311 a large number of variables and/or long character variables), you
305 may want to edit these \code{LRECL}s to quadruple them, for example.
312 may want to edit these \samp{LRECL}s to quadruple them, for example.
306313 }
307314 \note{
308315 If \code{sasout} is not given, you
311318
312319 If you are reading time or
313320 date-time variables, you will need to execute the command \code{library(chron)}
314 to print those variables or the data frame if the \code{timeDate} function
321 to print those variables or the data frame if the \code{\link{timeDate}} function
315322 is not available.
316323 }
317324 \section{BACKGROUND}{
318325 The references cited below explain the structure of \acronym{SAS} datasets and how
319326 they are stored.
320327 See \emph{\acronym{SAS} Language}
321 for a discussion of the \dQuote{subsetting if} statement.
328 for a discussion of the \preformatted{subsetting if} statement.
322329 }
323330 \author{
324331 Terry Therneau, Mayo Clinic
374381 q1.new[is.special.miss(q1,"D")] <- nl+1
375382 q1.new[is.special.miss(q1,"R")] <- nl+2
376383 q1.new <- factor(q1.new, 1:(nl+2), lev)
377 # Note: would like to use factor() in place of as.integer \dots but
384 # Note: would like to use factor() in place of as.integer ... but
378385 # factor in this case adds "NA" as a category level
379386
380387 d <- sas.get(mem="mydata")