8 | 8 |
\alias{timePOSIXt}
|
9 | 9 |
\title{Convert a SAS Dataset to an S Data Frame}
|
10 | 10 |
\description{
|
11 | |
Converts a \acronym{SAS} dataset into an S data frame.
|
12 | |
You may choose to extract only a subset of variables
|
13 | |
or a subset of observations in the \acronym{SAS} dataset.
|
14 | |
You may have the function automatically convert PROC FORMAT-coded
|
15 | |
variables to factor objects. The original \acronym{SAS} codes are stored in an
|
16 | |
attribute called \code{sas.codes} and these may be added back to the
|
17 | |
\code{levels} of a \code{factor} variable using the \code{code.levels} function.
|
18 | |
Information about special missing values may be captured in an attribute
|
19 | |
of each variable having special missing values. This attribute is
|
20 | |
called \code{special.miss}, and such variables are given class \code{special.miss}.
|
21 | |
There are \code{print}, \code{[]}, \code{format}, and \code{is.special.miss}
|
22 | |
methods for such variables.
|
23 | |
The \code{chron} function is used to set up date, time, and date-time variables.
|
24 | |
If using S-Plus 5 or 6 or later, the \code{timeDate} function is used
|
25 | |
instead.
|
26 | |
Under R, \code{\link{Dates}} is used for dates and \code{\link[chron]{chron}}
|
27 | |
for date-times. For times without
|
28 | |
dates, these still need to be stored in date-time format in POSIX.
|
29 | |
Such \acronym{SAS} time variables are given a major class of \code{timePOSIXt} and a
|
30 | |
\code{format.timePOSIXt} function so that the date portion (which will
|
31 | |
always be 1/1/1970) will not print by default.
|
32 | |
If a date variable represents a partial date (.5 added if
|
33 | |
month missing, .25 added if day missing, .75 if both), an attribute
|
34 | |
\code{partial.date} is added to the variable, and the variable also becomes
|
35 | |
a class \code{imputed} variable.
|
36 | |
The \code{describe} function uses information about partial dates and
|
37 | |
special missing values.
|
38 | |
There is an option to automatically uncompress (or gunzip) compressed
|
39 | |
\acronym{SAS} datasets.
|
|
11 |
Converts a \acronym{SAS} dataset into an S data frame.
|
|
12 |
You may choose to extract only a subset of variables
|
|
13 |
or a subset of observations in the \acronym{SAS} dataset.
|
|
14 |
You may have the function automatically convert \preformatted{PROC FORMAT}-coded
|
|
15 |
variables to factor objects. The original \acronym{SAS} codes are stored in an
|
|
16 |
attribute called \code{sas.codes} and these may be added back to the
|
|
17 |
\code{levels} of a \code{factor} variable using the \code{code.levels} function.
|
|
18 |
Information about special missing values may be captured in an attribute
|
|
19 |
of each variable having special missing values. This attribute is
|
|
20 |
called \code{special.miss}, and such variables are given class \code{special.miss}.
|
|
21 |
There are \code{print}, \code{[]}, \code{format}, and \code{is.special.miss}
|
|
22 |
methods for such variables.
|
|
23 |
The \code{chron} function is used to set up date, time, and date-time variables.
|
|
24 |
If using S-Plus 5 or 6 or later, the \code{timeDate} function is used
|
|
25 |
instead.
|
|
26 |
Under R, \code{\link{Dates}} is used for dates and \code{\link[chron]{chron}}
|
|
27 |
for date-times. For times without
|
|
28 |
dates, these still need to be stored in date-time format in POSIX.
|
|
29 |
Such \acronym{SAS} time variables are given a major class of \code{timePOSIXt} and a
|
|
30 |
\code{format.timePOSIXt} function so that the date portion (which will
|
|
31 |
always be 1/1/1970) will not print by default.
|
|
32 |
If a date variable represents a partial date (0.5 added if
|
|
33 |
month missing, 0.25 added if day missing, 0.75 if both), an attribute
|
|
34 |
\code{partial.date} is added to the variable, and the variable also becomes
|
|
35 |
a class \code{imputed} variable.
|
|
36 |
The \code{describe} function uses information about partial dates and
|
|
37 |
special missing values.
|
|
38 |
There is an option to automatically uncompress (or \command{gunzip}) compressed
|
|
39 |
\acronym{SAS} datasets.
|
40 | 40 |
}
|
41 | 41 |
\usage{
|
42 | 42 |
sas.get(library, member, variables=character(0), ifs=character(0),
|
|
62 | 62 |
code.levels(object)
|
63 | 63 |
}
|
64 | 64 |
\arguments{
|
65 | |
\item{library}{
|
66 | |
character string naming the directory in which the dataset is kept.
|
67 | |
}
|
|
65 |
\item{library}{
|
|
66 |
character string naming the directory in which the dataset is kept.
|
|
67 |
}
|
68 | 68 |
\item{drop}{
|
69 | 69 |
logical. If \code{TRUE} the result is coerced to the
|
70 | 70 |
lowest possible dimension.
|
71 | 71 |
}
|
72 | |
\item{member}{
|
73 | |
character string giving the second part of the two part \acronym{SAS} dataset name.
|
74 | |
(The first part is irrelevant here - it is mapped to the UNIX directory name.)
|
75 | |
}
|
76 | |
\item{x}{
|
77 | |
a variable that may have been created by \code{sas.get} with
|
78 | |
\code{special.miss=T} or with \code{recode} in effect.
|
79 | |
}
|
80 | |
\item{variables}{
|
81 | |
vector of character strings naming the variables in the \acronym{SAS} dataset.
|
82 | |
The S dataset will contain only those variables from the
|
83 | |
\acronym{SAS} dataset.
|
84 | |
To get all of the variables (the default), an empty string may be given.
|
85 | |
It is a fatal error if any one of the variables is not
|
86 | |
in the \acronym{SAS} dataset. You can use \code{sas.contents} to get
|
87 | |
the variables in the \acronym{SAS} dataset.
|
88 | |
If you have retrieved a subset of the variables
|
89 | |
in the \acronym{SAS} dataset and which to retrieve the same list of variables
|
90 | |
from another dataset, you can program the value of \code{variables} - see
|
91 | |
one of the last examples.
|
92 | |
}
|
93 | |
\item{ifs}{
|
94 | |
a vector of character strings, each containing one \acronym{SAS} \dQuote{subsetting if}
|
95 | |
statement.
|
96 | |
These will be used to extract a subset of the observations in the \acronym{SAS} dataset.
|
97 | |
}
|
98 | |
\item{format.library}{
|
99 | |
The UNIX directory containing the file \bold{formats.sct}, which contains
|
100 | |
the definitions of the user defined formats used in this dataset.
|
101 | |
By default, we look for the formats in the same directory as the data.
|
102 | |
The user defined formats must be available (so \acronym{SAS} can read the data).
|
103 | |
}
|
104 | |
\item{formats}{
|
105 | |
Set \code{formats} to \code{FALSE} to keep \code{sas.get} from telling the \acronym{SAS} macro to
|
106 | |
retrieve value label formats from \code{format.library}. When you do not
|
107 | |
specify \code{formats} or \code{recode}, \code{sas.get} will set \code{format} to \code{TRUE} if a
|
108 | |
\acronym{SAS} format catalog (\code{.sct} or \code{.sc2}) file exists in \code{format.library}.
|
109 | |
Value label formats if present are stored as the \code{formats} attribute of the returned
|
110 | |
object (see below). A format is used if it is referred to by one or more
|
111 | |
variables
|
112 | |
in the dataset, if it contains no ranges of values (i.e., it identifies
|
113 | |
value labels for single values), and if it is a character format
|
114 | |
or a numeric format that is not used just to label missing values.
|
115 | |
If you set \code{recode} to \code{TRUE}, 1, or 2, \code{formats} defaults to \code{TRUE}.
|
116 | |
To fetch the values and labels for variable \code{x} in the dataset \code{d} you
|
117 | |
could type:
|
118 | |
\cr
|
119 | |
\code{f <- attr(d$x, "format")}
|
120 | |
\cr
|
|
72 |
\item{member}{
|
|
73 |
character string giving the second part of the two part \acronym{SAS} dataset name.
|
|
74 |
(The first part is irrelevant here - it is mapped to the UNIX directory name.)
|
|
75 |
}
|
|
76 |
\item{x}{
|
|
77 |
a variable that may have been created by \code{sas.get} with
|
|
78 |
\code{special.miss=T} or with \code{recode} in effect.
|
|
79 |
}
|
|
80 |
\item{variables}{
|
|
81 |
vector of character strings naming the variables in the \acronym{SAS} dataset.
|
|
82 |
The S dataset will contain only those variables from the
|
|
83 |
\acronym{SAS} dataset.
|
|
84 |
To get all of the variables (the default), an empty string may be given.
|
|
85 |
It is a fatal error if any one of the variables is not
|
|
86 |
in the \acronym{SAS} dataset. You can use \code{sas.contents} to get
|
|
87 |
the variables in the \acronym{SAS} dataset.
|
|
88 |
If you have retrieved a subset of the variables
|
|
89 |
in the \acronym{SAS} dataset and which to retrieve the same list of variables
|
|
90 |
from another dataset, you can program the value of \code{variables} - see
|
|
91 |
one of the last examples.
|
|
92 |
}
|
|
93 |
\item{ifs}{
|
|
94 |
a vector of character strings, each containing one \acronym{SAS} \dQuote{subsetting if}
|
|
95 |
statement.
|
|
96 |
These will be used to extract a subset of the observations in the \acronym{SAS} dataset.
|
|
97 |
}
|
|
98 |
\item{format.library}{
|
|
99 |
The UNIX directory containing the file \file{formats.sct}, which contains
|
|
100 |
the definitions of the user defined formats used in this dataset.
|
|
101 |
By default, we look for the formats in the same directory as the data.
|
|
102 |
The user defined formats must be available (so \acronym{SAS} can read the data).
|
|
103 |
}
|
|
104 |
\item{formats}{
|
|
105 |
Set \code{formats} to \code{FALSE} to keep \code{sas.get} from telling the \acronym{SAS} macro to
|
|
106 |
retrieve value label formats from \code{format.library}. When you do not
|
|
107 |
specify \code{formats} or \code{recode}, \code{sas.get} will set \code{format} to \code{TRUE} if a
|
|
108 |
\acronym{SAS} format catalog (\file{.sct} or \file{.sc2}) file exists in \code{format.library}.
|
|
109 |
Value label formats if present are stored as the \code{formats} attribute of the returned
|
|
110 |
object (see below). A format is used if it is referred to by one or more
|
|
111 |
variables
|
|
112 |
in the dataset, if it contains no ranges of values (i.e., it identifies
|
|
113 |
value labels for single values), and if it is a character format
|
|
114 |
or a numeric format that is not used just to label missing values.
|
|
115 |
If you set \code{recode} to \code{TRUE}, 1, or 2, \code{formats} defaults to \code{TRUE}.
|
|
116 |
To fetch the values and labels for variable \code{x} in the dataset \code{d} you
|
|
117 |
could type:
|
|
118 |
\cr
|
|
119 |
\code{f <- attr(d\$x, "format")}
|
|
120 |
\cr
|
121 | 121 |
\code{formats <- attr(d, "formats")}
|
122 | |
\cr
|
123 | |
\code{formats$f$values; formats$f$labels}
|
124 | |
}
|
125 | |
\item{recode}{
|
126 | |
This parameter defaults to \code{TRUE} if \code{formats} is \code{TRUE}. If it is
|
127 | |
\code{TRUE}, variables that have an appropriate format (see above) are
|
128 | |
recoded as \code{factor} objects, which map the values
|
129 | |
to the value labels for the format. Alternatively, set \code{recode} to
|
130 | |
1 to use labels of the form value:label, e.g. 1:good 2:better 3:best.
|
131 | |
Set \code{recode} to 2 to use labels such as good(1) better(2) best(3).
|
132 | |
Since \code{sas.codes} and \code{code.levels} add flexibility, the usual choice
|
133 | |
for \code{recode} is \code{TRUE}.
|
134 | |
}
|
135 | |
\item{special.miss}{
|
136 | |
For numeric variables, any missing values are stored as NA in S.
|
137 | |
You can recover special missing values by setting \code{special.miss} to
|
138 | |
\code{TRUE}. This will cause the \code{special.miss} attribute and the
|
139 | |
\code{special.miss} class to be added
|
140 | |
to each variable that has at least one special missing value.
|
141 | |
Suppose that variable \code{y} was .E in observation 3 and .G
|
142 | |
in observation 544. The \code{special.miss} attribute for \code{y} then has the
|
143 | |
value
|
144 | |
\cr
|
|
122 |
\cr
|
|
123 |
\code{formats\$f\$values; formats\$f\$labels}
|
|
124 |
}
|
|
125 |
\item{recode}{
|
|
126 |
This parameter defaults to \code{TRUE} if \code{formats} is \code{TRUE}. If it is
|
|
127 |
\code{TRUE}, variables that have an appropriate format (see above) are
|
|
128 |
recoded as \code{factor} objects, which map the values
|
|
129 |
to the value labels for the format. Alternatively, set \code{recode} to
|
|
130 |
1 to use labels of the form value:label, e.g. 1:good 2:better 3:best.
|
|
131 |
Set \code{recode} to 2 to use labels such as good(1) better(2) best(3).
|
|
132 |
Since \code{sas.codes} and \code{code.levels} add flexibility, the usual choice
|
|
133 |
for \code{recode} is \code{TRUE}.
|
|
134 |
}
|
|
135 |
\item{special.miss}{
|
|
136 |
For numeric variables, any missing values are stored as NA in S.
|
|
137 |
You can recover special missing values by setting \code{special.miss} to
|
|
138 |
\code{TRUE}. This will cause the \code{special.miss} attribute and the
|
|
139 |
\code{special.miss} class to be added
|
|
140 |
to each variable that has at least one special missing value.
|
|
141 |
Suppose that variable \code{y} was .E in observation 3 and .G
|
|
142 |
in observation 544. The \code{special.miss} attribute for \code{y} then has the
|
|
143 |
value
|
|
144 |
\cr
|
145 | 145 |
\code{list(codes=c("E","G"),obs=c(3,544))}
|
146 | |
\cr
|
147 | |
To fetch this information for variable \code{y} you would say for example
|
148 | |
\cr
|
|
146 |
\cr
|
|
147 |
To fetch this information for variable \code{y} you would say for example
|
|
148 |
\cr
|
149 | 149 |
\code{s <- attr(y, "special.miss")}
|
150 | |
\cr
|
151 | |
\code{s$codes; s$obs}
|
152 | |
\cr
|
153 | |
or use \code{is.special.miss(x)} or the \code{print.special.miss} method, which
|
154 | |
will replace \code{NA} values for the variable with \code{E} or \code{G} if they
|
155 | |
correspond to special missing values.
|
156 | |
The describe
|
157 | |
function uses this information in printing a data summary.
|
158 | |
}
|
159 | |
\item{id}{
|
160 | |
The name of the variable to be used as the row names of the S dataset.
|
161 | |
The id variable becomes the \code{row.names} attribute of a data frame, but
|
162 | |
the id variable is still retained as a variable in the data frame.
|
163 | |
(if \code{data.frame.out} is \code{FALSE}, this will be the attribute \code{"id"} of the S
|
164 | |
dataset.) You can also specify a vector of variable names as the \code{id}
|
165 | |
parameter. After fetching the data from \acronym{SAS}, all these variables will be
|
166 | |
converted to character format and concatenated (with a space as a separator)
|
167 | |
to form a (hopefully) unique ID variable.
|
168 | |
}
|
169 | |
\item{dates.}{specifies the format for storing \acronym{SAS} dates in the
|
170 | |
resulting data frame}
|
171 | |
\item{as.is}{
|
172 | |
IF \code{data.frame.out = TRUE}, \acronym{SAS} character variables are converted to S factor
|
173 | |
objects if \code{as.is = FALSE} or if \code{as.is} is a number between 0 and 1 inclusive and
|
174 | |
the number of unique values of the variable is less than
|
175 | |
the number of observations (\code{n}) times \code{as.is}. The default if \code{as.is} is .5,
|
176 | |
so character variables are converted to factors only if they have fewer
|
177 | |
than \code{n/2} unique values. The primary purpose of this is to keep unique
|
178 | |
identification variables as character values in the data frame instead
|
179 | |
of using more space to store both the integer factor codes and the
|
180 | |
factor labels.
|
181 | |
}
|
182 | |
\item{check.unique.id}{
|
183 | |
If \code{id} is specified, the row names are checked for
|
184 | |
uniqueness if \code{check.unique.id = TRUE}. If any are duplicated, a warning
|
185 | |
is printed. Note that if a data frame is being created with duplicate
|
186 | |
row names, statements such as \code{my.data.frame["B23",]} will retrieve
|
187 | |
only the first row with a row name of \code{"B23"}.
|
188 | |
}
|
189 | |
\item{force.single}{
|
190 | |
By default, \acronym{SAS} numeric variables having \code{LENGTH} > 4 are stored as
|
191 | |
S double precision numerics, which allow for the same precision as
|
192 | |
a \acronym{SAS} \code{LENGTH} 8 variable. Set \code{force.single = TRUE} to store every
|
193 | |
numeric variable in single precision (7 digits of precision).
|
194 | |
This option is useful when the creator of the \acronym{SAS} dataset has
|
195 | |
failed to use a \code{LENGTH} statement.
|
196 | |
R does not have single precision, so no attempt is made to convert to
|
197 | |
single if running R.
|
198 | |
}
|
199 | |
\item{dates}{
|
200 | |
One of the character strings \code{"sas"}, \code{"yearfrac"}, \code{"yearfrac2"}, \code{"yymmdd"}.
|
201 | |
If a \acronym{SAS} variable has a date format (one of \code{"DATE", "MMDDYY", "YYMMDD",
|
202 | |
"DDMMYY", "YYQ", "MONYY", "JULIAN"}), it will be converted to the format
|
203 | |
specified by \code{dates} before being given to S. \code{"sas"} gives
|
204 | |
days from 1/1/1960 (from 1/1/1970 if using \code{chron}),
|
205 | |
\code{"yearfrac"} gives days from 1/1/1900 divided by
|
206 | |
365.25, \code{"yearfrac2"} gives year plus fraction of current year,
|
207 | |
and \code{"yymmdd"} gives a 6 digit number YYMMDD (year\%\%100, month, day).
|
208 | |
Note that S will store these as numbers, not as
|
209 | |
character strings. If dates="sas" and a variable has one of the \acronym{SAS}
|
210 | |
date formats listed above, the variable will be given a class of "date"
|
211 | |
to work with Terry Therneau's implementation of the "date" class in S.
|
212 | |
If the \code{chron} package or \code{timeDate} function is available, these are
|
213 | |
used instead.
|
214 | |
}
|
215 | |
\item{keep.log}{
|
216 | |
logical flag: if \code{FALSE}, delete the \acronym{SAS} log file upon completion.
|
217 | |
}
|
218 | |
\item{log.file}{
|
219 | |
the name of the \acronym{SAS} log file.
|
220 | |
}
|
221 | |
\item{macro}{
|
222 | |
the name of an S object in the current search path that contains the text of
|
223 | |
the \acronym{SAS} macro called by S. The S object is a character vector that
|
224 | |
can be edited using for example sas.get.macro <- editor(sas.get.macro).
|
225 | |
}
|
226 | |
\item{data.frame.out}{
|
227 | |
logical flag: if \code{TRUE}, the return value will be an S data frame,
|
228 | |
otherwise it will be a list.
|
229 | |
}
|
230 | |
\item{clean.up}{
|
231 | |
logical flag: if \code{TRUE}, remove all temporary files when finished. You
|
232 | |
may want to keep these while debugging the \acronym{SAS} macro. Not needed for \R.
|
233 | |
}
|
234 | |
\item{quiet}{
|
235 | |
logical flag: if \code{FALSE}, print the contents of the \acronym{SAS} log file if
|
236 | |
there has been an error.
|
237 | |
}
|
238 | |
\item{temp}{
|
239 | |
the prefix to use for the temporary files. Two characters
|
240 | |
will be added to this, the resulting name
|
241 | |
must fit on your file system.
|
242 | |
}
|
243 | |
\item{sasprog}{
|
244 | |
the name of the system command to invoke \acronym{SAS}
|
245 | |
}
|
246 | |
\item{uncompress}{
|
247 | |
set to \code{TRUE} to automatically invoke the UNIX \code{gunzip} command
|
248 | |
(if \code{member.ssd01.gz} exists) or the \code{uncompress} command
|
249 | |
(if \code{member.ssd01.Z} exists) to uncompress the \acronym{SAS} dataset before
|
250 | |
proceeding. This assumes you have the file permissions to allow
|
251 | |
uncompressing in place. If the file is already uncompressed, this
|
252 | |
option is ignored.
|
253 | |
}
|
254 | |
\item{where}{
|
255 | |
by default, a list or data frame which contains all the variables is returned.
|
256 | |
If you specify \code{where}, each individual variable is placed into a
|
257 | |
separate object (whose name is the name of the variable) using the
|
258 | |
\code{assign} function with the \code{where} argument. For example, you can
|
259 | |
put each variable in its own file in a directory, which in some cases
|
260 | |
may save memory over attaching a data frame.
|
261 | |
}
|
262 | |
\item{code}{
|
263 | |
a special missing value code (A through Z or underscore) to check
|
264 | |
against. If \code{code} is omitted, \code{is.special.miss} will return
|
265 | |
a \code{TRUE} for each observation that has any special missing value.
|
266 | |
}
|
267 | |
\item{object}{a variable in a data frame created by \code{sas.get}}
|
268 | |
\item{\dots}{ignored}
|
|
150 |
\cr
|
|
151 |
\code{s\$codes; s\$obs}
|
|
152 |
\cr
|
|
153 |
or use \code{is.special.miss(x)} or the \code{print.special.miss} method, which
|
|
154 |
will replace \code{NA} values for the variable with \samp{E} or \samp{G} if they
|
|
155 |
correspond to special missing values.
|
|
156 |
The describe
|
|
157 |
function uses this information in printing a data summary.
|
|
158 |
}
|
|
159 |
\item{id}{
|
|
160 |
The name of the variable to be used as the row names of the S dataset.
|
|
161 |
The id variable becomes the \code{row.names} attribute of a data frame, but
|
|
162 |
the id variable is still retained as a variable in the data frame.
|
|
163 |
(if \code{data.frame.out} is \code{FALSE}, this will be the attribute \samp{id} of the \R
|
|
164 |
dataset.) You can also specify a vector of variable names as the \code{id}
|
|
165 |
parameter. After fetching the data from \acronym{SAS}, all these variables will be
|
|
166 |
converted to character format and concatenated (with a space as a separator)
|
|
167 |
to form a (hopefully) unique identification variable.
|
|
168 |
}
|
|
169 |
\item{dates.}{
|
|
170 |
specifies the format for storing \acronym{SAS} dates in the
|
|
171 |
resulting data frame
|
|
172 |
}
|
|
173 |
\item{as.is}{
|
|
174 |
IF \code{data.frame.out = TRUE}, \acronym{SAS} character variables are converted to S factor
|
|
175 |
objects if \code{as.is = FALSE} or if \code{as.is} is a number between 0 and 1 inclusive and
|
|
176 |
the number of unique values of the variable is less than
|
|
177 |
the number of observations (\code{n}) times \code{as.is}. The default if \code{as.is} is 0.5,
|
|
178 |
so character variables are converted to factors only if they have fewer
|
|
179 |
than \code{n/2} unique values. The primary purpose of this is to keep unique
|
|
180 |
identification variables as character values in the data frame instead
|
|
181 |
of using more space to store both the integer factor codes and the
|
|
182 |
factor labels.
|
|
183 |
}
|
|
184 |
\item{check.unique.id}{
|
|
185 |
If \code{id} is specified, the row names are checked for
|
|
186 |
uniqueness if \code{check.unique.id = TRUE}. If any are duplicated, a warning
|
|
187 |
is printed. Note that if a data frame is being created with duplicate
|
|
188 |
row names, statements such as \code{my.data.frame["B23",]} will retrieve
|
|
189 |
only the first row with a row name of \preformatted{B23}.
|
|
190 |
}
|
|
191 |
\item{force.single}{
|
|
192 |
By default, \acronym{SAS} numeric variables having \eqn{LENGTH > 4} are stored as
|
|
193 |
S double precision numerics, which allow for the same precision as
|
|
194 |
a \acronym{SAS} \preformatted{LENGTH} 8 variable. Set \code{force.single = TRUE} to store every
|
|
195 |
numeric variable in single precision (7 digits of precision).
|
|
196 |
This option is useful when the creator of the \acronym{SAS} dataset has
|
|
197 |
failed to use a \preformatted{LENGTH} statement.
|
|
198 |
R does not have single precision, so no attempt is made to convert to
|
|
199 |
single if running R.
|
|
200 |
}
|
|
201 |
\item{dates}{
|
|
202 |
One of the character strings \code{"sas"}, \code{"yearfrac"}, \code{"yearfrac2"}, \code{"yymmdd"}.
|
|
203 |
If a \acronym{SAS} variable has a date format (one of \code{"DATE"}, \code{"MMDDYY"}, \code{"YYMMDD"},
|
|
204 |
\code{"DDMMYY"}, \code{"YYQ"}, \code{"MONYY"}, \code{"JULIAN"}), it will be converted to the format
|
|
205 |
specified by \code{dates} before being given to S. \code{"sas"} gives
|
|
206 |
days from 1/1/1960 (from 1/1/1970 if using \code{chron}),
|
|
207 |
\code{"yearfrac"} gives days from 1/1/1900 divided by
|
|
208 |
365.25, \code{"yearfrac2"} gives year plus fraction of current year,
|
|
209 |
and \code{"yymmdd"} gives a 6 digit number \preformatted{YYMMDD} (year\%\%100, month, day).
|
|
210 |
Note that \R will store these as numbers, not as
|
|
211 |
character strings. If \code{dates="sas"} and a variable has one of the \acronym{SAS}
|
|
212 |
date formats listed above, the variable will be given a class of \samp{date}
|
|
213 |
to work with Terry Therneau's implementation of the \samp{date} class in S.
|
|
214 |
If the \code{chron} package or \code{timeDate} function is available, these are
|
|
215 |
used instead.
|
|
216 |
}
|
|
217 |
\item{keep.log}{
|
|
218 |
logical flag: if \code{FALSE}, delete the \acronym{SAS} log file upon completion.
|
|
219 |
}
|
|
220 |
\item{log.file}{
|
|
221 |
the name of the \acronym{SAS} log file.
|
|
222 |
}
|
|
223 |
\item{macro}{
|
|
224 |
the name of an S object in the current search path that contains the text of
|
|
225 |
the \acronym{SAS} macro called by \R. The \R object is a character vector that
|
|
226 |
can be edited using for example \code{sas.get.macro <- editor(sas.get.macro)}.
|
|
227 |
}
|
|
228 |
\item{data.frame.out}{
|
|
229 |
logical flag: if \code{TRUE}, the return value will be an S data frame,
|
|
230 |
otherwise it will be a list.
|
|
231 |
}
|
|
232 |
\item{clean.up}{
|
|
233 |
logical flag: if \code{TRUE}, remove all temporary files when finished. You
|
|
234 |
may want to keep these while debugging the \acronym{SAS} macro. Not needed for \R.
|
|
235 |
}
|
|
236 |
\item{quiet}{
|
|
237 |
logical flag: if \code{FALSE}, print the contents of the \acronym{SAS} log file if
|
|
238 |
there has been an error.
|
|
239 |
}
|
|
240 |
\item{temp}{
|
|
241 |
the prefix to use for the temporary files. Two characters
|
|
242 |
will be added to this, the resulting name
|
|
243 |
must fit on your file system.
|
|
244 |
}
|
|
245 |
\item{sasprog}{
|
|
246 |
the name of the system command to invoke \acronym{SAS}
|
|
247 |
}
|
|
248 |
\item{uncompress}{
|
|
249 |
set to \code{TRUE} to automatically invoke the \acronym{UNIX} \command{gunzip} command
|
|
250 |
(if \file{\var{member}.ssd01.gz} exists) or the \command{uncompress} command
|
|
251 |
(if \file{\var{member}.ssd01.Z} exists) to uncompress the \acronym{SAS} dataset before
|
|
252 |
proceeding. This assumes you have the file permissions to allow
|
|
253 |
uncompressing in place. If the file is already uncompressed, this
|
|
254 |
option is ignored.
|
|
255 |
}
|
|
256 |
\item{where}{
|
|
257 |
by default, a list or data frame which contains all the variables is returned.
|
|
258 |
If you specify \code{where}, each individual variable is placed into a
|
|
259 |
separate object (whose name is the name of the variable) using the
|
|
260 |
\code{assign} function with the \code{where} argument. For example, you can
|
|
261 |
put each variable in its own file in a directory, which in some cases
|
|
262 |
may save memory over attaching a data frame.
|
|
263 |
}
|
|
264 |
\item{code}{
|
|
265 |
a special missing value code (\samp{A} through \samp{Z} or \samp{\_}) to check
|
|
266 |
against. If \code{code} is omitted, \code{is.special.miss} will return
|
|
267 |
a \code{TRUE} for each observation that has any special missing value.
|
|
268 |
}
|
|
269 |
\item{object}{
|
|
270 |
a variable in a data frame created by \code{sas.get}
|
|
271 |
}
|
|
272 |
\item{\dots}{ignored}
|
269 | 273 |
}
|
270 | 274 |
\value{
|
271 | |
if \code{data.frame.out} is \code{TRUE}, the output will
|
272 | |
be a data frame resembling the \acronym{SAS} dataset. If \code{id}
|
273 | |
was specified, that column of the data frame will be used
|
274 | |
as the row names of the data frame. Each variable in the data frame
|
275 | |
or vector in the list will have the attributes \code{label} and \code{format}
|
276 | |
containing \acronym{SAS} labels and formats. Underscores in formats are
|
277 | |
converted to periods. Formats for character variables have \code{$} placed
|
278 | |
in front of their names.
|
279 | |
If \code{formats} is \code{TRUE} and there are any
|
280 | |
appropriate format definitions in \code{format.library}, the returned
|
281 | |
object will have attribute \code{formats} containing lists named the
|
282 | |
same as the format names (with periods substituted for underscores and
|
283 | |
character formats prefixed by \code{$}).
|
284 | |
Each of these lists has a vector called \code{values} and one called
|
285 | |
\code{labels} with the \code{PROC FORMAT; VALUE \dots} definitions.
|
286 | |
|
287 | |
|
288 | |
If \code{data.frame.out} is \code{FALSE}, the output will
|
289 | |
be a list of vectors, each containing a variable from the \acronym{SAS}
|
290 | |
dataset. If \code{id} was specified, that element of the list will
|
291 | |
be used as the \code{id} attribute of the entire list.
|
|
275 |
if \code{data.frame.out} is \code{TRUE}, the output will
|
|
276 |
be a data frame resembling the \acronym{SAS} dataset. If \code{id}
|
|
277 |
was specified, that column of the data frame will be used
|
|
278 |
as the row names of the data frame. Each variable in the data frame
|
|
279 |
or vector in the list will have the attributes \code{label} and \code{format}
|
|
280 |
containing \acronym{SAS} labels and formats. Underscores in formats are
|
|
281 |
converted to periods. Formats for character variables have \code{\$} placed
|
|
282 |
in front of their names.
|
|
283 |
If \code{formats} is \code{TRUE} and there are any
|
|
284 |
appropriate format definitions in \code{format.library}, the returned
|
|
285 |
object will have attribute \code{formats} containing lists named the
|
|
286 |
same as the format names (with periods substituted for underscores and
|
|
287 |
character formats prefixed by \code{\$}).
|
|
288 |
Each of these lists has a vector called \code{values} and one called
|
|
289 |
\code{labels} with the \preformatted{PROC FORMAT; VALUE ...} definitions.
|
|
290 |
|
|
291 |
|
|
292 |
If \code{data.frame.out} is \code{FALSE}, the output will
|
|
293 |
be a list of vectors, each containing a variable from the \acronym{SAS}
|
|
294 |
dataset. If \code{id} was specified, that element of the list will
|
|
295 |
be used as the \code{id} attribute of the entire list.
|
292 | 296 |
}
|
293 | 297 |
\section{Side Effects}{
|
294 | |
if a \acronym{SAS} error occurs and \code{quiet} is \code{FALSE}, then the \acronym{SAS} log file will be
|
295 | |
printed under the control of the \bold{less} pager.
|
|
298 |
if a \acronym{SAS} error occurs and \code{quiet} is \code{FALSE}, then the \acronym{SAS} log file will be
|
|
299 |
printed under the control of the \command{less} pager.
|
296 | 300 |
}
|
297 | 301 |
\details{
|
298 | |
If you specify \code{special.miss = TRUE} and there are no special missing
|
299 | |
values in the data \acronym{SAS} dataset, the \acronym{SAS} step will bomb.
|
300 | |
|
301 | |
For variables having a \code{PROC FORMAT VALUE}
|
302 | |
format with some of the levels undefined, \code{sas.get} will interpret those
|
303 | |
values as \code{NA} if you are using \code{recode}.
|
304 | |
|
305 | |
The \acronym{SAS} macro \code{sas_get} uses record lengths of up to 4096 in two
|
306 | |
places. If you are exporting records that are very long (because of
|
307 | |
a large number of variables and/or long character variables), you
|
308 | |
may want to edit these \code{LRECL}s to quadruple them, for example.
|
|
302 |
If you specify \code{special.miss = TRUE} and there are no special missing
|
|
303 |
values in the data \acronym{SAS} dataset, the \acronym{SAS} step will bomb.
|
|
304 |
|
|
305 |
For variables having a \preformatted{PROC FORMAT VALUE}
|
|
306 |
format with some of the levels undefined, \code{sas.get} will interpret those
|
|
307 |
values as \code{NA} if you are using \code{recode}.
|
|
308 |
|
|
309 |
The \acronym{SAS} macro \file{sas\_get} uses record lengths of up to 4096 in two
|
|
310 |
places. If you are exporting records that are very long (because of
|
|
311 |
a large number of variables and/or long character variables), you
|
|
312 |
may want to edit these \preformatted{LRECL}s to quadruple them, for example.
|
309 | 313 |
}
|
310 | 314 |
\note{
|
311 | |
You must be able to run \acronym{SAS} (by typing \bold{sas}) on your system.
|
312 | |
If the S command \code{!sas} does not start \acronym{SAS}, then this function cannot work.
|
313 | |
|
314 | |
If you are reading time or
|
315 | |
date-time variables, you will need to execute the command \code{library(chron)}
|
316 | |
to print those variables or the data frame if the \code{timeDate} function
|
317 | |
is not available.
|
|
315 |
You must be able to run \acronym{SAS} (by typing \command{sas}) on your system.
|
|
316 |
If the S command \code{!sas} does not start \acronym{SAS}, then this function cannot work.
|
|
317 |
|
|
318 |
If you are reading time or
|
|
319 |
date-time variables, you will need to execute the command \code{library(chron)}
|
|
320 |
to print those variables or the data frame if the \code{timeDate} function
|
|
321 |
is not available.
|
318 | 322 |
}
|
319 | 323 |
\section{BACKGROUND}{
|
320 | |
The references cited below explain the structure of \acronym{SAS} datasets and how
|
321 | |
they are stored under UNIX.
|
322 | |
See \emph{\acronym{SAS} Language}
|
323 | |
for a discussion of the \dQuote{subsetting if} statement.
|
|
324 |
The references cited below explain the structure of \acronym{SAS} datasets and how
|
|
325 |
they are stored under \acronym{UNIX}.
|
|
326 |
See \emph{\acronym{SAS} Language}
|
|
327 |
for a discussion of the \dQuote{subsetting if} statement.
|
324 | 328 |
}
|
325 | 329 |
\author{
|
326 | |
Terry Therneau, Mayo Clinic
|
327 | |
\cr
|
328 | |
Frank Harrell, Vanderbilt University
|
329 | |
\cr
|
330 | |
Bill Dunlap, University of Washington and Insightful Corporation
|
331 | |
\cr
|
332 | |
Michael W. Kattan, Cleveland Clinic Foundation
|
|
330 |
Terry Therneau, Mayo Clinic
|
|
331 |
\cr
|
|
332 |
Frank Harrell, Vanderbilt University
|
|
333 |
\cr
|
|
334 |
Bill Dunlap, University of Washington and Insightful Corporation
|
|
335 |
\cr
|
|
336 |
Michael W. Kattan, Cleveland Clinic Foundation
|
333 | 337 |
}
|
334 | 338 |
\references{
|
335 | |
\acronym{SAS} Institute Inc. (1990).
|
336 | |
\emph{\acronym{SAS} Language: Reference, Version 6.}
|
337 | |
First Edition.
|
338 | |
\acronym{SAS} Institute Inc., Cary, North Carolina.
|
339 | |
|
340 | |
|
341 | |
\acronym{SAS} Institute Inc. (1988).
|
342 | |
\acronym{SAS} Technical Report P-176,
|
343 | |
\emph{Using the \acronym{SAS} System, Release 6.03, under UNIX Operating Systems and Derivatives. }
|
344 | |
\acronym{SAS} Institute Inc., Cary, North Carolina.
|
345 | |
|
346 | |
|
347 | |
\acronym{SAS} Institute Inc. (1985).
|
348 | |
\emph{\acronym{SAS} Introductory Guide.}
|
349 | |
Third Edition.
|
350 | |
\acronym{SAS} Institute Inc., Cary, North Carolina.
|
|
339 |
\acronym{SAS} Institute Inc. (1990).
|
|
340 |
\emph{\acronym{SAS} Language: Reference, Version 6.}
|
|
341 |
First Edition.
|
|
342 |
\acronym{SAS} Institute Inc., Cary, North Carolina.
|
|
343 |
|
|
344 |
|
|
345 |
\acronym{SAS} Institute Inc. (1988).
|
|
346 |
\acronym{SAS} Technical Report P-176,
|
|
347 |
\emph{Using the \acronym{SAS} System, Release 6.03, under UNIX Operating Systems and Derivatives. }
|
|
348 |
\acronym{SAS} Institute Inc., Cary, North Carolina.
|
|
349 |
|
|
350 |
|
|
351 |
\acronym{SAS} Institute Inc. (1985).
|
|
352 |
\emph{\acronym{SAS} Introductory Guide.}
|
|
353 |
Third Edition.
|
|
354 |
\acronym{SAS} Institute Inc., Cary, North Carolina.
|
351 | 355 |
}
|
352 | 356 |
\seealso{
|
353 | |
\code{\link{data.frame}}, \code{\link[Hmisc]{describe}},
|
354 | |
\code{\link[Hmisc]{label}},
|
355 | |
\code{\link[Hmisc]{upData}},
|
356 | |
\code{\link[Hmisc]{cleanup.import}}
|
|
357 |
\code{\link{data.frame}}, \code{\link[Hmisc]{describe}},
|
|
358 |
\code{\link[Hmisc]{label}},
|
|
359 |
\code{\link[Hmisc]{upData}},
|
|
360 |
\code{\link[Hmisc]{cleanup.import}}
|
357 | 361 |
}
|
358 | 362 |
\examples{
|
359 | 363 |
\dontrun{
|