Codebase list pspp / HEAD doc / language.texi
HEAD

Tree @HEAD (Download .tar.gz)

language.texi @HEADraw · history · blame

   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
@c PSPP - a program for statistical analysis.
@c Copyright (C) 2017 Free Software Foundation, Inc.
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3
@c or any later version published by the Free Software Foundation;
@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
@c A copy of the license is included in the section entitled "GNU
@c Free Documentation License".
@c
@node Language
@chapter The @pspp{} language
@cindex language, @pspp{}
@cindex @pspp{}, language

This chapter discusses elements common to many @pspp{} commands.
Later chapters will describe individual commands in detail.

@menu
* Tokens::                      Characters combine to form tokens.
* Commands::                    Tokens combine to form commands.
* Syntax Variants::             Batch vs. Interactive mode
* Types of Commands::           Commands come in several flavors.
* Order of Commands::           Commands combine to form syntax files.
* Missing Observations::        Handling missing observations.
* Datasets::                    Data organization.
* Files::                       Files used by @pspp{}.
* File Handles::                How files are named.
* BNF::                         How command syntax is described.
@end menu


@node Tokens
@section Tokens
@cindex language, lexical analysis
@cindex language, tokens
@cindex tokens
@cindex lexical analysis

@pspp{} divides most syntax file lines into series of short chunks
called @dfn{tokens}.
Tokens are then grouped to form commands, each of which tells
@pspp{} to take some action---read in data, write out data, perform
a statistical procedure, etc.  Each type of token is
described below.

@table @strong
@cindex identifiers
@item Identifiers
Identifiers are names that typically specify variables, commands, or
subcommands.  The first character in an identifier must be a letter,
@samp{#}, or @samp{@@}.  The remaining characters in the identifier
must be letters, digits, or one of the following special characters:

@example
@center @.  _  $  #  @@
@end example

@cindex case-sensitivity
Identifiers may be any length, but only the first 64 bytes are
significant.  Identifiers are not case-sensitive: @code{foobar},
@code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
different representations of the same identifier.

@cindex identifiers, reserved
@cindex reserved identifiers
Some identifiers are reserved.  Reserved identifiers may not be used
in any context besides those explicitly described in this manual.  The
reserved identifiers are:

@example
@center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
@end example

@item Keywords
Keywords are a subclass of identifiers that form a fixed part of
command syntax.  For example, command and subcommand names are
keywords.  Keywords may be abbreviated to their first 3 characters if
this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
characters are also accepted: @samp{FRE}, @samp{FREQ}, and
@samp{FREQUENCIES} are equivalent when the last is a keyword.)

Reserved identifiers are always used as keywords.  Other identifiers
may be used both as keywords and as user-defined identifiers, such as
variable names.

@item Numbers
@cindex numbers
@cindex integers
@cindex reals
Numbers are expressed in decimal.  A decimal point is optional.
Numbers may be expressed in scientific notation by adding @samp{e} and
a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
are some more examples of valid numbers:

@example
-5  3.14159265359  1e100  -.707  8945.
@end example

Negative numbers are expressed with a @samp{-} prefix.  However, in
situations where a literal @samp{-} token is expected, what appears to
be a negative number is treated as @samp{-} followed by a positive
number.

No white space is allowed within a number token, except for horizontal
white space between @samp{-} and the rest of the number.

The last example above, @samp{8945.} will be interpreted as two
tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
@xref{Commands, , Forming commands of tokens}.

@item Strings
@cindex strings
@cindex @samp{'}
@cindex @samp{"}
@cindex case-sensitivity
Strings are literal sequences of characters enclosed in pairs of
single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
character used for quoting in the string, double it, e.g.@:
@samp{'it''s an apostrophe'}.  White space and case of letters are
significant inside strings.

Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
'c'} is equivalent to @samp{'abc'}.  So that a long string may be
broken across lines, a line break may precede or follow, or both
precede and follow, the @samp{+}.  (However, an entirely blank line
preceding or following the @samp{+} is interpreted as ending the
current command.)

Strings may also be expressed as hexadecimal character values by
prefixing the initial quote character by @samp{x} or @samp{X}.
Regardless of the syntax file or active dataset's encoding, the
hexadecimal digits in the string are interpreted as Unicode characters
in UTF-8 encoding.

Individual Unicode code points may also be expressed by specifying the
hexadecimal code point number in single or double quotes preceded by
@samp{u} or @samp{U}.  For example, Unicode code point U+1D11E, the
musical G clef character, could be expressed as @code{U'1D11E'}.
Invalid Unicode code points (above U+10FFFF or in between U+D800 and
U+DFFF) are not allowed.

When strings are concatenated with @samp{+}, each segment's prefix is
considered individually.  For example, @code{'The G clef symbol is:' +
u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise
plain text string.

@item Punctuators and Operators
@cindex punctuators
@cindex operators
These tokens are the punctuators and operators:

@example
@center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
@end example

Most of these appear within the syntax of commands, but the period
(@samp{.}) punctuator is used only at the end of a command.  It is a
punctuator only as the last character on a line (except white space).
When it is the last non-space character on a line, a period is not
treated as part of another token, even if it would otherwise be part
of, e.g.@:, an identifier or a floating-point number.
@end table

@node Commands
@section Forming commands of tokens

@cindex @pspp{}, command structure
@cindex language, command structure
@cindex commands, structure

Most @pspp{} commands share a common structure.  A command begins with a
command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
CASES}.  The command name may be abbreviated to its first word, and
each word in the command name may be abbreviated to its first three
or more characters, where these abbreviations are unambiguous.

The command name may be followed by one or more @dfn{subcommands}.
Each subcommand begins with a subcommand name, which may be
abbreviated to its first three letters.  Some subcommands accept a
series of one or more specifications, which follow the subcommand
name, optionally separated from it by an equals sign
(@samp{=}). Specifications may be separated from each other
by commas or spaces.  Each subcommand must be separated from the next (if any)
by a forward slash (@samp{/}).

There are multiple ways to mark the end of a command.  The most common
way is to end the last line of the command with a period (@samp{.}) as
described in the previous section (@pxref{Tokens}).  A blank line, or
one that consists only of white space or comments, also ends a command.

@node Syntax Variants
@section Syntax Variants

@cindex Batch syntax
@cindex Interactive syntax

There are three variants of command syntax, which vary only in how
they detect the end of one command and the start of the next.

In @dfn{interactive mode}, which is the default for syntax typed at a
command prompt, a period as the last non-blank character on a line
ends a command.  A blank line also ends a command.

In @dfn{batch mode}, an end-of-line period or a blank line also ends a
command.  Additionally, it treats any line that has a non-blank
character in the leftmost column as beginning a new command.  Thus, in
batch mode the second and subsequent lines in a command must be
indented.

Regardless of the syntax mode, a plus sign, minus sign, or period in
the leftmost column of a line is ignored and causes that line to begin
a new command.  This is most useful in batch mode, in which the first
line of a new command could not otherwise be indented, but it is
accepted regardless of syntax mode.

The default mode for reading commands from a file is @dfn{auto mode}.
It is the same as batch mode, except that a line with a non-blank in
the leftmost column only starts a new command if that line begins with
the name of a @pspp{} command.  This correctly interprets most valid @pspp{}
syntax files regardless of the syntax mode for which they are
intended.

The @option{--interactive} (or @option{-i}) or @option{--batch} (or
@option{-b}) options set the syntax mode for files listed on the @pspp{}
command line.  @xref{Main Options}, for more details.

@node Types of Commands
@section Types of Commands

Commands in @pspp{} are divided roughly into six categories:

@table @strong
@item Utility commands
@cindex utility commands
Set or display various global options that affect @pspp{} operations.
May appear anywhere in a syntax file.  @xref{Utilities, , Utility
commands}.

@item File definition commands
@cindex file definition commands
Give instructions for reading data from text files or from special
binary ``system files''.  Most of these commands replace any previous
data or variables with new data or
variables.  At least one file definition command must appear before the first command in any of
the categories below.  @xref{Data Input and Output}.

@item Input program commands
@cindex input program commands
Though rarely used, these provide tools for reading data files
in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.

@item Transformations
@cindex transformations
Perform operations on data and write data to output files.  Transformations
are not carried out until a procedure is executed.  

@item Restricted transformations
@cindex restricted transformations
Transformations that cannot appear in certain contexts.  @xref{Order
of Commands}, for details.

@item Procedures
@cindex procedures
Analyze data, writing results of analyses to the listing file.  Cause
transformations specified earlier in the file to be performed.  In a
more general sense, a @dfn{procedure} is any command that causes the
active dataset (the data) to be read.
@end table

@node Order of Commands
@section Order of Commands
@cindex commands, ordering
@cindex order of commands

@pspp{} does not place many restrictions on ordering of commands.  The
main restriction is that variables must be defined before they are otherwise
referenced.  This section describes the details of command ordering,
but most users will have no need to refer to them.

@pspp{} possesses five internal states, called @dfn{initial}, @dfn{input-program}
@dfn{file-type}, @dfn{transformation}, and @dfn{procedure} states.  (Please note the
distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
@emph{commands} and the @dfn{input-program} and @dfn{file-type} @emph{states}.)

@pspp{} starts in the initial state.  Each successful completion
of a command may cause a state transition.  Each type of command has its
own rules for state transitions:

@table @strong
@item Utility commands
@itemize @bullet
@item
Valid in any state.
@item
Do not cause state transitions.  Exception: when @cmd{N OF CASES}
is executed in the procedure state, it causes a transition to the
transformation state.
@end itemize

@item @cmd{DATA LIST}
@itemize @bullet
@item
Valid in any state.
@item
When executed in the initial or procedure state, causes a transition to
the transformation state.  
@item
Clears the active dataset if executed in the procedure or transformation
state.
@end itemize

@item @cmd{INPUT PROGRAM}
@itemize @bullet
@item
Invalid in input-program and file-type states.
@item
Causes a transition to the intput-program state.  
@item
Clears the active dataset.
@end itemize

@item @cmd{FILE TYPE}
@itemize @bullet
@item
Invalid in intput-program and file-type states.
@item
Causes a transition to the file-type state.
@item
Clears the active dataset.
@end itemize

@item Other file definition commands
@itemize @bullet
@item
Invalid in input-program and file-type states.
@item
Cause a transition to the transformation state.
@item
Clear the active dataset, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
and @cmd{UPDATE}.
@end itemize

@item Transformations
@itemize @bullet
@item
Invalid in initial and file-type states.
@item
Cause a transition to the transformation state.
@end itemize

@item Restricted transformations
@itemize @bullet
@item
Invalid in initial, input-program, and file-type states.
@item
Cause a transition to the transformation state.
@end itemize

@item Procedures
@itemize @bullet
@item
Invalid in initial, input-program, and file-type states.
@item
Cause a transition to the procedure state.
@end itemize
@end table

@node Missing Observations
@section Handling missing observations
@cindex missing values
@cindex values, missing

@pspp{} includes special support for unknown numeric data values.
Missing observations are assigned a special value, called the
@dfn{system-missing value}.  This ``value'' actually indicates the
absence of a value; it means that the actual value is unknown.  Procedures
automatically exclude from analyses those observations or cases that
have missing values.  Details of missing value exclusion depend on the
procedure and can often be controlled by the user; refer to
descriptions of individual procedures for details.

The system-missing value exists only for numeric variables.  String
variables always have a defined value, even if it is only a string of
spaces.

Variables, whether numeric or string, can have designated
@dfn{user-missing values}.  Every user-missing value is an actual value
for that variable.  However, most of the time user-missing values are
treated in the same way as the system-missing value.

For more information on missing values, see the following sections:
@ref{Datasets}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
documentation on individual procedures for information on how they
handle missing values.

@node Datasets
@section Datasets
@cindex dataset
@cindex variable
@cindex dictionary

@pspp{} works with data organized into @dfn{datasets}.  A dataset
consists of a set of @dfn{variables}, which taken together are said to
form a @dfn{dictionary}, and one or more @dfn{cases}, each of which
has one value for each variable.

At any given time @pspp{} has exactly one distinguished dataset, called
the @dfn{active dataset}.  Most @pspp{} commands work only with the
active dataset.  In addition to the active dataset, @pspp{} also supports
any number of additional open datasets.  The @cmd{DATASET} commands
can choose a new active dataset from among those that are open, as
well as create and destroy datasets (@pxref{DATASET}).

The sections below describe variables in more detail.

@menu
* Attributes::                  Attributes of variables.
* System Variables::            Variables automatically defined by @pspp{}.
* Sets of Variables::           Lists of variable names.
* Input and Output Formats::    Input and output formats.
* Scratch Variables::           Variables deleted by procedures.
@end menu

@node Attributes
@subsection Attributes of Variables
@cindex variables, attributes of
@cindex attributes of variables
Each variable has a number of attributes, including:

@table @strong
@item Name
An identifier, up to 64 bytes long.  Each variable must have a different name.
@xref{Tokens}.

Some system variable names begin with @samp{$}, but user-defined
variables' names may not begin with @samp{$}.

@cindex @samp{.}
@cindex period
@cindex variable names, ending with period
The final character in a variable name should not be @samp{.}, because
such an identifier will be misinterpreted when it is the final token
on a line: @code{FOO.} will be divided into two separate tokens,
@samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.

@cindex @samp{_}
The final character in a variable name should not be @samp{_}, because
some such identifiers are used for special purposes by @pspp{}
procedures.

As with all @pspp{} identifiers, variable names are not case-sensitive.
@pspp{} capitalizes variable names on output the same way they were
capitalized at their point of definition in the input.

@cindex variables, type
@cindex type of variables
@item Type
Numeric or string.

@cindex variables, width
@cindex width of variables
@item Width
(string variables only) String variables with a width of 8 characters or
fewer are called @dfn{short string variables}.  Short string variables
may be used in a few contexts where @dfn{long string variables} (those
with widths greater than 8) are not allowed.

@item Position
Variables in the dictionary are arranged in a specific order.
@cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.

@item Initialization
Either reinitialized to 0 or spaces for each case, or left at its
existing value.  @xref{LEAVE}.

@cindex missing values
@cindex values, missing
@item Missing values
Optionally, up to three values, or a range of values, or a specific
value plus a range, can be specified as @dfn{user-missing values}.
There is also a @dfn{system-missing value} that is assigned to an
observation when there is no other obvious value for that observation.
Observations with missing values are automatically excluded from
analyses.  User-missing values are actual data values, while the
system-missing value is not a value at all.  @xref{Missing Observations}.

@cindex variable labels
@cindex labels, variable
@item Variable label
A string that describes the variable.  @xref{VARIABLE LABELS}.

@cindex value labels
@cindex labels, value
@item Value label
Optionally, these associate each possible value of the variable with a
string.  @xref{VALUE LABELS}.

@cindex print format
@item Print format
Display width, format, and (for numeric variables) number of decimal
places.  This attribute does not affect how data are stored, just how
they are displayed.  Example: a width of 8, with 2 decimal places.
@xref{Input and Output Formats}.

@cindex write format
@item Write format
Similar to print format, but used by the @cmd{WRITE} command
(@pxref{WRITE}).

@cindex custom attributes
@item Custom attributes
User-defined associations between names and values.  @xref{VARIABLE
ATTRIBUTE}.

@cindex variable role
@item Role
The intended role of a variable for use in dialog boxes in graphical
user interfaces.  @xref{VARIABLE ROLE}.
@end table

@node System Variables
@subsection Variables Automatically Defined by @pspp{}
@cindex system variables
@cindex variables, system

There are seven system variables.  These are not like ordinary
variables because system variables are not always stored.  They can be used only
in expressions.  These system variables, whose values and output formats
cannot be modified, are described below.

@table @code
@cindex @code{$CASENUM}
@item $CASENUM
Case number of the case at the moment.  This changes as cases are
shuffled around.

@cindex @code{$DATE}
@item $DATE
Date the @pspp{} process was started, in format A9, following the
pattern @code{DD MMM YY}.

@cindex @code{$JDATE}
@item $JDATE
Number of days between 15 Oct 1582 and the time the @pspp{} process
was started.

@cindex @code{$LENGTH}
@item $LENGTH
Page length, in lines, in format F11.

@cindex @code{$SYSMIS}
@item $SYSMIS
System missing value, in format F1.

@cindex @code{$TIME}
@item $TIME
Number of seconds between midnight 14 Oct 1582 and the time the active dataset
was read, in format F20.

@cindex @code{$WIDTH}
@item $WIDTH
Page width, in characters, in format F3.
@end table

@node Sets of Variables
@subsection Lists of variable names
@cindex @code{TO} convention
@cindex convention, @code{TO}

To refer to a set of variables, list their names one after another.
Optionally, their names may be separated by commas.  To include a
range of variables from the dictionary in the list, write the name of
the first and last variable in the range, separated by @code{TO}.  For
instance, if the dictionary contains six variables with the names
@code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
variables @code{X2}, @code{GOAL}, and @code{MET}.

Commands that define variables, such as @cmd{DATA LIST}, give
@code{TO} an alternate meaning.  With these commands, @code{TO} define
sequences of variables whose names end in consecutive integers.  The
syntax is two identifiers that begin with the same root and end with
numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
@code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
@code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
@code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.

After a set of variables has been defined with @cmd{DATA LIST} or
another command with this method, the same set can be referenced on
later commands using the same syntax.

@node Input and Output Formats
@subsection Input and Output Formats

@cindex formats
An @dfn{input format} describes how to interpret the contents of an
input field as a number or a string.  It might specify that the field
contains an ordinary decimal number, a time or date, a number in binary
or hexadecimal notation, or one of several other notations.  Input
formats are used by commands such as @cmd{DATA LIST} that read data or
syntax files into the @pspp{} active dataset.

Every input format corresponds to a default @dfn{output format} that
specifies the formatting used when the value is output later.  It is
always possible to explicitly specify an output format that resembles
the input format.  Usually, this is the default, but in cases where the
input format is unfriendly to human readability, such as binary or
hexadecimal formats, the default output format is an easier-to-read
decimal format.

Every variable has two output formats, called its @dfn{print format} and
@dfn{write format}.  Print formats are used in most output contexts;
write formats are used only by @cmd{WRITE} (@pxref{WRITE}).  Newly
created variables have identical print and write formats, and
@cmd{FORMATS}, the most commonly used command for changing formats
(@pxref{FORMATS}), sets both of them to the same value as well.  Thus,
most of the time, the distinction between print and write formats is
unimportant.

Input and output formats are specified to @pspp{} with 
a @dfn{format specification} of the
form @subcmd{@var{TYPE}@var{w}} or @code{TYPE@var{w}.@var{d}}, where
@var{TYPE} is one of the format types described later, @var{w} is a
field width measured in columns, and @var{d} is an optional number of
decimal places.  If @var{d} is omitted, a value of 0 is assumed.  Some
formats do not allow a nonzero @var{d} to be specified.

The following sections describe the input and output formats supported
by @pspp{}.

@menu
* Basic Numeric Formats::       
* Custom Currency Formats::     
* Legacy Numeric Formats::      
* Binary and Hexadecimal Numeric Formats::  
* Time and Date Formats::       
* Date Component Formats::      
* String Formats::              
@end menu

@node Basic Numeric Formats
@subsubsection Basic Numeric Formats

@cindex numeric formats
The basic numeric formats are used for input and output of real numbers
in standard or scientific notation.  The following table shows an
example of how each format displays positive and negative numbers with
the default decimal point setting:

@float
@multitable {DOLLAR10.2} {@code{@tie{}$3,141.59}} {@code{-$3,141.59}}
@headitem Format @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
@item F8.2       @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
@item COMMA9.2   @tab @code{@tie{}3,141.59}  @tab @code{-3,141.59}
@item DOT9.2     @tab @code{@tie{}3.141,59}  @tab @code{-3.141,59}
@item DOLLAR10.2 @tab @code{@tie{}$3,141.59} @tab @code{-$3,141.59}
@item PCT9.2     @tab @code{@tie{}3141.59%}  @tab @code{-3141.59%}
@item E8.1       @tab @code{@tie{}3.1E+003}  @tab @code{-3.1E+003}
@end multitable
@end float

On output, numbers in F format are expressed in standard decimal
notation with the requested number of decimal places.  The other formats
output some variation on this style:

@itemize @bullet
@item
Numbers in COMMA format are additionally grouped every three digits by
inserting a grouping character.  The grouping character is ordinarily a
comma, but it can be changed to a period (@pxref{SET DECIMAL}).

@item
DOT format is like COMMA format, but it interchanges the role of the
decimal point and grouping characters.  That is, the current grouping
character is used as a decimal point and vice versa.

@item
DOLLAR format is like COMMA format, but it prefixes the number with
@samp{$}.

@item
PCT format is like F format, but adds @samp{%} after the number.

@item
The E format always produces output in scientific notation.
@end itemize

On input, the basic numeric formats accept positive and numbers in
standard decimal notation or scientific notation.  Leading and trailing
spaces are allowed.  An empty or all-spaces field, or one that contains
only a single period, is treated as the system missing value.

In scientific notation, the exponent may be introduced by a sign
(@samp{+} or @samp{-}), or by one of the letters @samp{e} or @samp{d}
(in uppercase or lowercase), or by a letter followed by a sign.  A
single space may follow the letter or the sign or both.

On fixed-format @cmd{DATA LIST} (@pxref{DATA LIST FIXED}) and in a few
other contexts, decimals are implied when the field does not contain a
decimal point.  In F6.5 format, for example, the field @code{314159} is
taken as the value 3.14159 with implied decimals.  Decimals are never
implied if an explicit decimal point is present or if scientific
notation is used.

E and F formats accept the basic syntax already described.  The other
formats allow some additional variations:

@itemize @bullet
@item
COMMA, DOLLAR, and DOT formats ignore grouping characters within the
integer part of the input field.  The identity of the grouping
character depends on the format.

@item
DOLLAR format allows a dollar sign to precede the number.  In a negative
number, the dollar sign may precede or follow the minus sign.

@item
PCT format allows a percent sign to follow the number.
@end itemize

All of the basic number formats have a maximum field width of 40 and
accept no more than 16 decimal places, on both input and output.  Some
additional restrictions apply:

@itemize @bullet
@item
As input formats, the basic numeric formats allow no more decimal places
than the field width.  As output formats, the field width must be
greater than the number of decimal places; that is, large enough to
allow for a decimal point and the number of requested decimal places.
DOLLAR and PCT formats must allow an additional column for @samp{$} or
@samp{%}.

@item
The default output format for a given input format increases the field
width enough to make room for optional input characters.  If an input
format calls for decimal places, the width is increased by 1 to make
room for an implied decimal point.  COMMA, DOT, and DOLLAR formats also
increase the output width to make room for grouping characters.  DOLLAR
and PCT further increase the output field width by 1 to make room for
@samp{$} or @samp{%}.  The increased output width is capped at 40, the
maximum field width.

@item
The E format is exceptional.  For output, E format has a minimum width
of 7 plus the number of decimal places.  The default output format for
an E input format is an E format with at least 3 decimal places and
thus a minimum width of 10.
@end itemize

More details of basic numeric output formatting are given below:

@itemize @bullet
@item
Output rounds to nearest, with ties rounded away from zero.  Thus, 2.5
is output as @code{3} in F1.0 format, and -1.125 as @code{-1.13} in F5.1
format.

@item
The system-missing value is output as a period in a field of spaces,
placed in the decimal point's position, or in the rightmost column if no
decimal places are requested.  A period is used even if the decimal
point character is a comma.

@item
A number that does not fill its field is right-justified within the
field.

@item
A number is too large for its field causes decimal places to be dropped
to make room.  If dropping decimals does not make enough room,
scientific notation is used if the field is wide enough.  If a number
does not fit in the field, even in scientific notation, the overflow is
indicated by filling the field with asterisks (@samp{*}).

@item
COMMA, DOT, and DOLLAR formats insert grouping characters only if space
is available for all of them.  Grouping characters are never inserted
when all decimal places must be dropped.  Thus, 1234.56 in COMMA5.2
format is output as @samp{@tie{}1235} without a comma, even though there
is room for one, because all decimal places were dropped.

@item
DOLLAR or PCT format drop the @samp{$} or @samp{%} only if the number
would not fit at all without it.  Scientific notation with @samp{$} or
@samp{%} is preferred to ordinary decimal notation without it.

@item
Except in scientific notation, a decimal point is included only when
it is followed by a digit.  If the integer part of the number being
output is 0, and a decimal point is included, then the zero before the
decimal point is dropped.

In scientific notation, the number always includes a decimal point,
even if it is not followed by a digit.

@item
A negative number includes a minus sign only in the presence of a
nonzero digit: -0.01 is output as @samp{-.01} in F4.2 format but as
@samp{@tie{}@tie{}.0} in F4.1 format.  Thus, a ``negative zero'' never
includes a minus sign.

@item
In negative numbers output in DOLLAR format, the dollar sign follows the
negative sign.  Thus, -9.99 in DOLLAR6.2 format is output as
@code{-$9.99}.

@item
In scientific notation, the exponent is output as @samp{E} followed by
@samp{+} or @samp{-} and exactly three digits.  Numbers with magnitude
less than 10**-999 or larger than 10**999 are not supported by most
computers, but if they are supported then their output is considered
to overflow the field and will be output as asterisks.

@item
On most computers, no more than 15 decimal digits are significant in
output, even if more are printed.  In any case, output precision cannot
be any higher than input precision; few data sets are accurate to 15
digits of precision.  Unavoidable loss of precision in intermediate
calculations may also reduce precision of output.

@item
Special values such as infinities and ``not a number'' values are
usually converted to the system-missing value before printing.  In a few
circumstances, these values are output directly.  In fields of width 3
or greater, special values are output as however many characters will
fit from @code{+Infinity} or @code{-Infinity} for infinities, from
@code{NaN} for ``not a number,'' or from @code{Unknown} for other values
(if any are supported by the system).  In fields under 3 columns wide,
special values are output as asterisks.
@end itemize

@node Custom Currency Formats
@subsubsection Custom Currency Formats

@cindex currency formats
The custom currency formats are closely related to the basic numeric
formats, but they allow users to customize the output format.  The
SET command configures custom currency formats, using the syntax
@display
SET CC@var{x}=@t{"}@var{string}@t{"}.
@end display
@noindent 
where @var{x} is A, B, C, D, or E, and @var{string} is no more than 16
characters long.

@var{string} must contain exactly three commas or exactly three periods
(but not both), except that a single quote character may be used to
``escape'' a following comma, period, or single quote.  If three commas
are used, commas will be used for grouping in output, and a period will
be used as the decimal point.  Uses of periods reverses these roles.

The commas or periods divide @var{string} into four fields, called the
@dfn{negative prefix}, @dfn{prefix}, @dfn{suffix}, and @dfn{negative
suffix}, respectively.  The prefix and suffix are added to output
whenever space is available.  The negative prefix and negative suffix
are always added to a negative number when the output includes a nonzero
digit.

The following syntax shows how custom currency formats could be used to
reproduce basic numeric formats:

@example
@group
SET CCA="-,,,".  /* Same as COMMA.
SET CCB="-...".  /* Same as DOT.
SET CCC="-,$,,". /* Same as DOLLAR.
SET CCD="-,,%,". /* Like PCT, but groups with commas.
@end group
@end example

Here are some more examples of custom currency formats.  The final
example shows how to use a single quote to escape a delimiter:

@example
@group
SET CCA=",EUR,,-".   /* Euro.
SET CCB="(,USD ,,)". /* US dollar.
SET CCC="-.R$..".    /* Brazilian real.
SET CCD="-,, NIS,".  /* Israel shekel.
SET CCE="-.Rp'. ..". /* Indonesia Rupiah.
@end group
@end example

@noindent These formats would yield the following output:

@float
@multitable {CCD13.2} {@code{@tie{}@tie{}USD 3,145.59}} {@code{(USD 3,145.59)}}
@headitem Format @tab @code{@tie{}3145.59}         @tab @code{-3145.59}
@item CCA12.2 @tab @code{@tie{}EUR3,145.59}        @tab @code{EUR3,145.59-}
@item CCB14.2 @tab @code{@tie{}@tie{}USD 3,145.59} @tab @code{(USD 3,145.59)}
@item CCC11.2 @tab @code{@tie{}R$3.145,59}         @tab @code{-R$3.145,59}
@item CCD13.2 @tab @code{@tie{}3,145.59 NIS}       @tab @code{-3,145.59 NIS}
@item CCE10.0 @tab @code{@tie{}Rp. 3.146}          @tab @code{-Rp. 3.146}
@end multitable
@end float

The default for all the custom currency formats is @samp{-,,,},
equivalent to COMMA format.

@node Legacy Numeric Formats
@subsubsection Legacy Numeric Formats

The N and Z numeric formats provide compatibility with legacy file
formats.  They have much in common:

@itemize @bullet
@item
Output is rounded to the nearest representable value, with ties rounded
away from zero.

@item
Numbers too large to display are output as a field filled with asterisks
(@samp{*}).

@item
The decimal point is always implicitly the specified number of digits
from the right edge of the field, except that Z format input allows an
explicit decimal point.

@item
Scientific notation may not be used.

@item
The system-missing value is output as a period in a field of spaces.
The period is placed just to the right of the implied decimal point in
Z format, or at the right end in N format or in Z format if no decimal
places are requested.  A period is used even if the decimal point
character is a comma.

@item
Field width may range from 1 to 40.  Decimal places may range from 0 up
to the field width, to a maximum of 16.

@item
When a legacy numeric format used for input is converted to an output
format, it is changed into the equivalent F format.  The field width is
increased by 1 if any decimal places are specified, to make room for a
decimal point.  For Z format, the field width is increased by 1 more
column, to make room for a negative sign.  The output field width is
capped at 40 columns.
@end itemize

@subsubheading N Format

The N format supports input and output of fields that contain only
digits.  On input, leading or trailing spaces, a decimal point, or any
other non-digit character causes the field to be read as the
system-missing value.  As a special exception, an N format used on
@cmd{DATA LIST FREE} or @cmd{DATA LIST LIST} is treated as the
equivalent F format.

On output, N pads the field on the left with zeros.  Negative numbers
are output like the system-missing value.

@subsubheading Z Format

The Z format is a ``zoned decimal'' format used on IBM mainframes.  Z
format encodes the sign as part of the final digit, which must be one of
the following:
@example
0123456789
@{ABCDEFGHI
@}JKLMNOPQR
@end example
@noindent
where the characters in each row represent digits 0 through 9 in order.
Characters in the first two rows indicate a positive sign; those in the
third indicate a negative sign.

On output, Z fields are padded on the left with spaces.  On input,
leading and trailing spaces are ignored.  Any character in an input
field other than spaces, the digit characters above, and @samp{.} causes
the field to be read as system-missing.

The decimal point character for input and output is always @samp{.},
even if the decimal point character is a comma (@pxref{SET DECIMAL}).

Nonzero, negative values output in Z format are marked as negative even
when no nonzero digits are output.  For example, -0.2 is output in Z1.0
format as @samp{J}.  The ``negative zero'' value supported by most
machines is output as positive.

@node Binary and Hexadecimal Numeric Formats
@subsubsection Binary and Hexadecimal Numeric Formats

@cindex binary formats
@cindex hexadecimal formats
The binary and hexadecimal formats are primarily designed for
compatibility with existing machine formats, not for human readability.
All of them therefore have a F format as default output format.  Some of
these formats are only portable between machines with compatible byte
ordering (endianness) or floating-point format.

Binary formats use byte values that in text files are interpreted as
special control functions, such as carriage return and line feed.  Thus,
data in binary formats should not be included in syntax files or read
from data files with variable-length records, such as ordinary text
files.  They may be read from or written to data files with fixed-length
records.  @xref{FILE HANDLE}, for information on working with
fixed-length records.

@subsubheading P and PK Formats

These are binary-coded decimal formats, in which every byte (except the
last, in P format) represents two decimal digits.  The most-significant
4 bits of the first byte is the most-significant decimal digit, the
least-significant 4 bits of the first byte is the next decimal digit,
and so on.

In P format, the most-significant 4 bits of the last byte are the
least-significant decimal digit.  The least-significant 4 bits represent
the sign: decimal 15 indicates a negative value, decimal 13 indicates a
positive value.

Numbers are rounded downward on output.  The system-missing value and
numbers outside representable range are output as zero.

The maximum field width is 16.  Decimal places may range from 0 up to
the number of decimal digits represented by the field.

The default output format is an F format with twice the input field
width, plus one column for a decimal point (if decimal places were
requested).

@subsubheading IB and PIB Formats

These are integer binary formats.  IB reads and writes 2's complement
binary integers, and PIB reads and writes unsigned binary integers.  The
byte ordering is by default the host machine's, but SET RIB may be used
to select a specific byte ordering for reading (@pxref{SET RIB}) and
SET WIB, similarly, for writing (@pxref{SET WIB}).

The maximum field width is 8.  Decimal places may range from 0 up to the
number of decimal digits in the largest value representable in the field
width.

The default output format is an F format whose width is the number of
decimal digits in the largest value representable in the field width,
plus 1 if the format has decimal places.

@subsubheading RB Format

This is a binary format for real numbers.  By default it reads and
writes the host machine's floating-point format, but SET RRB may be
used to select an alternate floating-point format for reading
(@pxref{SET RRB}) and SET WRB, similarly, for writing (@pxref{SET
WRB}).

The recommended field width depends on the floating-point format.
NATIVE (the default format), IDL, IDB, VD, VG, and ZL formats should use
a field width of 8.  ISL, ISB, VF, and ZS formats should use a field
width of 4.  Other field widths will not produce useful results.  The
maximum field width is 8.  No decimal places may be specified.

The default output format is F8.2.

@subsubheading PIBHEX and RBHEX Formats

These are hexadecimal formats, for reading and writing binary formats
where each byte has been recoded as a pair of hexadecimal digits.

A hexadecimal field consists solely of hexadecimal digits
@samp{0}@dots{}@samp{9} and @samp{A}@dots{}@samp{F}.  Uppercase and
lowercase are accepted on input; output is in uppercase.

Other than the hexadecimal representation, these formats are equivalent
to PIB and RB formats, respectively.  However, bytes in PIBHEX format
are always ordered with the most-significant byte first (big-endian
order), regardless of the host machine's native byte order or @pspp{}
settings.

Field widths must be even and between 2 and 16.  RBHEX format allows no
decimal places; PIBHEX allows as many decimal places as a PIB format
with half the given width.

@node Time and Date Formats
@subsubsection Time and Date Formats

@cindex time formats
@cindex date formats
In @pspp{}, a @dfn{time} is an interval.  The time formats translate
between human-friendly descriptions of time intervals and @pspp{}'s
internal representation of time intervals, which is simply the number of
seconds in the interval.  @pspp{} has three time formats:

@float
@multitable {Time Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 01:31:17.01}}
@headitem Time Format @tab Template                  @tab Example
@item MTIME    @tab @code{MM:SS.ss}             @tab @code{91:17.01}
@item TIME     @tab @code{hh:MM:SS.ss}          @tab @code{01:31:17.01}
@item DTIME    @tab @code{DD HH:MM:SS.ss}       @tab @code{00 04:31:17.01}
@end multitable
@end float

A @dfn{date} is a moment in the past or the future.  Internally, @pspp{}
represents a date as the number of seconds since the @dfn{epoch},
midnight, Oct. 14, 1582.  The date formats translate between
human-readable dates and @pspp{}'s numeric representation of dates and
times.  @pspp{} has several date formats:

@float
@multitable {Date Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 04:31:17.01}}
@headitem Date Format @tab Template                  @tab Example
@item DATE     @tab @code{dd-mmm-yyyy}          @tab @code{01-OCT-1978}
@item ADATE    @tab @code{mm/dd/yyyy}           @tab @code{10/01/1978}
@item EDATE    @tab @code{dd.mm.yyyy}           @tab @code{01.10.1978}
@item JDATE    @tab @code{yyyyjjj}              @tab @code{1978274}
@item SDATE    @tab @code{yyyy/mm/dd}           @tab @code{1978/10/01}
@item QYR      @tab @code{q Q yyyy}             @tab @code{3 Q 1978}
@item MOYR     @tab @code{mmm yyyy}             @tab @code{OCT 1978}
@item WKYR     @tab @code{ww WK yyyy}           @tab @code{40 WK 1978}
@item DATETIME @tab @code{dd-mmm-yyyy HH:MM:SS.ss} @tab @code{01-OCT-1978 04:31:17.01}
@item YMDHMS   @tab @code{yyyy-mm-dd HH:MM:SS.ss} @tab @code{1978-01-OCT 04:31:17.01}
@end multitable
@end float

The templates in the preceding tables describe how the time and date
formats are input and output:

@table @code
@item dd
Day of month, from 1 to 31.  Always output as two digits.

@item mm
@itemx mmm
Month.  In output, @code{mm} is output as two digits, @code{mmm} as the
first three letters of an English month name (January, February,
@dots{}).  In input, both of these formats, plus Roman numerals, are
accepted.

@item yyyy
Year.  In output, DATETIME and YMDHMS always produce 4-digit years;
other formats can produce a 2- or 4-digit year.  The century assumed
for 2-digit years depends on the EPOCH setting (@pxref{SET EPOCH}).
In output, a year outside the epoch causes the whole field to be
filled with asterisks (@samp{*}).

@item jjj
Day of year (Julian day), from 1 to 366.  This is exactly three digits
giving the count of days from the start of the year.  January 1 is
considered day 1.

@item q
Quarter of year, from 1 to 4.  Quarters start on January 1, April 1,
July 1, and October 1.

@item ww
Week of year, from 1 to 53.  Output as exactly two digits.  January 1 is
the first day of week 1.

@item DD
Count of days, which may be positive or negative.  Output as at least
two digits.

@item hh
Count of hours, which may be positive or negative.  Output as at least
two digits.

@item HH
Hour of day, from 0 to 23.  Output as exactly two digits.

@item MM
In MTIME, count of minutes, which may be positive or negative.  Output
as at least two digits.

In other formats, minute of hour, from 0 to 59.  Output as exactly two
digits.

@item SS.ss
Seconds within minute, from 0 to 59.  The integer part is output as
exactly two digits.  On output, seconds and fractional seconds may or
may not be included, depending on field width and decimal places.  On
input, seconds and fractional seconds are optional.  The DECIMAL setting
controls the character accepted and displayed as the decimal point
(@pxref{SET DECIMAL}).
@end table

For output, the date and time formats use the delimiters indicated in
the table.  For input, date components may be separated by spaces or by
one of the characters @samp{-}, @samp{/}, @samp{.}, or @samp{,}, and
time components may be separated by spaces, @samp{:}, or @samp{.}.  On
input, the @samp{Q} separating quarter from year and the @samp{WK}
separating week from year may be uppercase or lowercase, and the spaces
around them are optional.

On input, all time and date formats accept any amount of leading and
trailing white space.

The maximum width for time and date formats is 40 columns.  Minimum
input and output width for each of the time and date formats is shown
below:

@float
@multitable {DATETIME} {Min. Input Width} {Min. Output Width} {4-digit year}
@headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option 
@item DATE @tab 8 @tab 9 @tab 4-digit year
@item ADATE @tab 8 @tab 8 @tab 4-digit year
@item EDATE @tab 8 @tab 8 @tab 4-digit year
@item JDATE @tab 5 @tab 5 @tab 4-digit year
@item SDATE @tab 8 @tab 8 @tab 4-digit year
@item QYR @tab 4 @tab 6 @tab 4-digit year
@item MOYR @tab 6 @tab 6 @tab 4-digit year
@item WKYR @tab 6 @tab 8 @tab 4-digit year
@item DATETIME @tab 17 @tab 17 @tab seconds
@item YMDHMS @tab 12 @tab 16 @tab seconds
@item MTIME @tab 4 @tab 5
@item TIME @tab 5 @tab 5 @tab seconds
@item DTIME @tab 8 @tab 8 @tab seconds
@end multitable
@end float
@noindent 
In the table, ``Option'' describes what increased output width enables:

@table @asis
@item 4-digit year
A field 2 columns wider than minimum will include a 4-digit year.
(DATETIME and YMDHMS formats always include a 4-digit year.)

@item seconds
A field 3 columns wider than minimum will include seconds as well as
minutes.  A field 5 columns wider than minimum, or more, can also
include a decimal point and fractional seconds (but no more than allowed
by the format's decimal places).
@end table

For the time and date formats, the default output format is the same as
the input format, except that @pspp{} increases the field width, if
necessary, to the minimum allowed for output.

Time or dates narrower than the field width are right-justified within
the field.

When a time or date exceeds the field width, characters are trimmed from
the end until it fits.  This can occur in an unusual situation, e.g.@:
with a year greater than 9999 (which adds an extra digit), or for a
negative value on MTIME, TIME, or DTIME (which adds a leading minus sign).

@c What about out-of-range values?

The system-missing value is output as a period at the right end of the
field.  

@node Date Component Formats
@subsubsection Date Component Formats

The WKDAY and MONTH formats provide input and output for the names of
weekdays and months, respectively.

On output, these formats convert a number between 1 and 7, for WKDAY, or
between 1 and 12, for MONTH, into the English name of a day or month,
respectively.  If the name is longer than the field, it is trimmed to
fit.  If the name is shorter than the field, it is padded on the right
with spaces.  Values outside the valid range, and the system-missing
value, are output as all spaces.

On input, English weekday or month names (in uppercase or lowercase) are
converted back to their corresponding numbers.  Weekday and month names
may be abbreviated to their first 2 or 3 letters, respectively.

The field width may range from 2 to 40, for WKDAY, or from 3 to 40, for
MONTH.  No decimal places are allowed.

The default output format is the same as the input format.

@node String Formats
@subsubsection String Formats

@cindex string formats
The A and AHEX formats are the only ones that may be assigned to string
variables.  Neither format allows any decimal places.

In A format, the entire field is treated as a string value.  The field
width may range from 1 to 32,767, the maximum string width.  The default
output format is the same as the input format.

In AHEX format, the field is composed of characters in a string encoded
as hex digit pairs.  On output, hex digits are output in uppercase; on
input, uppercase and lowercase are both accepted.  The default output
format is A format with half the input width.

@node Scratch Variables
@subsection Scratch Variables

@cindex scratch variables
Most of the time, variables don't retain their values between cases.
Instead, either they're being read from a data file or the active dataset,
in which case they assume the value read, or, if created with
@cmd{COMPUTE} or
another transformation, they're initialized to the system-missing value
or to blanks, depending on type.

However, sometimes it's useful to have a variable that keeps its value
between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
use a @dfn{scratch variable}.  Scratch variables are variables whose
names begin with an octothorpe (@samp{#}).  

Scratch variables have the same properties as variables left with
@cmd{LEAVE}: they retain their values between cases, and for the first
case they are initialized to 0 or blanks.  They have the additional
property that they are deleted before the execution of any procedure.
For this reason, scratch variables can't be used for analysis.  To use
a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
to copy its value into an ordinary variable, then use that ordinary
variable in the analysis.

@node Files
@section Files Used by @pspp{}

@pspp{} makes use of many files each time it runs.  Some of these it
reads, some it writes, some it creates.  Here is a table listing the
most important of these files:

@table @strong
@cindex file, command
@cindex file, syntax file
@cindex command file
@cindex syntax file
@item command file
@itemx syntax file
These names (synonyms) refer to the file that contains instructions
that tell @pspp{} what to do.  The syntax file's name is specified on
the @pspp{} command line.  Syntax files can also be read with
@cmd{INCLUDE} (@pxref{INCLUDE}).

@cindex file, data
@cindex data file
@item data file
Data files contain raw data in text or binary format.  Data can also
be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.

@cindex file, output
@cindex output file
@item listing file
One or more output files are created by @pspp{} each time it is
run.  The output files receive the tables and charts produced by
statistical procedures.  The output files may be in any number of formats,
depending on how @pspp{} is configured.

@cindex system file
@cindex file, system
@item system file
System files are binary files that store a dictionary and a set of
cases.  @cmd{GET} and @cmd{SAVE} read and write system files.

@cindex portable file
@cindex file, portable
@item portable file
Portable files are files in a text-based format that store a dictionary
and a set of cases.  @cmd{IMPORT} and @cmd{EXPORT} read and write
portable files.
@end table

@node File Handles
@section File Handles
@cindex file handles

A @dfn{file handle} is a reference to a data file, system file, or 
portable file.  Most often, a file handle is specified as the
name of a file as a string, that is, enclosed within @samp{'} or
@samp{"}.

A file name string that begins or ends with @samp{|} is treated as the
name of a command to pipe data to or from.  You can use this feature
to read data over the network using a program such as @samp{curl}
(e.g.@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to
read compressed data from a file using a program such as @samp{zcat}
(e.g.@: @code{GET '|zcat mydata.sav.gz'}), and for many other
purposes.

@pspp{} also supports declaring named file handles with the @cmd{FILE
HANDLE} command.  This command associates an identifier of your choice
(the file handle's name) with a file.  Later, the file handle name can
be substituted for the name of the file.  When @pspp{} syntax accesses a
file multiple times, declaring a named file handle simplifies updating
the syntax later to use a different file.  Use of @cmd{FILE HANDLE} is
also required to read data files in binary formats.  @xref{FILE HANDLE},
for more information.

In some circumstances, @pspp{} must distinguish whether a file handle
refers to a system file or a portable file.  When this is necessary to
read a file, e.g.@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
@pspp{} uses the file's contents to decide.  In the context of writing a
file, e.g.@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, @pspp{}
decides based on the file's name: if it ends in @samp{.por} (with any
capitalization), then @pspp{} writes a portable file; otherwise, @pspp{}
writes a system file.

INLINE is reserved as a file handle name.  It refers to the ``data
file'' embedded into the syntax file between @cmd{BEGIN DATA} and
@cmd{END DATA}.  @xref{BEGIN DATA}, for more information.

The file to which a file handle refers may be reassigned on a later
@cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
HANDLE}.  @xref{CLOSE FILE HANDLE}, for
more information.

@node BNF
@section Backus-Naur Form
@cindex BNF
@cindex Backus-Naur Form
@cindex command syntax, description of
@cindex description of command syntax

The syntax of some parts of the @pspp{} language is presented in this
manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
following table describes BNF:

@itemize @bullet
@cindex keywords
@cindex terminals
@item
Words in all-uppercase are @pspp{} keyword tokens.  In BNF, these are
often called @dfn{terminals}.  There are some special terminals, which
are written in lowercase for clarity:

@table @asis
@cindex @code{number}
@item @code{number}
A real number.

@cindex @code{integer}
@item @code{integer}
An integer number.

@cindex @code{string}
@item @code{string}
A string.

@cindex @code{var-name}
@item @code{var-name}
A single variable name.

@cindex operators
@cindex punctuators
@item @code{=}, @code{/}, @code{+}, @code{-}, etc.
Operators and punctuators.

@cindex @code{.}
@item @code{.}
The end of the command.  This is not necessarily an actual dot in the
syntax file: @xref{Commands}, for more details.
@end table

@item
@cindex productions
@cindex nonterminals
Other words in all lowercase refer to BNF definitions, called
@dfn{productions}.  These productions are also known as
@dfn{nonterminals}.  Some nonterminals are very common, so they are
defined here in English for clarity:

@table @code
@cindex @code{var-list}
@item var-list
A list of one or more variable names or the keyword @code{ALL}.

@cindex @code{expression}
@item expression
An expression.  @xref{Expressions}, for details.
@end table

@item
@cindex ``is defined as''
@cindex productions
@samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
the name of the nonterminal being defined.  The right side of @samp{::=}
gives the definition of that nonterminal.  If the right side is empty,
then one possible expansion of that nonterminal is nothing.  A BNF
definition is called a @dfn{production}.

@item
@cindex terminals and nonterminals, differences
So, the key difference between a terminal and a nonterminal is that a
terminal cannot be broken into smaller parts---in fact, every terminal
is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
composed of a (possibly empty) sequence of terminals and nonterminals.
Thus, terminals indicate the deepest level of syntax description.  (In
parsing theory, terminals are the leaves of the parse tree; nonterminals
form the branches.)

@item
@cindex start symbol
@cindex symbol, start
The first nonterminal defined in a set of productions is called the
@dfn{start symbol}.  The start symbol defines the entire syntax for
that command.
@end itemize