'\" t
.\" Title: doclifter
.\" Author: [see the "Author" section]
.\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
.\" Date: 01/22/2023
.\" Manual: Documentation Tools
.\" Source: doclifter
.\" Language: English
.\"
.TH "DOCLIFTER" "1" "01/22/2023" "doclifter" "Documentation Tools"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
doclifter \- translate troff requests into DocBook
.SH "SYNOPSIS"
.HP \w'\fBdoclifter\fR\ 'u
\fBdoclifter\fR [\-o\ \fIoutput\-location\fR] [\-e\ \fIoutput\-encoding\fR] [\-i\ \fIinput\-encodings\fR] [\-h\ \fIhintfile\fR] [\-q] [\-x] [\-v] [\-w] [\-V] [\-D\ \fItoken=type\fR] [\-I\ \fIpath\fR] [\-S\ \fIspoofname\fR] \fIfile\fR...
.SH "DESCRIPTION"
.PP
\fBdoclifter\fR
translates documents written in troff macros to DocBook\&. Structural subsets of the requests in
\fBman\fR(7),
\fBmdoc\fR(7),
\fBms\fR(7),
\fBme\fR(7),
\fBmm\fR(7), and
\fBtroff\fR(1)
are supported\&.
.PP
The translation brings over all the structure of the original document at section, subsection, and paragraph level\&. Command and C function synopses are translated into DocBook markup, not just a verbatim display\&. Tables (TBL markup) are translated into DocBook table markup\&. PIC diagrams are translated into SVG\&. Troff\-level information that might have structural implications is preserved in XML comments\&.
.PP
Where possible, font\-change macros are translated into structural markup\&.
\fBdoclifter\fR
recognizes stereotyped patterns of markup and content (such as the use of italics in a FILES section to mark filenames) and lifts them\&. A means to edit, add, and save semantic hints about highlighting is supported\&.
.PP
Some cliches are recognized and lifted to structural markup even without highlighting\&. Patterns recognized include such things as URLs, email addresses, man page references, and C program listings\&.
.PP
The tag
\fB\&.in\fR
and
\fB\&.ti\fR
requests are passed through with complaints\&. They indicate presentation\-level markup that
\fBdoclifter\fR
cannot translate into structure; the output will require hand\-fixing\&.
.PP
The tag
\fB\&.ta\fR
is passed through with a complaint unless the immediarely following by text lines contains a tab, in which case the following span of lines containing tabs is lifted to a table\&.
.PP
Under some circumstances,
\fBdoclifter\fR
can even lift formatted manual pages and the text output produced by
\fBlynx\fR(1)
from HTML\&. If it finds no macros in the input, but does find a NAME section header, it tries to interpret the plain text as a manual page (skipping boilerplate headers and footers generated by
\fBlynx\fR(1))\&. Translations produced in this way will be prone to miss structural features, but this fallback is good enough for simple man pages\&.
.PP
\fBdoclifter\fR
does not do a perfect job, merely a surprisingly good one\&. Final polish should be applied by a human being capable of recognizing patterns too subtle for a computer\&. But
\fBdoclifter\fR
will almost always produce translations that are good enough to be usable before hand\-hacking\&.
.PP
See the
Troubleshooting
section for discussion of how to solve document conversion problems\&.
.SH "OPTIONS"
.PP
If called without arguments
\fBdoclifter\fR
acts as a filter, translating troff source input on standard input to DocBook markup on standard output\&. If called with arguments, each argument file is translated separately (but hints are retained, see below); the suffix
\&.xml
is given to the translated output\&.
.PP
\-o
.RS 4
Set the output location where files will be saved\&. Defaults to current working directory\&.
.RE
.PP
\-h
.RS 4
Name a file to which information on semantic hints gathered during analysis should be written\&.
.RE
.PP
\-D
.RS 4
The
\fB\-D\fR
allows you to post a hint\&. This may be useful, for example, if
\fBdoclifter\fR
is mis\-parsing a synopsis because it doesn\*(Aqt recognize a token as a command\&. This hint is merged after hints in the input source have been read\&.
.RE
.PP
\-I
.RS 4
The
\fB\-I\fR
option adds its argument to the include path used when docfilter searches for inclusions\&. The include path is initially just the current directory\&.
.RE
.PP
\-S
.RS 4
Set the filename to be used in error and warning messages\&. This is mainly inttended for use by test scripts\&.
.RE
.PP
\-e
.RS 4
The
\fB\-e\fR
allows you to set the output encoding of the XML and the encoding field to be emitted in its header\&. It defaults to UTF\-8\&.
.RE
.PP
\-i
.RS 4
The
\fB\-i\fR
allows you to set a comma\-separated list of encodings to be looked for in the input\&. The default is "ISO\-8859\-1,UTF\-8", which should cover almost all cases\&.
.RE
.PP
\-q
.RS 4
Normally, requests that
\fBdoclifter\fR
could not interpret (usually because they\*(Aqre presentation\-level) are passed through to XML comments in the output\&. The \-q option suppresses this\&. It also suppresses listing of macros\&. Messages about requests that are unrecognized or cannot be translated go to standard error whatever the state of this option\&. This option is intended to reduce clutter when you believe you have a clean lift of a document and want to lose the troff legacy\&.
.RE
.PP
\-x
.RS 4
The \-x option requests that
\fBdoclifter\fR
generate DocBook version 5 compatible xml content, rather than its default DocBook version 4\&.4 output\&. Inclusions and entities may not be handled correctly with this switch enabled\&.
.RE
.PP
\-v
.RS 4
The \-v option makes
\fBdoclifter\fR
noisier about what it\*(Aqs doing\&. This is mainly useful for debugging\&.
.RE
.PP
\-w
.RS 4
Enable strict portability checking\&. Multiple instances of \-w increase the strictness\&. See
the section called \(lqPORTABILITY CHECKING\(rq\&.
.RE
.PP
\-V
.RS 4
With this option, the program emits a version message and exits\&.
.RE
.SH "TRANSLATION RULES"
.PP
Overall, you can expect that font changes will be turned into
Emphasis
macros with a
Remap
attribute taken from the troff font name\&. The basic font names are R, I, B, U, CW, and SM\&.
.PP
Troff and macro\-package special character escapes are mapped into ISO character entities\&.
.PP
When
\fBdoclifter\fR
encounters a
\fB\&.so\fR
directive, it searches for the file\&. If it can get read access to the file, and open it, and the file consists entirely of command lines and comments, then it is included\&. If any of these conditions fails, an entity reference for it is generated\&.
.PP
\fBdoclifter\fR
performs special parsing when it recognizes a display such as is generated by
\fB\&.DS/\&.DE\fR\&. It repeatedly tries to parse first a function synopsis, and then plain text off what remains in the display\&. Thus, most inline C function prototypes will be lifted to structured markup\&.
.PP
Some notes on specific translations:
.SS "Man Translation"
.PP
\fBdoclifter\fR
does a good job on most man pages, It knows about the extended
\fBUR\fR/\fBUE\fR/\fBUN\fR
and
\fBURL\fR
requests supported under Linux\&. If any
\fB\&.UR\fR
request is present, it will translate these but not wrap URLs outide them with
Ulink
tags\&. It also knows about the extended
\fB\&.L\fR
(literal) font markup from Bell Labs Version 8, and its friends\&.
.PP
The
\fB\&.TH\fR
macro is used to generate a
RefMeta
section\&. If present, the date/source/manual arguments (see
\fBman\fR(7)) are wrapped in
RefMiscInfo
tag pairs with those class attributes\&. Note that
\fBdoclifter\fR
does not change the date\&.
.PP
\fBdoclifter\fR
performs special parsing when it recognizes a synopsis section\&. It repeatedly tries to parse first a function synopsis, then a command synopsis, and then plain text off what remains in the section\&.
.PP
The following man macros are translated into emphasis tags with a remap attribute:
\fB\&.B\fR,
\fB\&.I\fR,
\fB\&.L\fR,
\fB\&.BI\fR,
\fB\&.BR\fR,
\fB\&.BL\fR,
\fB\&.IB\fR,
\fB\&.IR\fR,
\fB\&.IL\fR,
\fB\&.RB\fR,
\fB\&.RI\fR,
\fB\&.RL\fR,
\fB\&.LB\fR,
\fB\&.LI\fR,
\fB\&.LR\fR,
\fB\&.SB\fR,
\fB\&.SM\fR\&. Some stereotyped patterns involving these macros are recognized and turned into semantic markup\&.
.PP
The following macros are translated into paragraph breaks:
\fB\&.LP\fR,
\fB\&.PP\fR,
\fB\&.P\fR,
\fB\&.HP\fR, and the single\-argument form of
\fB\&.IP\fR\&.
.PP
The two\-argument form of
\fB\&.IP\fR
is translated either as a
VariableList
(usually) or
ItemizedList
(if the tag is the troff bullet or square character)\&.
.PP
The following macros are translated semantically:
\fB\&.SH\fR,\fB\&.SS\fR,
\fB\&.TP\fR,
\fB\&.UR\fR,
\fB\&.UE\fR,
\fB\&.UN\fR,
\fB\&.IX\fR\&. A
\fB\&.UN\fR
call just before
\fB\&.SH\fR
or
\fB\&.SS\fR
sets the ID for the new section\&.
.PP
The
\fB\e*R\fR,
\fB\e*(Tm\fR,
\fB\e*(lq\fR, and
\fB\e*(rq\fR
symbols are translated\&.
.PP
The following (purely presentation\-level) macros are ignored:
\fB\&.PD\fR,\fB\&.DT\fR\&.
.PP
The
\fB\&.RS\fR/\fB\&.RE\fR
macros are translated differently depending on whether or not they precede list markup\&. When
\fB\&.RS\fR
occurs just before
\fB\&.TP\fR
or
\fB\&.IP\fR
the result is nested lists\&. Otherwise, the
\fB\&.RS\fR/\fB\&.RE\fR
pair is translated into a
Blockquote
tag\-pair\&.
.PP
\fB\&.DS\fR/\fB\&.DE\fR
is not part of the documented man macro set, but is recognized because it shows up with some frequency on legacy man pages from older Unixes\&.
.PP
Certain extension macros originally defined under Ultrix are translated structurally, including those that occasionally show up on the manual pages of Linux and other open\-source Unixes\&.
\fB\&.EX\fR/\fB\&.EE\fR
(and the synonyms
\fB\&.Ex\fR/\fB\&.Ee\fR),
\fB\&.Ds\fR/\fB\&.De\fR,
\fB\&.NT\fR/\fB\&.NE\fR,
\fB\&.PN\fR, and
\fB\&.MS\fR
are translated structurally\&.
.PP
The following extension macros used by the X distribution are also recognized and translated structurally:
\fB\&.FD\fR,
\fB\&.FN\fR,
\fB\&.IN\fR,
\fB\&.ZN\fR,
\fB\&.hN\fR, and
\fB\&.C{\fR/\fB\&.C}\fR
The
\fB\&.TA\fR
and
\fB\&.IN\fR
requests are ignored\&.
.PP
When the man macros are active, any
\fB\&.Pp\fR
macro definition containing the request
\fB\&.PP\fR
will be ignored\&. and all instances of
\fB\&.Pp\fR
replaced with
\fB\&.PP\fR\&. Similarly,
\fB\&.Tp\fR
will be replaced with
\fB\&.TP\fR\&. This is the least painful way to deal with some frequently\-encountered stereotyped wrapper definitions that would otherwise cause serious interpretation problems
.PP
Known problem areas with man translation:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Weird uses of
\fB\&.TP\fR\&. These will sometime generate invalid XML and sometimes result in a FIXME comment in the generated XML (a warning message will also go to standard error)\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
It is debatable how the man macros
\fB\&.HP\fR
and
\fB\&.IP\fR
without tag should be translated\&. We treat them as an ordinary paragraph break\&. We could visually simulate a hanging paragraph with list markup, but this would not be a structural translation\&.
.RE
.SS "Pod2man Translation"
.PP
\fBdoclifter\fR
recognizes the extension macros produced by
\fBpod2man\fR
(\fB\&.Sh\fR,
\fB\&.Sp\fR,
\fB\&.Ip\fR,
\fB\&.Vb\fR,
\fB\&.Ve\fR) and translates them structurally\&.
.PP
The results of lifting pages produced by
\fBpod2man\fR
should be checked carefully by eyeball, especially the rendering of command and function synopses\&.
\fBPod2man\fR
generates rather perverse markup;
\fBdoclifter\fR\*(Aqs struggle to untangle it is sometimes in vain\&.
.PP
If possible, generate your DocBook from the POD sources\&. There is a
pod2docbook
module on CPAN that does this\&.
.SS "Tkman Translation"
.PP
\fBdoclifter\fR
recognizes the extension macros used by the Tcl/Tk documentation system:
\fB\&.AP\fR,
\fB\&.AS\fR,
\fB\&.BS\fR,
\fB\&.BE\fR,
\fB\&.CS\fR,
\fB\&.CE\fR,
\fB\&.DS\fR,
\fB\&.DE\fR,
\fB\&.SO\fR,
\fB\&.SE\fR,
\fB\&.UL\fR,
\fB\&.VS\fR,
\fB\&.VE\fR\&. The
\fB\&.AP\fR,
\fB\&.CS\fR,
\fB\&.CE\fR,
\fB\&.SO\fR,
\fB\&.SE\fR,
\fB\&.UL\fR,
\fB\&.QW\fR
and
\fB\&.PQ\fR
macros are translated structurally\&.
.SS "Mandoc Translation"
.PP
\fBdoclifter\fR
should be able to do an excellent job on most
\fBmdoc\fR(7)
pages, because this macro package expresses a lot of semantic structure\&.
.PP
Known problems with mandoc translation: All
\fB\&.Bd\fR/\fB\&.Ed\fR
display blocks are translated as
LiteralLayout
tag pairs
\&.
.SS "Ms Translation"
.PP
\fBdoclifter\fR
does a good job on most ms pages\&. One weak spot to watch out for is the generation of Author and Affiliation tags\&. The heuristics used to mine this information out of the
\fB\&.AU\fR
section work for authors who format their names in the way usual for English (e\&.g\&. "M\&. E\&. Lesk", "Eric S\&. Raymond") but are quite brittle\&.
.PP
For a document to be recognized as containing ms markup, it must have the extension
\&.ms\&. This avoids problems with false positives\&.
.PP
The
\fB\&.TL\fR,
\fB\&.AU\fR,
\fB\&.AI\fR, and
\fB\&.AE\fR
macros turn into article metainformation in the expected way\&. The
\fB\&.PP\fR,
\fB\&.LP\fR,
\fB\&.SH\fR, and
\fB\&.NH\fR
macros turn into paragraph and section structure\&. The tagged form of
\fB\&.IP\fR
is translated either as a
VariableList
(usually) or
ItemizedList
(if the tag is the troff bullet or square character); the untagged version is treated as an ordinary paragraph break\&.
.PP
The
\fB\&.DS\fR/\fB\&.DE\fR
pair is translated to a
LiteralLayout
tag pair
\&. The
\fB\&.FS\fR/\fB\&.FE\fR
pair is translated to a
Footnote
tag pair\&. The
\fB\&.QP\fR/\fB\&.QS\fR/\fB\&.QE\fR
requests define
BlockQuotes\&.
.PP
The
\fB\&.UL\fR
font change is mapped to U\&.
\fB\&.SM\fR
and
\fB\&.LG\fR
become numeric plus or minus size steps suffixed to the
Remap
attribute\&.
.PP
The
\fB\&.B1\fR
and
\fB\&.B2\fR
box macros are translated to a
Sidebar
tag pair\&.
.PP
All macros relating to page footers, multicolumn mode, and keeps are ignored (\fB\&.ND\fR,
\fB\&.DA\fR,
\fB\&.1C\fR,
\fB\&.2C\fR,
\fB\&.MC\fR,
\fB\&.BX\fR,
\fB\&.KS\fR,
\fB\&.KE\fR,
\fB\&.KF\fR)\&. The
\fB\&.R\fR,
\fB\&.RS\fR, and
\fB\&.RE\fR
macros are ignored as well\&.
.SS "Me Translation"
.PP
Translation of me documents tends to produce crude results that need a lot of hand\-hacking\&. The format has little usable structure, and documents written in it tend to use a lot of low\-level troff macros; both these properties tend to confuse
\fBdoclifter\fR\&.
.PP
For a document to be recognized as containing me markup, it must have the extension
\&.me\&. This avoids problems with false positives\&.
.PP
The following macros are translated into paragraph breaks:
\fB\&.lp\fR,
\fB\&.pp\fR\&. The
\fB\&.ip\fR
macro is translated into a
VariableList\&. The
\fB\&.bp\fR
macro is translated into an
ItemizedList\&. The
\fB\&.np\fR
macro is translated into an
OrderedList\&.
.PP
The b, i, and r fonts are mapped to emphasis tags with B, I, and R
Remap
attributes\&. The
\fB\&.rb\fR
("real bold") font is treated the same as
\fB\&.b\fR\&.
.PP
\fB\&.q(\fR/\fB\&.q)\fR
is translated structurally
\&.
.PP
Most other requests are ignored\&.
.SS "Mm Translation"
.PP
Memorandum Macros documents translate well, as these macros carry a lot of structural information\&. The translation rules are tuned for Memorandum or Released Paper styles; information associated with external\-letter style will be preserved in comments\&.
.PP
For a document to be recognized as containing mm markup, it must have the extension
\&.mm\&. This avoids problems with false positives\&.
.PP
The following highlight macros are translated int Emphasis tags:
\fB\&.B\fR,
\fB\&.I\fR,
\fB\&.R\fR,
\fB\&.BI\fR,
\fB\&.BR\fR,
\fB\&.IB\fR,
\fB\&.IR\fR,
\fB\&.RB\fR,
\fB\&.RI\fR\&.
.PP
The following macros are structurally translated:
\fB\&.AE\fR,
\fB\&.AF\fR,
\fB\&.AL\fR,
\fB\&.RL\fR,
\fB\&.APP\fR,
\fB\&.APPSK\fR,
\fB\&.AS\fR,
\fB\&.AT\fR,
\fB\&.AU\fR,
\fB\&.B1\fR,
\fB\&.B2\fR,
\fB\&.BE\fR,
\fB\&.BL\fR,
\fB\&.ML\fR,
\fB\&.BS\fR,
\fB\&.BVL\fR,
\fB\&.VL\fR,
\fB\&.DE\fR,
\fB\&.DL\fR
\fB\&.DS\fR,
\fB\&.FE\fR,
\fB\&.FS\fR,
\fB\&.H\fR,
\fB\&.HU\fR,
\fB\&.IA\fR,
\fB\&.IE\fR,
\fB\&.IND\fR,
\fB\&.LB\fR,
\fB\&.LC\fR,
\fB\&.LE\fR,
\fB\&.LI\fR,
\fB\&.P\fR,
\fB\&.RF\fR,
\fB\&.SM\fR,
\fB\&.TL\fR,
\fB\&.VERBOFF\fR,
\fB\&.VERBON\fR,
\fB\&.WA\fR,
\fB\&.WE\fR\&.
.PP
The following macros are ignored:
.PP
\ \&\fB\&.)E\fR,
\fB\&.1C\fR,
\fB\&.2C\fR,
\fB\&.AST\fR,
\fB\&.AV\fR,
\fB\&.AVL\fR,
\fB\&.COVER\fR,
\fB\&.COVEND\fR,
\fB\&.EF\fR,
\fB\&.EH\fR,
\fB\&.EDP\fR,
\fB\&.EPIC\fR,
\fB\&.FC\fR,
\fB\&.FD\fR,
\fB\&.HC\fR,
\fB\&.HM\fR,
\fB\&.GETR\fR,
\fB\&.GETST\fR,
\fB\&.HM\fR,
\fB\&.INITI\fR,
\fB\&.INITR\fR,
\fB\&.INDP\fR,
\fB\&.ISODATE\fR,
\fB\&.MT\fR,
\fB\&.NS\fR,
\fB\&.ND\fR,
\fB\&.OF\fR,
\fB\&.OH\fR,
\fB\&.OP\fR,
\fB\&.PGFORM\fR,
\fB\&.PGNH\fR,
\fB\&.PE\fR,
\fB\&.PF\fR,
\fB\&.PH\fR,
\fB\&.RP\fR,
\fB\&.S\fR,
\fB\&.SA\fR,
\fB\&.SP\fR,
\fB\&.SG\fR,
\fB\&.SK\fR,
\fB\&.TAB\fR,
\fB\&.TB\fR,
\fB\&.TC\fR,
\fB\&.VM\fR,
\fB\&.WC\fR\&.
.PP
The following macros generate warnings:
\fB\&.EC\fR,
\fB\&.EX\fR,
\fB\&.GETHN\fR,
\fB\&.GETPN\fR,
\fB\&.GETR\fR,
\fB\&.GETST\fR,
\fB\&.LT\fR,
\fB\&.LD\fR,
\fB\&.LO\fR,
\fB\&.MOVE\fR,
\fB\&.MULB\fR,
\fB\&.MULN\fR,
\fB\&.MULE\fR,
\fB\&.NCOL\fR,
\fB\&.nP\fR,
\fB\&.PIC\fR,
\fB\&.RD\fR,
\fB\&.RS\fR,
\fB\&.RE\fR,
\fB\&.SETR\fR
.PP
Pairs of
\fB\&.DS\fR/\fB\&.DE\fR
are interpreted as informal figures\&. If an
\fB\&.FG\fR
is present it becomes a caption element\&.
.PP
\ \&\fB\&.BS\fR/\fB\&.BE\fR
and
\fB\&.IA\fR/\fB\&.IE\fR
pairs are passed through\&. The text inside them may need to be deleted or moved\&.
.PP
The mark argument of
\fB\&.ML\fR
is ignored; the following list id formatted as a normal
ItemizedList\&.
.PP
The contents of
\fB\&.DS\fR/\fB\&.DE\fR
or
\fB\&.DF\fR/\fB\&.DE\fR
gets turned into a
Screen
display\&. Arguments controlling presentation\-level formatting are ignored\&.
.SS "Mwww Translation"
.PP
The mwww macros are an extension to the man macros supported by
\fBgroff\fR(1)
for producing web pages\&.
.PP
The
\fBURL\fR,
\fBFTP\fR,
\fBMAILTO\fR,
\fBFTP\fR,
\fBIMAGE\fR,
\fBTAG\fR
tags are translated structurally\&. The
\fBHTMLINDEX\fR,
\fBBODYCOLOR\fR,
\fBBACKGROUND\fR,
\fBHTML\fR, and
\fBLINE\fR
tags are ignored\&.
.SS "TBL Translation"
.PP
All structural features of TBL tables are translated, including both horizontal and vertical spanning with \(oqs\(cq and \(oq^\(cq\&. The \(oql\(cq, \(oqr\(cq, and \(oqc\(cq formats are supported; the \(oqn\(cq column format is rendered as \(oqr\(cq\&. Line continuations with
T{
and
T}
are handled correctly\&. So is
\fB\&.TH\fR\&.
.PP
The
\fBexpand\fR,
\fBbox\fR,
\fBdoublebox\fR,
\fBallbox\fR,
\fBcenter\fR,
\fBleft\fR, and
\fBright\fR
options are supported\&. The GNU synonyms
\fBframe\fR
and
\fBdoubleframe\fR
are also recognized\&. But the distinction between single and double rules and boxes is lost\&.
.PP
Table continuations (\&.T&) are not supported\&.
.PP
If the first nonempty line of text immediately before a table is boldfaced, it is interpreted as a title for the table and the table is generated using a
table
and
title\&. Otherwise the table is translated with
informaltable\&.
.PP
Most other presentation\-level TBL commands are ignored\&. The \(oqb\(cq format qualifier is processed, but point size and width qualifiers are not\&.
.SS "Pic Translation"
.PP
PIC sections are translated to SVG\&.
doclifter
calls out to
\fBpic2plot\fR(1)
to accomplish this; you must have that utility installed for PIC translation to work\&.
.SS "Eqn Translation"
.PP
EQN sections are filtered into embedded MathML with
\fBeqn \-TMathML\fR
if possible, otherwise passed through enclosed in
LiteralLayout
tags\&. After a delim statement has been seen, inline eqn delimiters are translated into an XML processing instruction\&. Exception: inline eqn equations consisting of a single character are translated to an
Emphasis
with a Role attribute of eqn\&.
.SS "Troff Translation"
.PP
The troff translation is meant only to support interpretation of the macro sets\&. It is not useful standalone\&.
.PP
The
\fB\&.nf\fR
and
\fB\&.fi\fR
macros are interpreted as literal\-layout boundaries\&. Calls to the
\fB\&.so\fR
macro either cause inclusion or are translated into XML entity inclusions (see above)\&. Calls to the
\fB\&.ul\fR
and
\fB\&.cu\fR
macros cause following lines to be wrapped in an
Emphasis
tag with a
Remap
attribute of "U"\&. Calls to
\fB\&.ft\fR
generate corresponding start or end emphasis tags\&. Calls to
\fB\&.tr\fR
cause character translation on output\&. Calls to
\fB\&.bp\fR
generate a
BeginPage
tag (in paragraphed text only)\&. Calls to
\fB\&.sp\fR
generate a paragraph break (in paragraphed text only)\&. Calls to
\fB\&.ti\fR
wrap the following line in a
BlockQuote
These are the only troff requests we translate to DocBook\&. The rest of the troff emulation exists because macro packages use it internally to expand macros into elements that might be structural\&.
.PP
Requests relating to macro definitions and strings (\fB\&.ds\fR,
\fB\&.as\fR,
\fB\&.de\fR,
\fB\&.am\fR,
\fB\&.rm\fR,
\fB\&.rn\fR,
\fB\&.em\fR) are processed and expanded\&. The
\fB\&.ig\fR
macro is also processed\&.
.PP
Conditional macros (\fB\&.if\fR,
\fB\&.ie\fR,
\fB\&.el\fR) are handled\&. The built\-in conditions o, n, t, e, and c are evaluated as if for
nroff
on page one of a document\&. The m, d, and r troff conditionals are also interpreted\&. String comparisons are evaluated by straight textual comparison\&. All numeric expressions evaluate to true\&.
.PP
The extended
groff
requests
\fBcc\fR,
\fBc2\fR,
\fBab\fR,
\fBals\fR,
\fBdo\fR,
\fBnop\fR, and
\fBreturn\fR
and
\fBshift\fR
are interpreted\&. Its
\fB\&.PSPIC\fR
extension is translated into a
MediaObject\&.
.PP
The
\fB\&.tm\fR
macro writes its arguments to standard error (with
\fB\-t\fR)\&. The
\fB\&.pm\fR
macro reports on defined macros and strings\&. These facilities may aid in debugging your translation\&.
.PP
Some troff escape sequences are lifted:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
The \ee and \e\e escapes become a bare backslash, \e\&. a period, and \e\- a bare dash\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
The troff escapes \e^, \e`, \e\*(Aq \e&, \e0, and \e| are lifted to equivalent ISO special spacing characters\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
A \e followed by space is translated to an ISO non\-breaking space entity\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
A \e~ is also translated to an ISO non\-breaking space entity; properly this should be a space that can\*(Aqt be used for a linebreak but stretches like ordinary whitepace during line adjustment, but there is no ISO or Unicode entity for that\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
The \eu and \ed half\-line motion vertical motion escapes, when paired, become
\fBSuperscript\fR
or
\fBSubscript\fR
tags\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 6." 4.2
.\}
The \ec escape is handled as a line continuation\&. in circumstances where that matters (e\&.g\&. for token\-pasting)\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 7.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 7." 4.2
.\}
The \ef escape for font changes is translated in various context\-dependent ways\&. First,
\fBdoclifter\fR
looks for cliches involving font changes that have semantic meaning, and lifts to a structural tag\&. If it can\*(Aqt do that, it generates an
Emphasis
tag\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 8.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 8." 4.2
.\}
The \em[] extension is translated into a
phrase
span with a remap attribute carrying the color\&. Note: Stylesheets typically won\*(Aqt render this!
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 9.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 9." 4.2
.\}
Some uses of the \eo request are translated: pairs with a letter followed by one of the characters ` \*(Aq : ^ o ~ are translated to combining forms with diacriticals acute, grave, umlaut, circumflex, ring, and tilde respectively if the corresponding Latin\-1 or Latin\-2 character exists as an ISO literal\&.
.RE
.PP
Other escapes than these will yield warnings or errors\&.
.PP
All other troff requests are ignored but passed through into XML comments\&. A few (such as
\fB\&.ce\fR) also trigger a warning message\&.
.SH "PORTABILITY CHECKING"
.PP
When portability checking is enabled,
\fBdoclifter\fR
emits portability warnings about markup which it can handle but which will break various other viewers and interpreters\&.
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
At level 1, it will warn about constructions that would break
\fBman2html\fR(1), (the C program distributed with Linux
\fBman\fR(1), not the older and much less capable Perl script)\&. A close derivative of this code is used in GNOME
yelp\&. This should be the minimum level of portability you aim for, and corresponds to what is recommended on the
\fBgroff_man\fR(7)
manual page\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
At level 2, it will warn about constructions that will break portability back to the Unix classic tools (including long macro names and glyph references with \e[])\&.
.RE
.SH "SEMANTIC ANALYSIS"
.PP
\fBdoclifter\fR
keeps two lists of semantic hints that it picks up from analyzing source documents (especially from parsing command and function synopses)\&. The local list includes:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of function formal arguments
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of command options
.RE
.PP
Local hints are used to mark up the individual page from which they are gathered\&. The global list includes:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of functions
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of commands
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Names of function return types
.RE
.PP
If
\fBdoclifter\fR
is applied to multiple files, the global list is retained in memory\&. You can dump a report of global hints at the end of the run with the
\fB\-h\fR
option\&. The format of the hints is as follows:
.sp
.if n \{\
.RS 4
.\}
.nf
\ \&\&.\e" | mark <phrase> as <markup>
.fi
.if n \{\
.RE
.\}
.PP
where
\fB<phrase>\fR
is an item of text and
\fB<markup>\fR
is the DocBook markup text it should be wrapped with whenever it appeared either highlighted or as a word surrounded by whitespace in the source text\&.
.PP
Hints derived from earlier files are also applied to later ones\&. This behavior may be useful when lifting collections of documents that apply to a function or command library\&. What should be more useful is the fact that a hints file dumped with
\fB\-h\fR
can be one of the file arguments to
\fBdoclifter\fR; the code detects this special case and does not write XML output for such a file\&. Thus, a good procedure for lifting a large library is to generate a hints file with a first run, inspect it to delete false positives, and use it as the first input to a second run\&.
.PP
It is also possible to include a hints file directly in a troff sourcefile\&. This may be useful if you want to enrich the file by stages before converting to XML\&.
.SH "TROUBLESHOOTING"
.PP
\fBdoclifter\fR
tries to warn about problems that it can can diagnose but not fix by itself\&. When it says
"look for FIXME", do that in the generated XML; the markup around that token may be wrong\&.
.PP
Occasionally (less than 2% of the time)
\fBdoclifter\fR
will produce invalid DocBook markup even from correct troff markup\&. Usually this results from strange constructions in the source page, or macro calls that are beyond the ability of
\fBdoclifter\fR\*(Aqs macro processor to get right\&. Here are some things to watch for, and how to fix them:
.SS "Malformed command synopses\&."
.PP
If you get a message that says
"command synopsis parse failed", try rewriting the synopsis in your manual page source\&. The most common cause of failure is unbalanced [] groupings, a bug that can be very difficult to notice by eyeball\&. To assist with this, the error message includes a token number in parentheses indicating on which token the parse failed\&.
.PP
For more information, use the \-v option\&. This will trigger a dump telling you what the command synopsis looked like after preprocessing, and indicate on which token the parse failed (both with a token number and a caret sign inserted in the dump of the synopsis tokens)\&. Try rewriting the synopsis in your manual page source\&. The most common cause of failure is unbalanced [] groupings, a bug that can be very difficult to notice by eyeball\&. To assist with this, the error token dump tries to insert \(oq$\(cq at the point of the last nesting\-depth increase, but the code that does this is failure\-prone\&.
.SS "Confusing macro calls\&."
.PP
Some manual page authors replace standard requests (like
\fB\&.PP\fR,
\fB\&.SH\fR
and
\fB\&.TP\fR) with versions that do different things in
\fBnroff\fR
and
\fBtroff\fR
environments\&. While
\fBdoclifter\fR
tries to cope and usually does a good job, the quirks of [nt]roff are legion and confusing macro calls sometimes lead to bad XML being generated\&. A common symptom of such problems is unclosed
Emphasis
tags\&.
.SS "Malformed list syntax\&."
.PP
The manual\-page parser can be confused by
\fB\&.TP\fR
constructs that have header tags but no following body\&. If the XML produced doesn\*(Aqt validate, and the problem seems to be a misplaced
listitem
tag, try using the verbose (\-v) option\&. This will enable line\-numbered warnings that may help you zero in on the problem\&.
.SS "Section nesting problems with SS\&."
.PP
The message
"possible section nesting error"
means that the program has seen two adjacent subsection headers\&. In man pages, subsections don\*(Aqt have a depth argument, so
\fBdoclifter\fR
cannot be certain how subsections should be nested\&. Any subsection heading between the indicated line and the beginning of the next top\-level section might be wrong and require correcting by hand\&.
.SS "Bad output with no doclifter error message"
.PP
If you\*(Aqre translating a page that uses user\-defined macros, and doclifter fails to complain about it but you get bad output, the first thing to do is simplify or eliminate the user\-defined macros\&. Replace them with stock requests where possible\&.
.SH "IMPROVING TRANSLATION QUALITY"
.PP
There are a few constructions that are a good idea to check by hand after lifting a page\&.
.PP
Look near the
BlockQuote
tags\&. The troff temporary indent request (\fB\&.ti\fR) is translated into a
BlockQuote
wrapper around the following line\&. Sometimes
LiteralLayout
or
ProgramListing
would be a better translation, but
\fBdoclifter\fR
has no way to know this\&.
.PP
It is not possible to unambiguously detect candidates for wrapping in a DocBook
option
tag in running text\&. If you care, you\*(Aqll have to check for these and fix them by hand\&.
.SH "BUGS AND LIMITATIONS"
.PP
About 3% of man pages will either make this program throw error status 1 or generate invalid XML\&. In almost all such cases the misbehavior is triggered by markup bugs in the source that are too severe to be coped with\&.
.PP
Equation number arguments of EQN calls are ignored\&.
.PP
Semicolon used as a TBL field separator will lead to garbled tables\&. The easiest way to fix this is by patching the source\&.
.PP
The function\-synopsis parser is crude (it\*(Aqs not a compiler) and prone to errors\&. Function\-synopsis markup should be checked carefully by a human\&.
.PP
If a man page has both paragraphed text in a Synopsis section and also a body section before the Synopis section, bad things will happen\&.
.PP
Running text (e\&.g\&., explanatory notes) at the end of a Synopsis section cannot reliably be distinguished from synopsis\-syntax markup\&. (This problem is AI\-complete\&.)
.PP
Some firewalls put in to cope with common malformations in troff code mean that the tail end of a span between two
\fB\ef{B,I,U,(CW}\fR
or
\fB\&.ft\fR
highlight changes may not be completely covered by corresponding
Emphasis
macros if (for example) the span crosses a boundary between filled and unfilled (\fB\&.nf\fR/\fB\&.fi\fR) text\&.
.PP
The treatment of conditionals relies on the assumption that conditional macros never generate structural or font\-highlight markup that differs between the if and else branches\&. This appears to be true of all the standard macro packages, but if you roll any of your own macros you\*(Aqre on your own\&.
.PP
Macro definitions in a manual page NAME section are not interpreted\&.
.PP
Uses of \ec for line continuation sometimes are not translated, leaving the \ec in the output XML\&. The program will print a warning when this occurs\&.
.PP
It is not possible to unambiguously detect candidates for wrapping in a DocBook
option
tag in running text\&. If you care, you\*(Aqll have to check for these and fix them by hand\&.
.PP
The line numbers in
\fBdoclifter\fR
error messages are unreliable in the presence of
\fB\&.EQ/\&.EN\fR,
\fB\&.PS/\&.PE\fR, and quantum fluctuations\&.
.SH "OLD MACRO SETS"
.PP
There is a conflict between Berkeley ms\*(Aqs documented
\fB\&.P1\fR
print\-header\-on\-page request and an undocumented Bell Labs use for displayed program and equation listings\&. The
\fBms\fR
translator uses the Bell Labs interpretation when
\fB\&.P2\fR
is present in the document, and otherwise ignores the request\&.
.SH "RETURN VALUES"
.PP
On successful completion, the program returns status 0\&. It returns 1 if some file or standard input could not be translated\&. It returns 2 if one of the input sources was a
\fB\&.so\fR
inclusion\&. It returns 3 if there is an error in reading or writing files\&. It returns 4 to indicate an internal error\&. It returns 5 when aborted by a keyboard interrupt\&.
.PP
Note that a zero return does not guarantee that the output is valid DocBook\&. It will almost always (as in, more than 98% of cases) be syntactically valid XML, but in some rare cases fixups by hand may be necessary to meet the semantics of the DocBook DTD\&. Validation problems are most likely to occur with complicated list markup\&.
.SH "REQUIREMENTS"
.PP
The
\fBpic2plot\fR(1)
utility must be installed in order to translate PIC diagrams to SVG\&.
.SH "SEE ALSO"
.PP
\fBman\fR(7),
\fBmdoc\fR(7),
\fBms\fR(7),
\fBme\fR(7),
\fBmm\fR(7),
\fBmwww\fR(7),
\fBtroff\fR(1)\&.
.SH "AUTHOR"
.PP
Eric S\&. Raymond
<esr@thyrsus\&.com>
.PP
There is a project web page at
\m[blue]\fBhttp://www\&.catb\&.org/~esr/doclifter/\fR\m[]\&.