Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Beautiful Documents with Groff (Part I) (stephenramsay.net)
149 points by MrVandemar on May 17, 2023 | hide | past | favorite | 36 comments


> Alas, the site that inspired me doesn’t seem to exist anymore, but basically, it was a grad student who was procrastinating as I do and had decided to write their thesis in groff.

I believe the site referenced was a blog post I wrote and submitted here in July. Unfortunately the post's popularity used up the free trial it was hosted on. The post can now be viewed here: https://yockyrr.gitlab.io/jstutter/ with original comments https://news.ycombinator.com/item?id=32202209


Omg that is like a thesis on thesis production, instead. Wildly impressive, and great job with the actual screenshots for comparing output. Thanks.


Author, here. I was just thinking yesterday that it's time to write part II of that piece. But briefly, for the HN crowd: I ended up switching my toolchain to a markdown-based system relying on a combination of pandoc, Makefiles, and various little command-line utilities. Again, this is mainly for large, book-length projects where compilation speed becomes an issue (though I have been using the same system for written lectures all semester).

Pandoc, of course, gives me HTML, EPUB, docx, LaTeX, and plain text. But what I really wanted (for the speed boost) is groff -mom, so I wrote a custom pandoc Writer to make the translation.

I haven't released it, though I'd like to eventually. Pandoc recently changed the way it does Readers and Writers, and I'm not sure I completely understand the new way (right now, it sort of does an "include" on the old-style code).

Delighted to see that the post that inspired this journey is back online. Here's the link again, in case it gets too buried:

https://yockyrr.gitlab.io/jstutter/


Oh, I'm disappointed now, I wanted to learn more about groff and mom!

I've been through many typesetting systems lately, namely latex, sile, pollen, patoline, scribble, typst, even affinity publisher. I ended up back to latex with tectonic as engine/distribution.

My problem with the rest is that they don't output pdf. And my problem with typst is that it is still pretty immature (doesn't support foonotes, for example).


The roff series of tools reaching back to runof(f) predate TeX by a decade. Obviously troff was more contemporary to TeX but the foundations were laid in '64.

The model of .dd style directives, macros, and it's textual assembly model are good. They work. They suit the edit compile preview print model. Fitting tbl and eqn and pic in was not untenably hard. They're a product of brilliant minds in a different sense to Knuth's.

They are just not infinitely designed around the harmonic mean, golden proportions, aesthetically pleasing models of arbitrary ink on paper. You wouldn't reach to groff or ditroff for calligraphy like by hand with a brush, where TeX might ask "can we encompass hand brush movement" It's made for a world immediately present in monotype machines, and phototypesetters with fonts on film.

TeX aspires to more.

I admit TeX confused me. It may explain why I failed at launch on writing a thesis 3 decades ago.


Indeed, not only is (g)roff a descendant of runof(f), but so is IBM's SCRIPT/VS, which, in turn, is the origin of SGML and hence HTML ;)

Moreover, the .dd style directives would be later generalized into ... Generalized Markup Language (GML) and groff's original author James Clark would later develop SP/OpenSP and become co-author of ISO 8897 (SGML).


When I wrote my MA thesis I opted for markdown that compiles to a Adobe Indesign .idml and did all the styling there.

Everything including figures and footnotes worked fine with only two or three minor manual corrections needed after updating the material.

Nowadays I might go for LaTeX, but I must say I liked the "write and forget"-nature of markdown and imagine LaTeX would distract me much more


I guess a good compromise might be to use Markdown or Asciidoc and then combine it with LaTex layout using Pandoc.

https://comparch-resources.ece.gatech.edu/resources/markdown...


Check out MyST or Quarto for new efforts in this direction!


> In order to assess whether the thing I’m writing is good or not, it’s important for me to see what it would look like as a real book (or letter, or article, or whatever).

This resonates. There is a part of the brain that engages better with content which is properly formatted, even if the content itself is awful. And by "formatted," I mean either typography and spacing, or even the media. A bad audiobook read by a great narrator sounds way better than a great book read by an incredible TTS (like elevenlab's). But any content read by that same TTS somehow is more real and professional than the same content in a word processor, at least to me.


Only time I tried Groff was last year, because I liked the idea and the speed.

But couldn’t finish a single document, because by default accents in French or Chinese characters cause documents compilation issues.

Tried using mom, without much luck.

Wasn’t aware a secondary tool was needed to make it work, as Unicode is not supported by default.

On top of poor documentation, I gave up.


You might be interested in heirloom troff. Supports UTF-8 so French might be fine, but I'm not sure about Chinese characters.

https://n-t-roff.github.io/heirloom/doctools.html

The page below might be of some use - seems like some work needed to get groff (GNU troff) to use Chinese characters.

https://kleshwong.com/blog/post/composing-pdf-on-linux-happi...


tried to build it, but it has some unorthodox ideas of filesystem structure; this gist I found gives a quick summary how to install it all in a single folder:

https://gist.github.com/baruchel/0653cc7c041d58d21b202c8e94e...


Another good example of groff is that invoice:

https://eseth.com/2020/cli-invoices-groff.html

FWW, I started a new engine in Go quite similar to groff a few years ago. Extremely fast rendering and quite high quality!


One of my all-time favorite books which of course I’ve owned multiple times but never read in its entirety: UNIX Text Processing, which covers the roff family in detail, plus eqn, awk, etc.

https://www.oreilly.com/openbook/utp/


I very much endorse that book. I only have two small shelves of technical books, but that one makes the cut to be on it.


This looks wonderful -- thank you for posting it!


Possibly of interest...

https://github.com/larrykollar/Unix-Text-Processing

A re-compilation of the book by various contributors coordinated by Larry Kollar.

The whole thing compiles in a few seconds on an old Thinkpad (I had to fix a font location warning).


I've started using plain vanilla groff (eg., no mom or other macro packages) for formatting text files. Just run: groff -Tascii example.groff

I've found it be perfect for my purposes. LaTeX felt like too much overhead for generating a few nice-looking README files and blog posts. I also have a soft spot for older UNIX / GNU tools.

The best sources of documentation (which did take some time to track down) are:

- gtroff reference: https://www.gnu.org/software/groff/manual/groff.html#gtroff-...

- man 7 groff: https://man7.org/linux/man-pages/man7/groff.7.html

I keep a small, annotated .groff file in a gist. It serves as a reference for the formatting functions I use most frequently: https://gist.github.com/benjamindblock/0926f7346b79b93e739ab...


Thanks for sharing your groff file. I'm on a Debian derivative and running the below command did produce a pdf but the line "This paragraph should be right-justified and indented by th" was chopped off on the right side:

    groff -Tpdf example.groff > example.pdf
Also, the left/right margins are partically non-existent (top/bottom margins look fine).

Did I miss any command-line option?


First point: the groff file in my gist is designed for plain-text files, so not all the functionality will exactly translate to PDF rendering.

Second point: to resolve the issue you're running into, remove these lines from the groff file to allow the default margins for PDF rendering to occur.

    \# Set the line lenth to \nl
    .ll \nl
There can occasionally be some strange behavior with the macros when different output devices are used (some options are ignored, some may have unexpected consequences). I took a look at the PDF output using my file directly, and it looks like the em sizing for gropdf is different from the grotty implementation, causing some overflow. (Maybe the font needs to be set explicitly before setting the line-length? Not sure).

Using a different unit of measurement (like inches or points) may be a better option for PDF files: https://www.gnu.org/software/groff/manual/html_node/Measurem...

Bonus note: you can confirm the final output device that will be used by a groff command by adding the -V option. Example:

    groff -Tpdf -V examaple.groff
Outputs:

    troff -Tpdf example.groff | gropdf
Using -Tascii instead will produce:

    troff -Tascii example.groff | grotty
Ref on output devices: https://www.gnu.org/software/groff/manual/html_node/Output-D...

* Edit: See 2b3a51's response for a better explanation.


Try

    groff -ms -Tpdf example.groff > example.pdf
The difference is the use of the ms macro package. That package generates page breaks and makes assumptions about the page size. -Tpdf will typeset your text using proportional fonts, justification and so on.

The previous poster's command generated nicely justified text in the terminal (or redirected to a file).


Ah, this explains it. I had a feeling I was missing something in my less informative response.


Last time I did some longer groff documents (ages ago), I was quite disappointed by the paragraph layout, closer to Word quality than TeX.

I know that heirloom troff[0] boasts some improvements here (by using the TeX algorithm), but did groff get better at it, too?

[0]: http://n-t-roff.github.io/heirloom/doctools.html


> Actually, unicode support is one of groff’s most mysterious blind spots.

Interesting. What didn't work about -Tutf8 for you? I assumed that I was simply doing something wrong, and left it as a problem to solve later given that I can view the original DocBook XML in UTF-8 without problem. This tells me that there's more to it than I thought.


My introduction to roff and co. as a typesetting system was the 1997 edition of Dale Dougherty and Arnold Robbins's Sed & Awk. Many of the book's examples are geared toward automatically inserting/correcting roff macros. I can't recall a specific example from the book, but it would be something like "insert the roff 'section' macro on matches of `/^CHAPTER/`."

Ultimately, it's probably not practical, but I liked the idea of using UNIX utilities to get around worrying about markup altogether during the writing process. The book isn't really about roff, so I only got a sideways view of this kind of workflow. Was this "a thing" in the 90s?


I want to love groff. All the most beautifully typeset documents I have produced I have produced using groff.

However, it is just so hard to remember how it all works. It is an amazing struggle everytime.


A document describing how to use Groff and Mom together by Peter Schaffter[0] can be found here[1].

[0] https://schaffter.ca/pdf/curriculum-vitae.pdf

[1] https://www.gnu.org/software/groff/groff-and-mom.pdf


This is a fun read. But I don't find LaTeX to be slow, personally. I'm working on a math text. 449 pages with hundreds of equations, hundreds of graphics, you get the idea, and on reading this article I timed a compile. On my 2016 laptop it took 20 real seconds.

Anyway, gripe aside, interesting read.


I confess I'm intimidated by folks that have slow TeX documents. Especially on modern machines, it is hella fast for all documents I have seen. I assume full book rendering is the slow part?


I believe certain macro packages can be slow, like TikZ (which is otherwise great).


I would assume any high res asset generation would be best not done inline? Such that that makes sense, but is kind of unsurprising?

Is also unsatisfying as an explanation. Since none of the examples people show to get off TeX show that. Rather they imply it is the typesetting aspect that is slow.


My favorite part is : "...Fussing with typography is an excellent way to procrastinate..."


Nice. Looking forward to part 2.


The TXR manual is written in troff. It's designed so that you can install it as a man page, but it has macros which retarget some of the formatting for HTML and PDF. When compiled to PDF, it comes out to over 900 pages.

It may be the only man page you will ever see which auto-numbers its sections (down to three levels deep). I developed the macros for all that.

Pretty much any large or large-ish man page you can think of does not have this feature. E.g. type "man gcc" or "man bash": no section numbers anywhere.

I use Lucifredi's man2html converter from the man1.6 package. I think this is mostly abandonware. The man2html program is essentially a hacky reimplementation of enough of a subset of nroff to crank simple man pages into HTML. It does a better job than any other converter. (There is at least one more thing called man2html, like one unrelated program written in Perl.)

I made some 30 commits to man2html, fixing bugs, adding some features, and making it understand and execute complex nroff macro definitions. One feature added is that man2html defines a variable M2 which the code can use to conditionally switch to code specific to man2html versus regular nroff. All that work is here:

https://www.kylheku.com/cgit/man/log/

Anyone wanting to reproduce the HTML version of the TXR manual needs that version of man2html. A TXR program called genman.txr, in the root of the TXR source tree, analyzes the man2html output, and rearranges it, adds identifier cross-referencing, a collapsible table of contents (via a tiny bit of JS) and such.

I use another TXR program for running checks for common markup mistakes in the manual: checkman.txr in the root of the TXR source tree. It understands the usage of many of my custom macros and validates that they are correctly used, in their intended structures. For instance if a syntax synopsis block is opened with .synb, it will tell you there is a missing .syne closer.

There are "weird" formatting macros; e.g:

  .coNP Operators @ let and @ let*
  .synb
  .mets (let >> ({ sym | >> ( sym << init-form )}*) << body-form *)
  .mets (let* >> ({ sym | >> ( sym << init-form )}*) << body-form *)
  .syne
  .desc
  The
  .code let
  and
  .code let*
  operators introduce  ...
coNP is like NP, introducing a paragraph at a certain subsection level; the co refers to it having code markup. In co markup, a word preceded by @ will be typeset as code, e.g. typewriter font. A word preceded by @, (at comma) will be typeset in a typewriter font, followed immediately by a comma in the original font.

.synb starts the syntax synopsis section and .synb ends it.

We use .mets for markup in the syntax synopsis. The "met" refers to "meta": markup for meta-syntactic identifiers, which typically use italic in HTML and PDF of angle brackets like <sym> in the plain text man page.

In .mets, everything is typewriter by default. When an argument is preceded by <, it is marked up as a meta identifier, italic or <angle>. When two arguments are preceded by <<, like << body-form *) the left one is meta, and the right one is tacked onto it with no intervening space. There is also <> with three arguments: <> A B C will typeset B as a meta, flanked by A and C with no space; and there is an opposite >< also.

All these macros are defined near the top of the txr.1 file.

The "meta" markup is implemented by lower-level macro called .getm which the higher level ones like .mets and .meti and .meIP use. That macro is implemented twice, conditionally switched for nroff mode (angle brackets) and regular (font switching).


In the documentations sections describing the REPL, I wanted to nicely typeset keystroke sequences with <kbd>...</kbd>. The result is that there two macros:

  .key foo
foo will be typeset as a keycap, with spaces around it, and:

  .keyn foo bar
foo typeset as keycap, immediately followed by bar (no intervening space) by regular bar.

The macros are defined in three ways: one for man2html, one for nroff, and one for groff:

  .\" keystrokes
  .ie \n(M2 \{\
  .de key
  .M2HT <kbd>\\$1</kbd>
  ..
  .de keyn
  .M2HT <kbd>\\$1</kbd>\\$2
  ..
  .\}
That's for man2html, detected by the M2 variable being nonzero. (This elicits a warning out of groff, due to that variable not existing.) We use the .M2HT macro for inline HTML. I think that didn't exist in man2html; another thing I added.

  .el \{\
  .  ie n \{\
  .    de key
  [\\$1]
  .    .
  .    de keyn
  [\\$1]\\$2
  .    .
  .  \}
Else, for the other two: first nroff for man page viewing. Here the strategy is to use square brackets, so .key Ctrl becomes [Ctrl].

  .  el \{\
  .  \" Box macro from Groff manual with $2 added
  .    de box
  .      nr @wd \w'\\$1'
  \h'.2m'\
  \h'-.2m'\v'(.2m - \\n[rsb]u)'\
  \D'l 0 -(\\n[rst]u - \\n[rsb]u + .4m)'\
  \D'l (\\n[@wd]u + .4m) 0'\
  \D'l 0 (\\n[rst]u - \\n[rsb]u + .4m)'\
  \D'l -(\\n[@wd]u + .4m) 0'\
  \h'.2m'\v'-(.2m - \\n[rsb]u)'\
  \\$1\
  \h'.2m'\\$2
  .    .
  .    de key
  .  box "\\$1" ""
  .    .
  .    de keyn
  .  box "\\$1" "\\$2"
  .    .
  .  \}
  .\}
For groff PDF output we use a .box macro to put a box around the keycap material. The macro is based on something in the manual.

Not sure why the box invocations are indented funny.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: