evan forry's curation corner

Plagued by Typesetting

I've been working on a religious, nonfiction book for a while. I'm starting to see the light at the end of the tunnel, and that means I need to start thinking about typesetting again.

I love typesetting, but I hate the tools that are available to me. Typesetting throws so much friction into my publishing process. When it comes up, I end up being plagued by obsessive thoughts about how to make it smoother. Right now I should be working on the editing for the final part of my book. Instead, I'm ranting about the work that isn't doable yet and scouring the web looking over and considering solutions I've already rejected a thousand times.

They say writing is thinking. So, I'm hoping that this ranting and raving will either help me think through and solve my problem, or help me realize I can be content with the tools I have.

What are the limitations?

I should probably start by defining my problem with the typesetting process.

To do that, I guess I should describe the whole self-publishing process a bit as well.

Simply put, books end up on shelves by first being drafted, then edited, then typeset, then printed, then distributed, then sold, then read.

There are numerous options, paths, and tools one can use for every step of that process.

Sticking to the tech side of things, my current tool stack looks like this:

That's my publishing process. What about typesetting specifically?

Typesetting is the process of transforming the raw text of a draft into a fully polished file ready to be in the hands of the public.

Right there we have the main constraints that I need to work within.

I draft and edit everything I write in plaintext. Specifically, I write in Markdown. That isn't going to change. I use plaintext because it is future proof, lightweight, text editors are simpler and less distracting compared to word processors, Markdown is quick and easy, and Bearblog uses Markdown.

My plaintext life is built around Markdown specifically because of its ubiquity. Syntax wise, I would love to switch to Orgmode or better yet use my own custom markup language. However, Markdown is everywhere. If I want to copy and paste a finalized draft into the Bear editor, it better be in Markdown. Markdown won so, using anything else is fighting an uphill battle. I'm stuck with Markdown for the same reason I'm stuck with QWERTY. It's the standard. My highest value is simplicity, and standards are usually the simplest option.

So, when it comes to typesetting, I am limited to Markdown being my input format.

The typesetting process also has output requirements. It's a slight over simplification, but digital books require an EPUB file, and print books require a PDF file. My input requirements of Markdown, I recognize, are essentially a product of ideology. Output requirements, however, are institutional. Amazon and other printing services will only print certain formats. Ereaders (hardware and software) only support and handful of file formats. While there are a few options to pick from, all options basically boil down to PDF and EPUB. EPUB is the universal standard. All hardware and software supports it. Devices that uses proprietary DRM formats will convert from EPUB. For digital books, EPUB is simply the standard. Likewise, every print book comes from a PDF. Amazon does accept files besides PDF for printing (DOC, DOCX, RTF, HTML, or TXT), but these get converted to PDF anyways. To have the most control over what your book is going to look like, you need to give them a PDF.

The typesetting process then, for me, is one of transforming a file from Markdown to EPUB and PDF.

For me, that first means using Pandoc to convert the Markdown into an EPUB file. Then I use Md2roff to convert the Markdown into a Groff file that uses the MOM macros. This MOM file needs tweaking and then once it's combined with a template file, I use Groff to generate a print-ready PDF.

This process is fine. It works, I like it more than the alternatives, but there are three pain points that create the friction which drives me to find something better.

First, I don't like needing to use Pandoc for EPUB and Groff for PDF. I'd much prefer to use one program to generate both files required for publication. Pandoc does do PDFs, but I prefer to not use it because there is (or feels like there is) less control over the final output. When it comes to PDFs, Pandoc feels like an unnecessary middleman. Behind the scenes, it uses Groff, HTML, LaTeX, or Typst as middle steps. I might as well use those programs directly instead of getting bogged down by Pandoc's abstractions. But, none of those directly produce EPUB output. I do need to do more research here. Groff gets to HTML but not to EPUB. I'm not sure if HTML to PDF tools (like Weasyprint) do EPUB or not. I have no idea if LaTeX does EPUB. Typst doesn't current export to EPUB, but it is in the works. Regardless, this pain point isn't too bad. It's only really an issue because of the second point of friction.

Second, Markdown to MOM macro Groff is basically nonexistent. Md2roff is the only program I've found that supports MOM, and it's not that great. (At least the old version I've been using. There is a fork that has active (7 months vs 4 years) development that may be better.) It requires some manual fixes like removing unwanted boilerplate, random bugs, and I'm not sure it supports all of the Markdown features. Which leads to the third problem.

Third, because files converted into MOM require so much fiddling, it's a pain to make changes to content after conversion. My options are to edit the origional Markdown to update the Epub and then either convert that Markdown to MOM and redo all of the manual fixes, or manually correct the MOM files creating the possibility that a mistake is made and I end up with the print copies being different than the digital books. It's not a huge deal, but it's friction that stresses me out.

Alright. We've gone the long way around to answering what the limitations are and what typesetting is. Simply, typesetting is the process used to get an edited draft ready for printing and publication. For me, that means turning a Markdown file into PDF and EPUB. We've seen how I've been doing it (a bit and I don't expect you to understand any of it), now let's look at what the options are for typesetting software.

What are the options?

Aight, so what programs are there to typeset a book?

There are basically six paths to producing a print-ready typeset document (PDF).

Just Pandoc

I've already talked about just using Pandoc. It's an option I maybe should give more consideration. The only reason to use it over directly using the PDF engines that it uses under the hood is if it makes using those engines easier. Whether or not that is the case, I do not know. I haven't dug deep enough into Pandoc to know for sure, but from what I gather, the options are to use Pandoc as is, relying on generic templates and limited options, or use Pandoc with custom templates. Using generic templates with limited options does make the process much simpler, but your output becomes pretty generic. That isn't always an issue, but for creative products, it's not ideal. Using custom templates should give the full power and customization of the underlying engines being used to produce PDF files, but then I see no benefit to using Pandoc to generate the PDF.

I should dig deeper into Pandoc, but until everything else has been exhausted, it's not the path I want to take.

Remove the Constraints

The industry standard for typesetting is basically a combination of Microsoft Word or Goggle Docs and Adobe InDesign. Using plaintext and terminal programs is a very unusual thing to do. LaTeX and Typst have considerable professional use, but only in a niche (technical) sense. My typesetting headaches would all go away if I used normal tools that are broadly supported. But, then my experience drafting and editing would suffer. Drafting and editing make up 90% of the writing process. Typesetting and publishing are a small sliver. It doesn't make sense to contort myself around unfriendly proprietary software just to make the last bit smoother. It's not a great analogy, but it'd be like running a marathon in dress shoes and then putting sneakers on at the last moment. That being said, I could write and edit in plaintext then move to traditional tools for typesetting. I'm never going to subscribe to Adobe, but there is Scribus. Complex GUIs aren't going to help though. I want to write a config file for my book, not click options in menus. No, I can't get rid of the plaintext input constraint. Though, I might be able to switch away from Markdown. Writing directly in something that doesn't need conversion may be an option. But, that isn't possible for blogs. I'd still need to write in Markdown to publish on Bear. I'd rather just stick with one language. So, Markdown isn't a constraint I can change.

It might be possible to change my output formats. Amazon does support HTML and TXT. (DOC and DOCX are out for the same reasons as above) I have no idea what Amazon would do with a TXT file. I love the idea of writing entirely in Markdown (you can rename the .md extension to .txt and no content changes) and just giving Amazon that file and living with what they do with it. There's an extreme simplicity to it. But, I'm sure it's going to look terrible. How terrible exactly might be worth experimentation. (I wonder if there's a public domain book that'd I'd enjoy writing an introduction for and actually try it.) HTML is the more realistic option though. Markdown was designed to convert to HTML, so that step is tried and true. It's trivial. How Amazon converts the HTML to PDF, I don't know. Again, probably terribly. Like with Pandoc, if I'm going to convert to HTML, I might as well use a dedicated HTML to PDF program instead of leaving it up to Amazon's generic templates. Changing my output constraints is not ideal, but worth looking into I guess.

I'll note now, you may have realized I haven't talked much about EPUB conversion. That's because EPUB is HTML and Markdown is HTML. So, Markdown to EPUB is super easy. The PDF part is what causes me pain. That being said, a solution could be to go entirely digital. The points of pain in the typesetting process come entirely from preparing a print-ready PDF. If I move away from publishing physical books, all my problems are gone...

No, that's not an option lol.

Though, maybe digital only to test the success of a book and then create a print copy only when the digital sells enough for the effort to be worth it. No, that thought can't even conform to a readable sentence, much less be a good idea.

HTML

I guess since I've mentioned it, I might as well tackle HTML next. As I said, Markdown was meant for HTML and EPUB is HTML.

Using HTML as the middle language makes a lot of sense. I've always been drawn to the idea of using it for publishing.

HTML to PDF converters (Weasyprint is likely what I'd use) are fast and efficient enough.

The only issue, however, is HTML was designed for the Web, not print. Modern HTML is robust enough to support print. Though with that not being it's intended use, it's hard to find resources on making good print-ready books from an HTML source. I have found a few, but not enough to feel comfortable diving into this option.

I feel like HTML is where I should end up, but I can't bring myself to do it.

I think what I need to do is look into the HTML to PDF engines and see if any support EPUB export as well. If I can consolidate down to one program giving me both outputs, it might be worth the leap. Again, it may be interesting running an experiment where I find a public domain book and publish it using each method and see what I like best. Hmm.

LaTeX

I hate LaTeX.

I don't really have any good reasons why, but I just do. Mainly it just does way more than I need it to, so it feels overwhelming and impossible to use. Also, the syntax is ugly. It's a slow, bloated program that has had so much legacy junk piled on it that I don't want to touch it.

I'm sure there's a world where I don't hate it and am blissfully using Pandoc with a custom LaTeX template, blissfully unaware that Groff even exists.

But that's not this world. I don't know why I decided to do this one before talking about Groff and Typst, but where Groff is the archaic dinosaur, and Typst is the modern machine, LaTeX is the caveman trying to sell me car insurance. It's just the worst of both worlds. Groff is old, a bit confusing, but simple enough. Typst is new, fast, and complex. LaTeX is old, complex, and complicated.

I want nothing to do with LaTeX. I'm not considering it for my stack at all.

Groff

Groff is old, and unused, but I love it.

It's what I use now.

Specifically, what I like about it, and why I use it, is entirely because of the MOM macros.

Most typesetting programs are designed with documents and math as their primary focus. Books are a bit of an after thought. Groff's main use is creating and typesetting man pages, even more niche. However, the MOM macros are designed specifically for print books.

Groff in general is old, ugly, and complicated. Groff-MOM is great though. But no one uses it. There's no support, so conversion is difficult.

The great thing about Groff in general is that it is one of if not the oldest typesetting programs. It's small and focused. The only things stopping it from being my ideal tool are how annoying it is getting into it from Markdown and it's lack of EPUB export.

One option then would be to find or make a better conversion tool. It wouldn't be too difficult from what I've seen, but quite tedious. (And no, I'm not going to have A.I. do it for me.) Groff does also export to HTML, so I could then go from HTML to EPUB. It would just depend on Groff outputting HTML that gives a comparable or better EPUB than the original Markdown.

Groff is what I use now. I am happy with it. I could improve my workflow with it to remove some pain points instead of abandoning it. Ideally that is my best option for becoming content with my typesetting process.

Typst

Ah, Typst. The fancy new kid.

Like HTML, Typst seems perfect. It's modern. Simple. Popular. Written in Rust. What more could you ask for.

Better syntax, that's what. I despise Typst's use of equal signs for headers. It absolutely breaks my brain.

But. I could just convert into it and use it to make PDFs.

And as I said, EPUB is on the road map. Once Typst can give me both a PDF and EPUB in one go, I might just have to switch to it.

I think the main reason I haven't switch to Typst is simply because I can almost but not fully use it full time. It is inviting to use for absolutely everything. But not being able to write blog posts in it means I'm either converting to it or writing books in a different markup language.

That just feels bad.

My problem with Typst is that it is so close to my ideal markup system that the disappointment in it not being so is just too great.

What I really want is I guess Markdown+. I want a markup language that is basically an advanced Markdown flavor that takes the core of Markdown and adds in everything Typst does.

I want Typst but with Markdown syntax instead of stupid equal sign headers.

Maybe what I want does exist...

HTML Again

I want Markdown with extra typesetting stuff built-in.

But, Markdown is HTML. Out of the box, Markdown supports inline HTML.

Isn't Markdown+HTML exactly what I'm looking for?

I think it might be.

Am I just scared of learning HTML proper?

Well, I guess sort of.

I think the reason I haven't dived down this trail is because, like LaTeX, there's just to much superfluous stuff that I don't know where to begin. HTML is designed for the web, not books. All learning resources are about making web pages and web books, not print books.

I'm just frustrated that there aren't any simple tools for writing simple (no math, figures, or tables) books in plaintext and producing a print-ready PDF.

I want to write Markdown, a metadata file, and a config file, and then enter a command and have my publish ready files.

Maybe the closest I'll get is HTML and CSS.

Maybe I somehow figure out how to make my dream program myself.

I guess the only thing left to do is experiment.

What are my options?

What am I going to do? What experiments should I try?

I kinda like the idea of grabbing a TXT file of a public domain book and trying the different tools to produce print-ready output.

Here's a list of the different experiments I want to run that I can update as I do them.

The End

Well, I guess now it's time for me to get experimenting.

Though, maybe I should focus on finishing my book first...

That also might give me some time to hear from you.

Let me know if you have any ideas. There are more tools that I know of but didn't mention, but there might still be some I've missed.

I also have a lot of opinions based on never actually using some of these tools. So tell me where I'm wrong.

Also, if you have a suggestion for a public domain book I should use, let me know. I'm just gonna pick something semi random from Project Gutenberg, but if you have a book you want to see printable, I'm gonna share the files for my tests as I do them.

If you've made it this far through my rambling rant, thanks! This is weirdly the most me article I've written in a long time. I very rarely let myself write, much less publish, these raw thought dumps. But this time it feels right. So thanks for sticking with me.

You are Love.

P.S. Oh, and I recognize I skimmed over a lot of detail here. If there's any part that you want more info on, I can probably provide it!


Subscribe to my blog via RSS feed.

Go to my contact page and send me your indie work for review!

#publishing #typesetting #writing