Friday, July 30, 2021

The darker side of publishing a manuscript


I've just finished three days of purgatory, trying to sort out the bugs and problems besetting a manuscript before running it through a publishing program to produce print- and e-book-ready files.  The manuscript is for my wife's third novel, "Going Ballistic", which was published in e-book format last year.

Due to a number of factors, she wrote it over time in segments and sections, using different software packages to do so, then combined them into a single file.  There were immediate problems with the "final" version:  quotes that were sometimes straight (") and sometimes curly (“), the same with apostrophes, double spaces interspersed with single spaces in the text, and some really baffling pagination errors.  We managed to get the e-book version published, but I just couldn't get the print version (which would be in Adobe's PDF format) to typeset properly.  Under the pressure of COVID-19 and related health issues, that had to be put on the shelf for a while.

Three days ago, I tackled the manuscript file.  I began by trying global search-and-replace fixes for quotes, apostrophes, etc., only to find to my astonishment that they didn't work.  I've never found a manuscript before that actively resisted replacing punctuation like this.  It seemed almost to have a will of its own!  For example, when I tried to replace a straight quote mark with a curly one, the curly mark would be inserted after the straight one, but would not replace it, so that I now had two marks.  Also, if I wanted to insert a comma or a full stop or a space, very often the characters would be inserted before the quote mark, even if the cursor wasn't positioned there, and would insert themselves in reverse order to how I typed them.  Weird!

Frustrated, I began on Page 1 and started to do a one-by-one replacement of the offending errors.  Not only was this very slow, but it seemed to upset something in the manuscript and/or program logic, particularly when I altered pagination marks or chapter breaks.  Changes dragged on and on, not taking place immediately, but producing a "working" or "wait" symbol while they were processed.  Routine backups of the file every ten minutes began to take thirty to forty-five seconds to execute, during which time I couldn't do anything else;  and when I manually saved the file, the same delays occurred.  On two occasions, my word processing program (LibreOffice) crashed completely.

It was obvious that hidden code in the manuscript file was messing up the entire process.  I therefore saved the file every few pages.  Every 50-odd pages I copied all its text to memory, then pasted it without formatting into a brand-new file with a new version number.  I saved the old versions as backups, of course.  I'd got to Version 5 by the time I fixed the last error, after many hours of line-by-line editing and replacing.  However, the moment I fixed the last error and copied the text to the final version of the draft, all the weird problems suddenly went away.  Saving became a half-second affair, global search and replace worked as it should, and all the previous complications became merely an unpleasant memory.

I've no idea what could have caused such issues.  I've never had a file behave like that in my life before!  Can any reader suggest what might have crept into a simple DOCX format file that might cause it to (mis)behave that way?  I'm at a loss to figure it out, except to assume that extraneous code had crept in from the multiple programs used to produce the text.  (The original segments were produced in Windows 10 using packages such as Notepad [TXT format files], Wordpad [RTF format] and LibreOffice [ODT format], then imported into and combined in Libreoffice before being exported to an MS Word [DOCX format] file.)

Fortunately, Miss D. has now adopted a different document processor (Jutoh) to produce her manuscripts (sadly, she doesn't like LibreOffice - which I use - or MS Word, and neither of us like Scrivener, another popular standard).  Jutoh produces clean files with no complications - at least, so far.  Thank heavens for small mercies!

As soon as we've checked and double-checked the revised manuscript file, we'll use Vellum publishing software to put out a new edition of the e-book (if you've already bought it, yours should be updated seamlessly), and a print edition as well.  Miss D. is almost ready to publish her fourth novel, set in the same world as "Going Ballistic" and involving one of the characters from that book.  Look for a launch announcement here within a few weeks.  She's also working on a fifth, to complete a trilogy in that setting, involving a character from the fourth book.  She's been hard at work writing, and I've been hard at work preparing her files for publication.  I've also got my own works in progress; I'm preparing to publish a number of them at roughly one-month intervals from the fourth quarter onward.

There's no peace for the wicked, it seems - either of us!



D.J. Schreffler said...

I have enjoyed both Scaling the Rim and Going Ballistic. It looks like all three that she has written are in the same general setting, with the main male in the second being a Rus-descended mercenary, and Going Ballistic could well be on the same planet as Scaling the Rim with the Rus having founded the Imperium and Central the other main faction.

Definitely looking forward to her fourth book coming up, and will pick it up.

Winterborn said...

I'm no author but do work in tech, from your comment on the multiple file types, I'd say you already hit this on the head.

When you mix and match you can get a lot of weird stuff to creep in. A lot of times, it isn't even repeatable stuff as it is code that has broken in the background of the document/book. Copy/pasting the entire thing from the .pdf to a notepad (.txt) file can many times clean out all the stuff. I've done that for years. Then, close out adobe/word/docProgramX whatever it is, completely. Re open, fresh doc and (after making sure your clipboard is clean as in copy pasting a single word someplace else) copy the large text doc (in windows ctrl A for all) and paste into the writing program. NOT saying it will get all of it. But it sure tends to kill a lot of stuff in the background.

Note that Notepad (the basic version, not Notepad++ or a lot of others that have lots more bells and whistles) will kill bold font, italics etc. So that can be a huge huge pain to go through again. Something like this can cut a Gordian not but that also might just leave you with some unusable rope as well.

I'll have to pick up these books as well :)

Barbarus said...

I use Libreoffice. I found myself exporting my file as HTML then editing the HTML manually in a text editor, mostly to remove tags that shouldn't make any difference but sometimes did. It was surprising how inconsistent the HTML output of the software was - you could see where edits had happened, and sometimes the inconsistencies appeared when viewing the uploaded story even when not visible in the original.

Jonathan H said...

I've noticed that docx files don't play well with other programs. i wonder if MS went to docx because too many programs could use doc files - I wouldn't put it past them to NOT fully document the file type and set it up t create problems for anybody using it with non-MS products.

Bob said...

I've got a macro that I can't recall where I got it that will convert all quotes in either direction. curly to straight or straight to curly. I'll email it if you like.

jaericho said...

I second what Winterborn said.

Since I write code more than words, I'm using plain-text code editors all the time (vscode/notepad++), so what little writing I do is written in Markdown. I write most everything in plain text then work on formatting later. It works well for me.

BillB said...

Similar to Winterborn, I did a lot of technical writing for a major aerospace program. I had a document that was a compilation of original content and content from other documents and was a between 100 and 200 pages long. While it was all done in MS Word, there were many different styles included in it from different versions of MS Word. I ended up having to copy, then paste without formatting the entire document and finally using a single set of styles reformat the whole document. I can empathize with you, Peter, on doing this.

My wife has published 3 novels through Amazon. Her first one was started in Libreoffice but moved to MS Word and thence to Kindle Create. Plus she did some writing on a laptop and some writing on a table. There are a lot of issues in it due to formatting because of moving it around from software to software as she did. She learned her lesson for the second and third by creating them completely in MS Word and then moving to Kindle Create before publishing.

ColoComment said...

...used to work 100% in Word in a law office. Got lots of docs, like proposed contracts, in pdf, etc., without an accompanying Word version for making & marking changes.
MS Word is all I've worked in (well, WordPerfect in the long ago and far away, but..., I'm dating myself!), and I'm very comfortable with it. Even had "corporate" training classes in it (so there!) : )

I don't know if other word processor programs offer the same functions, but I always, always, worked in Word with the paragraph mark turned on -- to reveal the background formatting codes. (much like WordPerfect's "reveal codes" command.) At the very basic level of formatting, it differentiates between the "hard" and the "soft" returns, and illuminates problems with bullet points, by way of example.
I'd find all kinds of weird stuff, especially frames, of which Word is very fond if it can't figure out what else to do with incoming text. If it was too time consuming to make corrections individually, I'd export the whole kit n' caboodle of a doc as an .rtf, and then re-build it totally in Word. -- which was often faster than the former type of repair.
It was fun work for this detail-oriented introvert -- I even enjoyed proofreading SEC prospectuses. Go figure.

Jack Ward said...

I loved Going Balistic! have read it twice now. Dorothy needs to expand at least 3 of her novels into series, especially this one. And, as for difficulties with getting the formating done, well, she is a bush pilot is she not? Believe I remember that. Balistic had too much ovious insider infor and terminoly for it not to have been done by a pilot.
Build a fire under her, if the courage is there. We readers need more of her, and your upcoming releases. You two will be rich yet.

Ray - SoCal said...

In word search and replace can replace codes.

Unknown said...

docx is FAR from a simple file format, no matter how simple the result looks visually there is a absolute ton of hidden stuff.

Toirdhealbheach Beucail said...

You have my admiration BRM. I have edited my own manuscripts in LibreOffice or Word, and that alone was difficult - and that was with the program playing nice. In my case, I believe it was the conversion to the approved KDP template that added to the issue.

Philip Sells said...

Is LibreOffice like OpenOffice? It sounds like a story could be made of the manuscript with a mind of its own.

Unknown said...

The OpenOffice brand was donated to the Apache Foundation, who then took it
directions that the developers didn't like, so they forked it to the LibreOffice
brand and have put in several years of development, while the OpenOffice branch
has stagnated.

If you are using OpenOffice, you really should migrate to LibreOffice, it has
years of fixes that are missing from OpenOffice.

David Lang

bushpilot said...

The most powerful typesetting program I know is LaTeX (pronunced laa-tek, and yes, the funky capitalization is the way it's spelled). It originated in academia handle the typesetting of mathematical formulas, which are notoriously difficult to do. It has since blossomed into a full blown typesetting program that gives total control over the appearance of the printed page. It is open source, and is maintained by programmers at universities all over the world. It has become the de facto standard for producing science doctoral dissertations.

To get started go to LaTeX works with a variation of HTML tags, so everything in in .txt format with tags. TeXnicCenter is a simple to use editor that automates placing the HTML tags for Italic, bold, centering, footnotes, etc.

A link from the TexnicCenter Download page will let you download MiKTeX, which is the compiler engine for turning LaTeX files into PDFs.

I tried it out on a family history document and was blown way by how easy it was to use, and how good the output looked.

Unknown said...

My personal recommendation is to keep everything as simple text until the final publishing stages, only then format it.

David Lang

lpdbw said...

I have not successfully completed any of my writing projects, but nonetheless feel comfortable saying you might consider yWriter from SpaceJock.

OTOH, now that you've got stuff that works for you, maybe you should proceed on the path you're on.

Stan_qaz said...

One thing that many authors seem to be missing is how the text looks with different background colors selected.

In my case it is a black background on Kindle for Android where the expected white text changes to unreadable dark gray for no reason. Might be for a few letters or a few pages, then it changes back. The other backgrounds don't cause the problem.

Has to be some oddness in the editor chosen, there is no reason the author would choose any function for the mis-formatted blocks of text.