Archive | February, 2012

eText Best Practices for Document Creation

19 Feb

Digital Textbook Format Workflow

For Etext Best Practices project.

Concerns of Format

Too often accessibility is an afterthought. The goal of eText.Illinois was and is to develop a workflow that creates or starts from a core document that fully separates semantic mark up and presentation styling. This separation provides document curators with the best Wikipedia does good job summing up

Semantic HTML is the use of HTML markup to reinforce the semantics, or meaning, of the information in webpages rather than merely to define its presentation (look). Semantic HTML is processed by regular web browsers as well as by many other user agents. CSS is used to suggest its presentation to human users.

(via Semantic HTML – Wikipedia, the free encyclopedia)

This HTML and CSS approach is well accepted basic and solid best practice for web content creation. When considering a long format “book” much of the authoring process place away from the web professional and solely under the control of the author. It is all too often that in this period of crafting the book the adherence to purely semantic document creation slips.

For nearly all authors in academia, the writing tool of choice is Microsoft Word. This is a double-edged sword. On one side is MS Word’s ability to “Track Changes,” especially across multiple users is an important collaboration tool. While Track Changes is an important feature, the remainder of MS Word’s unique and powerful toolset is primarily tied to a printed document.

The other edge of the sword are the tools that create beautiful pages visually but not necessarily semantic organized for conveying meaning. The main problem is that Word gives the writer too many tools for presentation at a time when the process should be on clear writing and organization. Arbitrary application of formatting methods such as font size, typeface, indenting, and underlining may visually deliver meaning and emphasis but do so without any semantically correct markup. While semantic meaning can be created with consistent application of MS Word’s paragraph “Styles” it still is difficult to impossible to extract clean HTML that moves away from making the web look like the printed. There is no process in MS Word that puts the screen first ahead of the page. In an eText project that’s backwards.

With a well crafted HTML document can result in a beautiful printed page with the appropriate CSS. If advanced features such as automatic printed table of contents, indices, footnotes/endnotes the HTML document can more easily be adapted to print that performing the process in the reverse direction

A (Not the only) Solution

While MS Word is a perceived as the ubiquitous format, in reality plain text is even more so. With only the caveat of the missing “Track Changes” feature plain text, as an authoring format, is without peer for long term use and compatibility. Working with plain text is a bit foreign to many today, but is the basis of all text creation and consumption on the screen. A rising movement among professional writers is the move to drafting and shaping their narrative as plain text files. Plain text keeps the mind on the content and not the look of the page. Additionally working in plain text lends itself to working on mobile and tablet devices becoming more and more popular. As an example, this document is being created on an iPad with the “Writing Kit” application.

The goal is to create a document with the least amount of friction as it moves through the editorial and production process. Plain text captures the words, but suggestions for meaningful structure such as headings and emphasis on particular words must also be included. Once we have words and structure in a single document as plain text the destination of the document is completely flexible.

The Notion of an ÜberFormat

The flexible destination idea is so critical to the whole work flow described here. Documents created could go to one or many of the following possible situations:

  • Printed for distribution
  • Specialized document for reading by text to speech reading devices
  • Printed to Braille type
  • Distributed via digital book formats
  • A website
  • Integrated into a learning management system
  • A blog
  • Master document draft to go to a publisher

This list not complete because a text document can end up anywhere. The überformat should be thought of like this:

From a single semantically correct document a conversion process can deliver a variety of formats for screen display and print.

Semantically Correct – Structural Formatting That Carries Meaning

For the smoothest conversion to other formats but with all the structural meaning of the document proper markup should be in used to:

  • Create structured, nested sections with Headings
  • Provide means to place more importance on words or phrase. There are two levels: Strong and Emphasis.
  • Create lists that are either unordered (bullets), ordered (numbered), or combination of the two
  • Ability to organize tabular information in row/column tables
  • Designate blockquotes
  • Offer hyperlinking of words or phrases
  • Blocks of code
  • Horizontal Rules (separating lines)
  • Images

Where appropriate these items should have the appropriate “alt text” and “long description” to provide low vision users with options to gather meaning from the text.

Optimum Flow – How Can it Work?

The process of creating digital documents that provide the most flexibility with the least amount of labor to output in a variety of needed formats requires a commitment all along the creation path of the document. Any creation path and set of document formats can eventually create an überformat document but the goal is too make it as painless as possible for everyone involved and save time and money. Without this process goal the ability to scale up document production is only a dream.

The basic idea is the seamless transfer of documents between authors, editors, proof readers and final digital publishers. Editing a text file with appropriate markup as discussed above provides for the best process. The solution that has been used for the last year and a half in the eText.Illinois initiative is Markdown formatting. Markdown provides appropriate options for semantic structure and keeps authors and editors focused on the actual content and not the “making it pretty” trap that full blown word processors tempt the writer.

Markdown Overview

The creator of Markdown John Gruber is a well-known tech pundit, former programmer most notably with application BBEdit. As a writer, for both print and screen, he devised to ease his own workflow.

Philosophy (From John Gruber)

Markdown is intended to be as easy-to-read and easy-to-write as is feasible.

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters — including Setext, atx, Textile, reStructuredText, Grutatext, and EtText — the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.

To this end, Markdown’s syntax is comprised entirely of punctuation characters, which punctuation characters have been carefully chosen so as to look like what they mean. E.g., asterisks around a word actually look like emphasis. Markdown lists look like, well, lists. Even blockquotes look like quoted passages of text, assuming you’ve ever used email.

(via Daring Fireball: Markdown Syntax Documentation)

See the formatting for Markdown from John Gruber’s Markdown Syntax Page

The eText.Illinois Workflow

In a perfect world we would have authors writing, or at least delivering, their works to the editors as Markdown (or other similar lightweight markup languages). While the real word experience with three different text book projects has provided three unique document conversions.

  1. Introduction to Bioevironmental Engineering was a MS-Word document with well over 100 hundred scientific equations.
  2. ACES 101: Contemporary Issues in ACES was compiled from a all sorts of document sources including MS-Word, PDF, webpage text and newly written content in plain text.
  3. Writing @ The University of Illinois was delivered completely in HTML.

Learning from these three experiences the model eText.Illinois is moving forward to new projects is based on the following process flow:

Flow chart that traces the process documents take as they come from authors and then are prepared for the überformat. Once prepared the documents can be used to provide all possible publishing endpoints for print, web, reading devices and accessibility hardware and software readers.


The Introduction to Bioevironmental Engineering book required the conversion of MS-Word created scientific equations. Through much research and experimentation it was discovered that to have equations that can be properly displayed in a browser, ePub formatted eBooks, copyable and usable in advanced mathematics software such as Mathematica, and as source for reading by text to speech software and hardware multiple methods must be employed. There isn’t one magic format that allows publishing to all major web browsers with proprietary and limiting software plugins and applications. Specifics on the equation methods placed in HTML and digital will follow.

Of concern at this stage is the ability to extract equations from the MS-Word source document. These additional steps are added to the workflow above to convert MS-Equation Editor or MathType equations to LaTeX formatted equations that become the backbone for screen display as well as some speech to text.

Flow chart of steps converting MS-Word docs with equations created in MS Equation Editor to LaTeX (MathJax Flavor). Then wrap up by exporting the whole document as plain text.

Additionally eText.Illinois is recommending that written out text scripts of how the equation would be read is being gathered. This is basically a transcript in words of the the symbolic representation of the equation. As a test a Graduate Assistant provided this transcription for the Introduction to Bioevironmental Engineering digital textbook.