Getting Published Digitally.

This is fabulous. No company or digital personality gets left out.


eText Best Practices for Document Creation

Digital Textbook Format Workflow

For Etext Best Practices project.

Concerns of Format

Too often accessibility is an afterthought. The goal of eText.Illinois was and is to develop a workflow that creates or starts from a core document that fully separates semantic mark up and presentation styling. This separation provides document curators with the best Wikipedia does good job summing up

Semantic HTML is the use of HTML markup to reinforce the semantics, or meaning, of the information in webpages rather than merely to define its presentation (look). Semantic HTML is processed by regular web browsers as well as by many other user agents. CSS is used to suggest its presentation to human users.

(via Semantic HTML – Wikipedia, the free encyclopedia)

This HTML and CSS approach is well accepted basic and solid best practice for web content creation. When considering a long format “book” much of the authoring process place away from the web professional and solely under the control of the author. It is all too often that in this period of crafting the book the adherence to purely semantic document creation slips.

For nearly all authors in academia, the writing tool of choice is Microsoft Word. This is a double-edged sword. On one side is MS Word’s ability to “Track Changes,” especially across multiple users is an important collaboration tool. While Track Changes is an important feature, the remainder of MS Word’s unique and powerful toolset is primarily tied to a printed document.

The other edge of the sword are the tools that create beautiful pages visually but not necessarily semantic organized for conveying meaning. The main problem is that Word gives the writer too many tools for presentation at a time when the process should be on clear writing and organization. Arbitrary application of formatting methods such as font size, typeface, indenting, and underlining may visually deliver meaning and emphasis but do so without any semantically correct markup. While semantic meaning can be created with consistent application of MS Word’s paragraph “Styles” it still is difficult to impossible to extract clean HTML that moves away from making the web look like the printed. There is no process in MS Word that puts the screen first ahead of the page. In an eText project that’s backwards.

With a well crafted HTML document can result in a beautiful printed page with the appropriate CSS. If advanced features such as automatic printed table of contents, indices, footnotes/endnotes the HTML document can more easily be adapted to print that performing the process in the reverse direction

A (Not the only) Solution

While MS Word is a perceived as the ubiquitous format, in reality plain text is even more so. With only the caveat of the missing “Track Changes” feature plain text, as an authoring format, is without peer for long term use and compatibility. Working with plain text is a bit foreign to many today, but is the basis of all text creation and consumption on the screen. A rising movement among professional writers is the move to drafting and shaping their narrative as plain text files. Plain text keeps the mind on the content and not the look of the page. Additionally working in plain text lends itself to working on mobile and tablet devices becoming more and more popular. As an example, this document is being created on an iPad with the “Writing Kit” application.

The goal is to create a document with the least amount of friction as it moves through the editorial and production process. Plain text captures the words, but suggestions for meaningful structure such as headings and emphasis on particular words must also be included. Once we have words and structure in a single document as plain text the destination of the document is completely flexible.

The Notion of an ÜberFormat

The flexible destination idea is so critical to the whole work flow described here. Documents created could go to one or many of the following possible situations:

  • Printed for distribution
  • Specialized document for reading by text to speech reading devices
  • Printed to Braille type
  • Distributed via digital book formats
  • A website
  • Integrated into a learning management system
  • A blog
  • Master document draft to go to a publisher

This list not complete because a text document can end up anywhere. The überformat should be thought of like this:

From a single semantically correct document a conversion process can deliver a variety of formats for screen display and print.

Semantically Correct – Structural Formatting That Carries Meaning

For the smoothest conversion to other formats but with all the structural meaning of the document proper markup should be in used to:

  • Create structured, nested sections with Headings
  • Provide means to place more importance on words or phrase. There are two levels: Strong and Emphasis.
  • Create lists that are either unordered (bullets), ordered (numbered), or combination of the two
  • Ability to organize tabular information in row/column tables
  • Designate blockquotes
  • Offer hyperlinking of words or phrases
  • Blocks of code
  • Horizontal Rules (separating lines)
  • Images

Where appropriate these items should have the appropriate “alt text” and “long description” to provide low vision users with options to gather meaning from the text.

Optimum Flow – How Can it Work?

The process of creating digital documents that provide the most flexibility with the least amount of labor to output in a variety of needed formats requires a commitment all along the creation path of the document. Any creation path and set of document formats can eventually create an überformat document but the goal is too make it as painless as possible for everyone involved and save time and money. Without this process goal the ability to scale up document production is only a dream.

The basic idea is the seamless transfer of documents between authors, editors, proof readers and final digital publishers. Editing a text file with appropriate markup as discussed above provides for the best process. The solution that has been used for the last year and a half in the eText.Illinois initiative is Markdown formatting. Markdown provides appropriate options for semantic structure and keeps authors and editors focused on the actual content and not the “making it pretty” trap that full blown word processors tempt the writer.

Markdown Overview

The creator of Markdown John Gruber is a well-known tech pundit, former programmer most notably with application BBEdit. As a writer, for both print and screen, he devised to ease his own workflow.

Philosophy (From John Gruber)

Markdown is intended to be as easy-to-read and easy-to-write as is feasible.

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters — including Setext, atx, Textile, reStructuredText, Grutatext, and EtText — the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.

To this end, Markdown’s syntax is comprised entirely of punctuation characters, which punctuation characters have been carefully chosen so as to look like what they mean. E.g., asterisks around a word actually look like emphasis. Markdown lists look like, well, lists. Even blockquotes look like quoted passages of text, assuming you’ve ever used email.

(via Daring Fireball: Markdown Syntax Documentation)

See the formatting for Markdown from John Gruber’s Markdown Syntax Page

The eText.Illinois Workflow

In a perfect world we would have authors writing, or at least delivering, their works to the editors as Markdown (or other similar lightweight markup languages). While the real word experience with three different text book projects has provided three unique document conversions.

  1. Introduction to Bioevironmental Engineering was a MS-Word document with well over 100 hundred scientific equations.
  2. ACES 101: Contemporary Issues in ACES was compiled from a all sorts of document sources including MS-Word, PDF, webpage text and newly written content in plain text.
  3. Writing @ The University of Illinois was delivered completely in HTML.

Learning from these three experiences the model eText.Illinois is moving forward to new projects is based on the following process flow:

Flow chart that traces the process documents take as they come from authors and then are prepared for the überformat. Once prepared the documents can be used to provide all possible publishing endpoints for print, web, reading devices and accessibility hardware and software readers.


The Introduction to Bioevironmental Engineering book required the conversion of MS-Word created scientific equations. Through much research and experimentation it was discovered that to have equations that can be properly displayed in a browser, ePub formatted eBooks, copyable and usable in advanced mathematics software such as Mathematica, and as source for reading by text to speech software and hardware multiple methods must be employed. There isn’t one magic format that allows publishing to all major web browsers with proprietary and limiting software plugins and applications. Specifics on the equation methods placed in HTML and digital will follow.

Of concern at this stage is the ability to extract equations from the MS-Word source document. These additional steps are added to the workflow above to convert MS-Equation Editor or MathType equations to LaTeX formatted equations that become the backbone for screen display as well as some speech to text.

Flow chart of steps converting MS-Word docs with equations created in MS Equation Editor to LaTeX (MathJax Flavor). Then wrap up by exporting the whole document as plain text.

Additionally eText.Illinois is recommending that written out text scripts of how the equation would be read is being gathered. This is basically a transcript in words of the the symbolic representation of the equation. As a test a Graduate Assistant provided this transcription for the Introduction to Bioevironmental Engineering digital textbook.

With All the Digital Textbook Noise I Thought I’d Share Our Solution

The Apple announcements, as expected, have stirred things up in the digital textbook discussion. I’ve been heavily involved with digital textbooks for the last 2 1/2 years and have learned numerous lessons, experienced pitfalls and worked toward fabulous instructional opportunities. I will continue to share those in this blog.

I thought I would share a brief video that discusses eText.Illinois, the digital textbook system we’ve developed in the College of ACES Information Technology and Communication Services.

Video Overview of the eText.Illinois Digital Textbook System

Apple’s Education Announcement

Education Announcement Image
Education Announcement Image

Lots of rumors swirling today – will the announcement be an iTunes-eque selling of textbooks from the major publishers -OR- self-publishing.

I’m hoping for both.

Self-Publishing – Power to the Local Author

eText.Illinois, the digital textbook system my group developed plays on the need for locally produced, short run (can you say that in the digital world?) textbooks. We use the ePub format at the core but the main consumption of the book is through our webapp. It is a big production getting the texts formatted and ready. Unfortunately and usually the books/docs are written in MS-Word and any export of course is a mess. We revert everything back to plain text.

I’ve come to love editing in Markdown (actually Multimarkdown since we need tables). Getting a cleanly marked up book is pretty straightforward, however it’s hard to tell a professor to write in a text editor. Will there be a slick and enticing Apple authoring tool? The buzz is teasing us with a “Garageband for e-books.” That would be cool and particularly if it creates clean, ADA accessible docs.

So Much for the Self-Publishing Angle. How About the Big Publishers?

There are so many textbook purchasing models available to students: purchased, purchase – sell back (used), rental, digital textbook access for specific amount of time, and purchase of digital textbooks with and without digital rights management. The model that the publishers don’t like is the used book’s purchase – sell back cycle, effectively removing the publisher from those additional sales and profits as books get used and sold over and over again. The publishers see digital sales, without resale, as their chance back into the profit game.

A sales model for textbooks similar to the iTunes music model with Apple operating the cash register, paying the wholesale publishers, and dealing with consumption devices like the iPad is a very nice model for all involved. Apple did transform the music industry in this way like it or not.

The interesting twist is that iTunes is not just about the major record labels, it’s got the indie labels and even the solo artist/act with a couple of records. These lower sales volume releases co-mingle in an online storefront with the big labels. The blending of all ranks of music business is effective and can drive sales for the little guy. (I know, I’ve played on and engineered quite a few records on iTunes.)

Can this mixture of big companies and little guys work with digital textbooks?

The missing piece from our eText.Illinois model is use of published, copyrighted content within a locally authored book legally but without all the hoop jumping of permissions. If our local authors and us as producers had a simple means to blend copyrighted content into self published eBooks I would be extremely excited. Companies like McGraw-Hill already have an “a la carte” book building process through their own web interfaces and formats. If Apple could provide the a la carte selection of copyrighted content in their authoring tool THAT WOULD BE A GAME CHANGER.

You have a course “reader” that contains a number of articles and reprints from books, journals, etc. The local copy shop gets permissions and prints it up for the students to buy and read. The professor/author may want to write introductions, add some glue between articles and a postscript. Maybe insert some assignments in between articles. Definitely value-added content.

While writing in this epublishing dream world I am describing Apple’s authoring tool would provide the ability to “browse” – like Apple’s iLife Media Browser – copyrighted content from the big publishers. Each piece of copyrighted content would have a unit cost that dynamically adds cost to the final book price. The “meter is running”; depending on what and how much the author finds and inserts into their book.

So the end price of the locally authored book is:

Author’s Price + Autocalculated License Fee to Publisher(s) = Textbook Cost

The student buys the book from the iAcademicBookstore (or whatever Apple calls it) and everyone is happy:

  • Pubishers get accurate licensing with no overhead in dealing with numerous small books.
  • Authors can use copyrighted content without worry of infringement. If it’s easy to use, the authors will use copyrighted content more liberally.
  • Locally authored books generate local revenue for author, department, college, etc. whatever is appropriate.
  • If the myth of digital books hold true, student costs should go down. (we’ll see)

Win, Win?