Tagging, mark-up and XML
From the first days of DTP to tomorrow's XML-based repurposing, Tom Arah explores the central importance of the mark-up tag.
It's established wisdom that Desktop Publishing (DTP) and the whole computer-based design industry that followed came into being thanks to the advent of the Mac and the PageMaker application that ran on it. As a contemporary PC/MS-DOS user I have to admit to many twinges of envy when comparing platforms back in the mid to late 80's, but I never felt the same when comparing applications. To my mind the PC-based design leader, Ventura Publisher, was easily a match for PageMaker.
The wysiwyg PageMaker is credited for inventing the whole concept of DTP.
What made both applications revolutionary was that they were built on the work done in Xerox's Palo Alto Research Centre (PARC) on Graphical User Interfaces (GUIs). Previously typesetting machines had acted much like glorified typewriters with the operator inputting both text and formatting codes to produce galleys of typeset text which then needed to be cut and pasted to produce the final design. With PageMaker under Mac OS and Ventura Publisher under an integrated version of GEM (Digital Research Inc's pre-Windows Graphical Environment Manager), the end user was able to work on a wysiwyg version of the actual page changing the layout, content and formatting onscreen.
However, whereas PageMaker totally embraced this new visual, hands-on, approach, Ventura Publisher kept a foot in both camps. In particular it retained much of the old typesetting system based on the mark-up of text within its new GUI-based environment. There were two crucial differences. First, rather than keying in escape codes to the stream of text, you applied "tags". Second, unlike the typesetter's fixed codes, these tags and their formatting were entirely in the hands of the user. This distinction between customizable, logical tags and fixed formatting codes was - and is - fundamental.
To begin with, tags are far more efficient. Whereas a typesetter operator has to input separate codes to separately switch on and off font, point-size and leading changes, by creating a single <heading> tag with the same attributes you can apply all formatting options with one click. In addition the use of tags guarantees consistency and accuracy, whereas physically entering codes can easily lead to costly mistakes that only become apparent when the galleys are produced.
What really makes tags powerful is their flexibility. This might seem strange as the requirement to create a new tag every time you want to apply new formatting seems anything but adaptable. However the beauty of tags is that, because they are logical constructs rather than physical codes, you can change what they mean at any time. You can instantly change the typeface of all your paragraphs that have had the <heading> tag applied to them, for example, or tweak your master <body> tag's point size to ensure that your copy fits into its layout - and this is as true if your document is one or a thousand pages long.
At this stage many readers are probably thinking "what's all the fuss about?" It's not as if Ventura is the only program that provides such efficiency, consistency and flexibility - nowadays every DTP and WP application on the planet offers the benefits of "style"-based handling. It's true that they've imported the general idea but, within their proprietary and binary documents in which content and formatting are inseparably mixed, such style handling is little different from master escape codes that you can easily find and replace. A Ventura Publisher tag is completely different to a word processor style.
To begin with, Ventura Publisher took the idea of what a tag could do and ran with it. On top of the obvious text and paragraph formatting, it offered control over more advanced features such as background colour and ruling lines. It also managed page layout as well as text formatting through features such as forced breaks and the ability to line up paragraphs across the page. Later versions added tag-based control of multi-page tables, frame tags, page tags and even tag-based processing in the form of vertical justification settings that intelligently distributed vertical spacing to ensure aesthetically pleasing layouts. In other words anything and everything was handled by tags. The end result was that, after creating the tags in the first place, the actual design and layout of the publication became almost automatic.
Ventura has always been built upon tags.
Because layout as well as formatting was controlled by tags this meant that, unlike PageMaker with its cut-and-paste approach, the text content in a Ventura Publisher publication could remain as a single continuous stream. And within this text stream there was no direct formatting, instead all paragraph tags were simply marked up with "@TAGNAME = " at the beginning of the line while local tags were marked up with the now familiar right-angle brackets <>. With such a simple underlying mark-up system there was no reason that the text shouldn't be stored as simple ASCII (American Standard Code for Information Interchange) - which is exactly what Ventura Publisher did.
Ventura Publisher's tagged ascii text stream has no in-built formatting.
What this meant was that the marked-up text content in a Ventura Publisher publication remained live. Each time the publication was opened, the component TXT file was re-imported and laid out and formatted on the fly - a process that thanks to the efficiency of tagging still managed to be almost instantaneous and a lot quicker than PageMaker. And crucially, because the live text was stored in an open standard format, it could be edited with any other application that could read and write ascii text.
In my case this meant that I could not just create, but also edit the content in a laid-out Ventura Publisher document at any time using my word processor of choice, PC Write. More than this, because the markup tags were also there in the TXT file, I could also edit these and so control the final typeset formatting and layout. The end result was that I actually did more design work in PC Write than in Ventura! An even more efficient workflow for a regular price list involved setting up export macros in a dBase file to add tagnames and codes for tickable boxes. Once set up, all I had to do every month was export the new TXT file, copy it over the old and load Ventura Publisher ready for output (the sort of workflow Ventura later addressed directly with its dedicated Database Publisher add-on).
By keeping content separate from style, it's not just the content handling that becomes much more efficient. In the same way as it kept all content as linked TXT files, Ventura Publisher kept all style information in a separate STY stylesheet file. Disappointingly this STY file was in binary format so not easily editable (though a number of third-party utilities soon appeared to do the job). However it did mean that by swapping stylesheets containing different settings for the same logical tags, you could change the look and feel of an entire document - and do so easily and immediately.
In effect the TXT (and linked image files) acted as the ingredients in a Ventura Publisher publication with the STY stylesheet acting as the cooking instructions. Bringing them together in the final recipe were the CHP chapter files (effectively ascii lists of TXT, IMG and STY files) and the master PUB publication file (an ascii list of CHP files). The beauty was that all elements were open and editable so that you could change the ingredients, instructions and/or recipe at any time to instantly cook up an entirely new production. In other words, Ventura Publisher offered repurposing long before the idea was appreciated.
And that was the problem - Ventura Publisher was way ahead of its time. The open workflows and ability to repurpose were largely irrelevant to most users getting into DTP for the first time and wanting to produce, say, an advert. For these users there was no benefit to editing text in an external application while the need to produce, specify and then apply complex tags for one-off formatting was a huge chore. Worse, Ventura Publisher's multiple file structure was a nightmare for file management - archive just the CHP file to floppy and when you come to re-open it again all you have is a list of missing ingredients! And finally what's the point of repurposing? For most jobs you're only likely to go to print once so why not just get it right and do it in a simple, standalone, hands-on environment like PageMaker?
When PageMaker moved over to the PC's new Windows environment, the writing was on the wall for the GEM-based Ventura Publisher. And things got worse. After a disastrously buggy move to Windows, Ventura was eventually bought up by Corel (which incidentally had begun life making Ventura add-ons). The company completely misunderstood the program's strengths and immediately cut back on the use of tags in favour of direct formatting and bundled all the previously separate and open publication components into a single binary and proprietary VP file. Eventually Corel pulled things together and the most recent version 8's combination of design-intensive and long document handling is impressive. Sadly it came too late for most users. They had already jumped ship.
Most of the design-oriented users turned to PageMaker and QuarkXPress, but for those long document publishers who had really profited from Ventura Publisher's tag-based approach, there was really only one alternative - FrameMaker. Developed by Frame Technology as a cross-platform long document technical publisher, FrameMaker was a high-end niche program most at home on Unix mainframes. When Adobe bought out the company, however, FrameMaker looked ready to go mainstream.
The latest FrameMaker 7 includes SGML and XML tag support.
Unlike Ventura Publisher, FrameMaker didn't keep content and style information separate but combined them in its own proprietary and binary FM file format. Apart from this though it took the single text-flow with mark-up principle to heart and even extended the idea of tagging into new areas such as conditional tags that enabled the management of multiple versions of the same publication from the same core content. In its dedicated FrameMaker+SGML incarnation it even offered support for SGML (Standardized General Markup Language) the original markup language for creating other markup languages - tagging on steroids.
What really impressed about FrameMaker was its recognition of the importance of repurposing and its understanding of where this was truly relevant - not for print but for onscreen delivery. With its development of the ascii-based and so platform-independent MIF (Maker Interchange Format) file format and of the FrameViewer browser utility it effectively created its own hypertext-based electronic publishing system.
Like Ventura Publisher, FrameMaker was ahead of its time, but of course all attempts to create a proprietary electronic publishing system were blown out of the water by the arrival of the Worldwide Web. What made the Web so successful was the fact that it too was built on the principles of a logically tagged stream of text, this time based on Tim Berner-Lee's HyperText Markup Language (HTML). Using just a handful of tags - <H1>, <H2>, <EM>, <ADDRESS> and so on - embedded in simple non-binary text files, HTML enabled any user on any platform to produce content. And because the tags were logical and so free to interpret, developing a viewing application was relatively trivial which led to the advent of the free browser.
With universal and near-free authoring and universal and near-free access, the Web exploded to become a publishing medium just as important as print. And by so doing, it created a real demand for repurposing -designing for paper alone was no longer enough. Suddenly the limitations of the cut-and-paste, hands-on formatting of the PageMaker-style applications was exposed as a dead-end: great for design-intensive print but near useless for Web output. By comparison the tagged text stream approach of Ventura and FrameMaker were tailor-made for Web output. All you had to do was translate from one mark-up based system to another.
The tagged text stream approach is well suited to HTML output.
Unfortunately both Adobe and Corel, with their heavy design-for-print investment, were slow to see the importance of the Web and were too busy on other projects to fully capitalize. In any case there was a more fundamental hurdle to overcome in the very nature of HTML. Essentially the two factors that made HTML such a success also came to be its biggest limitations. First, HTML's limited number of in-built tags simply doesn't provide enough scope or flexibility to cover all publishing needs. Second, the way that tags are left open to interpretation by the browser, means that it is impossible to pin down your design with any certainty. Compared to print design, designing for HTML is like trying to sculpt jelly with one hand tied behind your back.
Fortunately the solution is already clear. XML (eXtensible Markup Language) tackles both problems head on. To begin with the tags in XML are extensible and fully customizable. Like SGML, XML is a markup language for creating other markup languages so the user can create their own tags as needed. Second, the way that these tags can be handled and interpreted is strictly laid down and enforced. This is crucial for XML's role in machine-readable data handling and e-commerce but also for publishing. In fact an XML file itself has no formatting whatsoever - once again it's straight text this time in the superior ascii replacement Unicode format - but the way its tags are interpreted and appear onscreen and in print can be precisely controlled through a stylesheet.
A tagged XML file with and without formatting.
This all sounds very familiar as the parallels to the old Ventura Publisher system are clear. Replacing Ventura's tagged TXT content file is the tagged XML file (in fact thanks to its strictly-defined rigour the XML file can also be seen as replacing the original dBase file). And replacing Ventura's STY style is the text-based and open CSS (Cascading Style Sheet) standard which is in turn being replaced by the XML-compliant XSL (eXtensible Stylesheet Language). And standing in for the master CHP and PUB files to control how the elements of the publication can be brought together is the DTD (Document Type Definition) file - again being rewritten to be made XML-compliant. In fact with SVG (Scalable Vector Graphics), even the graphics can be handled in XML format!
In other words XML takes Ventura Publisher's recipe-based approach built on the central idea of extensible logical tags and takes it to its logical conclusion. With XML everything is stored and handled as marked up text - the text content, formatting information, underlying rules and even the graphics. It's a hugely powerful idea that promises to lead to the nirvana of open, integrated and automatic multi-channel publishing workflows.
And now, at last, all the major publishing developers have woken up to the potential. The recent QuarkXPress 5 again demonstrated the limitations of HTML output but, by bundling the XML-based avenue.Quark XTension, it also showed that it has its eye on the future. InDesign's HTML output is similarly weak, but it too offers the ability to tag and view XML structure as well as to import and export XML. The latest FrameMaker 7 has gone even further integrating the previously separate +SGML add-on and offering the ability to export XML and CSS files and even to open up XML files which it has previously created - so-called "roundtrip XML".
These are big advances but even FrameMaker 7's round-tripping isn't a complete solution as, while you work on the publication, the content and style are still inextricably intertwined in the closed and binary FM format. When Tim Bray one of the co-editors of the XML specification talked about the inspiration behind the development of XML he said that "The enemy, explicitly was Microsoft Word, FrameMaker, Quark - all these fragile, proprietary binary file formats that lock up the inventory of human knowledge." It looks like it's a lesson still to be learned. To really get the benefits of XML-based workflows everything should remain in XML at all times and be brought together for output on the fly much as Ventura Publisher managed its content 15 years ago.
Who knows, maybe it could still happen. There are even rumours that Corel has finally seen the light and is hoping to bring Ventura back from the brink with a new version majoring on XML. While that would certainly be welcome I won't hold my breath. And in the meantime I can't help wondering how much more advanced the publishing industry - and IT as a whole - would be today if only Ventura Publisher had first appeared on the Mac.
|designer-info.com: independent, informed, intelligent, incisive, in-depth...|
All the work on the site (over 250 reviews, over 100 articles and tutorials) has been written by me, Tom Arah It's also me who maintains the site, answers your emails etc. The site is very popular and from your feedback I know it's a useful resource - but it takes a lot to keep it up.
You can help keep the site running, independent and free by Bookmarking the site (if you don't you might never find it again), telling others about it and by coming back (new content is added every month). Even better you can make a donation eg $5 the typical cost of just one issue of a print magazine or buy anything via Amazon.com or Amazon.co.uk (now or next time you feel like shopping) using these links or the designer-info.com shop - it's a great way of quickly finding the best buys, it costs you nothing and I gain a small but much-appreciated commission.
Thanks very much, Tom Arah
[DTP/Publishing] [Vector Drawing] [Bitmap/Photo] [Web] [3D]
[Articles/Tutorials] [Reviews/Archive] [Shop] [Home/What's New]