Publishr

The new face of the publishing industry

Mountains of Data by P. Bradley Robb

Touch a word and gain real-time, global context - that’s the promise of the connected, next-generation eBook. That promise highlights major difference between print and digital books. The advance isn’t in the file format, but rather the level of access, the malleability of a story, the interconnectedness of everything. And the means to achieve that future isn’t through transmedia, or the iPad, or even a radically new file format. No, the key to that future is metadata. Mountains of metadata. 

Metadata, data about data, was a mainstay of publishing long before the term was invented. In a purely theoretical sense, the first story ever composed created the first metadata. Since then, humanity’s been amassing a web of connected stories and a means to sort and classify the. Things like title, author, and publication date are perfect examples of current metadata - these disparate pieces of information don’t necessarily add anything to the story, but they help identify where a book belongs in the greater context of storytelling.

As publishing evolved into a proper industry, the metadata associated with a book likewise evolved into a standardized and efficient system. By the twentieth century, sales catalogs had created a largely standardized system of metadata broken down into a half-dozen or so broad categories. The twentieth century system of metadata is perfect for classifying books in groups, but rather lacking for comparing and expanding on specific books. 

The real hangup of twentieth century metadata is that it operates in a polarized fashion. Certain metadata, like title, author, or ISBN, is really specific and grants little insight into a piece of writing. Other pieces of metadata, like publisher or BISAC Subject Headings, are broad but often gloss over the uniqueness of a particular work. 

The future of metadata, the connected and exceptionally granular variety, holds the hope of finding broad connections through specific fields. The method to that madness involves metadata derived from the inside of a story – the minutia that can be easily overlooked as pieces of a whole. When those pieces are extracted and connected, the Law of Truly Large Numbers (or as we’ve taken to calling it in marketing circles Chris Anderson’s “The Long Tail”) takes over. 

Moving from theory into potential, imagine reading a young adult book where the protagonist is a twelve year-old girl. Under the current system of metadata, finding another book where the lead character is another twelve year-old girl involves a great deal of digging – you can narrow the field with the BISAC, but finding a characters specific age is going to require some elbow grease and a great deal of digging. However, with interior, specific metadata, a few taps can bring up an assortment of similar books. 

Interior, specific metadata also allows for comparison and contrast reading. Imagine, if you would, a modern book on the Spanish-American War. Using our future metadata system, finding a book on the same subject written during the war is an easy task. Fully leveraging the system could combine the two books – side-by-side – showing the differences. 

Like the James Arnold Ross character in Upton Sinclair’s Oil! and want to find other books about oil prospectors, or prospectors in general? If the characters occupation is included in the metadata, you’re just a tap away.

And interior, specific metadata can expand beyond the realm of storytelling, and into the greater information sphere. Imagine a self-guided tour of Europe, following in the footsteps of Hemingway’s The Sun Also Rises. Or tracing Jack London’s Call of the Wild. Or fact checking a former politician’s memoir. 

Interior, specific metadata opens a book up in ways that books have never before experienced. Whether a book is made more topical through interconnectedness or merely exposes a reader to immensely scalable cross-sales, the upshot of specific connected metadata is clear: metadata makes the contents of books more important. Metadata increases the value of eBooks by making them more than books. Metadata can turn an eBook into a discovery portal, a cornerstone for information tangents.

The adoption of interior, specific metadata hinges on one very large step – getting metadata right. As pointed out by a slew of forward-thinkers (Mike Cane, Kassia Krozser, Laura Dawson) though the current state of eBook metadata has been standardized in ONIX, the actual quality of that data has been somewhat lacking. The shift towards interior, specific metadata is an exponential increase over not only the current state of metadata, but also the size of an eBook.

Let me be blunt – in the future, books will be more metadata than content. 
Twitter’s recently launched Annotations program allows 512 bytes of data to be attached to a tweet (a 140 byte packet of information) and will expand to 2048 (2 kB) in the near future. This relationship – a roughly 14:1 metadata-data ratio – might sound a bit extreme where eBooks are considered, but a 3:1 or greater ratio is not out of the question.

Crafting that metadata is going to be the great challenge, the question then becomes, who will do the work?

As the engines that turn manuscripts into published books, publishers have a powerful opportunity to turn metadata generation into another value add. Tacking interior, specific  metadata onto editing and career fostering can help traditional publishers remain as the purveyors of legitimate story. It won’t be glamorous, nor will it be easy, but moving an arguably analog manuscript toward its digital destiny is a herculean task that begs for an established bureaucratic system. 

The other path is to crowdsource internal, specific metadata generation. Leaning on the long tail, a book with even limited appeal made available to those with interest is going to garner support from those willing to bash out proper metadata markup. This system of fan supported creation has been widely leveraged by things from fan subtitling of Japanese animation to the creation and upkeep of Wikipedia. The system is neither perfect, nor certain, but it is cheap. 

As we move further into the future of publishing, the generation of internal, specific metadata will become more important. As evident in the iPad, eBooks as a medium are not competing with other eBooks, but instead with other mediums – audio, interactive, and video – for the attention of the audience. Of the five major sources for infotainment – news, film, music, print and video games – the print storytelling media is primed to move into the first truly digital realm. Unlike film, music, and video games, print storytelling properties are easily compared between each other thanks to a standardized method of delivery – words. And unlike the news media, print storytelling isn’t focused on the immediate, but rather on the lasting. 

All we need to do is get metadata right. It won’t be easy, but metadata-enabled eBooks will be the true shift from analog to digital, and represent the greatest change in publishing since Gutenberg.  

P. Bradley Robb is a writer who currently blogs at Fiction Matters

blog comments powered by Disqus