Hindawi's Approach to Metadata

The following article discusses and examines Hindawi’s approach to metadata, and the opportunities and challenges we and other publishers face.

Published content

By the time an article is published in final format, it consists of a typeset PDF and a HTML display both based on XML and an e-pub.

All our content is open access and anyone is able to download the PDF, the XML and/or e-pub. Hindawi adheres to JATS DTD and has done so since 2012, and we retrospectively updated XML of all our published content since 2008. We ensure our content is machine-readable by adhering to JATS4R (Journal Article Tag Suite for Reuse) standards within our XML. By offering reusability, readers are able to exchange, share and store scholarly content from our 250+ journals with limited obstacles both now and in the future. More information can be be found here.

Hindawi’s entire corpus can be downloaded in a single zip file package here. We also feed content to a variety of indexers to improve discoverability.

New Metadata

Common metadata fields are requested during our single page submission process, these include title, author names and affiliations. Whilst we focus our efforts toward open science, new metadata requirements have become worthwhile to collect from our authors. These fields include data availability statements, funding statements and unique identifiers such as ORCIDs (which Hindawi has a mandate for.) Hindawi balances our exploration of open science solutions whilst making sure our submission process is simple and easy to use, and that we provide an excellent service for our authors.

Whilst the new requirements offer more openness to the community they can be technically difficult to capture. Making it mandatory for authors to provide information beyond the bare essentials can sometimes create delays while an author updates their manuscript.

Hindawi’s approach to this challenge can be seen in our requirement for authors to include a data availability statement with their research, which was launched December 2017. Whilst we make it mandatory to include a data availability statement, providing access to the data is optional.

Another example for Hindawi is requiring all corresponding authors to have a registered ORCID iD associated with their Hindawi account in order for it to be added to the publication. In an attempt to reduce delays at the publication stage this requirement is mentioned to authors on submission, then during the peer review process and finally upon acceptance. However, it is not mandatory at these stages. If the corresponding author has not supplied their ORCID iD by the time publication is ready the article is held until this data is provided to us by the author.

While we have noted some delays, we’ve found authors have been eager and open to adding an ORCID iD for publications. Complaints are very rare.

There is also the question of how the new data is presented to ensure machine readability. When new requirements such as these arise the JATS4R group conducts a stakeholder group to produce new guidelines, often resulting in new tag requests for JATS. In the meantime the publishers need to select an interim solution before the longer term proposal is ready.

Updating metadata

Once an article is published it’s not uncommon for some of the metadata to be updated, such as a misspelled author name. These updates take form in an errata or corrigenda, and indexers have to be informed of the change. However, if readers have downloaded the content prior to the update they may not realize a change in the metadata has been made.

Should the metadata remain unchanged or be updated when necessary over time? An author will publish with their email address attached to their article, but it won’t be updated when they move to a new institution with a new email domain. By having more dynamic ways of recording and updating metadata information it could mean metadata remains relevant over a longer time period, perhaps by linking a users ORCID account rather than an email address.

Next steps

It is these discussions that are just part of what Metadata 2020 is concentrating on. Various projects are proposed with the aim of improving metadata in numerous ways for many different stakeholder groups – and Hindawi is thrilled to be a part of this journey.

About the author

Craig has over 15 years of experience in scholarly publishing in various editorial, production and product management roles at Hindawi, BioMed Central, SpringerNature and the BMJ. Craig is currently Director of Operations at Hindawi managing the peer review, production, and customer services processes.

Craig has been involved in various working groups with ORCID and sits as a current member of the CrossRef board.

Get in touch with the Metadata 20/20 team by email.