Page 1 of 1

Help needed in putting Speleology back-issues online

Posted: Tue 04 Mar 2014 11:56
by David Gibson
Help needed in putting Speleology back-issues online

Summary: We want to put Speleology back-issues online (probably for free access). It would be helpful if someone could first type-up a list of contents of each issue, together with the articles' standfirst text, for our database. A related task is to re-compile the PDFs to an ebook size and to chop them up into individual articles; naming them in accordance with our database convention. For this task you will need Acrobat, or an equivalent program that can re-distil PDF files.

Details: Database
The database format for our online content is a set of plain text files in a format based closely on Endnote. The specification is online but, as a first step, it is probably simpler to ask someone to assemble the data in a tabular form, either in Excel or as a table in MS Word. I would strongly suggest that you use a table in Word, for preference, because it is far easier to edit, format and view than in Excel. For each issue of the magazine, we need a separate table with columns as follows...
  • Page range (e.g. 7-12)
  • Article type (Feature or left blank)
  • Title of Article
  • Authors
  • Standfirst Text
This data is not intended to be human-readable on our web site; instead it is going to be interpreted by a set of programs in order to display a set of formatted web pages. It is therefore absolutely essential that it is in exactly the correct format for a machine to interpret correctly but, unless you want to work your way through our specification, the easiest thing is probably for you to work on one issue of Speleology, and then I will explain how what you've done needs to be tweaked to match the specification.

The bulk of the work can probably be done by pasting text from the PDFs (which can be made available to you) to get the article titles and the standfirst text (that is, the summary text that comes after the title of the article). The salient point is that the text must be HTML-safe, so you will need to weed out any non-safe characters or escape them using an HTML entity. (But this could probably be done as a secondary exercise – the main point is to have the text in a table, in an editable format).

Details: PDFs
A related task is to re-compile the PDFs to an ebook size and to chop them up into individual articles; naming them in accordance with our database convention. For this task you will need Acrobat, or an equivalent program that can re-distill PDF files. Not really much more to say about that.