Digital Library and Archives received some responses to one of our previous blog posts, Automated Repository Deposit and Messy Metadata. These responses were sent via email because blog comments were not enabled at that time. Here are most of the comments. Please feel free to comment on the blog post if anything is missing.
So looking at the metadata record that’s missing an author and after pulling up the pdf object I’m wondering what kind of program Mendeley uses to pull citation information out of pdf files, and if something similar can be used on the BioMed articles. As anyone who uses Mendeley knows, the generated citation doesn’t always work, but something like that might be able to pull out likely first/last names which could then be checked against a list of VT faculty. Subjects are tricky–wouldn’t it be nice if there was some kind of compendium of likely keywords that, if found in an article, could generate a very general subject heading or headings?
Hi. The metadata, including authors is in the metadata in the pdf,
http://vtechworks.lib.vt.edu/bitstream/handle/10919/18653/1471-2105-13-S5-S4.pdf?sequence=2. We might be able to use XMP, http://www.adobe.com/products/xmp/, to extract the data from the PDF (and also to write metadata to PDFs, if desired). The XML file for this article,
/18653/1471-2105-13-S5-S4.xml?sequence=1, is not tagged like the other, better files such as http://vtechworks.lib.vt.edu/bitstream/handle/10919/18724/1471-2105-13-S11-S2.xml?sequence=1, so authors can’t be extracted from it.
I’m interested in working on this with you, but I don’t have any immediate suggestions until I learn more about how this works.