Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Nature PDF RDF #374

Open
simonster opened this issue Jun 3, 2012 · 0 comments

Comments

Projects
None yet
2 participants
@simonster
Copy link
Member

commented Jun 3, 2012

Nature PDF XMPP looks like

<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-13, framework 1.6'>
    <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:iX='http://ns.adobe.com/iX/1.0/'>
        <rdf:Description rdf:about='uuid:31241afd-6bf6-4f20-80e4-499b827be551' xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='Adobe PDF Library 9.9'>
        </rdf:Description>
        <rdf:Description rdf:about='uuid:31241afd-6bf6-4f20-80e4-499b827be551' xmlns:xap='http://ns.adobe.com/xap/1.0/' xap:CreateDate='2011-03-11T14:48:06+05:30' xap:CreatorTool='Adobe InDesign CS5 (7.0.3)' xap:MetadataDate='2011-03-16T16:18:13+05:30' xap:ModifyDate='2011-03-16T16:18:13+05:30'>
            <xap:Label>Nature Reviews Neuroscience 12, 217 (2011). doi:10.1038/nrn3008</xap:Label>
            <xap:Identifier>
                <rdf:Bag>
                    <rdf:li>doi:10.1038/nrn3008</rdf:li>
                </rdf:Bag>
            </xap:Identifier>
        </rdf:Description>
        <rdf:Description rdf:about='uuid:31241afd-6bf6-4f20-80e4-499b827be551' xmlns:xapMM='http://ns.adobe.com/xap/1.0/mm/' xapMM:DocumentID='uuid:174bf0e1-72a0-4cc5-a0ad-ecbf5c30049a' xapMM:InstanceID='uuid:8d297de4-0e7f-4fd1-b628-e17b518f08dd'/>
        <rdf:Description rdf:about='uuid:31241afd-6bf6-4f20-80e4-499b827be551' xmlns:xapRights='http://ns.adobe.com/xap/1.0/rights/' xapRights:Marked='True'>
        </rdf:Description>
        <rdf:Description rdf:about='uuid:31241afd-6bf6-4f20-80e4-499b827be551' xmlns:prism='http://prismstandard.org/namespaces/basic/2.0/' prism:copyright='© 2011 Nature Publishing Group' prism:doi='10.1038/nrn3008' prism:eIssn='1471-0048' prism:endingPage='230' prism:issn='1471-003X' prism:number='4' prism:publicationName='Nature Publishing Group' prism:rightsAgent='permissions@nature.com' prism:startingPage='217' prism:volume='12'>
            <prism:publicationDate>
                <rdf:Bag>
                    <rdf:li>2011-04-01</rdf:li>
                </rdf:Bag>
            </prism:publicationDate>
            <prism:url>
                <rdf:Bag>
                    <rdf:li>http://dx.doi.org/10.1038/nrn3008</rdf:li>
                </rdf:Bag>
            </prism:url>
        </rdf:Description>
        <rdf:Description rdf:about='uuid:31241afd-6bf6-4f20-80e4-499b827be551' xmlns:dc='http://purl.org/dc/elements/1.1/' dc:format='application/pdf' dc:identifier='doi:10.1038/nrn3008'>
            <dc:creator>
                <rdf:Seq>
                    <rdf:li>Dwight J. Kravitz</rdf:li>
                    <rdf:li>Kadharbatcha S. Saleem</rdf:li>
                    <rdf:li>Chris I. Baker</rdf:li>
                    <rdf:li>Mortimer Mishkin</rdf:li>
                </rdf:Seq>
            </dc:creator>
            <dc:description>
                <rdf:Alt>
                    <rdf:li xml:lang='x-default'>Nature Reviews Neuroscience 12, 217 (2011). doi:10.1038/nrn3008</rdf:li>
                </rdf:Alt>
            </dc:description>
            <dc:publisher>
                <rdf:Bag>
                    <rdf:li>Nature Publishing Group</rdf:li>
                </rdf:Bag>
            </dc:publisher>
            <dc:rights>
                <rdf:Alt>
                    <rdf:li xml:lang='x-default'>&#xA;        © 2011 Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.</rdf:li>
                </rdf:Alt>
            </dc:rights>
            <dc:title>
                <rdf:Alt>
                    <rdf:li xml:lang='x-default'>A new neural framework for visuospatial processing</rdf:li>
                </rdf:Alt>
            </dc:title>
        </rdf:Description>
    </rdf:RDF>
</x:xmpmeta>

From this, we currently extract

    'itemType' => "journalArticle"
    'creators' => [
        '0' => {
            'firstName' => "Dwight J."
            'lastName' => "Kravitz"
            'creatorType' => "author"
        }
        '1' => {
            'firstName' => "Kadharbatcha S."
            'lastName' => "Saleem"
            'creatorType' => "author"
        }
        '2' => {
            'firstName' => "Chris I."
            'lastName' => "Baker"
            'creatorType' => "author"
        }
        '3' => {
            'firstName' => "Mortimer"
            'lastName' => "Mishkin"
            'creatorType' => "author"
        }
    ]
    'notes' => []
    'tags' => []
    'seeAlso' => []
    'attachments' => []
    'itemID' => "uuid:31241afd-6bf6-4f20-80e4-499b827be551"
    'title' => "17"
    'publicationTitle' => "Nature Publishing Group"
    'rights' => "© 2011 Nature Publishing Group"
    'volume' => "12"
    'issue' => "4"
    'number' => "4"
    'patentNumber' => "4"
    'pages' => "217-230"
    'date' => "11"
    'ISSN' => "1471-003X"
    'DOI' => "10.1038/nrn3008"
    'url' => "_:n12"
    'extra' => "14"

I'm not sure why the title and URL come in an rdf:Bags, but maybe we should try to deal with it. Then again, while this contains most of the vital metadata, it also lists the prism:publicationName as "Nature Publishing Group", which is clearly wrong, so maybe we'd be better off resorting to DOI lookup in these cases.

@aurimasv aurimasv added Minor and removed New Translator labels Nov 4, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.