I have written a wrapper to expose openlibrary.org api data as RDF. It is written in Python and deployed on Google App Engine. It is only an illustration of Linked Data publication, and in order to get a first feedback, so do not rely on it in your application since the URL or the content will change in the future.
How to use it :
- The wrapper is deployed at http://olrdf.appspot.com
- There are 2 ways to call it :
- with “/isbn/<an_isbn>”, give the ISBN (ISBN 10 or ISBN 13) of the book you want to to get data for. Example : http://olrdf.appspot.com/isbn/2070394433
- with “/key/<openlibrary_key>”, give the Open Library key of the book you want to get data for. Example : http://olrdf.appspot.com/key/b/OL5218098M
- The service uses content negotiation, so if you browser does not support “application/rdf+xml” content-type, you will be redirected to the corresponding openlibrary page for the book. To avoid that :
- either install the tabulator firefox extension
- or add “.rdf” at the end of the URL; this turns off content negociation, and always returns raw RDF data. Examples :
Technical information :
- It is written in Python;
- It is deployed in Google App Engine;
- It uses and depends on the following libraries :
- The Google App Engine webapp API
- SimpleJSON
- rdflib
- SPARQL Wrapper, modified to work around google app engine python limitations (uses the fetch API instead of urllib2)
Linked Data Sets :
The following data sets are referenced in the data, with the following heuristics :
- The wrapper references itself, in properties such as “authors”, or “rdf:type”.
- heuristic : the author or type open library key is appended to the root URL of the wrapper : “http://olrdf.appspot.com/key/”
- The lingvoj dataset, for languages
- heuristic : the openlibrary language key (”/l/eng” for exemple) is parsed, the 3 letter language code is extracted, and an attempt is made to find the corresponding 2-letter language code, with an hardcoded mapping table. The 2-letter code is then appended to the root URI “http://www.lingvoj.org/lang/”. If not found, the 3-letter code is appended to the same root URI.
- The RDF book mashup, in a “owl:sameAs” property
- heuristic : append the ISBN10 to the URI prefix “http://www4.wiwiss.fu-berlin.de/bookmashup/books/”
- (plus it used to call the lcsh.info sparql endpoint to try to match the “subject” property to a corresponding Library of Congress Subject Heading. However this service has recently been shut down…)
Ontology and properties used :
The following properties are used in the generated RDF :
- Dublin Core (namespace “http://purl.org/dc/elements/1.1/”) : publisher, subject, etc.
- Dublin Core terms (namespace “http://purl.org/dc/terms/”) : alternative, tableOfContents, format, modified, etc.
- Bibliographic ontology (namespace “http://purl.org/ontology/bibo/”) : edition, authorList, oclcnum, sibn10, sibn13, etc.
- Plus of course, rdf, rdfs and owl.
- For all properties not falling in one of those vocabularies, a property in the wrapper namespace (”http://olrdf.appspot.com/key/“) is generated. No formal description of this ontology is provided.
An exhaustive description of how the Open Library data is mapped to RDF properties can be found on this google spreadsheet.
Feedback on this work is more than welcome. The python code is available on request. Potential improvements are :
- linking subjects to another controlled vocabulary now that lcsh.info is dead;
- linking authors to DBPedia;
- linking countries to geonames;
- defining a clear ontology of properties;
- etc…
janvier 7, 2009 à 2:04 |
Well done Thomas! How did you like using google app engine?
janvier 7, 2009 à 2:11 |
Thanks. Well, the GAE development environement is kind of buggy, the limitation on URL request is frustating when you need to do SPARQL queries all around, and it works only in Python, not Java, but overall it is easy to deploy your app. I haven’t tried the other services, like google base, that are also adding value to the service.
janvier 18, 2009 à 6:47 |
Hello,
I would be very interested to look at your modified SPARQL Wrapper. Would you be willing to share your code with me for educational purposes?
Thanks in advance.
janvier 19, 2009 à 1:57 |
Sure, but the email address I got with your comment (dupe-2…@gmail.com) is invalid. Could you post a comment agin with another valid address ?
janvier 19, 2009 à 7:08 |
@Thomas
oops! I didn’t know that my profile was out of date. It’s fixed now. but just in case:
david at hadto dot net
février 20, 2009 à 11:31 |
Hi Thomas,
great work – thanks for blogging that! Did you also find the AppEngine datastore very enticing regarding storage of the ontology triplets?
And I’m very interested in sharing your code. Could you pass me a copy? Did you decide on the license?
Cheers!
Georg
juillet 1, 2009 à 5:10 |
Hey Thomas – this is so very interesting! I wonder if you would mind sharing your code with me? Thanks in advance. I’m also interested in the app engine datastore – what do you think of it?
Ken
kirby at k2group dot com
juillet 1, 2009 à 11:07 |
Hi – I sent you the code by email. I cannot give any advice on the app engine datastore since this wrapper converts data on the fly and does not store anything – so it does not use the datastore.
Thomas