> Comment #47362
Project Gutenberg (
) would be a great source for classics, there is even a GitHub user who developed APIs to interact with it:
The problem is, on Project Gutenberg's front page they explicitly deny the use of the site to anything automated, and I figure "automated" might also apply to API calls.
They do offer the catalogue in RDF/XML file though, GNU-GPL licensed and updated nightly:
The complete Project Gutenberg catalog is available in RDF/XML Format. It is licensed under the GNU General Public License.
This file is a tar archive that contains one RDF file for each book. The RDF is based on the DCMI recommendation.
It contains a lot of information beside the url of the book in its several formats: for example the wikipedia page about the author and the year of death, alongside information about copyright (e.g. in which countries it has expired).
Doing a test search for Moby Dick, the Project Gutenberg book URL displayed in the results contains the numeric ID of the book in their catalogue.
With that, I figure it would be possibile to a) access the actual reading material on PG website (the various file formats - html, zipped html, epub etc.- all have consistent URL patterns) or b) query the aforementioned RDF/XML catalogue to present the user with all the information contained there, including the links to the book.
a) Generating the links to the various file formats would be easy enough, but it's not very future-proof: it will come apart when/if the maintainers at Project Gutenberg change the URL pattern. This doesn't seem a likely event for now, but one never knows :)
b) Querying the catalogue means it needs to be stored somewhere and updated frequently, of course, and I haven't yet found an idea to tackle the storing issue for a large scale use scenario such as millions of users searching for a classic book on DDG.
In my head, though, I imagine an Instant Answer presenting the user with the different book formats all at once, so they could click on their preferred one and start reading.
As for the Instant Answer type, I imagine this could correspond to many types: Spice, Goodie or Fathead, depending on how it's developed... (I'm still wrapping my head around all these types, so please point me in the right direction if I got it wrong :) ).
4 years and 2 months ago
Keep in Touch
© DuckDuckGo. The search engine that doesn't track you.
Login with Github
Forgot your password?
Don't have an account?
Report this content for:
Attempting to drive traffic to a website by posting off-topic links to a personal/business website.
Insulting or provocative (and not in a good way).
Lewd or offensive language.
Post is entirely unrelated to the topic being discussed.
Post is likely copy/pasted from another website and was meant for blackhat SEO.
Enter your own description of the flagged post.