jdorw
How would the api work? Just hit it once a week to get a whole new data set?
posted by jdorw Staff3 years and 4 months ago Link

brendanheywood
Yeah that's what I had in mind, this seems the easiest. Even monthly could be fine as this data is very slow moving at this high level. This would work well for all crags, which is a fairly limited data set, currently ~5,000 records worldwide (and probably return only high quality subset of these). This probably won't work so well for route level stuff ~300,000, but I'm a little dubious about the value of IA's for individual routes. Perhaps we could only IA routes which are iconic, 2 or 3 star popular routes.
posted by brendanheywood 3 years and 4 months ago Link
floey
jdorw, If theCrag will have the custom endpoint, should the IA be a Spice or is the Longtail format still applicable?
posted by floey 3 years and 3 months ago Link
floey
I created a scraper/crawler for the site and here is the sample output in JSON format for my local crag, The Rumbling Bald.

{"Rumbling Bald":{"styles":{"Boulder":"32%","Unknown":"20%","Sport":"2%","Trad":"44%"},"breadcrumbs":["Rumbling Bald","North Carolina","USA","North America"],"areas":[{"routes":"9","ticks":"129","height":"130ft","name":"Cereal Buttress"},{"height":"","name":"Comotose Area","routes":"0","ticks":"0"},{"routes":"0","ticks":"0","name":"Flakeview Area","height":""},{"height":"","name":"Lakeview Area","routes":"0","ticks":"0"},{"height":"69ft","name":"Screamweaver Area","routes":"3","ticks":"6"},{"ticks":"0","routes":"0","height":"","name":"Cereal Wall"}],"type":"crag","url":"http://www.thecrag.com/climbing/united-s...","number of routes":"34"}}

The scraper could scrape a larger area, but I did 1 crag so we would just have a little data first.
posted by floey 3 years and 3 months ago Link
brendanheywood
Is this running from Quebec in Safari 534.34 by any chance and still running right now?

We've seen a massive spike in traffic hitting urls like this:

http://www.thecrag.com/climbing/a/b/area...
http://www.thecrag.com/climbing/a/b/area...
http://www.thecrag.com/climbing/a/b/area...

This is quite odd as that url format, while it works, isn't something you'd ever find by following links, it works almost by accident.

If it is please let me know so I don't have to keep chasing this traffic, and please ramp it down a *lot*. Ideally just scrape the page without any assets (ie don't load our google analytics beacon)
posted by brendanheywood 3 years and 3 months ago Link
floey
That is quite odd. I haven't been running anything since I posted and my traffic was minimal and confined to that single url. I don't live in Quebec or have safari installed either. I do hope you figure out what the problem is though.
posted by floey 3 years and 3 months ago Link
jdorw
Great thanks! All of the data that you want to show will have to go into a single "paragraph" field. This field will have to be formatted in the way you want it to show on the site. For now that can only be plain text and newlines.

Feel free to make a pull request at any time. It doesn't have to be completely finished and it might be easier to go over the output file format on github.
posted by jdorw Staff3 years and 3 months ago Link
floey
Pull request has been made:
https://github.com/duckduckgo/zeroclicki...
posted by floey 3 years and 3 months ago Link