Hiking and climbing spots near a location

Log in to Vote
7
7 Votes • 38 Comments
I think that an instant answer for nearby outdoor activities including hiking, biking, and climbing would be useful for the DDG community. If need be, it could be split up into separate answers for each category.
Overall, it would be great to be able to type in "climbing routes near [place]" and get quick, useful answers.

I'd be willing to work on this, but I'd like input on the idea and other sources.
Source:
http://alpinaut.com/

this API was featured on Programmable Web, but it only has a good amount of entries for areas in Western Europe.
• posted 3 years and 6 months ago • type: Spice (API calls) Needs a Developer

jdorw
It's a great idea!

That source sems like it's okay for Spain? (never climbed there) Looks like it's missing a lot of other popular european climbing spots though.

I emailed mountainproject a while ago and asked if they could create an api but never heard back. I know they also do bike trails which would be another good source to have. Maybe it's time to bug them again :)

For European climbing maybe it's worth looking for or asking 8a.nu for an api?
posted by jdorw Staff3 years and 6 months ago Link
floey
I've looked around for a mountain project api, and I would assume that they aren't going to make one. Many people have asked for it on their forums and to my knowledge, none have received any feedback. I have found another website that does have a fairly extensive database, but its only for climbing. http://www.thecrag.com/ It has an api but its in beta. Here's the api site: http://www.thecrag.com/article/API
posted by floey 3 years and 6 months ago Link
jdorw
That looks like a really interesting source. I've never hear of them before but it looks like they have pretty good data on the climbing spots I checked. I'll look into getting an api key.
posted by jdorw Staff3 years and 6 months ago Link
floey
Any luck on getting an api key?
posted by floey 3 years and 6 months ago Link
jdorw
Sorry for the delay. I was looking into getting us access to another api which might still work out. I'll email theCraig today though.
posted by jdorw Staff3 years and 6 months ago Link
jdorw
Good news! I heard back from theCrag and they even offered to help out with the development. I sent them the link to this post so we can discuss the technical details over here.
posted by jdorw Staff3 years and 6 months ago Link
Krystle
I was about to suggest something like this! Mine is specific to hiking though, and in reverse of what you're proposing. I'll add my input here but let me know if this belongs in its own thread (I'm new here).

When you search "___ trail" or "___ trailhead", would it be possible to show a map of the trailhead (with it pinned) and maybe a map of the trail itself? fs.usda.gov always has lat/long for trailheads in the location section, usually on the lower right. Example: http://1.usa.gov/1DGrpK1

In my area (Portland, Oregon) there is a wiki field guide (barely wiki though, it's difficult to get editing access) that has a decently standardized format: http://www.portlandhikersfieldguide.org/
posted by Krystle 3 years and 6 months ago Link
floey
This is another great idea! This might be a separate instant answer than the climbing one we were discussing above, but it is relevant on this page since its close to the same idea. I think the first thing needed is to find a good source with an api. The more information from 1 source, the better. I checked out your first link, and there seemed to be a good api for all sorts or recreation at http://www.usda.gov/wps/portal/usda/usda... and it seems fairly extensive. Though it should be looked into a little more.
posted by floey 3 years and 6 months ago Link
jdorw
Is there a JSON api somewhere that has this data?

If there isn't an api this could still work as a longtail instant answer. Basically crawl the site and build a list of trail heads and their coordinates. I don't think anyone has ever tried showing a map from a longtail but I think it could work and I'm happy to help out if you or anyone else want to try it.
posted by jdorw Staff3 years and 6 months ago Link
floey
The only api I can find for the USDA and other Federal Recreation sites is at https://ridb.recreation.gov/. The only site I have found that gives the maps and trailhead locations is the USDA, and that api only lists locations not trails. Would each destination be crawled for trail heads and coordinates?

I've looked up a few more sources for trails and hiking and I have found http://www.trailapi.com/ so far. I seems like it might be a good source and was on programmable web.
posted by floey 3 years and 6 months ago Link
Krystle
I did this search: http://screencast.com/t/MJMDfTPRX

And came up with a list of almost 2000 trailheads: http://screencast.com/t/dqqYxQ4jHg5u

Not sure if that helps at all - still learning about APIs and how they work!
posted by Krystle 3 years and 6 months ago Link
jdorw
That looks great! I posted a wall of text to Brendan's post. All of my ideas for the climbing data instant answers would apply to the hiking IA too since it's the same type of data.
posted by jdorw Staff3 years and 6 months ago Link
brendanheywood
hi all,

I'm Brendan one of the developers behind theCrag.com and keen to help however I can. I'm completely new to the DDG side of things but from my quick read through the docs I'm thinking that much of the structured info the you'd want to query from thecrag is already available in the scraped page itself and using the API isn't really needed. But I'm completely open to you guys using the api, or us even writing a new tailored endpoint if it comes to that. In general I'm more interesting in making changes that are generically useful to any consumer / robot etc I'd say that querying your own index will be a lot faster that live querying our API.

Specifically:
* each page on our site has open graph meta data that says whether it is a 'route' or a 'crag / cliff / boulder etc'
* each page uses openschema markup to show the hierarchy relationship between locations
* each page has lat and long - we also have a boundary polygon for each area so can add that if there is a standard markup way of doing this.

For routes
* each route page has extra markup meta data using og for the popularity, quality, route grade, height, climbing style and potentially a bunch of other stuff like rock type, approach time, sun / shade, wind etc

For areas / crags
* we have an internal concept of 'crag quality' which takes into account lots of factors like how many people tick routes there, and how they rated them.

There is also a lot more info we store and display in html but isn't structured which we could also markup better if you wanted to leverage it.

So from my very limited understanding of the DDG we'd want to write a Fathead or Longtail which parses all of the above after DDG indexes the page, and then you could answer search questions like:

"best climbing crags in queensland"

"best boulder problems near london"

or potentially crazy complex but fairly impractical stuff like:

"the least climbed grade 18 trad routes on dolerite in tasmania"

As a side note our default license for content is CC Share alike, Attribution, Non-commercial and we've put a lot of thought into our licensing and markup and API specifically for sharing which I see as a massive deficiency in most other similar sites like mountainproject and a8 etc We're always open to discussion on how we can make the content our users share more available and accessible.

So where to from here to get started?
posted by brendanheywood 3 years and 6 months ago Link
jdorw
Hey Brendan, thanks for offering to help out!

Floey, this isn't really a spice IA but hopefully you're still interested in working on it? I'm happy to answer any question you might have if you want to try a fathead or longtail.

For both IA you pretty much just create a scraper and grab all the relevant data from each page. Then use that to build a output file. The formats are a little different between the two types.
posted by jdorw Staff3 years and 6 months ago Link
floey
jdorw, I am still interested in working on this IA. From reading all the information on the page creating a fathead seems like the best way to start. I am excited about the progress on this IA and am thankful for the support from theCraig.com.

I will be very busy for the next week, so I probably won't be able to do very much during that time. Also, I have never worked on a fathead or longtail, so it may take me some time learning the ropes. If anyone would like to help in development, the process would probably go a lot faster.
posted by floey 3 years and 6 months ago Link
jdorw
Great! I like your comment about starting with the longtail instead of a fathead. Here's the longtail docs. The details are kinda sparse so feel free to ask me any questions.
posted by jdorw Staff3 years and 6 months ago Link
jdorw
I see two potential instant answers from here. The easiest to get started with would be a fathead. fathead

Fatheads are basically key->value data. So your search term has to match the key with a few optional trigger words we can add on. This might be good for searching for exact route/crag names.

A query like "salathe climb" or "salathe wall" would trigger info that came from this page http://www.thecrag.com/climbing/united-states/yosemite-national-park/route/20684875 It could then show a list of data (pitches, rating, FA .. ) or maybe a longer description.

Here's an example of a fathead that shows data in a list format like that: https://duckduckgo.com/?q=python+numpy&t=canonical&ia=about

We also have the option of showing a small image on the left side of the text. Similar to the wikipedia IA: https://duckduckgo.com/?q=shawangunks&ia=about

No maps or geo search from a fathead though. I don't really see an easy way for that to work.
posted by jdorw Staff3 years and 6 months ago Link
jdorw
link got chopped off. This is the fathead showing list data https://duckduckgo.com/?q=python+numpy&t=canonical&ia=about
posted by jdorw Staff3 years and 6 months ago Link
jdorw
The second type we could so is a longtail. The good thing is that once you have a scraper for a fathead, it's very easy to just change the format of the output file to fit a longtail.

I think this is going to be a more interesting IA to make. We don't currently support geo search or maps but I think with some work on my end I could make that happen.

The interesting part of the longtail is that we can handle more complex searches like the ones Brendan listed:
"best climbing crags in queensland"
"best boulder problems near london"

It will also handle more complicated name searches like "salathe wall yosemite" which wouldn't match an exact key in the fathead. We could also do location searches like "climbing near me".

Other queries like "the least climbed grade 18 trad routes on dolerite in tasmania" will still be much harder though.


Hope that helps!

TDLR; This sounds like a really neat IA, I'd start with a fathead for now. Please let me know if you have any questions
posted by jdorw Staff3 years and 6 months ago Link
floey
If we know that we want to include the features that would require a longtail, would it be better to just start it as such, or convert after there is a functioning fathead?
posted by floey 3 years and 6 months ago Link
jdorw
It will just take longer to get a longtail working on my end. I'll have to add in support for geoip and maps which will take some time. If you don't mind the wait while I try to get it working then I agree.
posted by jdorw Staff3 years and 6 months ago Link
brendanheywood
Just a few more thoughts that come to mind:

* Our index is mode up of nodes which have a type hierarchy of region > crag > cliff > area > field > boulder, as a general rule you'd probably want to favour results of type 'crag', so if someone searched for "yosemite climbing" you'd probably want to return an answer based on:

Yosemite National Park (a 'crag')
http://www.thecrag.com/climbing/united-s...

rather than it's child

Yosemite Valley (an 'area')
http://www.thecrag.com/climbing/united-s...

... but this is only a soft signal not a hard rule

* One thing with climbing in is that many of the area names get re-used a lot, eg there are many climbing areas called 'red cliffs' around the world. If there are addition words that give it context like 'red cliff climbing queensland' then that's fine and can be inferred from the hierarchy. Or perhaps from the search context, geoip etc. This problem is compounded when it comes to duplicated route names. If unqualified you could choose the most popular based on our 'classic crag' metric

* Rock climbing comes in many different styles which would affect which keywords would trigger the instant answer. Some suggestions:

"yosemite climbing" or "yosemite rock climbing"
http://www.thecrag.com/climbing/united-s...

"castle hill bouldering"
http://www.thecrag.com/climbing/new-zeal...

"craftys deep water solo" or "crafty dws"
http://www.thecrag.com/climbing/australi...

"ouray ice climbing"
http://www.thecrag.com/climbing/united-s...

A simple solution is that these are just synonyms of each other, or you could look at the predominant 'climbing style' stats in the page to know which to trigger on. Probably 'climbing' or 'crag' should trigger on all types regardless. Some (most) areas have mixed type so should trigger on whatever is in use.
posted by brendanheywood 3 years and 6 months ago Link
jdorw
The re-use of names will be tough but there's plenty of tricks I use on my end to handle those cases. We can work on that once there's a working example to test. I think just including a popularity score in the data would be good enough to at least get something started.

I like the polygons on the maps. I was playing around with editing the map for my local climbing spot. Really neat!

Is there a site map or index to start scraping from?

Floey, I think it might be easier if you just want to scrap your small local area to get an initial test data set. I can put that on a server and give us something to start testing out.
posted by jdorw Staff3 years and 6 months ago Link
floey
Which way should I go about scraping my local area? I looked at the few longtails currently made and it didn't seem as if they had any similarities in their scraping.

Do we want to start small or should we pick a massive area like Yosemite as the initial data set?
posted by floey 3 years and 6 months ago Link
jdorw
I'd start small. We probably need a few iterations to figure out what data to get so no point in scraping a big area.

There's not a lot of similarities. That's a good (or bad?) part of these. You can write them in pretty much any language and any way you like.
posted by jdorw Staff3 years and 6 months ago Link
brendanheywood
We don't implement a sitemap.xml and I'm not sure it's practical to - we have several hundred thousand nodes being updated hourly in an irregular way, it would be constantly stale and take a lot of cpu to make. We do implement an atom feed of updates to the index which is discoverable in markup and I *think* google and other robots use it to detect timely changes to nodes to re-index without resorting to a full re-scrape. Hard to know their internals.

But for our purposes I'd assume there is already lots of DDG scraping and indexing going on and this should just be an extra step after that process and not a separate scrape?

If this needs another scrape, or just for dev purposes you could start at the world node, or any region or crag node and then walk down through the index following links in the left nav:

http://www.thecrag.com/climbing/world

Or pick some smaller region like

http://www.thecrag.com/climbing/australi...

Just to get started I'd probably only walk down as far as the highest crag node, we internally call this the TLC or Top Level Crag. We use this concept of a TLC for lots of reasons, eg what 'crag' does a route belong to? Crag's can be nested, ie the Grampians or Yosemite are considered crags but are 100s of km wide and contain smaller well known and names crags, eg Yosemite > El Capitan and Grampians > Hollow Mountain. This will avoid the large number of cases of crags which have children nodes with generic names like 'Left side / Right side', 'North / South' or 'Sunny side / Shady side'. Later we can go down to the route level and figure out how to filter out all the duplicates.

posted by brendanheywood 3 years and 6 months ago Link
jdorw
This would be a completely separate scraper. All the fatheads and longtails use a source specific scraper. Usually we just figure out what a reasonable update period is and manually run it then. I like the atom feed though. I'll try and think of a way we could use that.
posted by jdorw Staff3 years and 6 months ago Link
brendanheywood
Given that this would be a separate scraping process I'm starting to think a custom endpoint on our side which is hit once a week or so would be better, but not a live api.

In the mean time you guys can manually piece together a static text file, csv or json or directly in the longtail format with exactly the bits of data that you need, and adding fields and test data as you go. When you are close to getting it all working and the format has stabilized a bit we (ie thecrag) will knock up a single api endpoint which replicates that format.

That way you are focusing on the IA logic and not on a bunch of scraping and parsing code, and we are focus on just pumping out the data in the right shape without having to think too much about what that shape is and what field goes where.
posted by brendanheywood 3 years and 6 months ago Link
jdorw
How would the api work? Just hit it once a week to get a whole new data set?
posted by jdorw Staff3 years and 6 months ago Link
brendanheywood
Yeah that's what I had in mind, this seems the easiest. Even monthly could be fine as this data is very slow moving at this high level. This would work well for all crags, which is a fairly limited data set, currently ~5,000 records worldwide (and probably return only high quality subset of these). This probably won't work so well for route level stuff ~300,000, but I'm a little dubious about the value of IA's for individual routes. Perhaps we could only IA routes which are iconic, 2 or 3 star popular routes.
posted by brendanheywood 3 years and 6 months ago Link
floey
jdorw, If theCrag will have the custom endpoint, should the IA be a Spice or is the Longtail format still applicable?
posted by floey 3 years and 5 months ago Link
floey
I created a scraper/crawler for the site and here is the sample output in JSON format for my local crag, The Rumbling Bald.

{"Rumbling Bald":{"styles":{"Boulder":"32%","Unknown":"20%","Sport":"2%","Trad":"44%"},"breadcrumbs":["Rumbling Bald","North Carolina","USA","North America"],"areas":[{"routes":"9","ticks":"129","height":"130ft","name":"Cereal Buttress"},{"height":"","name":"Comotose Area","routes":"0","ticks":"0"},{"routes":"0","ticks":"0","name":"Flakeview Area","height":""},{"height":"","name":"Lakeview Area","routes":"0","ticks":"0"},{"height":"69ft","name":"Screamweaver Area","routes":"3","ticks":"6"},{"ticks":"0","routes":"0","height":"","name":"Cereal Wall"}],"type":"crag","url":"http://www.thecrag.com/climbing/united-s...","number of routes":"34"}}

The scraper could scrape a larger area, but I did 1 crag so we would just have a little data first.
posted by floey 3 years and 5 months ago Link
brendanheywood
Is this running from Quebec in Safari 534.34 by any chance and still running right now?

We've seen a massive spike in traffic hitting urls like this:

http://www.thecrag.com/climbing/a/b/area...
http://www.thecrag.com/climbing/a/b/area...
http://www.thecrag.com/climbing/a/b/area...

This is quite odd as that url format, while it works, isn't something you'd ever find by following links, it works almost by accident.

If it is please let me know so I don't have to keep chasing this traffic, and please ramp it down a *lot*. Ideally just scrape the page without any assets (ie don't load our google analytics beacon)
posted by brendanheywood 3 years and 5 months ago Link
floey
That is quite odd. I haven't been running anything since I posted and my traffic was minimal and confined to that single url. I don't live in Quebec or have safari installed either. I do hope you figure out what the problem is though.
posted by floey 3 years and 5 months ago Link
jdorw
Great thanks! All of the data that you want to show will have to go into a single "paragraph" field. This field will have to be formatted in the way you want it to show on the site. For now that can only be plain text and newlines.

Feel free to make a pull request at any time. It doesn't have to be completely finished and it might be easier to go over the output file format on github.
posted by jdorw Staff3 years and 5 months ago Link
floey
Pull request has been made:
https://github.com/duckduckgo/zeroclicki...
posted by floey 3 years and 5 months ago Link
brendanheywood
hey I know this isn't the right forum, but I find these forums fairly difficult to follow what is new. In particular two small low hanging fruit would be for the notification emails and the notifications list on the dashboard to link to the hash anchor of each new comment. And secondly if it could highlight which comments where new since you last visited a page that would make a massive difference. It's very easy to lose track and context in these deeply nested threads
posted by brendanheywood 3 years and 6 months ago Link
jdorw
Agreed. Better notifications are in progress. I'll make an issue for the comment highlighting.
posted by jdorw Staff3 years and 6 months ago Link