Latest Comments

7 Total
a reply to a comment on the Instant Answer Idea Hiking and climbing spots near a location 4 years and 26 days ago
Is this running from Quebec in Safari 534.34 by any chance and still running right now?

We've seen a massive spike in traffic hitting urls like this:

This is quite odd as that url format, while it works, isn't something you'd ever find by following links, it works almost by accident.

If it is please let me know so I don't have to keep chasing this traffic, and please ramp it down a *lot*. Ideally just scrape the page without any assets (ie don't load our google analytics beacon)
a reply to a comment on the Instant Answer Idea Hiking and climbing spots near a location 4 years and 1 month ago
Yeah that's what I had in mind, this seems the easiest. Even monthly could be fine as this data is very slow moving at this high level. This would work well for all crags, which is a fairly limited data set, currently ~5,000 records worldwide (and probably return only high quality subset of these). This probably won't work so well for route level stuff ~300,000, but I'm a little dubious about the value of IA's for individual routes. Perhaps we could only IA routes which are iconic, 2 or 3 star popular routes.
a reply to a comment on the Instant Answer Idea Hiking and climbing spots near a location 4 years and 1 month ago
Given that this would be a separate scraping process I'm starting to think a custom endpoint on our side which is hit once a week or so would be better, but not a live api.

In the mean time you guys can manually piece together a static text file, csv or json or directly in the longtail format with exactly the bits of data that you need, and adding fields and test data as you go. When you are close to getting it all working and the format has stabilized a bit we (ie thecrag) will knock up a single api endpoint which replicates that format.

That way you are focusing on the IA logic and not on a bunch of scraping and parsing code, and we are focus on just pumping out the data in the right shape without having to think too much about what that shape is and what field goes where.
a reply to a comment on the Instant Answer Idea Hiking and climbing spots near a location 4 years and 1 month ago
hey I know this isn't the right forum, but I find these forums fairly difficult to follow what is new. In particular two small low hanging fruit would be for the notification emails and the notifications list on the dashboard to link to the hash anchor of each new comment. And secondly if it could highlight which comments where new since you last visited a page that would make a massive difference. It's very easy to lose track and context in these deeply nested threads
a reply to a comment on the Instant Answer Idea Hiking and climbing spots near a location 4 years and 1 month ago
We don't implement a sitemap.xml and I'm not sure it's practical to - we have several hundred thousand nodes being updated hourly in an irregular way, it would be constantly stale and take a lot of cpu to make. We do implement an atom feed of updates to the index which is discoverable in markup and I *think* google and other robots use it to detect timely changes to nodes to re-index without resorting to a full re-scrape. Hard to know their internals.

But for our purposes I'd assume there is already lots of DDG scraping and indexing going on and this should just be an extra step after that process and not a separate scrape?

If this needs another scrape, or just for dev purposes you could start at the world node, or any region or crag node and then walk down through the index following links in the left nav:

Or pick some smaller region like

Just to get started I'd probably only walk down as far as the highest crag node, we internally call this the TLC or Top Level Crag. We use this concept of a TLC for lots of reasons, eg what 'crag' does a route belong to? Crag's can be nested, ie the Grampians or Yosemite are considered crags but are 100s of km wide and contain smaller well known and names crags, eg Yosemite > El Capitan and Grampians > Hollow Mountain. This will avoid the large number of cases of crags which have children nodes with generic names like 'Left side / Right side', 'North / South' or 'Sunny side / Shady side'. Later we can go down to the route level and figure out how to filter out all the duplicates.

a reply to a comment on the Instant Answer Idea Hiking and climbing spots near a location 4 years and 1 month ago
Just a few more thoughts that come to mind:

* Our index is mode up of nodes which have a type hierarchy of region > crag > cliff > area > field > boulder, as a general rule you'd probably want to favour results of type 'crag', so if someone searched for "yosemite climbing" you'd probably want to return an answer based on:

Yosemite National Park (a 'crag')

rather than it's child

Yosemite Valley (an 'area')

... but this is only a soft signal not a hard rule

* One thing with climbing in is that many of the area names get re-used a lot, eg there are many climbing areas called 'red cliffs' around the world. If there are addition words that give it context like 'red cliff climbing queensland' then that's fine and can be inferred from the hierarchy. Or perhaps from the search context, geoip etc. This problem is compounded when it comes to duplicated route names. If unqualified you could choose the most popular based on our 'classic crag' metric

* Rock climbing comes in many different styles which would affect which keywords would trigger the instant answer. Some suggestions:

"yosemite climbing" or "yosemite rock climbing"

"castle hill bouldering"

"craftys deep water solo" or "crafty dws"

"ouray ice climbing"

A simple solution is that these are just synonyms of each other, or you could look at the predominant 'climbing style' stats in the page to know which to trigger on. Probably 'climbing' or 'crag' should trigger on all types regardless. Some (most) areas have mixed type so should trigger on whatever is in use.
a comment on the Instant Answer Idea Hiking and climbing spots near a location 4 years and 1 month ago
hi all,

I'm Brendan one of the developers behind and keen to help however I can. I'm completely new to the DDG side of things but from my quick read through the docs I'm thinking that much of the structured info the you'd want to query from thecrag is already available in the scraped page itself and using the API isn't really needed. But I'm completely open to you guys using the api, or us even writing a new tailored endpoint if it comes to that. In general I'm more interesting in making changes that are generically useful to any consumer / robot etc I'd say that querying your own index will be a lot faster that live querying our API.

* each page on our site has open graph meta data that says whether it is a 'route' or a 'crag / cliff / boulder etc'
* each page uses openschema markup to show the hierarchy relationship between locations
* each page has lat and long - we also have a boundary polygon for each area so can add that if there is a standard markup way of doing this.

For routes
* each route page has extra markup meta data using og for the popularity, quality, route grade, height, climbing style and potentially a bunch of other stuff like rock type, approach time, sun / shade, wind etc

For areas / crags
* we have an internal concept of 'crag quality' which takes into account lots of factors like how many people tick routes there, and how they rated them.

There is also a lot more info we store and display in html but isn't structured which we could also markup better if you wanted to leverage it.

So from my very limited understanding of the DDG we'd want to write a Fathead or Longtail which parses all of the above after DDG indexes the page, and then you could answer search questions like:

"best climbing crags in queensland"

"best boulder problems near london"

or potentially crazy complex but fairly impractical stuff like:

"the least climbed grade 18 trad routes on dolerite in tasmania"

As a side note our default license for content is CC Share alike, Attribution, Non-commercial and we've put a lot of thought into our licensing and markup and API specifically for sharing which I see as a massive deficiency in most other similar sites like mountainproject and a8 etc We're always open to discussion on how we can make the content our users share more available and accessible.

So where to from here to get started?
« 1 »