anonymous
Hi Zac,

Yeah, I was just meaning to say that it seemed that comments were directed toward two distinct sorts of projects:
1) Development of a multilingual translator that was capable of parsing at least to the sentential level (like translate.google.com)
2) A much simpler hack that simply leveraged existing third-party solutions to yield results.

There simply aren't any existing public projects that are anything close to meeting all three requirements outlined above - at least, not that are suitable for a wide range of language families and/or for working in arbitrary domains ("domains" in the linguistic sense, not the networking sense). So, any solution either involves uncovering an existing and previously non-public project or starting a new one.

I was just saying that #1 is a very ambitious undertaking and, as such, there should be very clear goals in mind. I love the idea of working on such a project (in fact I've been doing that off-and-on since about 2008). But, even so, I can't come up with a compelling reason not to use translate.google.com to achieve similar results.

I think that, if some compelling counter-cases to translate.google.com could be codified (unintentional alliteration), then it might be feasible to rally the necessary talent to pull it off. I can't overstate, however, what a huge undertaking it is. There are some very real reasons that it has never been done very well.

Many of these reasons could be overcome with well-directed resources. But, unfortunately, this sort of work is generally overshadowed by the observation that:

a) There isn't much money in it (compared to the cost of development).
b) It isn't likely to be so reliable that a human doesn't ultimately have to intervene.
c) In the short-term, it is cheaper and more reliable to just have humans do the work.

... all very valid considerations. However, the more long-sighted view is that comprehensive work in this arena has direct implications to a number of other more taxing problems. Specifically, to do L1-->L2 translation in a way that preserves meaning, this requires algorithms that comprehensively extracts meaning from natural language. And that is a game changer. No one (and I mean "no one" is doing that yet).

So, I would put forward that this is THE most important project for a search engine. But, perhaps not the most appropriate project for DDG which is not the first-line information parser for results. That is, if DDG was parsing original web content directly, and producing results from those primary data, then a semantic parser would be a great asset. Being, actually, a consumer of secondary data and a producer of tertiary data, then it might be that there is no net benefit to end users of building a project that doesn't simply get results from a third party.

I don't know. That's more of a philosophical question to the DDG leadership. It is really a question of whether DDG wants to step out of the role of information middle man. If not, then it probably does not make sense to build a project that doesn't leverage the same model of passing on results from a third party. And, any project that does leverage that model will be quite limited since no one has done a great job at solving this problem yet (with the possible exception of Google).

Cheers,
-Brent
posted by [UserVoice Brent] • 4 years and 9 months ago Link