anonymous
Also, since you are hosted on AWS then AWS is storing all my search queries, IP Address and UA, no?
posted by <hidden> • 3 years and 10 months ago Link

x.15a2
No. DDG uses AWS, but the information is not stored by DDG or !A.

This is old, but explains the whole UA thing. http://www.gabrielweinberg.com/blog/2010...
posted by x.15a2 Community Leader3 years and 10 months ago Link
anonymous
That still doesn't answer any of the points I've raised:

1. He spends a lot of time there talking about the requests to get the images but glaringly leaves out the obvious: the first request with my search terms IS sent to DDG (and, by definition, AWS). For instance, I enter a search query and it goes to "https://duckduckgo.com/?q=duck+dynasty". That single request hits your app AND it hits AWS (which Gabe admits they log).

2. You say you don't store the user agent but then how do determine what a "Bot" is in your traffic stats?
posted by <hidden> • 3 years and 10 months ago Link
anonymous
2. You say you don't store the user agent but then how do determine what a "Bot" is in your traffic stats?

Why can't they have a script that goes:

establish queryConnection
{

ConnectionID.array = IP + UA
fireTrafficStats(ConnectionID)
getQuery
etc
}

fireTrafficStats
{
IP = ConnectionID.0
UA = ConnectionID.1

if (IP == x || IP == x || IP == x (...) || UA == x || UA == x)
{
botCount++
}

else
{
userCount++
}

(any extra comparisons for API, etc)

del IP
del UA
del ConnectionID

}

In a nut shell. Then the IP and UA is only temporarily stored in server memory then immediately destroyed when the statistics are recorded.
posted by <hidden> • 3 years and 10 months ago Link
anonymous
Until we have an official answer from DDG we probably shouldn't speculate. ;)
posted by <hidden> • 3 years and 10 months ago Link
yegg
This general idea (though not the specifics) is how it works. We have automated systems that determine on the fly (in memory) whether something is a bot or not and then mark it as such. The most prevalent bot is actually Googlebot, which is extremely easy to detect in nginx.

As for the AWS question, we use encryption by default, which encrypts HTTP headers and those the search query. Actually the end server isn't all that matters. If you don't use HTTPS then actually lots of servers between you and us could intercept your search term. Just do a trace route between you and any Internet site. However, with encryption these headers are encrypted when they travel across the Internet.
posted by yegg Staff3 years and 10 months ago Link
anonymous
The search terms are not part of the headers but part of the URL in a GET request. For a POST request those terms are part of the body.

In any event, even before DDG has a chance to encrypt any data Amazon has already logged my search query and IP address.
posted by <hidden> • 3 years and 10 months ago Link
yegg
This a common misconception. All headers (including the full URL and search terms embedded in it) are encrypted on a GET request: http://stackoverflow.com/questions/18765...
posted by yegg Staff3 years and 10 months ago Link
Jlg
This is a good time to mention:
posted by Jlg Community Leader3 years and 10 months ago Link
anonymous
That single request hits your app AND it hits AWS (which Gabe admits they log).

How do you know the request hits AWS as well?

I tried it out and it only hit ddg in FF.
I'm sure ddg fetches all the data for you beforehand. not real time.
Even if it did, why would it have to send your IP along with the request?
posted by <hidden> • 3 years and 10 months ago Link
anonymous
By "hitting AWS" I mean it passes through AWS infrastructure. There's no way for it to hit DDG without passing through AWS. DDG doesn't pass your IP...your computer does. Here's an exercise:

With firebug active in FF put "https://duckduckgo.com/?q=bob+dylan" in the browser's address. In the "Net" tab in firebug the very first entry will be "GET ?q=bob+dylan". Copy the remote IP and go to "http://itools.com/tool/arin-whois-domain..." and enter that IP address. See? Amazon's name is there (and not DDG's) and Amazon own that IP address. You just sent a query to Amazon's computers and Amazon logs your IP Address & search terms, etc.
posted by <hidden> • 3 years and 10 months ago Link
anonymous
I can't believe I missed that.
posted by <hidden> • 3 years and 10 months ago Link
This comment has been removed for violation of our forum rules.
posted by <hidden> • 3 years and 10 months ago
anonymous
Why don't you find DDG a new server host then? One with a nice pivacy policy : )
posted by <hidden> • 3 years and 10 months ago Link
x.15a2
Quote:
2. You say you don't store the user agent but then how do determine what a "Bot" is in your traffic stats?
It clearly states that FULL UA's are not stored, but enough (again, NON-Personal) AU data is retained for the stats like you are referring to.
posted by x.15a2 Community Leader3 years and 10 months ago Link