Web Search Privacy

Privacy in web search can be protected in different ways. In a system-centric solution, we can design a search engine using private information retrieval, such that users can obtain the results to their searches without revealing their queries or their search activities to the search engine. The benefit of this approach is that no information about users’ search activities is revealed to the service provider or any eavesdropper. This solution, however, cannot protect privacy of users with respect to the existing popular search engines. In a network-centric solution, users can make use of anonymous communications to hide their identities with respect to the search engines, in order to make their queries unlinkable. This technique can prevent an adversary from constructing a profile for each user to some extent. Features extracted from the user’s web browser can be used however to fingerprint the user and link her queries. In a user-centric approach, users can conceal their real queries by issuing interleaving fake queries. The challenge here is to generate fake queries that cannot be distinguished from real queries, as simple randomly generated queries can be easily filtered out from the set of observed queries from a user. Note that there is a possibility of combining these approaches for designing a hybrid protection mechanism.

Quantifying Web-Search Privacy

Web search queries reveal extensive information about users’ personal lives to the search engines and Internet eavesdroppers. Obfuscating search queries through adding dummy queries is a practical and user-centric protection mechanism to hide users’ search intentions and interests. Despite few such obfuscation methods and tools, there is no generic quantitative methodology for evaluating users’ web-search privacy. We provide such a methodology. We formalize adversary’s background knowledge and attacks, the users’ privacy objectives, and the algorithms to evaluate the effectiveness of query obfuscation mechanisms. We build upon machine-learning algorithms to learn the linkability between user queries. This encompasses the adversary’s knowledge about the obfuscation mechanism and the users’ web-search behavior. Then, we quantify privacy of users with respect to linkage attacks. Our generic attack can run against users for which the adversary does not have any background knowledge, as well as for the cases where some prior queries from the target users are already observed. We quantify privacy at the query level (the link between user’s queries) and the semantic level (user’s topics of interest). We design a generic tool that can be used for evaluating generic obfuscation mechanisms, and users with different web search behavior. To illustrate our approach in practice, we analyze and compare privacy of users for two example obfuscation mechanisms on a set of real web-search logs.

Members of the project: Arthur Gervais, Reza Shokri, Adish Singla, Srdjan Capkun and Vincent Lenders

Related publication

Arthur Gervais, Reza Shokri, Adish Singla, Srdjan Capkun and Vincent Lenders
Quantifying Web-Search Privacy
In Proceedings of the ACM Conference on Computer and Communication Security (CCS), 2014 (Acceptance rate: ~19.5%)
[DownloadPDF (PDF, 466 KB) | DownloadBibTeX (BIB, 342 Bytes) | DownloadSlides (PDF, 8.6 MB)]