Claudia Hauff

Assistant Professor, Web Information Systems, TU Delft


Taking a look at health campaigns through the eyes of Twitter: a short paper and a workshop paper acceptance at EMNLP 2015.

Two Master thesis works I supervised have recently been accepted as full papers at ICWE 2015 and ACM HyperText 2015.

About me

I am an Assistant Professor at the Web Information Systems group, Delft University of Technology. Between 2011 and 2012 I worked as Postdoc in the same group, conducting research in the scope of the ImREAL project.
I received my PhD from the University of Twente, where I worked in the Human Media Interaction group. The Otto-von-Guericke University of Magdeburg in Germany was my home during my undergraduate years as a student in computer science.
In the past, I have worked on a variety of topics in the fields of information retrieval (IR) & data science, including query performance prediction (the topic of my PhD thesis), social search, computational social science, learning to search and IR for specific user groups (e.g. children).

I have diverse research interests and am always happy to dive into a new area. Together with three PhD students I am supervising (Guanliang Chen, Yue Zhao and Dan Davis), I am currently focusing heavily on large-scale learning analytics and how to incorporate search into the learning process at scale. TU Delft is a very active provider of Massive Open Online Courses (MOOCs) on the edX platform, which provides a rich experimental playground; ultimately, I (and the whole WIS Learning Analytics team) aim to provide insights and develop mechanisms to improve learning at scale.

My research vision for the coming years focuses on two projects (MoSeS and BoOLE), connected through the MOOC use case.


PO Box 5031
2600 GA Delft
The Netherlands

Office: HB 08.100
Email: c.hauff[at] or claudia.hauff[at]

View Claudia Hauff's profile on LinkedIn View Claudia Hauff's profile on GitHub

Data Sets

A list of data sets derived from our research that are publicly available:



I enjoy teaching. A lot. It is one of the reasons why I chose a career in academia. As a teacher of Bachelor courses, I design my courses to instill three qualities into my students: confidence, independence and problem understanding

MSc thesis supervision

Students I have supervised and am currently supervising are researching a range of topics related to information retrieval & learning analytics including spelling correction, big data architectures, topic modeling, query log analysis and cheating in MOOCs. Several of my students have published their MSc work at high-quality conference venues, including ICWE, ACM SIGIR and ACM HyperText.

If you are interested in a Master thesis research project in information retrieval, have a look at the ongoing benchmark campaigns - often a good starting point for finding a topic in information retrieval:

If you are interested in a thesis in learning analytics, have a look at the proceedings of the first and second edition of the ACM Learning At Scale conference - they contain many interesting contributions in terms of data science and Web engineering.


Publications [DBLP] [Google Scholar]


Jaeyoung Choi, Claudia Hauff, Olivier van Laere and Bart Thomee, The Placing Task at MediaEval 2015, overview paper of our benchmark running at MediaEval 2015

Dong Nguyen, Tijs A. van den Broek, Claudia Hauff and Djoerd Hiemstra, #SupportTheCause: Identifying Motivations to Participate in Online Health Campaigns, accepted as short paper at EMNLP 2015

Nugroho Dwi Prasetyo, Claudia Hauff, Dong Nguyen, Tijs van den Broek and Djoerd Hiemstra, On the Impact of Twitter-based Health Campaigns: A Cross-Country Analysis of Movember, accepted at the LOUHI 2015 workshop (co-located with EMNLP 2015)

Nugroho Dwi Prasetyo and Claudia Hauff, Twitter-based election prediction in the developing world, accepted as a full paper at ACM HyperText 2015

Morgan Harvey, David Elsweiler and Claudia Hauff, Learning by Example: training users through high-quality query suggestion, accepted as a full paper at SIGIR 2015

Yue Zhao and Claudia Hauff, Sub-document timestamping of Web documents, accepted as a short paper at SIGIR 2015

Claudia Hauff and Georgios Gousios, Matching GitHub developer profiles to job advertisments, accepted as short paper at the 12th Working Conference on Mining Software Repositories

Dirk Guijt and Claudia Hauff, Using Query-Log based Collective Intelligence to Generate Query Suggestions for Tagged Content Search, accepted as a full paper at ICWE 2015


Martha Larson, Pascal Kelm, Adam Rae, Claudia Hauff, Bart Thomee et al., The Benchmark as a Research Catalyst: Charting the Progress of Geo-prediction for Social Multimedia, book chapter in Multimodal Location Estimation of Videos and Images (Springer Publishing), 2014 [link]

Ke Tao, Claudia Hauff, Geert-Jan Houben, Fabian Abel, and Guido Wachsmuth, Facilitating Twitter Data Analytics: Platform, Language, and Functionality, accepted as a full paper at IEEE BigData 2014

Jie Yang, Claudia Hauff, Alessandro Bozzon, and Geert-Jan Houben, Answering the Right Question: on the Editing of Questions in Collaborative Question Answering Systems, accepted as full paper at ACM Hypertext 2014. [pdf] [slides]

Michiel van Dam, and Claudia Hauff, Large-Scale Author Verification: Temporal and Topical Influences, accepted at SIGIR 2014 [pdf] [poster]


Claudia Hauff, Bart Thomee, and Michele Trevisiol, Working Notes for the Placing Task at MediaEval, In MediaEval 2013 Workshop, 2013 (task organizers) [pdf]
The Placing Task 2013 data can be downloaded here.

Ke Tao, Claudia Hauff and Geert-Jan Houben, Building a Microblog Corpus for Search Result Diversification, accepted as a full paper at AIRS 2013 [pdf]

Gudrun Wesiak, Adam Moore, Christina M. Steiner, Claudia Hauff, Conor Gaffney, Declan Dagger, Dietrich Albert, Fionn Kelly, Gary Donohoe, Gordon Power and Owen Conlan, Affective Metacognitive Scaffolding and User Model Augmentation for Experiential Training Simulators: A Follow-up Study , EC-TEL 2013, pp. 396-409, 2013

Adam Moore, Gudrun Wesiak, Christina M. Steiner, Claudia Hauff, Declan Dagger, Gary Donohoe and Owen Conlan, Utilizing social neworks for user model priming: user attitudes, UMAP Workshops: Late Breaking Results, [pdf]

Claudia Hauff, A Study on the Accuracy of Flickr's Geotag Data, SIGIR 2013, pp. 1037-1040, 2013 [pdf preprint] [annotation data] [data visualization]

Christophe Deloo and Claudia Hauff, Exploiting Semantic Relatedness Measures for Multi-label Classifier Evaluation, accepted as research contribution at the Dutch-Belgian IR Workshop 2013 [pdf] [proceedings link]

Ke Tao, Fabian Abel, Claudia Hauff, Geert-Jan Houben and Ujwal Gadiraju, Groundhog Day: Near-Duplicate Detection on Twitter, WWW 2013, pp. 1273-1284, 2013 [slides]


Claudia Hauff and Gerald Friedland, Brave New Task: User Account Matching, MediaEval: Benchmarking Initiative for Multimedia Evaluation, 2012 [pdf] [slides]

Claudia Hauff, Matthias Hagen, Anna Beyer and Benno Stein, Towards realistic known-item topics for the ClueWeb, IIiX 2012, pp. 274-277, 2012 [pdf]

Claudia Hauff and Geert-Jan Houben, Placing images on the world map: a microblog-based enrichment approach, SIGIR 2012, pp. 691-700, 2012 [pdf] [slides]

Fabian Abel, Claudia Hauff, Geert-Jan Houben and Ke Tao, Leveraging User Modeling on the Social Web with Linked Data, ICWE 2012, pp. 378-385, 2012 [slides (Ke Tao)]

Ke Tao, Fabian Abel, Claudia Hauff and Geert-Jan Houben, Twinder: a search engine for Twitter streams, ICWE 2012, pp. 153-168, 2012 [slides (Ke Tao)]

Djoerd Hiemstra and Claudia Hauff, Brute Force Information Retrieval Experiments using MapReduce, ERCIM News 89, pp. 31-32, 2012 [pdf] [link]

Fabian Abel, Claudia Hauff, Geert-Jan Houben, Ke Tao and Richard Stronkman, Twitcident: Fighting Fire with Information from Social Web Streams, WWW '12 Companion, pp. 305-308, 2012 [link to paper] [link to website]

Ke Tao, Fabian Abel, Claudia Hauff and Geert-Jan Houben, What makes a tweet relevant for a topic?, WWW 2012 workshop "Making Sense of Microposts" [proceedings]

Fabian Abel, Claudia Hauff, Geert-Jan Houben, Ke Tao and Richard Stronkman, Semantics + Filtering + Search = Twitcident: Exploring Information in Social Web Streams, ACM Hypertext, pp. 285-294, 2012 [link]

Claudia Hauff and Geert-Jan Houben, Serendipitous Browsing: Stumbling through Wikipedia, Searching 4Fun! workshop (co-located with ECIR) [pdf]

Claudia Hauff and Geert-Jan Houben, Geo-Location Estimation of Flickr Images: Social Web Based Enrichment, ECIR 2012, pp. 85-96, 2012 [pdf] [slides]


Ke Tao, Fabian Abel and Claudia Hauff, WISTUD at TREC 2011: Microblog Track [pdf]

Claudia Hauff and Geert-Jan Houben, WISTUD at MediaEval 2011: Placing Task, MediaEval 2011 [pdf]

Claudia Hauff and Geert-Jan Houben, Deriving Knowledge Profiles from Twitter, EC-TEL 2011, pp. 139-152, 2011 [pdf] [slides]

Claudia Hauff and Geert-Jan Houben, Cognitive Processes in Query Generation, ICTIR 2011, pp. 176-187, 2011 [pdf] [slides] [link]

Claudia Hauff and Dolf Trieschnigg, Adding Emotions to Pictures, ICTIR 2011, pp. 364-367, 2011 [pdf] [poster] [link]

Claudia Hauff and Geert-Jan Houben, Simulating Memory Recall in Personal Search, EPS 2011 (Evaluating Personal Search Workshop), 2011 [link to proceedings]

Fabian Abel, Ilknur Celik, Claudia Hauff, Laura Hollink and Geert-Jan Houben, U-Sem: Semantic Enrichment, User Modeling and Mining Usage Data on the Social Web, 1st International Workshop on Usage Analysis and the Web of Data (USEWOD2011), short paper, 2011 [pdf]

Dolf Trieschnigg and Claudia Hauff, Classic Children's Literature - Difficult to Read?, ECIR 2011, pp. 691 - 694, 2011 [link]


Djoerd Hiemstra and Claudia Hauff, MapReduce for experimental search, TREC 2010 [link]

Claudia Hauff and Dolf Trieschnigg, Enhancing Access to Classic Children's Literature, BooksOnline 2010 workshop (co-located with CIKM) [pdf] [link]
The proposed project was awarded a seed fund, sponsored by Microsoft Research. [Workshop report]

Claudia Hauff, Leif Azzopardi and Diane Kelly, A Comparison of User and System Performance Predictions, CIKM 2010, pp. 979 - 988, 2010 [link]

Djoerd Hiemstra and Claudia Hauff, MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data", CLEF 2010, pp. 64 - 69, 2010 [pdf]

Guido Zuccon, Leif Azzopardi, Claudia Hauff and C.J. van Rijsbergen, Estimating Interference in the QPRP for Subtopic Retrieval, SIGIR 2010, pp. 741-742, 2010 [link]

Claudia Hauff, Diane Kelly, Leif Azzopardi and Franciska de Jong, Query Quality: User Ratings and System Predictions, SIGIR 2010, pp. 743-744, 2010 [link], [poster], [pdf]

Claudia Hauff and Franciska de Jong, Retrieval System Evaluation: Automatic Evaluation versus Incomplete Judgments, SIGIR 2010, pp. 863-864, 2010 [link], [pdf]

Claudia Hauff, Leif Azzopardi, Djoerd Hiemstra and Franciska de Jong, Query Performance Prediction: Evaluation Contrasted with Effectiveness, ECIR 2010, pp. 204-216, 2010 [pdf], [poster], [link]

Claudia Hauff, Djoerd Hiemstra, Leif Azzopardi and Franciska de Jong, A Case for Automatic System Evaluation, ECIR 2010, pp. 153-165, 2010 [pdf], [slides], [link]


Claudia Hauff, Djoerd Hiemstra, Franciska de Jong and Leif Azzopardi, Relying on topic subsets for system ranking estimation, CIKM 2009, pp. 1859-1862 [link]

Ricardo Baeza-Yates, Vanessa Murdock and Claudia Hauff, Efficiency trade-offs in two-tier web search systems, SIGIR 2009, pp. 163-170, 2009 [link]

Claudia Hauff and Leif Azzopardi, When is Query Performance Prediction Effective?, SIGIR 2009, pp. 829-830, 2009 [link], [pdf]

Claudia Hauff, Leif Azzopardi and Djoerd Hiemstra, The Combination and Evaluation of Query Performance Prediction Methods, ECIR 2009, pp. 301-312, 2009 [pdf], [slides]


A. Overwijk, D. Nguyen, C. Hauff, R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong, On the Evaluation of Snippet Selection for Information Retrieval, LNCS - CLEF 2008 [pdf] [link]

D. Nguyen, A.Overwijk, C.Hauff, R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong, WikiTranslate: Query Translation for Cross-lingual Information Retrieval using only Wikipedia, LNCS - CLEF 2008 [pdf] [link]

Claudia Hauff, Query Difficulty for Digital Libraries, presented at the ECDL 2008 Doctoral Consortium, published in the Fall 2009 issue of the TCDL Bulletin (Volume 5, Issue 2) [pdf]

Claudia Hauff, Djoerd Hiemstra and Franciska de Jong, A Survey of Pre-Retrieval Query Performance Predictors, CIKM 2008, pp. 1419-1420, 2008 [link]

Claudia Hauff, Vanessa Murdock and Ricardo Baeza-Yates, Improved Query Difficulty Prediction for the Web, CIKM 2008, pp. 439-448, 2008 [link]


R. Aly, C. Hauff, W. Heeren, D. Hiemstra, F. de Jong, R. Ordelman, T. Verschoor and A. de Vries, The Lowlands team at TRECVID 2007, TRECVID 2007 [pdf]

Djoerd Hiemstra, Claudia Hauff, Franciska de Jong and Wessel Kraaij, SIGIR's 30th anniversary: an analysis of trends in IR research and the topology of its community, ACM SIGIR Forum, Vol. 41, No. 2 [link]

Claudia Hauff, Robin Aly and Djoerd Hiemstra, The Effectiveness of Concept Based Search for Video Retrieval, WIR 2007 [pdf]

Claudia Hauff, Dolf Trieschnigg and Henning Rode, University of Twente at GeoCLEF 2006: geofiltered document retrieval, CLEF 2006, LNCS 4730, pp. 958-961, 2007 [link]


Claudia Hauff and Andreas Nürnberger, Utilizing scale-free networks to support the search for scientific publications, Proc. of the Dutch Belgian Workshop in Information Retrieval (DIR'06), 2006 [pdf]


Claudia Hauff and Andreas Nürnberger, On the use of scale-free networks for information network modelling, Proc. of 1st European Symposium on Nature-inspired Smart Information Systems, 2005 [pdf]

Claudia Hauff and Leif Azzopardi, Age dependent document priors in link structure analysis, 27th European Conference on IR Research (ECIR'05), 2005 [pdf]