Taking a look at health campaigns through the eyes of Twitter: a short paper and a workshop paper acceptance at EMNLP 2015.
Two Master thesis works I supervised have recently been accepted as full papers at ICWE 2015 and ACM HyperText 2015.
I am an Assistant Professor at the Web Information Systems group,
Delft University of Technology. Between 2011 and 2012 I
worked as Postdoc in the same group, conducting research in the
scope of the ImREAL project.
I received my PhD from the University of Twente, where I worked in the Human Media Interaction group. The Otto-von-Guericke University of Magdeburg in Germany was my home during my undergraduate years as a student in computer science.
In the past, I have worked on a variety of topics in the fields of information retrieval (IR) & data science, including query performance prediction (the topic of my PhD thesis), social search, computational social science, learning to search and IR for specific user groups (e.g. children).
I have diverse research interests and am always happy to dive into a new area. Together with three PhD students I am supervising (Guanliang Chen, Yue Zhao and Dan Davis), I am currently focusing heavily on large-scale learning analytics and how to incorporate search into the learning process at scale. TU Delft is a very active provider of Massive Open Online Courses (MOOCs) on the edX platform, which provides a rich experimental playground; ultimately, I (and the whole WIS Learning Analytics team) aim to provide insights and develop mechanisms to improve learning at scale.
My research vision for the coming years focuses on two projects (MoSeS and BoOLE), connected through the MOOC use case.
TU Delft - EWI/ST/WIS PO Box 5031 2600 GA Delft The Netherlands Office: HB 08.100 Email: c.hauff[at]tudelft.nl or claudia.hauff[at]gmail.com
Data SetsA list of data sets derived from our research that are publicly available:
- MediaEval 20013 Placing Task data
- Author Verification data based on Wikipedia Talkpages (SIGIR 2014)
- Co-organizer of the Delft Data Science Seminar on Online Education (March 9, 2015)
- Co-organizer of the TAIA 2015 workshop (as well as TAIA 2014)
- Co-organizer of the Placing Task @ MediaEval 2015 (as well as in 2013)
- Co-organizer of ECIR 2016 (demo co-chair)
I enjoy teaching. A lot. It is one of the reasons why I chose a career in academia. As a teacher of Bachelor courses, I design my courses to instill three qualities into my students: confidence, independence and problem understanding
- Lecturer of the 2nd year BSc course Big Data Processing (Q2 2013/14, Q2 2014/15) [lecture slides of the 2014/15 edition]
- Lecturer (50%) of the 1st year BSc course Web- and Database Technology (Q2 2013/14, Q2 2014/15) [lecture slides of the 2014/15 edition]
- Lecturer of the MSc course Information Retrieval (Q3 2011/2012) [lecture slides]
MSc thesis supervision
Students I have supervised and am currently supervising are researching a range of topics related to information retrieval & learning analytics including spelling correction, big data architectures, topic modeling, query log analysis and cheating in MOOCs. Several of my students have published their MSc work at high-quality conference venues, including ICWE, ACM SIGIR and ACM HyperText.
If you are interested in a Master thesis research project in information retrieval, have a look at the ongoing benchmark campaigns - often a good starting point for finding a topic in information retrieval:
If you are interested in a thesis in learning analytics, have a look at the proceedings of the first and second edition of the ACM Learning At Scale conference - they contain many interesting contributions in terms of data science and Web engineering.
- Temporal distribution of microblog qrels in a diversity/novelty setup.
- Visualization on the accuracy of Flickr geotags (SIGIR 2013 work)
- A visualization of an automatic retrieval system evaluation technique and its application on more than 20 TREC tasks.
- A visualization that shows the unique contributions of TREC runs to the relevance assessment pool.
- A visualization of query difficulty.
Publications [DBLP] [Google Scholar]
Jaeyoung Choi, Claudia Hauff, Olivier van Laere and Bart Thomee, The Placing Task at MediaEval 2015, overview paper of our benchmark running at MediaEval 2015
Dong Nguyen, Tijs A. van den Broek, Claudia Hauff and Djoerd Hiemstra, #SupportTheCause: Identifying Motivations to Participate in Online Health Campaigns, accepted as short paper at EMNLP 2015
Nugroho Dwi Prasetyo, Claudia Hauff, Dong Nguyen, Tijs van den Broek and Djoerd Hiemstra, On the Impact of Twitter-based Health Campaigns: A Cross-Country Analysis of Movember, accepted at the LOUHI 2015 workshop (co-located with EMNLP 2015)
Nugroho Dwi Prasetyo and Claudia Hauff, Twitter-based election prediction in the developing world, accepted as a full paper at ACM HyperText 2015
Morgan Harvey, David Elsweiler and Claudia Hauff, Learning by Example: training users through high-quality query suggestion, accepted as a full paper at SIGIR 2015
Yue Zhao and Claudia Hauff, Sub-document timestamping of Web documents, accepted as a short paper at SIGIR 2015
Claudia Hauff and Georgios Gousios, Matching GitHub developer profiles to job advertisments, accepted as short paper at the 12th Working Conference on Mining Software Repositories
Dirk Guijt and Claudia Hauff, Using Query-Log based Collective Intelligence to Generate Query Suggestions for Tagged Content Search, accepted as a full paper at ICWE 2015
Martha Larson, Pascal Kelm, Adam Rae, Claudia Hauff, Bart Thomee et al., The Benchmark as a Research Catalyst: Charting the Progress of Geo-prediction for Social Multimedia, book chapter in Multimodal Location Estimation of Videos and Images (Springer Publishing), 2014 [link]
Ke Tao, Claudia Hauff, Geert-Jan Houben, Fabian Abel, and Guido Wachsmuth, Facilitating Twitter Data Analytics: Platform, Language, and Functionality, accepted as a full paper at IEEE BigData 2014
Jie Yang, Claudia Hauff, Alessandro Bozzon, and Geert-Jan Houben, Answering the Right Question: on the Editing of Questions in Collaborative Question Answering Systems, accepted as full paper at ACM Hypertext 2014. [pdf] [slides]
Claudia Hauff, Bart Thomee, and Michele Trevisiol, Working Notes
for the Placing Task at MediaEval, In MediaEval 2013 Workshop, 2013 (task organizers) [pdf]
The Placing Task 2013 data can be downloaded here.
Ke Tao, Claudia Hauff and Geert-Jan Houben, Building a Microblog Corpus for Search Result Diversification, accepted as a full paper at AIRS 2013 [pdf]
Gudrun Wesiak, Adam Moore, Christina M. Steiner, Claudia Hauff, Conor Gaffney, Declan Dagger, Dietrich Albert, Fionn Kelly, Gary Donohoe, Gordon Power and Owen Conlan, Affective Metacognitive Scaffolding and User Model Augmentation for Experiential Training Simulators: A Follow-up Study , EC-TEL 2013, pp. 396-409, 2013
Adam Moore, Gudrun Wesiak, Christina M. Steiner, Claudia Hauff, Declan Dagger, Gary Donohoe and Owen Conlan, Utilizing social neworks for user model priming: user attitudes, UMAP Workshops: Late Breaking Results, [pdf]
Christophe Deloo and Claudia Hauff, Exploiting Semantic Relatedness Measures for Multi-label Classifier Evaluation, accepted as research contribution at the Dutch-Belgian IR Workshop 2013 [pdf] [proceedings link]
Ke Tao, Fabian Abel, Claudia Hauff, Geert-Jan Houben and Ujwal Gadiraju, Groundhog Day: Near-Duplicate Detection on Twitter, WWW 2013, pp. 1273-1284, 2013 [slides]
Claudia Hauff and Gerald Friedland, Brave New Task: User Account Matching, MediaEval: Benchmarking Initiative for Multimedia Evaluation, 2012 [pdf] [slides]
Fabian Abel, Claudia Hauff, Geert-Jan Houben and Ke Tao, Leveraging User Modeling on the Social Web with Linked Data, ICWE 2012, pp. 378-385, 2012 [slides (Ke Tao)]
Ke Tao, Fabian Abel, Claudia Hauff and Geert-Jan Houben, Twinder: a search engine for Twitter streams, ICWE 2012, pp. 153-168, 2012 [slides (Ke Tao)]
Fabian Abel, Claudia Hauff, Geert-Jan Houben, Ke Tao and Richard Stronkman, Twitcident: Fighting Fire with Information from Social Web Streams, WWW '12 Companion, pp. 305-308, 2012 [link to paper] [link to website]
Ke Tao, Fabian Abel, Claudia Hauff and Geert-Jan Houben, What makes a tweet relevant for a topic?, WWW 2012 workshop "Making Sense of Microposts" [proceedings]
Fabian Abel, Claudia Hauff, Geert-Jan Houben, Ke Tao and Richard Stronkman, Semantics + Filtering + Search = Twitcident: Exploring Information in Social Web Streams, ACM Hypertext, pp. 285-294, 2012 [link]
Ke Tao, Fabian Abel and Claudia Hauff, WISTUD at TREC 2011: Microblog Track [pdf]
Claudia Hauff and Geert-Jan Houben, Simulating Memory Recall in Personal Search, EPS 2011 (Evaluating Personal Search Workshop), 2011 [link to proceedings]
Fabian Abel, Ilknur Celik, Claudia Hauff, Laura Hollink and Geert-Jan Houben, U-Sem: Semantic Enrichment, User Modeling and Mining Usage Data on the Social Web, 1st International Workshop on Usage Analysis and the Web of Data (USEWOD2011), short paper, 2011 [pdf]
Dolf Trieschnigg and Claudia Hauff, Classic Children's Literature - Difficult to Read?, ECIR 2011, pp. 691 - 694, 2011 [link]
Djoerd Hiemstra and Claudia Hauff, MapReduce for experimental search, TREC 2010 [link]
Claudia Hauff and Dolf Trieschnigg, Enhancing Access to
Classic Children's Literature, BooksOnline
2010 workshop (co-located with CIKM) [pdf] [link]
The proposed project was awarded a seed fund, sponsored by Microsoft Research. [Workshop report]
Claudia Hauff, Leif Azzopardi and Diane Kelly, A Comparison of User and System Performance Predictions, CIKM 2010, pp. 979 - 988, 2010 [link]
Djoerd Hiemstra and Claudia Hauff, MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data", CLEF 2010, pp. 64 - 69, 2010 [pdf]
Guido Zuccon, Leif Azzopardi, Claudia Hauff and C.J. van Rijsbergen, Estimating Interference in the QPRP for Subtopic Retrieval, SIGIR 2010, pp. 741-742, 2010 [link]
Claudia Hauff, Djoerd Hiemstra, Franciska de Jong and Leif Azzopardi, Relying on topic subsets for system ranking estimation, CIKM 2009, pp. 1859-1862 [link]
Ricardo Baeza-Yates, Vanessa Murdock and Claudia Hauff, Efficiency trade-offs in two-tier web search systems, SIGIR 2009, pp. 163-170, 2009 [link]
D. Nguyen, A.Overwijk, C.Hauff, R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong, WikiTranslate: Query Translation for Cross-lingual Information Retrieval using only Wikipedia, LNCS - CLEF 2008 [pdf] [link]
Claudia Hauff, Query Difficulty for Digital Libraries, presented at the ECDL 2008 Doctoral Consortium, published in the Fall 2009 issue of the TCDL Bulletin (Volume 5, Issue 2) [pdf]
Claudia Hauff, Djoerd Hiemstra and Franciska de Jong, A Survey of Pre-Retrieval Query Performance Predictors, CIKM 2008, pp. 1419-1420, 2008 [link]
Claudia Hauff, Vanessa Murdock and Ricardo Baeza-Yates, Improved Query Difficulty Prediction for the Web, CIKM 2008, pp. 439-448, 2008 [link]
R. Aly, C. Hauff, W. Heeren, D. Hiemstra, F. de Jong, R. Ordelman, T. Verschoor and A. de Vries, The Lowlands team at TRECVID 2007, TRECVID 2007 [pdf]
Djoerd Hiemstra, Claudia Hauff, Franciska de Jong and Wessel Kraaij, SIGIR's 30th anniversary: an analysis of trends in IR research and the topology of its community, ACM SIGIR Forum, Vol. 41, No. 2 [link]
Claudia Hauff, Robin Aly and Djoerd Hiemstra, The Effectiveness of Concept Based Search for Video Retrieval, WIR 2007 [pdf]
Claudia Hauff, Dolf Trieschnigg and Henning Rode, University of Twente at GeoCLEF 2006: geofiltered document retrieval, CLEF 2006, LNCS 4730, pp. 958-961, 2007 [link]
Claudia Hauff and Andreas Nürnberger, Utilizing scale-free networks to support the search for scientific publications, Proc. of the Dutch Belgian Workshop in Information Retrieval (DIR'06), 2006 [pdf]
Claudia Hauff and Andreas Nürnberger, On the use of scale-free networks for information network modelling, Proc. of 1st European Symposium on Nature-inspired Smart Information Systems, 2005 [pdf]
Claudia Hauff and Leif Azzopardi, Age dependent document priors in link structure analysis, 27th European Conference on IR Research (ECIR'05), 2005 [pdf]