CS6550

Introduction to Information Retrieval

Class Hours: Tuesday/Thursday 9:10-10:30am, WEB 1250

Instructor

Qingyao Ai

Office Hours: Thursday 10:40am-11:40am, MEB 2172

Prerequisites

Text Books (optional):

Resources

Grading

The grade will count the assessments using the following proportions:

*Bonus:

5% to 10% bonus points are available for students who sign up and finish a paper presentation in one of the lectures. Papers can be selected from Readings of the class schedule or proposed by the student. Each presentation include 15 minutes talk and 5 minutes QA. The bonus points are determined according to the quality of the presentation.

P.S. Doing presentation with classmates together is acceptable, but the bonus points would be split and assigned equally to all the participants.

Tentative Class Schedule

Week, Date Topic Reminder Slides Readings
Week 01, 01/06 - 01/10 ▪ Course Overview
Introduce information retrieval and the course organization.
▪ The life of a query
Modern search engine architecture overview and how information is retrieved for a given search query.
     
Week 02, 01/13 - 01/17 ▪ Evaluation
Introduce the concept of ranking and how to evaluate IR systems with ranking metrics.
▪ Crawls and feeds
How to collect and create information retrieval collections.
    NDCG: Järvelin and Kekäläinen (TOIS 2002)
Week 03, 01/20 - 01/24 ▪ Processing text
The basic techniques for text processing such as stemming, stop words removal, etc.
▪ Processing Web
Link analysis and spam detection.
     
Week 04, 01/27 - 01/31 ▪ Indexing
Introduce basic indexing techniques (e.g., inverted index) for efficient information retrieval.
▪ Compression
Data compression and index compression algorithms.
     
Week 05, 02/03 - 02/07 ▪ Queries
Query process techniques including suggestions, reformulation, etc.
▪ Retrieval Model (1)
Vector space models.
     
Week 06, 02/10 - 02/14 ▪ Retrieval Model (2)
Basic language modeling approaches and smoothing.
▪ Retrieval Model (3)
Query likelihood model and KL-divergence.
    Query Likelihood model Ponte and Croft (SIGIR 1998)
KL-divergence model Lafferty and Zhai (SIGIR 2001)
Language smoothing Lafferty and Zhai (SIGIR 2001)
Week 07, 02/17 - 02/21 ▪ Retrieval Model (4)
Enhanced language modeling approaches.
▪ Relevance Feedback
The concept and basic techniques of relevance feedback and pseudo relevance feedback.
    Cluster-based LM Liu and Croft (SIGIR 2004)
Dependence models Metzler and Croft (SIGIR 2005)
Huston and Croft (CIKM 2014)
Relevance Model Lavrenko and Croft (SIGIR 2001)
Model-based feedback model Zhai and Lafferty (CIKM 2001)
Week 08, 02/24 - 02/28 ▪ Search Result Diversification
The motivation of result diversification and classic models.
▪ Mid-term Exam
     
Week 09, 03/02 - 03/06 ▪ Learning to Rank (1)
The motivation and basic concepts of learning to rank, including query-document pairs, feature vectors, etc.
▪ Learning to Rank (2)
The optimization paradigms for learning-to-rank models, e.g. pointwise, pairwise, and listwise methods.
     
Week 10, 03/09 - 03/13 Spring Break      
Week 11, 03/16 - 03/20 ▪ Course Project Proposal Presentation
Proposal presentation and mutual evaluation
Classification and Clustering
The concepts and algorithms for text classification and clustering.
     
Week 12, 03/23 - 03/27 ▪ Beyond bag-of-words (1)
Latent space models and distributed representations.
▪ Beyond bag-of-words (2)
Neural Information Retrieval.
    LSI Deerwester et al. (JASIS 1990)
Week 13, 03/30 - 04/03 ▪ Recommendation systems (1)
The basic concepts and naive algorithms for recommender systems.
▪ Recommendation systems (2)
Collaborative filtering.
     
Week 14, 04/06 - 04/10 ▪ Adv. User Study and Crowdsourcing
The study and applications of user modeling and crowdsourcing in information retrieval.
▪ Adv. Click Models and Unbiased Learning
The idea of biased user behavior and the algorithms for unbiased optimization.
     
Week 15, 04/13 - 04/17 ▪ Adv. Personalization
How to personalize retrieval results according to individual information needs.
▪ Adv. Question Answering
Introduction on classic and state-of-the-art methods for question answering in open domains.
     
Week 16, 04/20 - 04/24 ▪ Course Project Final Presentation
The presentation and evaluation of course projects.
     
Week 17, 04/27 - 05/01 ▪ Course Project Report Due      

Acknowledgements

Special thanks to Dr. Hamed Zamani from Microsoft Research, Prof. Jiepu Jiang from Virginia Tech University, and Prof. James Allan from University of Massachusetts Amherst. Some teaching materials are borrowed from their course sites for CS656: Information Retrieval.