EECS 395, 495: Web Information Retrieval and Extraction

Quarter Offered

None ;


EECS 311


This course covers the fundamentals of Internet search engines, including Web crawlers, inverted indices, hyperlink analysis, and relevance ranking.  Also covered are advanced topics including information extraction for knowledge base construction, question answering, search marketing and ad targeting, and activity mining for relevance optimization and personalization.

INSTRUCTOR: Prof. Doug Downey

Required Textbook: None (readings will be assigned from online tutorials and research papers).

Course Goals: Search engines play a critical role in helping people utilize the vast and ever-increasing body of data available on the World Wide Web.  The first learning objective of this course is to understand the fundamentals of Web search engines and their enabling technologies.  The second is to explore new frontiers in Web search research, through class presentations/discussions, and a substantial project.

Detailed Course Topics:

  • Web crawlers
  • Inverted Indices
  • MapReduce/Hadoop
  • Hyperlink analysis and PageRank
  • Text classification
  • Relevance ranking
  • Latent Semantic Analysis
  • Web information extraction
  • Wrapper induction
  • Question answering
  • Search advertising
  • Activity Mining
  • Personalization

Course organization: The course will divided into roughly a half-lecture, half discussion format.  For the discussion portion, students will be expected to participate in and lead discussions (class participation is 30% of the grade).  A substantial course project comprises the remaining 70% of the grade.