Information retrieval models foundations and relationships / Thomas Roelleke.
Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the ve...
Saved in:
Online Access: |
Access E-Book |
---|---|
Access Note: | Access to electronic resources restricted to Simmons University students, faculty and staff. Access to electronic resources restricted to Simmons University students, faculty and staff. |
Main Author: | |
Format: | Electronic eBook |
Language: | English |
Published: |
San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) :
Morgan & Claypool,
c2013.
|
Series: | Synthesis digital library of engineering and computer science.
Synthesis lectures on information concepts, retrieval, and services ; # 27. |
Subjects: |
Summary: | Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the vector-space model (VSM), the probabilistic relevance framework (PRF), the binary independence retrieval (BIR) model, BM25 (Best-Match Version 25, the main instantiation of the PRF/BIR), and language modelling (LM). Also, the early 2000s saw the arrival of divergence from randomness (DFR). Regarding intuition and simplicity, though LM is clear from a probabilistic point of view, several people stated: "It is easy to understand TF-IDF and BM25. For LM, however, we understand the math, but we do not fully understand why it works." This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN's), and divergence-based models. The aim is to create a consolidated and balanced view on the main models. A particular focus of this book is on the "relationships between models." This includes an overview over the main frameworks (PRF, logical IR, VSM, generalized VSM) and a pairing of TF-IDF with other models. It becomes evident that TF-IDF and LM measure the same, namely the dependence (overlap) between document and query. The Poisson probability helps to establish probabilistic, non-heuristic roots for TF-IDF, and the Poisson parameter, average term frequency, is a binding link between several retrieval models and model parameters. |
---|---|
Item Description: | Part of: Synthesis digital library of engineering and computer science. Series from website. |
Physical Description: | 1 electronic text (xxi, 141 p.) : ill., digital file. Also available in print. |
Format: | Mode of access: World Wide Web. System requirements: Adobe Acrobat Reader. |
Bibliography: | Includes bibliographical references (p. 127-134) and index. |
ISBN: | 9781627050791 (electronic bk.) |
ISSN: | 1947-9468 ; |
Access: | Access to electronic resources restricted to Simmons University students, faculty and staff. Access to electronic resources restricted to Simmons University students, faculty and staff. |