 |

 |
| |
Online Information Retrieval (Harter) - Chapter 3
|
|
|
|  |  |
|
 |
|
|
|
Title: Database Structure, Organization, and Search
Summary: This chapter is written from the users' perspectives.
- Record - refers to a document surrogate - a representation of the document for storage and subsequent retrieval.
- Entity - objects about which information will be stored. Entities are considered in terms of their characteristics, called attributes.
- Field - a set of characters that represent the value of an attribute for the entity under consideration.
- Hierarchy of data elements: bit → byte → subfield → field → record → database → library.
- Linear File - a set of index records in which each record describes one item or entity, and are arranged in an order based on teh values of one or more attributes.
- Inverted Index - consists of records, typically alphabetically arranged, that are created from a linear file.
- Document / Term Matrix - rows are made up of documents or records (linear file); while columns are made up of index terms (inverted index). Example on page 73.
- Controlled Vocabulary - can be used for searching related terms.
- Boolean Operators - And, Or, Not: the order of operations is important and can be ambiguous.
- Word Proximity - e.g. two search terms to be adjacent; or present in a particular field or fields such as abstract or title; or present together in any field, sentence; or separated by n or fewer words.
- Truncation - to search on a piece of a longer word or phrase, usually its leftmost portion - using a wildcard (e.g. *).
- Stop Words - have no value for indexing or retrieval, and receive no entries made in the inverted index (e.g. a, an, and, by).
|
 |
|
|
|


|
|
 |

|
 |

 |
| |
An experimental in building profiles in information filtering (Quiroga, Mostafa)
|
|
|
|  |  |
|
 |
|
|
|
Reference: Quiroga, Luz M. and Javed Mostafa (2002). "An experiment in building profiles in information filtering: the role of context of user relevance feedback." Information Processing and Management 38, 671-694.
Summary:
- In the first phase of the study, three different modes (explicit, implicit, and combination of both) of profile acquisition were compared. The intention was to see how relevance could be used to build and adjust profiles in a way that improves the performance of filter systems. The independent variable was the three modes of profile acquisition and the dependent variable was the effectiveness of filtering measured with the NP metric. For the second phase of the study, the research question was: what are the dimensions that influence user's feedback judgements and what is the role of context in these judgements.
- Explicit - Allow users to directly specify the profile.
- Implicit - Utilize relevance feedback to create and refine the profile.
- Combination - Allow users to initialize the profile and continuously refine it using relevance feedback.
- Relevance - User decision to accept or reject information retrieved from an information system.
- There is no consensus as to the factors that contribute to human relevance assessments.
- Characteristics of the user that influence feedback
- Demographic data (age, gender, population, community, marital status)
- Domain expertise (professional/occupation, education, projects)
- Lifestyle (hobbies, habits, health disabilities)
- Health status (illness, intolerance, propensities)
- Health concerns of friends and relatives
- Characteristics of documents that influence feedback
- Topical factors
- Orientation or facet
- Specificity
- Combination of topics and facets
- Non-topical factors
- Credibility of the information source
- Comprehensibility and approach
- Novelty
- Format
- Target audience
- Conclusion: Negative feedback is not well supported in many IR and recommenders.
- Conclusion: Another issue that needs investigating is how to design a mechanism to collect feedback without overwhelming the user.
|
 |
|
|
|

|
|