|
Title: Query Languages
Summary: The author begins by making a clear distinction between data retrieval and information retrieval.
- Protocol - languages that a higher level software package should use to query an online database or a CD-ROM archive.
- Document or Retrieval Unit - basic element which can be retrieved as an answer to a query (normally a set of basic elements is retrieved, sometimes ranked by relevance or other criterion).
- Query - formulation of a user information need.
- Term Frequency - number of times a word appears inside a document.
- Inverse Document Frequency - number of documents in which a word appears.
- Keyword-Based Query
- Single Word Query
- Context Query (Phrase, Proximity)
- Boolean Query - has a syntax composed of atoms (i.e. basic queries) that retrieve documents, and of boolean operators which work on their operands.
- Natural Language
- Pattern Matching
- Allows retrieval of pieces of text that have some property.
- It is more difficult to rank the results of a pattern matching expression.
- Types of pattern: words, prefixes, suffixes, substrings, ranges, allowing errors, regular expressions.
- Structural Query - text collections tend to have some structure built into them (e.g. HTML).
- Fixed Structure - e.g. mail (To, From, Body).
- Hypertext - directed graph where the nodes hold some text and the links represent connections between nodes or between positions inside the nodes.
- Hierarchical Structure
|