|
|
OUTLINE OF THE INFORMATION RETRIEVAL (IR) PROBLEM
Philip Doty
- Given what we have accumulated in the cultural record, how does one
find information?
- Corollary to the "classic IR problem": How do we distinguish what
we want from the sea of what we do not want, especially the bad, i.e.,
the irrelevant, unreliable, inaccurate, outdated, misleading, etc.?
- The "IR problem" is especially problematic as knowledge increases,
as the number of media and platforms increases, as the integration of
media grows, as the interoperability of platforms increases, and as
we face information overload.
CLASSICAL (SIMPLE) MEASURES OF INFORMATION RETRIEVAL
(IR)
| Precision |
= |
number of relevant documents/records retrieved
number of documents/records retrieved |
| Recall |
= |
number of relevant documents/records retrieved
number of relevant documents/records in dB(s) |
MAJOR DIFFICULTIES WITH RECALL AND PRECISION
- Defining relevance
- Searches are not discrete events
- Problems with real users, real information needs, and large corpora
- Assumed equivalence of information/document retrieval with the satisfaction
of information need
- "Tyranny of topic"
- System performance is concerned with users' satisfaction only obliquely
TRENDS IN IR
- Composite (multimedia) documents
- Documents in electronic formats only, i.e., documents "born digital"
o Dislocation of electronic and print "counterparts"
- Increased end-user computing, disintermediation; including the use
of intelligent agents and meta-search engines on the Web
- Direct marketing to end-users
- Document delivery
- Virtual reality and navigational tools in "information space"
- Use of fuzzy logic
- Parallel processing, e.g., neural nets and genetic algorithms
- Natural Language Processing
- "Undiscovered public knowledge"
|