Scholarly Commons

An electronic repository for the intellectual products of the Miami University community

Analysis of Multiterm Queries in Partitioned Signature File Environments

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Aktug, Deniz en_US
dc.date.accessioned 2008-07-22T19:31:12Z en_US
dc.date.accessioned 2013-07-10T15:06:39Z
dc.date.available 2008-07-22T19:31:12Z en_US
dc.date.available 2013-07-10T15:06:39Z
dc.date.issued 1993-04-01 en_US
dc.date.submitted 2008-03-17 en_US
dc.identifier.uri
dc.identifier.uri http://hdl.handle.net/2374.MIA/199 en_US
dc.description.abstract The concern of this study is the signature files which are used for information storage and retrieval in both formatted and unformatted databases. The analysis combines the concerns of signature extraction and signature file organization which have usually been treated as separate issues. Both the uniform frequency and single term query assumptions are relaxed and a comprehensive analysis is presented for multiterm query environments where terms can be classified based on their query and database occurrence frequencies. The performance of three superimposed signature generation schemes is explored as they are applied to a dynamic signature file organization based on linear hashing: Linear Hashing with Superimposed Signatures (LHSS). First scheme (SM) allows all terms set the same number of bits regardless of their discriminatory power whereas the second and third methods (MMS and MMM) emphasize the terms with high query and low database ooccurrence frequencies. Of these three schemes, only MMM takes the probability distribution of the number of query terms into account in finding the optimal mapping strategy. The main contribution of the study is the derivation of the performance evaluation formulas which is provided together with the analysis of various experimental settings. Results indicate that MMM outperforms the other methods as the gap between the discriminatory power of the terms gets larger. The absolute value of the savings provided by MMM reaches a maximum for the high query weight case. However, the extra savings decline sharply for high weight and moderately for the low weight queries with the increase in database size. The applicability of the derivations to other partitioned signature organizations is discussed and a detailed analysis of Fixed Prefix Partitioning (FPP) is provided as an example. An approximate formula that is shown to estimate the performance of both FPP and LHSS within an acceptable margin of error is also modified to account for the multiterm case. en_US
dc.title Analysis of Multiterm Queries in Partitioned Signature File Environments en_US
dc.type Text en_US
dc.type.genre Report en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search SC


Advanced Search

Browse

My Account