Embed
Email

IQ-PDFQuery

Document Sample

Shared by: Nuhman Paramban
Categories
Tags
Stats
views:
0
posted:
11/21/2011
language:
English
pages:
2
IQPDF - A better PDF search tool - Ken B Smith, Auckland - June 2011



On the CD supplied you will see a number of files. Copy them all to a sub directory somewhere,

or onto a stick or even leave them on the CD (although that can be a little slow). Navigate to

that directory and run the file IQ.EXE or put a shortcut to it on your desktop. The program is

written so that it requires no installation or registry entries, so it can do no harm at all to your

system at all, and can be run in the most restrictive mode. This copy is valid until 31 Dec 2011.

This is not to be awkward, but it is a Beta and the file structure will not be maintained into the

final version.



OK, you run IQPDF.EXE and you have a selection of PDFs to play with. The King James Bible and

the Complete Shakespeare plus some B777 technical manuals.. The text box is a free text query.

Type anything you like and see what happens when you click GO.



IQPDF sends its' output pages to SumatraPDF - an amazing fast and lightweight PDF reader. The

pages will stack with the best search result on top. use more tabs / pages to get a deeper

result.



Searching is all about speed and accuracy. The speed of IQPDF has to be tried to be believed.

The accuracy is down to you. The more words it can find on a page then the more accuracy you

will get. However more words can often bring up pages that you do not want - perhaps. But

that is part of the wonder of IOPDF. You find things and relationships in a search that you never

knew were there.



In practice the word order of your search has meaning. The first word carries more weight than

the last. Word order is not important. Unlike conventional PDF searches, a contiguous word

spread is not required. If a word appears anywhere on a page - it will score score that page.

The whole thing is quite intuitive - give it a try.



Because IQPDF is so fast - it is able to search many PDFs at once and rank pages from all the

PDFs for display. On average a 20MB - 25000 word PDF will take around 250ms (a quarter of a

second) to search. Believe it or not - that is all. Try it.





On the King James - try "Pale Horse" and a good start for Shakespeare would be "Scotch". or

better "band of brothers". Capitalization is ignored throughout as are hyphens and other

punctuation. Link words and single letters are discarded.



Whatever you search for - finding all your words on one page will get that page on the first PDF

page displayed, and as you move down the page stack, the word count may be less. A single

occurrence of a word in a document will get you just one page..



Searching a long and complex document takes a little thought or plain luck to get exactly what

you need first try. The progressive ranking in tabs helps no end. Use more pages / tabs to get

more depth of search.



The King James has 531431 words in 2444 pages and Shakespeare is showing 731211 words in

3066. A search of multiple words in Shakespeare, such as "scotch the snake not killed it" will

take around 2 seconds depending upon your PC speed. The FOXIT tabbed output takes longer

than that to deploy, particularly as I have a 500ms delay on the tab creation for each page to

allow FOXIT to keep up with my program.



If you forget to close the FOXIT reader between searches, it will just create more tabs for

subsequent results. This is deliberate and useful. Likewise the "All Files" option will scan all files

in the current directory and combine results. This is useful for our work, but hardly likely to

produce sensible results on the mix of PDFs provided.



How does it work? Well searching a conventional PDF is just too slow and awkward. Even trying

to work inside a PDF in real time is problematic. What IQ does is use a type of 3D binary

concordance (created by another module) and you can see this in the IDX files. The supplied

index files are effectively final format except that in this version I have left the keywords in plain

text to enable you to see some of the structure. I have been using this version to debug and

perform speed tests. Later versions use a full binary index and the ability to see these words is

removed as everything is in binary format.



The KJ Bible contains 13650 discrete words (that I found) and the Works of Shakespeare run to

24955 individual words, although there a few double words in there due to typesetting faults.

This is interesting on several levels, not least of which is the shear depth of Bill's vocabulary.









Enjoy







Ken



Other docs by Nuhman Paramba...
PressurVacuumTreceability
Views: 0  |  Downloads: 0
Chapter 11 review pp 332-349
Views: 15  |  Downloads: 0
arbete
Views: 6  |  Downloads: 0
CMAB Student Handbook SY2009-2010
Views: 0  |  Downloads: 0
Plumbing Mechanical Systems
Views: 0  |  Downloads: 0
HighfieldsBookingform2011
Views: 0  |  Downloads: 0
Inquiry_2_LessonPlan_DictionaryDive
Views: 0  |  Downloads: 0
tennisclassicgfernandezpr
Views: 1  |  Downloads: 0
jobapplicationformOCT2010
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!