Google XML Reference

Document Sample
Google XML Reference Powered By Docstoc
					   Google XML
    Reference
Google Search                  Confidential: For
                                                         Revised October, 2005
Appliance                     Customer Use Only


Google has developed a simple HTTP-based protocol for serving search results.
Search administrators have complete control over how search results are requested
and presented to the end user. This document describes the technical details of
Google search request and results formats. It assumes that the reader has basic
understanding of the HTTP protocol and the HTML document format.



Contents

1. Overview

2. Request Format
  2.1 Request Overview
  2.2 Search Parameters
  2.3 Query Terms
  2.4 Filtering
  2.5 Internationalization
  2.6 Sorting
  2.7 Meta Tags
  2.8 Limits

3. Results Format
  3.1 Custom HTML
     3.1.1 Custom HTML Output Overview
     3.1.2 Internationalization
  3.2 XML
     3.2.1 XML Output Overview
     3.2.2 Character Encoding Conventions
     3.2.3 Google XML Results DTD
     3.2.4 Google XML Tag Definitions

Appendices
 Appendix A: Estimated vs. Actual Number of Results
 Appendix B: URL Escaping

Glossary


                                                                 [TABLE OF CONTENTS]
1. Overview
A Google search request is a simple HTTP request to the Google search engine. The
search request format and options available are detailed in the Request Format
section.

The search results are returned in the output format specified in the search request.
Currently, Google supports output results in XML and HTML format. XML
formatted results give you the power to customize the display of the results through
the implementation of a custom XML parser. The HTML results can be customized
through the application of an XSL stylesheet to the standard XML results.




                                                                    [TABLE OF CONTENTS]
2. Request Format
This section is broken into the following categories:

      Request Overview
      Search Parameters
      Query Terms
      Filtering
      Internationalization
      Sorting
      Meta Tags
      Limits



2.1 Request Overview                              [REQUEST FORMAT] - [TABLE OF CONTENTS]


Using the Google search protocol is as simple as requesting a page from a web server.
The Google search request is a standard HTTP GET command, which returns results
in either XML or HTML format as specified in the search request. The search request
is a URL combining the search engine host name, port and path; as well as a
collection of name-value pairs (input parameters) separated by & characters. Some
examples are listed below. Explanations of input parameters and output results can be
found in the remainder of this document.

Note: Google recommends performing a HTTP version 1.0 (or later) GET command.

Note: To determine which host name and port to send your search requests to, please
review your specific configuration documentation. The path to send your search
requests to is always "/search".

Examples

The query
GET /search?q=bill+material&output=xml&client=test&site=operations
would return the first 10 results matching the query "bill material" in the "operations"
collection in the Google XML output format.

The query
GET
/search?q=bill+material&start=10&num=5&output=xml_no_dtd&proxystylesh
eet=test&client=test&site=operations
would return results numbering 11-15 matching the query "bill material" in the
"operations" collection in the Google XML output format.

The query
GET
/search?q=Star+Wars+Episode+%2BI&output=xml_no_dtd&lr=lang_de&ie=lati
n1&oe=latin1&client=test&site=movies
&proxystylesheet=test
would return the first 10 German results matching the query "Star Wars Episode +I"
in the "movies" collection returned in the Google custom HTML output format by
applying the XSL stylesheet associated with the "test" front end to the standard XML
results.



2.2 Search Parameters                              [REQUEST FORMAT] - [TABLE OF CONTENTS]


This table lists all the valid name-value pairs that can be used in a search request and
descriptions of how these parameters will modify the search results.

                                                                                 Default
      Name                                  Description
                                                                                  Value
                    Defines whether the user is searching public content or
                    all content (i.e. public and secure).
                    This parameter takes effect only if Secured Content
                    Search capability is enabled.
                    The access parameter can have one of these possible
                    values:
     access          p - search public content                                      p
                     s - search secure content
                     a - search all content, both public and secure
                    The access parameter defaults to "p" if none is
                    provided.
                    Note: Secured Content Search is automatically enabled
                    for clustered appliances.
                    Modifies the as_sitesearch parameter as follows:
                        Value                  Modification
                          i     Include only results in the web directory
      as_dt                     specified by as_sitesearch                          i
                                Exclude all results in the web directory
                          e
                                specified by as_sitesearch
          Adds an additional search query term to search for the
          phrase specified.
          This parameter has the same effect as the phrase
as_epq
          special query term.                                      Empty
          Note: New query terms specified will be combined         string
          with q query terms to generate search results.
          Note: The value specified for this parameter must be
          URL-escaped.
          Adds an additional search query terms to exclude any
          of the terms specified.
          This parameter has the same effect as the exclude (-)
 as_eq
          special query term.                                      Empty
          Note: New query terms will be combined with q query      string
          terms to generate search results.
          Note: The value specified for this parameter must be
          URL-escaped.
          Additional search query term to show any pages which
          link to the specified URL.
          This parameter has the same effect as the link special
 as_lq
          query term.                                              Empty
          Note: No other query terms can be specified when         string
          using this special query term.
          Note: The value specified for this parameter must be
          URL-escaped.
          Additional search query term to specify where the
          search terms occur on the page: anywhere on the page,
          in the title, or in the URL.
          Note: Query terms specified will be combined with q
          query terms to generate search results.
          Note: The value specified for this parameter must be
as_occt   URL-escaped.                                             Empty
                                                                   string
               Value                    Meaning
                any     anywhere on the page
                title   in the title of the page
               URL      in the URL for the page

          Adds additional search query terms to find any of the
          terms specified.
          This parameter has the same effect as the OR special
 as_oq
          query term.                                              Empty
          Note: New query terms will be combined with q query      string
          terms to generate search results.
          Note: The value specified for this parameter must be
          URL-escaped.
 as_q
          Search query terms as entered by the user.               Empty
          (See Query Terms section for additional query            string
                 features.)
                 Note: Query terms specified will be combined with q
                 query terms to generate search results.
                 Note: The value specified for this parameter must be
                 URL-escaped.
              Additional search query term to show links in the
              specified web directory or to exclude those links
              depending on the value of as_dt.
              This parameter has the same effect as the site special
              query term.
              When the Google Search Appliance is sent a search
              request that includes the as_sitesearch parameter, it
              converts the value of the parameter into an argument to
              the site special query term and appends it to the value
              of q in the search results.
              For example, if your search contains the following
as_sitesearch
              parameters:                                                     Empty
                   q=mycompany&as_sitesearch=www.mycompany.com                string
                 The raw XML of your search results will contain the
                 following:
                   <q>mycompany site:www.mycompany.com</q>
                 The default XSLT stylesheet displays the value of the q
                 tag in the search box on the results page. Consequently,
                 using an as_sitesearch parameter will appear to
                 change the user's search query.
                 If the parameter and value as_dt=e are specified, -
                 site: is appended to the end of the query term.
                 Note: The value specified for this parameter must be
                 URL-escaped and contain fewer than 125 characters.
    client       A string indicating any valid front end                     REQUIRED
                 Activates or deactivates automatic results filtering
                 performed by Google search. By default, filtering is
    filter       applied to all Google results returned to improve results      1
                 quality.
                 (See Automatic Filtering section for more details.)
                 Requests that the names and values of the meta tags
                 specified be returned with each search result, when
  getfields
                 available.                                                   Empty
                 (See Meta Tags section for more details.)                    string
                 Note: All meta tag names or values specified must be
                 double URL-escaped.
                 Input Encoding
      ie
                 Sets the character encoding used to interpret the query
                                                                              latin1
                 string.
                 (See Internationalization section for details.)
      lr
                 Language restrict                                            Empty
                 Restricts searches to pages in the specified language.       string
                 (See Language Restricts section for more details.)
                 Number of results desired per a single request. The
                 maximum allowable value is 100. (The maximum
     num
                 number of results available for a query is 1,000.) See
                                                                              10
                 also start.
                 Note: The actual number of results may be smaller than
                 the requested value.
                 Number of KeyMatch results to return with the results.
    numgm        A value between 0 to 5 (inclusive) can be specified for      3
                 this option.
                 Output Encoding
      oe
                 Sets the character encoding used to encode the results
                                                                            UTF8
                 returned.
                 (See Internationalization section for details.)
                 Select the format of the search results. Valid formats
                 are:

                              Value                Output Format
                                              XML results or custom
                          xml_no_dtd
                                              HTML
                                              (See proxystylesheet
    output                                    parameter for details.)      REQUIRED
                                              XML results with
                                              Google DTD reference.
                                              If using this value,
                               xml            proxystylesheet must
                                              be omitted from the
                                              parameters or must be
                                              set to an empty string.
              Restricts the search results to documents with meta tags
              whose values contain the words or phrases specified.
partialfields (See Meta Tags section for more details.)
                                                                            Empty
                                                                            string
              Note: All meta tag names or values specified must be
              double URL-escaped.
                 Custom XML tags to be included in the XML results.
                 The only permitted values for this parameter are either
                 <HOME/>, <ADVANCED/>, or <TEST/>.
                 (See the Custom HTML output section for more
 proxycustom
                 details.)                                                  Empty
                 Note: This parameter is disabled if the search request     string
                 does not contain the proxystylesheet tag.
                 Note: If custom XML is specified, search results will
                 not be returned with the search request.
                 Note: Custom XML must be URL-escaped.
 proxyreload
                 A value of 1 indicates that the Google Search
                                                                              0
                 Appliance should update the XSL stylesheet cache to
                  refresh the stylesheet currently being requested. This
                  parameter is optional. The XSL stylesheet cache is
                  updated approximately every 15 minutes.
                  (See the Custom HTML section for more details.)
                  If the value of the output parameter is xml_no_dtd,
                  then the output format is modified by the
                  proxystylesheet value as follows:

                      Proxystylesheet
                                                Output Format
                           Value
                     Omitted            XML results
                                        XML results have a content-
                                        type of text/html (rather than
                     Empty
                                        text/xml), because the XML
                                        results are not transformed.
proxystylesheet                                                              NA
                                    Custom HTML results through
                                    application of the XSL
                     Front End Name
                                    stylesheet associated with the
                                    specified front end

                  (See the Custom HTML section for more details.)
                  Note: This parameter may also specify the identifier of
                  a valid collection. The default XSL stylesheet
                  associated with that collection will then be used for
                  custom HTML output.
                  Note: The value specified for this parameter must be
                  URL-escaped.
                          Search query as entered by the user.
                     (See Query Terms section for additional query
       q
                                                                            Empty
                                       features.)
                                                                            string
                   Note: The value specified for this parameter must be
                                     URL-escaped.
               Restricts the search results to documents that contain
               exact meta tag names or name-value pairs specified.
requiredfields (See Meta Tags section for more details.)
                                                                            Empty
                                                                            string
               Note: All meta tag names or values specified must be
               double URL-escaped.
                  The name of a collection. Note that you can search over
     site         multiple collections using the properly escaped OR (pipe REQUIRED
                        character) to separate the collection names.
                      Additional search query term to show links in the
                     specified web directory. Requires that a value for q
                  (query) be submitted as well. (The value of as_dt does
  sitesearch
                                                                            Empty
                    not modify the effect of the sitesearch parameter.)
                                                                            string
                   This parameter has the same effect as the site special
                                         query term.
                  Note: The sitesearch and as_sitesearch parameters
                     differ in how they are returned in the XML results. The
                      sitesearch parameter is not appended to the search
                     query in the results. That is, the original query term will
                          not be modified when you use the sitesearch
                                             parameter.
                      Note: The value specified for this parameter must be
                      URL-escaped and contain fewer than 125 characters.
                    Indicates alternate sorting method.
       sort
                    (See Sorting section for sort parameter format and             Empty
                    details.)                                                      string
                    Note: Only date sort is currently supported.
                    Use this parameter to support result set page navigation.
                     The maximum number of results available for a query
      start          is 1,000, i.e., the value of the start parameter added to        0
                      the value of the num parameter cannot exceed 1,000.
                                            See also num.

Custom Parameters

If any custom parameters that contain spaces are added to the search URL, the space
characters will be replaced by an underscore (_).
For example:

http://search.customer.com/search?q=customer+query&site=collection&cl
ient=collection&output=xml_no_dtd&newparam=test+this

This URL adds the custom parameter newparam with a value of "test+this." For
security reasons, all space characters (represented as a "+") in the custom parameter
newparam will be replaced by "_" characters, while built-in variables, such as q, will
not be affected.

The resulting XML will look like this:

<PARAM name="q" value="customer query"
original_value="customer+query"/>
<PARAM name="newvar" value="test_this" original_value="test+this" />

The unmodified value can still be retrieved from the original_value attribute.



2.3 Query Terms                                     [REQUEST FORMAT] - [TABLE OF CONTENTS]


Default Search

By default, Google only returns pages that include all of your search terms. There is
no need to include "AND" between terms. Keep in mind that the order in which the
terms are typed will affect the search results. To restrict a search further, just include
more terms.
Google ignores common words and characters such as "where" and "how," as well as
certain single digits and single letters, because they tend to slow down your search
without improving the results. Google will indicate if a common word has been
excluded by including text in the search comments field of the search results returned.

Special Characters

By default, all non-alphanumeric characters that are included in a search query are
treated as query term separators (just like space characters).

The exceptions to this rule are the following characters: double quote mark ("), plus
sign (+), minus sign (hyphen) (-), decimal point (.), and ampersand (&). The
ampersand character (&) is treated as another character in the query term in which it
is included. The decimal point is a query term separator unless it is part of a number
(e.g., 250.01), in which case it counts as part of the query term. The remaining
exception characters correspond to search features listed in the section below.

If your document contains a number, with or without a decimal point, that has letters
immediately before or after it, the letters are treated as a separate word or words. For
example, the string 802.11a is indexed as two separate words, 802.11 and a.

Special Query Terms

Google supports the use of several special query terms that allow the user or search
administrator to access additional capabilities of the Google search engine. These
special query terms are listed below.

Note: All query terms must be correctly URL-escaped in the search request sent to
Google search.

      Special
      Query                   Sample Usage                        Description
     Capability

                                                           Sometimes what you're
                                                           searching for has more
                                                           than one meaning. For
                                                           example, the term
                                                           "bass" can refer to
                                                           either fishing or music.
                                                           You can exclude a word
    Exclude
                  bass -music
                                                           from your search by
    Query
                                                           putting a minus sign ("-
    Term
                                                           ") immediately in front
                                                           of the term you want to
                                                           exclude from the search
                                                           results.

                                                           Note: The search
                                                           request parameter,
                                            as_eq, can also be used
                                            to submit terms to
                                            exclude.
                                            Search for complete
                                            phrases by enclosing
                                            them in quotation marks
                                            or connecting them with
                                            hyphens. Words marked
                                            in this way will appear
                                            together in all results
                                            exactly as you have
                                            entered them. Phrase
Phrase    "yellow pages"                    searches are especially
Search
                                            useful when searching
                                            for famous sayings or
                                            proper names.

                                            Note: The search
                                            request parameter,
                                            as_epq, can also be
                                            used to submit a phrase
                                            search.
                                            Google search supports
                                            the Boolean "OR"
                                            operator. To retrieve
                                            pages that include either
                                            word A or word B, use
                                            an uppercase OR
Boolean
                                            between terms.
OR        vacation london OR paris
Search
                                            Note: The search
                                            request parameter,
                                            as_oq, can also be used
                                            to submit a search for
                                            any term in a set of
                                            terms.
                                            To search a domain,
          Domain search examples:           specify a partial string
          site:www.google.com               that matches complete
          site:google.com                   name segments from
          site:com
Directory                                   the end of the canonical
Restricted Directory search examples:       host name.
Search     admission
          site:www.stanford.edu/group/uga   To search a particular
          site:www.google.com/about/        directory on a web
          site:www.google.com/about         server (including root),
                                            you must specify the
                                complete canonical
                                name of the host server
                                followed by the path of
                                the directory. The string
                                must have a "/"
                                character after the host
                                name to limit searches
                                to a single
                                server/directory. The
                                path segments searched
                                must be a complete
                                match, because there is
                                no partial path segment
                                matching. Enter the
                                query followed by
                                "site:" followed by the
                                host name and path of
                                the web directory. If the
                                ("/") character is at the
                                end of the web directory
                                path specified, then
                                only files within that
                                directory will be
                                searched and files in
                                sub-directories will not
                                be considered.
                                The URLs for these
                                queries must contain
                                fewer than 119
                                characters.

                                Note: The exclusion
                                operator ("-") can be
                                applied to this query
                                term to remove a web
                                directory from
                                consideration in the
                                search.
                                Note: Only one "site:"
                                search term per search
                                request is supported at
                                this time.
                                Note: The search
                                request parameters,
                                as_sitesearch and
                                as_dt, can also be used
                                to submit "site:" and "-
                                site:" search terms.
Title   intitle:Google search   If you prepend "intitle:"
Search                               to a query term, Google
(term)                               search will restrict the
                                     results to documents
                                     containing that word in
                                     the title. The query term
                                     must appear in the first
                                     10 words of the title.
                                     Note there can be no
                                     space between the
                                     "intitle:" and the
                                     following word.

                                     Note: Putting "intitle:"
                                     in front of every word
                                     in your query is
                                     equivalent to putting
                                     "allintitle:" at the front
                                     of your query.
                                     If you start a query with
                                     the term, "allintitle:";
                                     Google search will
Title                                restrict the results to
Search   allintitle: Google search   those with all of the
(all)                                query words in the title.
                                     The query terms must
                                     appear in the first 10
                                     words of the title.
                                     If you prepend "inurl:"
                                     to a query term, Google
                                     search will restrict the
                                     results to documents
                                     containing that word in
                                     the result URL. Note
                                     there can be no space
                                     between the "inurl:"
                                     and the following word.
URL
Search   inurl:Google search         Note: "inurl:" works
(term)                               only on words, not URL
                                     components. In
                                     particular, it ignores
                                     punctuation and will
                                     only use the first word
                                     following the "inurl:"
                                     operator. To find
                                     multiple words in a
                                     result URL, use the
                                     "inurl:" operator for
                                           each word.

                                           Note: Putting "inurl:"
                                           in front of every word
                                           in your query is
                                           equivalent to putting
                                           "allinurl:" at the front
                                           of your query.
                                           If you start a query with
                                           the term, "allinurl:";
                                           Google search will
                                           restrict the results to
                                           those with all of the
                                           query words in the
                                           result URL.

                                           Note: "allinurl:" works
                                           only on words, not URL
                                           components. In
                                           particular, it ignores
URL
            allinurl: Google search
                                           punctuation. Thus,
Search
                                           "allinurl: foo/bar" will
(all)
                                           restrict the results to
                                           page with the words
                                           "foo" and "bar" in the
                                           URL, but won't require
                                           that they be separated
                                           by a slash within that
                                           URL, that they be
                                           adjacent, or that they be
                                           in that particular word
                                           order. There is currently
                                           no way to enforce these
                                           constraints.
                                           The query prefix,
                                           "filetype:", will filter
                                           the results returned to
                                           only include documents
                                           with the extension
                                           specified immediately
File Type   Google                         after. Note there can be
Filtering   filetype:doc OR filetype:pdf   no space between
                                           "filetype:" and the
                                           specified extension.

                                           Note: Multiple file
                                           types can be included in
                                           a filtered search by
                                 adding more "filetype:"
                                 terms to the search
                                 query, when used in
                                 conjunction with the
                                 Boolean OR.
                                 The query prefix, "-
                                 filetype:", will filter the
                                 results to exclude
                                 documents with the
                                 extension specified
                                 immediately after. Note
                                 there can be no space
File Type Google -filetype:doc   between "-filetype:" and
Exclusion -filetype:pdf          the specified extension.

                                 Note: Multiple file
                                 types can be excluded
                                 in a filtered search by
                                 adding more "-filetype:"
                                 terms to the search
                                 query.
                                 The query prefix,
                                 "info:", will return a
                                 single result for the
                                 specified URL if it
Web
                                 exists in the index.
Document info:www.google.com
Info
                                 Note: No other query
                                 terms can be specified
                                 when using this special
                                 query term.
                                 The query prefix,
                                 "link:", will list web
                                 pages that have links to
                                 the specified web page.
                                 Note there can be no
                                 space between "link:"
                                 and the web page URL.
Back      link:www.google.com    Note: No other query
Links
                                 terms can be specified
                                 when using this special
                                 query term.
                                 Note: The search
                                 request parameter,
                                 as_lq, can also be used
                                 to submit a link:
                                 request.
                                                          The query prefix,
                                                          "cache:", will return the
                                                          cached HTML version
                                                          of the specified web
                                                          document that the
                                                          Google search crawled.
                                                          Note there can be no
                                                          space between "cache:"
                                                          and the web page URL.

                                                          If you include other
                                                          words in the query,
                                                          Google will highlight
                                                          those words within the
    Cached
                                                          cached document.
    Results      cache:www.google.com web
    Page
                                                          Note: To use Google's
                                                          default cached result
                                                          display, simply omit the
                                                          output parameter in the
                                                          cache request. To
                                                          customize the display of
                                                          cached results, simply
                                                          request XML or
                                                          Custom HTML output
                                                          as part of the cache
                                                          request and ensure your
                                                          parser or stylesheet will
                                                          handle the incoming
                                                          cache data.




2.4 Filtering                                    [REQUEST FORMAT] - [TABLE OF CONTENTS]


Google search provides many ways for you to filter the results that are returned as
part of your query. These filtering options include:

      Automatic Filtering
      Language Filters
          o Automatic Language Filters
          o Combining Language Filters

Other filtering options can be applied through special query parameters, query terms
and meta tags, which are documented in their respective sections. Please review these
sections for more information on other filtering options.
2.4.1 Automatic Filtering

The quality of the results Google returns for searches is extremely important. One
method that makes sure the best results are returned for a query is automatic
"filtering" of the search results to weed out undesirable results.

Currently, Google search uses two techniques for automatic filtering of results:

      Duplicate Snippet Filter - If multiple documents contain the same information
       in their snippets in response to a query, then only the most relevant document
       of that set will be displayed in the results.
      Duplicate Directory Filter - If there are many results in a single web directory,
       then only the two most relevant results for that directory will be returned in the
       results. An output flag indicates that more results are available from that
       directory.

By default, both types of filters are enabled. However, you can disable them with the
filter parameter.

Setting filter=1 enables both Duplicate Directory Filtering and Duplicate Snippet
Filtering. This is the default setting if no value for the filter parameter is provided.

Setting filter=0 will disable both Duplicate Directory Filtering and Duplicate
Snippet Filtering.

Although determining when to use this option is up to each search administrator,
Google recommends against setting filter=0 for typical search requests, since Google
has found that document filtering significantly enhances the quality of most search
results.

Setting filter=p will disable Duplicate Snippet Filtering only.

Setting filter=s will disable Duplicate Directory Filtering only.

When an end user submits a search request in which filtering removes any results, the
removal of the results will be noted in the output generated for the search results. See
the section on Estimated vs. Actual Number of Results for more information on how a
filtered result set is identified and recommendations for results display.

The appliance also will automatically group results from a single directory in the
search results.

If you set filter=0, then the order in which results are ranked can change depending
on the value of the num parameter.

For example, if you set num=10 and filter=0 you may get two results in a particular
directory that are considered in the 10 most relevant results. If one of these results is
the most relevant of all, then directory crowding will cause both be displayed at the
top of the results.
If you now set num=20, you may get a third result in the same directory that would be
ranked from between 11 and 20. However, this result will actually be ranked third
because of directory crowding.



2.4.2 Language Filters

This section covers:

      Automatic Language Filters
      Combining Language Filters

2.4.2.1 Automatic Language Filters

Language filters limit searches to pages in the specified languages. The algorithm for
automatically determining the language of a web document is not customizable. The
language determination algorithm is primarily based on the majority language used in
the web document body text. Automatic language collections may not be appropriate
for all users.

Note: Encoding schemes for input and output of search requests are important when
providing international search. Please review the Internationalization section for more
details.

The automatic language filters generated are:

                  Language                      Automatic Language Filter Name

    Arabic                                  lang_ar
    Chinese (Simplified)                    lang_zh-CN
    Chinese (Traditional)                   lang_zh-TW
    Czech                                   lang_cs
    Danish                                  lang_da
    Dutch                                   lang_nl
    English                                 lang_en
    Estonian                                lang_et
    Finnish                                 lang_fi
    French                                  lang_fr
    German                                  lang_de
    Greek                                   lang_el
    Hebrew                                  lang_iw
    Hungarian                               lang_hu
    Icelandic                               lang_is
    Italian                                 lang_it
    Japanese                                lang_ja
    Korean                                  lang_ko
    Latvian                                 lang_lv
    Lithuanian                              lang_lt
    Norwegian                               lang_no
    Portuguese                              lang_pt
    Polish                                  lang_pl
    Romanian                                lang_ro
    Russian                                 lang_ru
    Spanish                                 lang_es
    Swedish                                 lang_sv
    Turkish                                 lang_tu

2.4.2.2 Combining Language Filters

Search requests that use the lr parameter support the Boolean operators identified in
the table below (in order of precedence).

        Boolean
                          Sample Usage                    Description
        Operator

                                              Removes all results that are
                                              defined as part of the Language
                                              Filter immediately following the "-
    Boolean NOT [           -lang_fr          " operator.
    -]
                                              The example lr value would
                                              remove all results in French.
                                              Returns results that are in the
                                              intersection of the results returned
                                              by the collection to either side of
                                              the "." operator.
    Boolean AND [         gloves.hats
    .]
                                              The example restrict value
                                              would return all results which are
                                              in both the "hats" and "gloves"
                                              custom collections.
                                                Returns results that are in either of
                                                the results returned by the
                                                collection to either side of the "|"
                                                operator.
    Boolean OR [ |      lang_en|lang_fr
    ]
                                                The example lr value would
                                                return all results matching the
                                                query that are in either French or
                                                English.
                                                All terms within the innermost set
                                                of parentheses will be evaluated
                                                before terms outside the
                                                parentheses are evaluated. Use
                                                parentheses to adjust the order of
    Parentheses [ (     (gloves).(-             term evaluation.
    )]              (lang_hu|lang_cs))
                                                The example lr value would
                                                return all results in the "gloves"
                                                custom collection that are not in
                                                either the Hungarian or Czech
                                                collections.

Note: Spaces are not valid characters in the collection string.



2.5 Internationalization                          [REQUEST FORMAT] - [TABLE OF CONTENTS]


In order to support searching documents in multiple languages and character
encodings, Google provides the ie parameter to specify how Google search should
interpret characters in the search request, and the oe parameter to specify how
characters in the search results output should be encoded. To appropriately decode the
search query and correctly encode the search results, specify the correct ie and oe
parameters, respectively, in the search request.

Note: When providing search for multiple languages, Google recommends the usage
of the utf8 encoding value for the ie and oe parameters.

Example

The query
GET
/search?q=gloves&client=test&site=test&lr=lang_en|lang_fr&ie=latin1&o
e=latin1
would interpret the search query "gloves" using the latin1 encoding scheme, search
for English or French results, and return results in the latin1 encoding scheme.

The query
GET /search?q=gloves&client=test&site=test&lr=(-lang_hu).(-
lang_cs)&ie=latin2&oe=latin2
would interpret the search query "gloves" using the latin2 encoding scheme, search
for any results which are not in Hungarian or Czech, and return results in the latin2
encoding scheme.

The query
GET /search?q=gloves&client=test&site=test&lr=lang_zh-CN|lang_zh-
TW&ie=utf8&oe=utf8
would interpret the search query "gloves" using the utf8 encoding scheme, search for
any results which are in Simplified or Traditional Chinese, and return results in the
utf8 encoding scheme.

Note: See the Language Filters section for details of language specific searches using
the lr parameter.

Character Encoding Values

The following table lists all encoding values supported by these parameters:

              Language                Encoding Value      Alternate Encoding Value

    Chinese (Simplified)       gb                         GB2312
    Chinese (Traditional)      big5                       Big5
    Czech                      latin2                     ISO-8859-2
    Danish                     latin1                     ISO-8859-1
    Dutch                      latin1                     ISO-8859-1
    English                    latin1                     ISO-8859-1
    Estonian                   latin4                     ISO-8859-4
    Finnish                    latin1                     ISO-8859-1
    French                     latin1                     ISO-8859-1
    German                     latin1                     ISO-8859-1
    Greek                      greek                      ISO-8859-7
    Hebrew                     hebrew                     ISO-8859-8
    Hungarian                  latin2                     ISO-8859-2
    Icelandic                  latin1                     ISO-8859-1
    Italian                    latin1                     ISO-8859-1
    Japanese                   sjis                       Shift_JIS
    Korean                     euc-kr                     EUC-KR
    Latvian                    latin4                     ISO-8859-4
    Lithuanian                latin4                    ISO-8859-4
    Norwegian                 latin1                    ISO-8859-1
    Portuguese                latin1                    ISO-8859-1
    Polish                    latin2                    ISO-8859-2
    Romanian                  latin2                    ISO-8859-2
    Russian                   cyrillic                  ISO-8859-5
    Spanish                   latin1                    ISO-8859-1
    Swedish                   latin1                    ISO-8859-1
                              latin3                    ISO-8859-3
                              latin5                    ISO-8859-9
                              latin6                    ISO-8859-10
                              euc-jp                    EUC-JP
    Unicode (All              utf8                      UTF-8
    Languages)




2.6 Sorting                                      [REQUEST FORMAT] - [TABLE OF CONTENTS]


Google search provides two sorting options for implementing your search solution:

      Sort By Relevance
      Sort By Date



2.6.1 Sort By Relevance (Default)

By default, Google combines hypertext analysis and PageRank technologies to
provide users with highly relevant results. Hypertext analysis uses the design of the
page, examining over 100 factors to determine the best result for your query term.
PageRank considers the link structure of the entire index to understand how each page
links to the other pages in the index.



2.6.2 Sort By Date

Google search also supports the ability to order search results by date. The date of a
web document is defined by parameters configured by the search administrator. When
a search is performed using the sort by date capability, the date associated with each
result document will be included with the results.

When using the Sort by Date feature, the automatic quality filter will sometimes re-
order results when performing result grouping. This can be disabled by adding the
"filter =0" parameter to the search request when performing search by date.

Example

The query
GET
/search?q=chicken+teriyaki&output=xml&client=test&site=test&sort=date
:D:S:d1
would return the first 10 top results sorted by both date and relevancy which match
the query "chicken teriyaki" in the "test" collection.

Details

To sort the results by date, the sort parameter must be formatted as follows:

date:<direction>:<mode>:<format>

where <direction>, <mode> and <format> can have the following values:
     <direction> Value                            Results
             A           Sort results in ascending date order
             D           Sort results in descending date order

      <mode> Value                                Results
                         Sort relevant results. Google's algorithm will determine a
                         subset of the most relevant results from the set of all
             S
                         results, and then sort that subset by date to return as
                         results for the search request.
                         Sort all results
             R           Note: Providing sort by date on queries with large result
                         sets may incur performance penalties.
                         Perform a look-up on the date associated with each
             L           document and return the date information for each result
                         returned; but no sorting is performed.

      <format> Value                              Results
            d1           The format of the value returned for each search result
                         returned is set to YYYY-MM-DD




2.7 Meta Tags                                     [REQUEST FORMAT] - [TABLE OF CONTENTS]
Google search provides two options for leveraging the meta tags that are available in
your content. Unless one of these parameters is specified; meta tag information will
not be considered in your search results, since that information is not visible to the
search user. These options are:

         Requesting Meta Tag Values
         Filtering by Meta Tags



2.7.1 Requesting Meta Tag Values

Through the use of the getfields parameter, the Google search engine allows a
search request to specify meta tag values to return with the search results. The search
engine will only return meta tag information for results which actually contain the
meta tags. The search for meta tags is case-insensitive. Use only whole words in the
getfields parameter, not partial words or word "stems." There is a limit of 320
characters returned for each meta tag when using getfields. This character limit
includes the meta tag name and content.

Usage

GET /search?q=[search
term]&output=xml&client=test&site=test&getfields=[meta tag name]

Example

The query
GET
/search?q=books&output=xml&client=[test]&site=[test]&getfields=author
.title.keywords
would return the first 10 results that match the query "books" in the "test" collection.
If any of the results contain the author, title and/or keywords meta tags, then the
values of those meta tags will be returned with the respective results. For example, the
following tags could be returned with this search request:
<META NAME="author" CONTENT="Jakob Nielsen">
<META NAME="title" CONTENT="Usability Engineering">
<META NAME="keywords" CONTENT="Usability, User Interface, User
Feedback">

Details

To specify multiple meta tag values to be returned, list all meta tag names separated
by a period (".") as in the example above. To request all available meta tags for each
search result, specify an asterisk ("*") as the value for the getfields parameter.

Note: When meta tag values are requested, they are not displayed in results requested
in the default HTML format. Please use the custom HTML or XML output options to
take advantage of this feature.
Note: All meta tag names or values specified must be double URL-escaped. See an
example in the following section.



2.7.2 Filtering by Meta Tags

The Google search engine can filter results by the values of the result meta tags. This
section details how to use the requiredfields and partialfields input parameters
to filter on meta tag values. The term partialfields refers to part of the meta tag
content, rather than part of a word. Other filtering techniques are noted in the Filtering
section.

Usage

GET /search?q=[search
term]&output=xml&client=test&site=test&requiredfields=[metatag name]:[metatag
content]

Examples

The query
GET
/search?q=checks&output=xml&client=test&site=test&requiredfields=depa
rtment:Human%252BResources|department:Finance
returns the first 10 results which match the query "checks" in the "test" collection
which also contained either of the following meta tags:
<META NAME="department" CONTENT="Human Resources">
<META NAME="department" CONTENT="Finance">

The query
GET
/search?q=books&output=xml&client=test&site=test&partialfields=author
:Scott
would return the first 10 results which match the query "books" in the "test" collection
which also contained the word "Scott" somewhere in the "author" meta tag. Some
example meta tags satisfying this search request are:
<META NAME="author" CONTENT="Sir Walter Scott">
<META NAME="author" CONTENT="F. Scott Fitzgerald">

Details

Multiple meta tag constraints can be specified using the requiredfields and
partialfields parameters. To filter for the presence of a meta tag, indicate the name
of the meta tag to be found. To filter on a specific meta tag value, indicate the name of
the meta tag followed by the colon ":" character and then the specific value. The
partialfields parameter matches complete words, not parts of words. In addition,
the match must be within the first 160 characters of the meta tag. See the examples in
the table below for sample usage.

To combine multiple name-value pairs, use the following operators:
           Boolean
                                    Sample Usage                                                    Description
           Operator

           Boolean                                                                      Returns results which
           AND [ .          author:William.keywords                                     satisfy both meta tag
           ]                                                                            constraints.
                                                       Returns results which
           Boolean department:Sales|department:Finance
                                                       satisfy either meta
           OR [ | ]
                                                       tag constraint.

    As stated in the "Query Terms" section, all non-alphanumeric characters included in a
    search query are treated as query term separators (just like space characters).
    Similarly, Google uses these separators to divide metatag content into single entities,
    or word tokens; that is, a word or a string that may or may not be a real word. The
    separators, used in both queries and results, and their values are in the table. They are
    not customizable.

                                     Separator                                                                        Value
              ~   !   @     #   $         %            ^        &       *       (       )       -                     space
+      {      }   |   `     [   ]     :            ;        '       <       >       ?       ,         .    /      = character
                                              \                                                                        92
                                              "                                                                        34
                                              \t                                                                        9
                                              \r                                                                       13
                                              \n                                                                       10
                                              \v                                                                       11
                                              \f                                                                       12
                                          \177                                                                         177


    Note: All meta tag names or values specified must be double URL-escaped. See
    example above.

    2.8 Limits                                                          [REQUEST FORMAT] - [TABLE OF CONTENTS]


    This section lists any limitations on the search requests sent to Google search.

    Component                                          Limit
    Search request length                              2048 bytes
    Query Terms
    (includes terms in parameter q and                 50
    any parameters starting with as_ )
    site: query terms
                                                       1 (per search request)
    (includes use of as_sitesearch
parameter)




                                                                    [TABLE OF CONTENTS]
3. Results Format
This section is broken into the following categories:

      Custom HTML
      XML




3.1 Custom HTML                                   [RESULTS FORMAT] - [TABLE OF CONTENTS]


The description of the custom HTML results section is broken down into the
following sections:

      Custom HTML Output Overview
      Internationalization


                                              [CUSTOM HTML] - [RESULTS FORMAT] - [TABLE
       3.1.1 Custom HTML                                                  OF CONTENTS]
       Output Overview

Google search provides the ability to generate custom HTML by incorporating an
XSLT (eXtensible Stylesheet Language Transformation) server into the search engine
infrastructure. Search requests submitted to the Google search engine, with the
output input parameter set to xml_no_dtd and a valid proxystylesheet parameter
value, will automatically be processed by the XSLT server as requests for custom
HTML output.

Using the XSL stylesheet specified by the proxystylesheet parameter; the XSLT
server will apply the transformation rules found in the XSL stylesheet to the standard
Google XML results and return the resulting output. While this document assumes
that the output generated by applying the XSL stylesheet will be HTML, almost any
output format can be generated by the application of the appropriate XSL stylesheet
rules. For any front end, the default XSL stylesheet can be customized or replaced by
the search administrator.

To customize the XSL stylesheet used to generate custom HTML output, please
review Google's XML output format to determine the XML tags that may be
transformed using a customized XSL stylesheet.

Additionally, you can leverage the proxycustom parameter to pass custom XML tags
to the XSLT server. Since the inclusion of custom XML does not generate search
results, this feature is useful for implementing additional static search pages, such as
an advanced search page.

Notes:

        XSL stylesheets used by the XSLT server will be cached for 15 minutes. To
         force the XSLT server to use the latest version of an XSL stylesheet, set the
         proxyreload input parameter to a value of 1 in your search request.
        XSL stylesheets which include other files may not be used with the Google
         search engine. Any XSL stylesheet which contains the following tags /
         functions will generate an error result: <xsl:import>, <xsl:include>,
         xmlns: and document()
        When requesting cached results in custom HTML output, the BLOB XML tag
         and associated value are automatically converted to the original text before the
         XSL stylesheet rules are applied. When using an XSL stylesheet which
         customizes cache results, simply use the values of the CACHE_LEGEND_TEXT,
         CACHE_LEGEND_NOTFOUND and CACHE_LEGEND_HTML XML tags directly
         instead of applying a rule on the BLOB sub-tag.
        If you use input or output encodings other than latin1, please consult the
         Internationalization section for more details.
        More information on XSL and XSLT can be found on the W3C web site.



                                               [CUSTOM HTML] - [RESULTS FORMAT] - [TABLE
         3.1.2                                                             OF CONTENTS]
         Internationalization

The Google search engine handles over 20 character encoding schemes. This section
will discuss any special considerations that must be made when using the custom
HTML output format with encoding schemes other than latin1.

In order to support all the encoding schemes supported by Google, the XSLT server
follows a process to ensure that the results are returned in the correct encoding
scheme. When requesting search results through the XSLT server, the server will
translate the results to the UTF8 encoding scheme before applying the selected XSL
stylesheet. Once the XSL stylesheet rules are applied to generate the results, then the
results will be converted to the encoding scheme specified in the output encoding
parameter, oe, of the search request. The one exception to this rule is cached result
pages, which get converted to the encoding scheme of the cached document after
XSLT processing.

Note: XSL stylesheets are associated with a front end. All XSL stylesheets must be in
latin1 or UTF8 formats.



3.2 XML                                            [RESULTS FORMAT] - [TABLE OF CONTENTS]
The description of the XML results format is broken down into the following
sections:

       XML Output Overview
       Character Encoding Conventions
       Google XML Results DTD
       Google XML Tag Definitions



                                                     [XML] - [RESULTS FORMAT] - [TABLE OF
        3.2.1 XML Output                                                      CONTENTS]
        Overview

For maximum flexibility, Google provides search results in XML format. Using the
Google XML results, you can use your own XML parser to customize the display for
your search users. For developers who want to specify an XSL stylesheet for
transformation of the XML results, instead of developing their own XML parser,
proceed to the Custom HTML section.

Note:

       All element values will be valid HTML suitable for display, unless otherwise
        noted in the XML tag definitions. Some values are URLs which will need to
        be HTML encoded before displaying.
       All XML parsers used to parse Google results should be built to ignore any
        attributes or tags which are not documented. This will allow custom XML
        parsers to continue working without modification when Google adds more
        features to the XML output in the future. In any custom parameters added that
        contain spaces, each space will be replaced with "_". You can still retrieve the
        unmodified value from "original_value." For example:

        <PARAM name="temp" value="token_ring"
        original_value="token+ring" />




                                                     [XML] - [RESULTS FORMAT] - [TABLE OF
        3.2.2 Character                                                       CONTENTS]
        Encoding
        Conventions

The first line of the Google XML results will indicate which character encoding is
used. See the XML Standard for more details.

Additionally, certain characters are required to be escaped when included as values in
XML tags. These characters are documented in the XML standard, and are also
reproduced in the table below. All other characters in the XML results will be
presented without modification.
            Character                          Escaped form
                <                           either &lt; or &#60;
                &                          either &amp; or &#38;
                >                           either &gt; or &#62;
                '                          either &apos; or &#39;
                "                          either &quot; or &#34;



                                                    [XML] - [RESULTS FORMAT] - [TABLE OF
       3.2.3 Google XML                                                      CONTENTS]
       Results DTD

Google XML results can be returned either with or without a reference to the most
recent DTD (Document Type Definition) describing Google's XML format. The DTD
is a guide to help search administrators and XML parsers understand the XML results
output. Since Google's XML grammar may change from time to time, you should not
configure your parser to use the DTD to validate the XML results.

Additionally, XML parsers should not be configured to fetch the DTD every time a
search request is performed. Since the DTD is updated infrequently, these fetches
create unnecessary delay and bandwidth requirements.

Google recommends that you use the xml_no_dtd output format to get XML results.
If you specify the xml output format in your search request, then the only difference
will be the inclusion of the following line in the XML results.

<!DOCTYPE GSP SYSTEM "google.dtd">

The DTD is available on the Google Search Appliance at

       http://<appliance_hostname>/google.dtd

If there are other features you would like to see on the DTD, please consult with your
account representative. Not all features in the DTD may be available or supported at
this time.



                                                    [XML] - [RESULTS FORMAT] - [TABLE OF
       3.2.4 Google XML                                                      CONTENTS]
       Tag Definitions

This section provides an index and details of Google's XML results.

Sub-Tags Legend
?   =   optional sub-tag
*   =   zero or more instances of the sub-tag
+   =   one or more instances of the sub-tag
|   =   Boolean OR

Index

The XML tags are listed in alphabetical order below. Please click on the first letter of
the XML tag in question to jump to the correct section.

 B      C   F   G    H    L   M    N    O    P    Q    R    S       T   U    X

Details

BLOB
Format              Text (See Definition)
Sub-Tags
                    This tag contains HTML data in the encoding format
                    specified in the attribute. Additionally, the data has
Definition          been BASE64 encoded to preserve data integrity of
                    cached results encoded in a different encoding scheme
                    then the results requested.
                      Name        Format              Description
                                       The encoding scheme of the
                             Text      HTML data
Attributes
                    encoding (Encoding (See the Internationalization
                             Scheme) section for a list of common
                                       encoding values)




C
Format
Sub-Tags
                    Indicates that the "cache:" special query term is
Definition
                    supported for this search result URL
                      Name        Format              Description
                                          Provides the size of the cached
                                          version of the search result in
                                          kilobytes ("k"). This field is
Attributes                     Text
                    SZ
                                          not populated if no cached
                               (Integer +
                                          version of a document is
                               "k")
                                          available, which can be the
                                          case if robots noarchive
                                          metatags are used.
                                   Identifier of a document in
                                   Google's cache. To fetch the
                                   document from the cache, send
                                   a search term built like this:
                                   "cache:" + CID text + ":" +
             CID        Text
                                   escaped URL. The escaped
                                   URL is available in the UE tag.
                                   Send this search term
                                   normally, as one would type it
                                   into the search form.




CACHE
Format
             CACHE_URL, CACHE_REDIR_URL,
             CACHE_LAST_MODIFIED, CACHE_LEGEND_FOUND?,
Sub-Tags     CACHE_LEGEND_NOTFOUND?, CACHE_CONTENT_TYPE,
             CACHE_LANGUAGE, CACHE_ENCODING, CACHE_HTML
             Provides encapsulation for the cached version of a
Definition
             search result
Attributes




CACHE_CONTENT_TYPE
Format       Text (MIME type)
Sub-Tags
             MIME type of the cached result as specified in the
Definition   HTTP header returned when the document was
             crawled
Attributes




CACHE_ENCODING
Format       Text
Sub-Tags
             The encoding scheme of the cached result as specified
             in the HTTP header returned when the document was
Definition   crawled
             (See the Internationalization section for a list of
             common values)
Attributes
CACHE_HTML
Format        Text (HTML) (Custom HTML output only)
Sub-Tags      BLOB?   (XML output only)
              The cached version of the search result. All search
Definition    results are stored in HTML format after being
              translated for indexing.
Attributes




CACHE_LANGUAGE
Format        Text (Google language tag)
Sub-Tags
              The language of the cached result as determined by
              Google's automatic language classification algorithm.
Definition    The value of this tag is the same as the values used for
              the automatic language collections without the
              "lang_" prefix.
Attributes




CACHE_LAST_MODIFIED
Format        Text
Sub-Tags
              Date that the document was crawled, as specified in
              the Date HTTP header when the document was
              crawled for this index. The crawler will fetch
              documents from its cache if the web server responds
Definition    with a 304 (not modified) status code to an if-
              modified-since request. In this case, the
              CACHE_LAST_MODIFIED will be the date the
              document was originally crawled and not the date of
              the if-modified-since request.
Attributes




CACHE_LEGEND_FOUND
Format
Sub-Tags      CACHE_LEGEND_TEXT*
Definition    Provides encapsulation for query terms found in the
             visible text of the cached result returned
Attributes




CACHE_LEGEND_NOTFOUND
Format       Text (Custom HTML output only)
Sub-Tags     BLOB?    (XML output only)
             Details of any query terms not visible in the cached
Definition
             result returned
Attributes




CACHE_LEGEND_TEXT
Format       Text (Custom HTML output only)
Sub-Tags     BLOB    (XML output only)
             Details of a query term which is visible in the cached
             result. Any query terms found in the cached result will
Definition
             automatically be highlighted using the colors
             described in the attributes of this tag.
                  Name     Format              Description
                                     The foreground color of the
                                     query term in the cached
              fgcolor
                         Color
                                     result. This value can be used
                         attribute
                                     directly in a color attribute for
Attributes                           HTML tags.
                                     The background color of the
                                     query term in the cached
              bgcolor
                         Color
                                     result. This value can be used
                         attribute
                                     directly in a color attribute for
                                     HTML tags.




CACHE_REDIR_URL
Format       Text (Absolute URL)
Sub-Tags
             Final URL of cached result after all redirects are
Definition
             resolved
Attributes
CACHE_URL
Format       Text (Absolute URL)
Sub-Tags
Definition   Initial URL of cached result
Attributes




CRAWLDATE
Format       Text
Sub-Tags
             This is an optional element that shows the date that
Definition   the page was crawled. It is shown only for pages
             crawled within the past two days.
Attributes




CT
Format       HTML
Sub-Tags
             Search comments
Definition   Example comment: Sorry, no content found for this
             URL
Attributes




CUSTOM
Format
Sub-Tags     (Any custom XML specified in the search request)
             Provides encapsulation for any custom XML tags
Definition
             specified in the proxycustom input parameter
Attributes




FI
Format
Sub-Tags
             Indicates that document filtering was performed
             during this search
Definition
             Note: See the section on Automatic Filtering for more
             details
Attributes




FS
Format
Sub-Tags
Definition   Additional search result details
               Name       Format                Description
Attributes   NAME        Text       Name of the result descriptor
             VALUE       Text       Value of the result descriptor




GSP
Format
             (TM, Q, PARAM*, CUSTOM?, Spelling?,
Sub-Tags     Synonyms?, CT?, TT?, GM*, RES?) | CACHE
             GSP = "Google Search Protocol"
Definition   Provides an encapsulation for all data returned in the
             Google XML search results
               Name       Format                Description
                                    Indicates version of the search
Attributes
             VER         Text       results output. The current
                                    output version is "3.2".




GD
Format       Text (HTML)
Sub-Tags
Definition   Contains the description of a KeyMatch result
Attributes




GL
Format       Text (URL)
Sub-Tags
Definition   Contains the URL of a KeyMatch result
Attributes




GM
Format
Sub-Tags     GL, GD?
Definition   Provides encapsulation for a single KeyMatch result
Attributes




HAS
Format
Sub-Tags     L?, C?
             Provides encapsulation for any special features
Definition
             supported for this search request
Attributes




HN
Format       Text (URL-escaped web directory)
Sub-Tags
             Indicates that directory crowding has occurred and
             that additional results are available from the directory
Definition
             where this search result was found. The value of this
             tag is ready to be used with the "site:" query term.
                 Name     Format               Description
Attributes                          HTML version of web
             U           Text
                                    directory




L
Format
Sub-Tags
             Indicates that the "link:" special query term is
Definition
             supported for this search result URL
Attributes




M
Format       Text (Integer)
Sub-Tags
             The estimated total number of results for the search
             Note: The estimate of the total number of results for a
Definition   search can be too high or too low. Please review the
             appendix entitled, Estimated vs. Actual Number of
             Results.
Attributes




MT
Format
Sub-Tags
             Meta tag name and value pairs pulled from the search
             result
Definition
             Note: Only meta tags which are requested in the
             search request will be returned
                 Name     Format              Description
Attributes   N          Text       Name of the meta tag
             V          Text       Value of the meta tag




NB
Format
Sub-Tags     PU?, NU?
             Provides encapsulation for result set navigation
             information
Definition
             Note: The NB tag will only be present if either
             previous or additional results are available
Attributes




NU
Format       Text (Relative URL)
Sub-Tags
             Contains relative URL to the next results page
Definition   Note: The NU tag will only be present if additional
             results are available
Attributes




OneSynonym
Format       HTML
Sub-Tags
             A synonym suggestion for the submitted query in
Definition
             HTML format.
                 Name      Format            Description
Attributes                          The URL-escaped version of
             Q          Text
                                    the synonym suggestion




PARAM
Format
Sub-Tags
             The input parameters submitted to the Google search
Definition
             engine to generate these results
                    Name        Format         Description
             name               Text     Input parameter name
                                     HTML formatted version
             value              HTML of the input parameter
Attributes
                                     value
                                         Original URL-escaped
             original_value     Text     version of the input
                                         parameter value




PU
Format       Text (Relative URL)
Sub-Tags
             Contains relative URL to the previous results page
Definition   Note: The PU tag will only be present if previous
             results are available
Attributes
Q
Format       HTML
Sub-Tags
             The search query submitted to the Google search
Definition
             engine to generate these results
Attributes




R
Format
Sub-Tags     U, T?, RK, FS?, MT*, S?, HAS, HN?
             Provides encapsulation for the details of an individual
Definition
             search result
                 Name     Format              Description

             N
                        Text      Indicates the index (1-based)
                        (Integer) of this search result
                                  Indicates the recommended
                                  indentation level of the results.
                                  Note: Currently this value will
Attributes              Text
             L                    always be 1 unless directory
                        (Integer)
                                  crowding occurs. In this case,
                                  the second directory result will
                                  have a value of 2.

             MIME
                                    Indicates the MIME type of
                        Text
                                    the search result




RES
Format
Sub-Tags     M, FI?, XT?, NB?, R*
             Provides encapsulation for the details of the
Definition
             individual search results
                 Name     Format              Description
                                  Indicates the index (1-based)
             SN
                        Text
                                  of the first search result
                        (Integer)
Attributes                        returned in this result set
                                  Indicates the index (1-based)
             EN
                        Text
                                  of the last search result
                        (Integer)
                                  returned in this result set
RK
Format       Text (Integer in the range 0-10)
Sub-Tags
             Provides a general rating of the relevance of the
Definition
             search result
Attributes




Format       Text (HTML)
Sub-Tags
             Search result snippet for the search result
             Note: Query terms will be in highlighted in bold in
Definition
             the results, and line breaks will be included for proper
             text wrapping.
Attributes




Spelling
Format
Sub-Tags     Suggestion+
             Provides encapsulation for alternate spelling
Definition   suggestions for the submitted query. Only one
             spelling suggestion is returned at this time.
Attributes




Suggestion
Format       HTML
Sub-Tags
             An alternate spelling suggestion for the submitted
Definition
             query in HTML format
                 Name     Format                Description
Attributes                          The URL-escaped version of
             Q           Text
                                    the spelling suggestion
Synonyms
Format
Sub-Tags     OneSynonym+
             Provides encapsulation for synonym suggestions for
             the submitted query. Up to 20 synonym suggestions
Definition   may be returned depending on the synonym list
             associated with the front end by the search
             administrator.
Attributes




T
Format       Text (HTML)
Sub-Tags
Definition   The title of the search result
Attributes




TM
Format       Text (Floating-point number)
Sub-Tags
             Total server time to return search results, measured in
Definition
             seconds.
Attributes




U
Format       Text (Absolute URL)
Sub-Tags
Definition   The URL of the search result.
Attributes




XT
Format
Sub-Tags
             Indicates that the estimated total number of results
Definition
             specified in this search result is exact.
                    Note: See the section on Automatic Filtering for more
                    details.
Attributes




                                                                        [TABLE OF CONTENTS]
Appendices
This section contains any appendices relevant to Google search:

       Estimated vs. Actual Number of Results
       URL Escaping



Appendix A: Estimated vs.                                [APPENDICES] - [TABLE OF CONTENTS]
Actual Number of Results

The Google search engine does not guarantee the ability to return a particular number
of results for any given search query. The total number of results provided by Google
in the search results is an estimate of the actual number of results for the query. This
number can be higher or lower than the actual number of results available. This
section covers any issues relating to this topic.

Behavior

When a search request is made to Google, the following behavior occurs:

    1. If Google has results to satisfy the search request, then the requested number
       of results will be returned.
    2. If Google has results and the search request is for results beyond what is
       available, the last page of results will be returned. The last page of results is
       determined by dividing the total number of results into pages based on the
       number of results requested.
    3. If no results are available for the search request, then an empty result set will
       be returned.

In order to determine if a particular results page is the last page of available results,
check for any of the following conditions:

    1. The first result number returned does not match the first result number
       requested.
    2. The number of results returned is less than the number of results requested.
    3. The results returned do not contain a link to the next result set.

Automatic Filtering
Typically, the number of results actually returned is significantly reduced by the
automatic filtering that Google performs on all search results to weed out undesirable
results. This feature can be disabled per the instructions in the Automatic Filtering
section.

Any results which have been filtered will be identified in the results returned. For
example, the <FI> XML tag will be present in any XML search results where
automatic document filtering has occurred.

Google recommends that the search results page display a message on the last page of
the search results similar to the following message when automatic filtering occurs:

In order to show you the most relevant results, we have omitted some entries very
similar to the search results already displayed. If you like, you can repeat the search
with the omitted results included.

The underlined text in the message should be a hypertext link to submit the same
search again with the filter parameter set to the value 0. Google has found that this
method of informing users about automatic document filtering works well and is used
on the Google Internet search site.

Navigation

When the total number of results returned is an estimate, the navigation structure for
search results can be complicated. Google recommends two approaches for generating
a navigation scheme for your search results:

   1. Only provide the search user with the ability to navigate to the previous results
      page and the next results page. Google provides links to the previous and next
      result set in the results returned when appropriate.
   2. Provide the search user with the ability to jump to any search page in the
      estimated number of results. If the user requests a results page beyond which
      results are actually available, the last results page will be returned and the
      navigation structure should be updated at that time. Google uses this approach
      on our Internet search site.



Appendix B: URL Escaping                               [APPENDICES] - [TABLE OF CONTENTS]


In order to make a search request to the Google search engine through an HTTP URL
request, there are certain conventions that must be followed in order to allow the
search engine to correctly translate your search request.

The HTTP URL schema defines that only alphanumeric, the special characters $-
_.+!*'(), and the reserved characters ;/?:@=& can be used as values within an
HTTP URL request. Since reserved characters are used by the search engine to
decode the URL and some special characters are used to request search features, then
all non-alphanumeric characters used as input parameter values should be URL
escaped.
In order to URL escape a string, all space characters should be converted to a "+"
character and all other alphanumeric characters should be replaced by a "%" character
followed by two hexadecimal digits representing the value of that character.

Some input parameters require that the values passed to Google search will need to be
double URL escaped. This means that you will need to apply the URL escaping to the
string twice in succession to generate the final value. See the input parameter
descriptions for more information.

Note: Additional information on URL escaping can be found at W3C and IETF web
sites.

Examples

    Original String                URL Escaped String
    chicken -teriyaki              chicken+%2Dteriyaki
    admission form
                                   admission+form+site%3Awww.stanford.edu
    site:www.stanford.edu
  Original String            Doubly URL Escaped String
  William Shakespeare        William%2BShakespeare
  admission form
                        admission%2Bform%2Bsite%253Awww.stanford.edu
  site:www.stanford.edu




                                                                    [TABLE OF CONTENTS]
Glossary
This glossary contains basic descriptions of acronyms and terms found in this
document which may be new to some readers.

Cached result - As part of its core technology, Google indexes all the content on a
page, rather than a portion of the content (percentage or meta tags). Each page that is
indexed is also available to be served in a cached HTML format (up to 4 million bytes
of each document before HTML conversion). When a user views a cached document,
each query term is highlighted in a different color, making it easy for the user to find
the information sought. Because all pages are cached, the user always has access to
content that has been indexed, even if the server where the live content is stored
happens to be refusing connections or is slow to return the page.

Collection - A collection is a subset or a view of the document index. Collections are
specified by URL patterns; some collections are created automatically by the Google
search engine. Collections are useful for allowing refined or advanced searches, for
limiting access to classified information, for group-level security, for language-
specific queries and for many other applications.
DTD - Document Type Definition. The purpose of a DTD is to define the legal
building blocks of an XML document. It defines the XML document structure with a
list of legal elements.

Encoding Scheme - Each language has an official encoding scheme which is used to
represent all of the language's characters in an 8-bit data stream format. These
encoding schemes are used by Google search to determine how to translate incoming
and outgoing search requests.

KeyMatch - Because you occasionally may want to return special results for specific
queries, Google search may be configured with the KeyMatch feature. Using
KeyMatch, the search administrator can designate special results that are returned in
addition to the standard results when specific queries are made. Google recommends
using KeyMatch carefully, as it can drastically decrease the quality of results if
overused.

Meta Tags - HTML tags which can be specified within an HTML document which
are not displayed to the end user, but which may contain document meta-data. Google
search uses meta tags with the NAME attribute to enhance and filter search results
when requested.

MIME - Multipurpose Internet Mail Extensions. The MIME type of a web document
(or search result) identifies the format of the document it is associated with. Some
sample MIME types include "text/html" for HTML documents, and "application/ms-
word" for Microsoft Word documents.

Query - A string of query terms separated by the space character which is submitted
to Google search. The results returned for a particular query will satisfy all query
terms by default.

Query term - A single term which defines a unit of search for the Google search
engine to find in the index. A single query term can not contain any spaces or
punctuation.

UTF-8 - Unicode Transformation Format (8-bit). UTF-8 is a Unicode based
encoding scheme for describing language data by representing the data using 8-bit
codes. This encoding scheme is used by Google search to support multiple languages
simultaneously.

Web Directory - A subset of files on a web server stored under its own directory
name.

XML - eXtensible Markup Language. XML is a markup language, similar to HTML,
which was designed to describe data. The tags used in XML are not pre-defined, and
are described by a DTD or the data provider.

XSL - eXtensible Stylesheet Language. XSL is a language that is designed to
describe how an XML document should be displayed. XSL contains commands that
can be used to describe the transformation and formatting of an XML document for
display. XSL is used in the Google search environment to transform XML results into
custom HTML output.

XSLT - XSL Transformation. XSLT describes the process of transforming an XML
document into another format. Google search allows search administrators to use our
XSLT server to transform our standard XML results into their own custom HTML
output.

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:6
posted:12/25/2010
language:English
pages:46