DoxyPress  1.6.0
External Searching

DoxyPress provides the ability to search HTML pages using an external search engine.

  • For large projects using an external search engine can have significant performance advantages over the built in search engine
  • An external search engine provides functionality to search in multiple projects
  • The external search engine is hosted on a web server, however clients can browse the web pages locally.

Setup

To set up an external search engine, ensure your search engine is available on a web server. Add the URL of your search engine to the "SEARCH ENGINE URL" tag in your project file.

SEARCH-ENGINE-URL = https://yoursite.com/cgi/yourSearchEngine.cgi

To use the external search ensure sure the following tags are enabled in your project file.

HTML-SEARCH           = YES
SEARCH-SERVER-BASED   = YES
SEARCH-EXTERNAL       = YES

DoxyPress will generate a file with the search index information, the default name of this file is searchdata.xml. The search-data-file tag can be used to override the file name.

The next step is to convert the raw search data from searchdata.xml into an index. Refer to the documentation for your search engine for instructions on how to do this.

Multiple Project Searching

External searching can be used to search multiple projects from within any project. To make this possible combine the search data for all projects into a single index. For example, given Project A and Project B you will have two 'searchdata.xml' files. You will need to combine the two xml files or follow the instructions for your search engine.

Projects A and B can share the same search engine database and the search results will link to the correct documentation. The searchdata.xml file does not contain any absolute paths or links. The search-external-id and search-mappings tags are used to resolve the paths.

Set a unique id using search-external-id for each project. To link the search results to the correct project define a per project mapping using the search-mappings tag. This tag is used to define the mapping from the projects id to a relative location.

project_A/doxy_file.json
------------------
SEARCH-EXTERNAL-ID    = A
SEARCH-MAPPINGS       = B=../../project_B/html


project_B/doxy_file.json
------------------
SEARCH-EXTERNAL-ID    = B
SEARCH-MAPPINGS       = A=../../project_A/html

External Search Engine Requirements

There are three interfaces for an external search engine which are important.

  • Format of the input for the index tool
  • Format of the input for the search engine
  • Format of the output of search engine

Index Input Format

The search data produced by DoxyPress follows a specific format. Refer to the following for detailed information:
XML Messages for Updating a Solr Index

The input for the indexer is an XML file which consists of one <add> tag containing multiple <doc> tags, which in turn contains multiple <field> tags. The following example is for one doc node, which contains the search data and meta data for one method.

<add>
  ...
  <doc>
    <field name="type">function</field>
    <field name="name">QXmlReader::setDTDHandler</field>
    <field name="args">(QXmlDTDHandler *handler)=0</field>
    <field name="tag">qtools.tag</field>
    <field name="url">de/df6/class_q_xml_reader.html#a0b24b1fe26a4c32a8032d68ee14d5dba</field>
    <field name="keywords">setDTDHandler QXmlReader::setDTDHandler QXmlReader</field>
    <field name="text">Sets the DTD handler to handler DTDHandler()</field>
  </doc>
  ...
</add>

The following is a list of the field names which are supported.

  • type
    • The type of the search entry
    • Possible values are: source, function, slot, signal, variable, typedef, enum, enumvalue, property,
      event, related, friend, define, file, namespace, group, package, page, dir
  • name
    • The name of the search entry
    • For a method this is the qualified name of the method
    • For a class it is the name of the class, etc.
  • args
    • The parameter list if this is a function or method
  • tag
    • The name of the tag file used for this project
  • url
    • The relative URL to the HTML documentation for this entry
  • keywords
    • Important words which are representative for this entry
    • When searching for such keyword, this entry should get a higher rank in the search results
  • text
    • The documentation associated with the item. Note that only words are present, no markup.
Note
Due to the potentially large size of the XML file it is recommended to use a SAX based parser to process it.

Search URL Format

When an external search engine is invoked from a DoxyPress generated HTML page, a number of parameters are passed by a query string. For additional information about query strings refer to: Wiki: Query String

The following is a list of the fields which are passed. For the complete list of search results, the range [n*p - n*(p+1)-1] should be returned.

  • q
    • Query text as entered by the user
  • n
    • Number of search results requested
  • p
    • Number of search pages for which to return the results. Each page has n values
  • cb
    • Name of the callback function, used for JSON with padding

The following is an example of how a query might look. This is a query for the word 'list' (q=list) and requesting 20 search results (n=20). Start with the result number 20 (p=1) and use callback 'dummy' (cb=dummy).

https://yoursite.com/path/to/cgi/doxysearch.cgi?q=list&n=20&p=1&cb=dummy

Note
The values are URL encoded so they have to be decoded before they can be used.

Search Results Format

When invoking the external search engine, it should reply with the results. The format of the reply is JSON with padding, which is basically a JavaScript structure wrapped in a function call. The name of function should be the name of the callback, as passed with the cb field in the query.

With the example query shown the above, the structure of the reply should look this:

dummy({
  "hits":179,
  "first":20,
  "count":20,
  "page":1,
  "pages":9,
  "query": "list",
  "items":[
  ...
 ]})

The fields listed have the following meanings:

  • hits
    • Total number of search results, which can be more than was requested
  • first
    • Index of first result returned
    • \(\min(n*p,\mbox{ hits})\)
  • count
    • Actual number of results returned
    • \(\min(n,\mbox{hits}-\mbox{first})\)
  • page
    • Page number of the result
    • \(p\)
  • pages
    • Total number of pages
    • \(\lceil\frac{\mbox{hits}}{n}\rceil\)
  • items
    • An array containing the search data per result

Here is an example of how the element of the items array should look:

{
 "type": "function",
 "name": "QDir::entryInfoList(const QString &nameFilter, int filterSpec=DefaultFilter,
        int sortSpec=DefaultSort) const",
 "tag": "qtools.tag",
 "url": "d5/d8d/class_q_dir.html#a9439ea6b331957f38dbad981c4d050ef",
 "fragments":
 [
  "Returns a <span class=\"hl\">list</span> of QFileInfo objects for all files and directories...",
  "... pointer to a QFileInfoList The <span class=\"hl\">list</span> is owned by the QDir object...",
  "... to keep the entries of the <span class=\"hl\">list</span> after a subsequent call to this..."
 ]
},

The fields for this item would have the following meanings:

  • type
    • Type of the item
    • Found in the field with name "type" in the raw search data
  • name
    • Name of the item, including the parameter list
    • Found in the fields with name "name" and "args" in the raw search data
  • tag
    • Name of the tag file
    • Found in the field with name "tag" in the raw search data
  • url
    • Name of the relative URL to the documentation
    • Found in the field with name "url" in the raw search data
  • fragments
    • An array with zero or more fragments of text containing words that have been searched for
    • These words should be wrapped in <span class="hl"> and </span> tags to highlight them in the output