Choosing an Open Source Desktop Search Tool: Part 4

by Rich on March 26, 2010

Evaluation of open source desktop search tools continue from Part 1, Part 2 and Part 3 with a late entry and some updates.  During my work on Strigi, their documentation referred to related projects.  Of the several other search tools mentioned, there was one which wasn’t already on my list or a defunct project:  Pinot.  Another C++ based and GPL2 licensed tool, Pinot uses a xapien back end for its index and relies on dbus for its interprocess communication.  On its face, it’s very similar to recoll.  In testing, it showed some interesting differences.

Pinot setup and searching

Pinot was installed with apt-get install pinot, keeping it very consistent with the other tools.  Apt added 65 packages for pinot, keeping it at the lower end of additional packages.  It was really in the same ballpark as recoll once recoll had enough packages to actually be functional.  The great bulk of the packages were parsing libraries and support files rather than X oriented cruft.  It does require dbus, and operationally is tightly coupled to it.  Otherwise, it’s a fairly self contained tool.  As a C++ tool, it didn’t have the baggage that beagle did.

Following installation, I found pinot to be very temperamental in my particular (and probably peculiar for pinot) environment.  Pinot’s pinot-dbus-daemon is clearly a dbus oriented tool similar to trackerd.  I needed a similar process to get the daemon running:  dbus-launch bash followed by pinot-dbus-daemon.  While this got the daemon running, no matter what I did or what configuration I tweaked, I wasn’t able to get the daemon to index anything or to pass me any information about why it was less than fully happy.

I was ultimately able to get some indexing done with pinot, though not with with its daemon.  Pinot-index is a once-and-done command line indexer.  It has a few nice options beyond simply indexing.  My index was created with pinot-index –index -b xapian -d ~/.pinot/index.db ~/search.  It appears you can give pinot an arbitrary set of files and index them to a single xapian database.  So, substituting another path for ~/search is no problem.  This can get messy if you haven’t kept track of what’s in the index, but pinot gives you a handy option for that.  pinot-index –check -b xapian -d ~/.pinot/index.db ~/search will tell me whether the specified path (and subdirectories) are included in the index.  That’s much cleaner than having to search for something and see if a useful result is returned.

Pinot’s search was at the quick end of the tools I evaluated.  It clocked in at 0.42s user, 0.29s system, 0.727s total.  It was by no means the fastest, but it was clearly in line with the higher performing tools.

The quality of the results was mid-pack.  For documents, pinot only had hits in text, html, and csv files.  It completely missed on pdf, wordprocessor and spreadsheets.  It did manage to read the id3 tags on an mp3 file just fine, but tags on other audio formats were opaque.  It was able to pull hits out of an archive — oddly, it was the tar.bz2. It had no luck with the zip, gzip or 7zip.  it also failed to pull information out of the plain tar even though it was successful with a bzip compressed tar.  I honestly don’t understand that at all.

A really nice diagnostic aspect, at this point, was the ability to check pinot’s index to see if a file was included in it.  Pinot did not include in its index documents it couldn’t read (such as .doc or .docx), which is fine, but the ability to get at least filename information for these files is desirable.  It did include audio files where it was unable to parse tags (such as .flac), allowing successful queries based on filename.  Archives (including 7zip) were also in the index.  This was nice, but failed to explain why the contents of the tar.bz2 were available where no other archive contents were.

Pinot-search has some other interesting features.  It appears to be a more general purpose search tool. I used the xapian backend, which was a natural fit for my purposes.  It also supports backends for opensearch, sherlock, and Google (using a Google API key).  From a client perspective, this is really interesting, as it means indices could be maintaned in the format or through the indexing system most appropriate to the content being indexed and searches could be accessed from a single query tool.  There are complexities there around how to know which backend to use, but it seems like a nice feature.

Overall, pinot was fair.  It did manage to access some archived content, handled mp3 successfully and was a fast performer.  It really fell down on word processor documents and its behavior with archives as a whole was very confusing.

In part 5, I’ll wrap up the evaluations and lay out the aggregate results.

  • http://pinot.berlios.de/ PinotDev

    Hi Rich.
    Unless I am mistaken, you don’t say what version of Pinot you tried, nor on what OS/distro.
    For your problem with the daemon, the log file in ~/.pinot may have some helpful information.
    Please also note that the indexing of PDFs, office documents and archives require third-party tools and libraries that may not have been installed on your system depending on your OS/distro.
    I hope this makes sense.

  • http://pinot.berlios.de/ PinotDev

    Oh I see this is on Ubuntu JeOS. I’d be interested in knowing what packages were pulled when you installed pinot. That would help diagnose the problems you are having.
    By the way, 0.95 can actually be built without D-Bus.

  • rich

    Hi PinotDev. That’s for the comments and sorry for the delay in responding. The Ubuntu JeOS I used is 9.10, with Pinot .94. The packages apt added in as part of the Pinot install were:

    consolekit (0.3.1-0ubuntu2 Ubuntu:9.10/karmic)
    dbus (1.2.16-0ubuntu9 Ubuntu:9.10/karmic)
    dbus-x11 (1.2.16-0ubuntu9 Ubuntu:9.10/karmic)
    defoma (0.11.10-0.2ubuntu1 Ubuntu:9.10/karmic)
    fontconfig (2.6.0-1ubuntu12 Ubuntu:9.10/karmic)
    fontconfig-config (2.6.0-1ubuntu12 Ubuntu:9.10/karmic)
    hicolor-icon-theme (0.10-2 Ubuntu:9.10/karmic)
    libatk1.0-0 (1.28.0-0ubuntu1 Ubuntu:9.10/karmic)
    libatk1.0-data (1.28.0-0ubuntu1 Ubuntu:9.10/karmic)
    libcairo2 (1.8.8-2ubuntu1.1 Ubuntu:9.10/karmic-updates)
    libcairomm-1.0-1 (1.8.0-1build1 Ubuntu:9.10/karmic)
    libck-connector0 (0.3.1-0ubuntu2 Ubuntu:9.10/karmic)
    libcurl3 (7.19.5-1ubuntu2 Ubuntu:9.10/karmic)
    libdatrie1 (0.2.2-1 Ubuntu:9.10/karmic)
    libdbus-glib-1-2 (0.80-4ubuntu1 Ubuntu:9.10/karmic)
    libdirectfb-1.2-0 (1.2.7-2ubuntu1 Ubuntu:9.10/karmic)
    libeggdbus-1-0 (0.5-1 Ubuntu:9.10/karmic)
    libexif12 (0.6.17-1 Ubuntu:9.10/karmic)
    libfontconfig1 (2.6.0-1ubuntu12 Ubuntu:9.10/karmic)
    libfontenc1 (1:1.0.4-3 Ubuntu:9.10/karmic)
    libglibmm-2.4-1c2a (2.22.1-2 Ubuntu:9.10/karmic)
    libgmime-2.4-2 (2.4.6-5 Ubuntu:9.10/karmic)
    libgtk2.0-0 (2.18.3-1ubuntu2.2 Ubuntu:9.10/karmic-updates)
    libgtk2.0-bin (2.18.3-1ubuntu2.2 Ubuntu:9.10/karmic-updates)
    libgtk2.0-common (2.18.3-1ubuntu2.2 Ubuntu:9.10/karmic-updates)
    libgtkmm-2.4-1c2a (1:2.18.2-1 Ubuntu:9.10/karmic)
    libjasper1 (1.900.1-6 Ubuntu:9.10/karmic)
    libjpeg62 (6b-14build1 Ubuntu:9.10/karmic)
    libpam-ck-connector (0.3.1-0ubuntu2 Ubuntu:9.10/karmic)
    libpango1.0-0 (1.26.0-1 Ubuntu:9.10/karmic)
    libpango1.0-common (1.26.0-1 Ubuntu:9.10/karmic)
    libpangomm-1.4-1 (2.26.0-0ubuntu2 Ubuntu:9.10/karmic)
    libpixman-1-0 (0.14.0-1 Ubuntu:9.10/karmic)
    libpng12-0 (1.2.37-1 Ubuntu:9.10/karmic)
    libpolkit-gobject-1-0 (0.94-1ubuntu1 Ubuntu:9.10/karmic)
    libsysfs2 (2.1.0-5 Ubuntu:9.10/karmic)
    libtag1c2a (1.6-2ubuntu2 Ubuntu:9.10/karmic)
    libtag1-vanilla (1.6-2ubuntu2 Ubuntu:9.10/karmic)
    libtextcat0 (2.2-2 Ubuntu:9.10/karmic)
    libtextcat-data (2.2-2 Ubuntu:9.10/karmic)
    libthai0 (0.1.12-1ubuntu0.2 Ubuntu:9.10/karmic-updates)
    libthai-data (0.1.12-1ubuntu0.2 Ubuntu:9.10/karmic-updates)
    libtiff4 (3.8.2-13 Ubuntu:9.10/karmic)
    libts-0.0-0 (1.0-7 Ubuntu:9.10/karmic)
    libxcb-render0 (1.4-1 Ubuntu:9.10/karmic)
    libxcb-render-util0 (0.3.6-1 Ubuntu:9.10/karmic)
    libxcomposite1 (1:0.4.0-4 Ubuntu:9.10/karmic)
    libxcursor1 (1:1.1.9-1build1 Ubuntu:9.10/karmic)
    libxdamage1 (1:1.1.1-4 Ubuntu:9.10/karmic)
    libxfixes3 (1:4.0.3-2build1 Ubuntu:9.10/karmic)
    libxfont1 (1:1.4.0-1ubuntu1 Ubuntu:9.10/karmic)
    libxft2 (2.1.13-3ubuntu1 Ubuntu:9.10/karmic)
    libxi6 (2:1.2.1-2ubuntu1 Ubuntu:9.10/karmic)
    libxinerama1 (2:1.0.3-2 Ubuntu:9.10/karmic)
    libxml++2.6-2 (2.26.0-2 Ubuntu:9.10/karmic)
    libxrandr2 (2:1.3.0-2 Ubuntu:9.10/karmic)
    libxrender1 (1:0.9.4-2ubuntu1 Ubuntu:9.10/karmic)
    pinot (0.94-1ubuntu1 Ubuntu:9.10/karmic)
    tsconf (1.0-7 Ubuntu:9.10/karmic)
    ttf-dejavu (2.29-2 Ubuntu:9.10/karmic)
    ttf-dejavu-core (2.29-2 Ubuntu:9.10/karmic)
    ttf-dejavu-extra (2.29-2 Ubuntu:9.10/karmic)
    xfonts-encodings (1:1.0.2-3 Ubuntu:9.10/karmic)
    xfonts-utils (1:7.4+1ubuntu1 Ubuntu:9.10/karmic)
    x-ttcidfont-conf (32 Ubuntu:9.10/karmic)

    The log I could locate under ~/.pinot was pinot-dbus-daemon.log, which did not include what looked like any helpful information. It was unable to find the filters directory (I didn’t create any), had some startup info, and complained about power management a couple of times.

    I’m not at all surprised about the need for additional libraries to index more file types — I didn’t see anything about pinot itself to make me think it couldn’t handle the data if the files had been properly indexed. For my purposes, I was looking to see what could be done without extensive configuration or tweaking, which meant relying on dependencies to pull in the ability to parse common files. Limitations there are obviously not tied to Pinot itself, but rather with the dependencies defined in Ubuntu’s packages for Pinot. While a very fair technical point, it wouldn’t be important for a user (who is rarely interested in resolving dependencies).

    I do have to say, I was really intrigued by Pinot’s general-purpose search client. It’s a great idea and it looks like there’s a lot of interesting things you could do with it to collapse local, remote, and public search into a single tool and UI.

  • VoidAndAny

    It seems you didn’t write the part 5, can you at least give me some informations about the choosen tool and the reasons ?
    Thanks a lot, your 4 articles are totally what I m’e looking for

  • http://richfriedeman.com Rich Friedeman

    Thanks for reading. I’m really glad the articles worked for you. Sorry I never did make it to the end of the project. I guess I ran out of steam. 

    In the end, a lot of it depends on your desktop environment. Beagle and Tracker, at the time of the test, were both highly serviceable and tightly integrated to Gnome and KDE desktops. There was nothing wrong with either of them, though Tracker was much faster than Beagle. That, presumably, was a Mono issue and could be related to the way I was using them rather than Mono in itself.

    The best results came from Recoll. That surprised me, as it wasn’t aligned with major desktop environments like Beagle or Tracker, so presumably it had fewer development resources. It was more work to get going, as I needed to diagnose and add the missing helpers. That was clearly worthwhile effort, though, as it was the tool which could search the most file types. Best office doc, XML, and media tag searching (except for m4a files) was very impressive. 

    They all had strengths, and my info is old. Based on my tests, though, if I had to pick a tool and no other dependency issues were a concern, Recoll would be it.

    Rich

  • Vinvixz

    Rich, thanks a lot for these articles on desktop search tools. 

Previous post:

Next post: