Nettalk Newsletter

Current Issue (Issue 121)

Thought of the Week:- "Experience is the name everyone gives to their mistakes."

Zebra:- http://www.indexdata.dk/zebra/ Free (Open Source). 846 KB. Platform Independent.

Zebra is a free, fast, friendly information management system. It is a high-performance, general-purpose structured text indexing and retrieval engine. It reads structured records in a variety of input formats (eg. email, XML, MARC) and allows access to them through exact boolean search expressions and relevance-ranked free-text queries. Zebra supports large databases (more than ten gigabytes of data, tens of millions of records). It supports incremental, safe database updates on live systems. One can access data stored in Zebra using a variety of Index Data tools (e.g. YAZ and PHP/YAZ) as well as commercial and freeware Z39.50 clients and toolkits. Search-and-retrieve applications can be written using APIs in a wide variety of languages, communicating with the Zebra server using industry-standard information-retrieval protocols.

Zebra is designed for portability, and should compile on most Unix-systems that provide an ANSI-C compiler. It also works with GNU C compiler (gcc). For Windows, one can use MS VC++ 6.0. The features of the software are: -

Very large databases: logical files can be automatically partitioned over multiple disks.

Robust updating - records can be added and deleted "on the fly" without rebuilding the index from scratch. Records can be safely updated even while users are accessing the server. The update procedure is tolerant to crashes or hard interrupts during database updating - data can be reconstructed following a crash.

Configurable to understand many input formats. A system of input filters driven by regular expressions allows most ASCII-based data formats to be easily processed. SGML, XML, ISO2709 (MARC), and raw text are also supported.

Searching supports a powerful combination of Boolean queries as well as relevance-ranking (free-text) queries. Truncation, masking, full regular expression matching and "approximate matching" (eg. spelling mistakes) are all handled.

Index-only databases: data can be, and usually is, imported into Zebra's own storage, but Zebra can also refer to external files, building and maintaining indexes of "live" collections.

Piggy-backed presents are honored in the search request - that is, a subset of the found records can be returned directly with a search response, enabling search and retrieval to happen in a single round-trip.

Easily configured to support different application profiles, with tables for attribute sets, tag sets, and abstract syntaxes.

Zebra has been deployed in numerous applications, in both the academic and commercial worlds. To know more, one can follow http://www.indexdata.dk/zebra/doc/apps.php.

***********************************************************************

Site of the Week:- http://www.scirus.com/

Scirus is a specialist search engine for scientific, technical and medial information sources. It offers two types of services: Web Sources - provide information for which no subscription or online registration is required. Scirus searches the entire Web and excludes sites with no scientific content. Examples of Web sources are university Web sites, learned society pages, scientists home pages, preprint servers, commercial companies, Patents etc. Membership sources - are information sources for which either a paid subscription or online registration is required, and often including peer-reviewed scientific information not directly accessible by standard search engines. Through Advanced Search option, one can select Publication Interval; Information Types; File Formats; Content Sources; and Subject areas.

That's all for this week. See you next week.

Madhuresh Singhal

Archives (To be Available Soon)

2003

2002

2001