-------------------------------
Riak Search 0.13.0 Release Notes
--------------------------------

We are happy to announce the first public release of Riak Search! Riak
Search is a distributed, easily-scalable, failure-tolerant, real-time,
full-text search engine built around Riak Core and tightly integrated
with Riak KV.

Riak Search allows you to find and retrieve your Riak objects using
the objects' values. When a Riak KV bucket has been enabled for Search
integration (by installing the Search pre-commit hook), any objects
stored in that bucket are also indexed seamlessly in Riak Search.

You can then use a Riak Client API (currently supported in the PHP,
Python, and Ruby APIs) to perform queries that return a list of
bucket/key pairs matching the query. Alternatively, you can use a
search query to kick off a Riak map/reduce operation.

Operationally, Riak Search is very similar to Riak KV. An
administrator can add nodes to a cluster on the fly with simple
commands to increase performance or capacity. Index and query
operations can be run from any node. Multiple replicas of data are
stored, allowing the cluster to continue serving full results in the
face of machine failure. Partitions are handed off and replicated
across clusters using the same mechanisms as Riak KV.

At index time, Riak Search tokenizes a document into an inverted index
using standard Lucene Analyzers. (For performance, the team
re-implemented some of these in Erlang to reduce hops between Erlang
and Java.) Custom analyzers can be created in either Java or
Erlang. The system consults a schema (defined per-index) to determine
required fields, the unique key, the default analyzer, and which
analyzer should be used for each field. Field aliases (grouping
multiple fields into one field) and dynamic fields (wildcard field
matching) are supported.

After analyzing a document into an inverted index, the system uses a
consistent hash to divide the inverted index entries--called
postings--by term across the cluster. This is called term-partitioning
and is a key difference from other commonly used distributed
indexes. Term-partitioning was chosen because it provides higher
overall query throughput with large data sets. (This can come at the
expense of higher-latency queries for especially large *result* sets.)

Search queries use the same syntax as Lucene, and support most Lucene
operators including term searches, field searches, boolean operators,
grouping, lexicographical range queries, and wildcards (at the end of
a word only).

Querying has two distinct stages, planning and execution. During query
planning, the system creates a directed graph of the query, grouping
points on the graph in order to maximize data locality and minimize
inter-node traffic. Single term queries can be executed on a single
node, while range queries and fuzzy matches are executed using the
minimal set of nodes that cover the query.

As the query executes, Riak Search uses a series of merge-joins,
merge-intersections, and filters to generate the resulting set of
matching bucket/key pairs.

For a backing store, the Riak Search team developed
merge_index. merge_index takes inspiration from the Lucene file
format, Bitcask (our standard backing store for Riak KV), and SSTables
(from Google's BigTable paper), and was designed to have a simple,
easily-recoverable data structure, to allow simultaneous reads and
writes with no performance degredation, and to be forgiving of write
bursts while taking advantage of low-write periods to perform data
compactions and optimizations.

Note that Riak Search should be considered beta software. Please be
aware that there may be bugs and issues that we have not yet covered
that may require a full data reload with the next version.

Roadmap
---
- Query parsing improvements and fixes.
- Enhancements to the Solr interface.
- Search support in Java and Javascript APIs.

Known Issues
------------
186 - Query parser fails on dates, negative numbers, and some complex queries. (311, 411)
679 - Query parser fails on negated stopwords.
311 - Qilr does not correctly parse negative numbers
346 - Currently no way to globally clear the schema cache.
399 - In certain cases, handoff can potentially lead to extraneous postings pointing
      to a missing or changed document
429 - UTF-8 indexing fails in certain cases.
611 - Error in inclusive/exclusive range building
622 - Query planner always uses the default analyzer, ignore field analyzer settings.
741 - Queries may return fewer results during handoff.
