Tuesday 16 October 2012

SharePoint 2013 - Search Logical Architecture

The search architecture has changed in many ways. There are new components, new topology and new features. The new architecture is to facilitate greater redundancy and to be more scalable. 




Search is one of the big bets in SharePoint 2013. Search is new, search is different from all previous versions of SharePoint. The platform has been consolidated. It is a combination of FAST Search and SharePoint Search components. And also the good news is, it is the same from Foundation to Server. No more different flavors.
Note: be aware, that this article is mainly written based on SharePoint 2013 Technical Review and although the beta version has arrived, functionality can still change when the RTM is released.
The search architecture has changed in many ways. There are new components, new topology and new features. The new architecture is to facilitate greater redundancy and to be more scalable. The following picture displays the logical architecture and its components. In this post all components are explained briefly.
2012-10-08-Search-Part01-01.png

Crawl Component

The crawler is responsible for crawling the content. The crawler uses the connectors to retrieve data from the content sources, but is does not parse any text or documents. The result of crawling is both the actual content and the associated metadata. All crawled items are passed over to the next component, the Content Processing Component.

Content Processing Component

This component processes crawled items and then feeds these items to the Index Component. So, this content processing component does actually parse the content by means of Format Handlers. It has automatic file format detection and it no longer relies on file extension. Out of the box there are high-performance format handlers for HTML, DOCX, PPTX, TXT, Image, XML and PDF formats. IFilters are still supported.

Index Component

The Index Component is used in both feeding and query processes. On one hand it receives processed items from the Content Processing Component and writes those items to the index. On the other hand it receives queries from the Query Processing Component and provides results sets in return.
The Index Component is also responsible for moving the indexed content when the topology changes by the Search Administration Component.

Query Processing Component

When the Query Processing Component receives a query from the search front-end, it analyzes and processes the query. The processed query is then submitted to the Index Component. The Index Component returns a result set based on the processed query back to the Query Processing Component, which in turn processes that result set before sending it back to the search front-end.

Analytics Process Component

In SharePoint 2010 there was a Web Analytics service application. In SharePoint 2013 this is now part of the Search architecture. The new analytics component tracks and analyzes crawled items and how users interact with search results. The analytics component uses it to continuously improve the search relevance. The results of this Analytics Processing Component are returned back to the Content Processing Component to be included in the search index.

Search Administration Component

The Search Administration Component is responsible for the search topology and search provisioning. It coordinates the search components Content Processing, Query, Index and Analytics.

Summary

A lot has changed when it comes to SharePoint search. The best of all search types in SharePoint 2010 (including FAST) are consolidated into one. And this one search engine is used from Foundation to Server. The new architecture with all the components is to provide greater redundancy and for better scalability.




No comments:

Post a Comment