Full Text Search in Relational Database

Overview

In business applications, designing a full text search is a custom designing process. There is no general search algorithm, that can fulfil the requirement of optimized and meaningful business domain driven full text search. And there is a reason for that. Search is always around certain type of information. For example, find a company, or a product, or a person. If multiple types of information is included in search, then it is a combination of search of multiple types. In the case of web, search engines are mostly designed around finding a general type of information in one source “web page”. No matter what kind of information is out there on web, it is all in web pages, using same HTML standards, and linking between pages, information paragraphs. So the challenge on web search is all about volume and relevancy of information.

In business applications, using relational databases at backend, if we use a generic search algorithm, it can work but to less optimized extent. Because structure of information and set of information is different from application to application, and requirement to requirement. So a single, pre-designed search inclusive of everything (open search) cannot bring the 100% expected results when it comes to relational databases and scattered files. Each “type of search” require planning and designing including radius of information, weightage of information, retrieval of information etc. Luckily, I guess, same design process can be applied to any type of business full text search. So in business applications, intelligent searching should be type specific or combination of type specifics. We can call it business domain driven full text searching.

Essence of keyword based full-text search in business applications

In the presence of today’s powerful, web search engines, end users want keyword based suggestions and search as a granted user experience. However, majority of business software system fails to deliver that basic user expectation, and ignores this fundamental pillar of a good software system. Proper implemented search experience is not only a must have to find information, but also the quickest method to access any information. In this article I would go through the steps of designing a custom business full text search.

Scope of this article


Type of search or the unit of search is one set of information represented by bunch of business models, and set of tables in database. In this article we will focus on how to design business full text search around one set of information. To expand the scope of information, or to include multiple sets of information, the process of designing can be repeated. Results of multiple types can be combined at higher level. So we would be focusing on one ‘unit of search’, where we can identify one higher piece of information around which our search revolves. We can call it pivot information.

We will see how N number of tables linking to pivot information can be included in search. Also we will explore how to design weightage, and ranking within the information.

I have implemented this design on SQL Server many times, but I would try to avoid technology specifics and I think same design can be applied to any database engine out there with basic full text searching/ indexing capability. So I would be leaving how to execute each step to reader’s knowledge and technology.

Pivot Table

Each unit of search normally revolves around central piece of information, an entity or model. Although the overall information can span to multiple tables, but select a table that is really central to a search. We will call that table Pivot table.

Scoping

  • Identify central table of information unit. We will call it “Pivot Table”.
  • Identify scope of search, by listing down all the related tables of Pivot table. We can include N number of tables in to search. We will call it “radius of search in database”. It is like circling the set of tables on a database diagram. More tables, greater the radius of selection and vice versa.
  • There could be multiple pivots in your requirements. That means, different set of information independent from each other.

Define Full-Text Catalog

Define a new full text (or simply text) catalog in database.

  • Catalog is independent of type of search, and multiple searches, or even multiple projects, can use the same full text-catalog.
  • SQL Server Specific: Microsoft recommendation of defining catalogs are based on same characteristics (such as small number of changes versus large number of changes, or tables that change frequently during a particular time of day) together under the same full-text catalog. By setting up full-text catalog population schedules, full-text indexes stay synchronous with the tables without adversely affecting the resource usage of the database server during periods of high database activity.

Define Index on Pivot and Related Tables

For each type of search:

  • Define indexes on Pivot Table P
  • Define indexes on all the related tables (Set (RT)) that are related with main pivot table with FK or logical relation. 
    E.g. For Companies the related tables could be Contacts, Addresses, Classifications etc
  • Only add columns that contain information you want to include in search.
  • You can index any table.column that contain files embedded like Word, Excel files etc.
  • A unique index would be required for that. Make use of existing Clustered Primary Key Index on tables.

Define weightage for type of information

  • Some information is more important that other. That means, a record with that information if matched with keyword, should be ranked higher. This concept is the process of assigning weightage to information entities.
  • You have to go through the list of tables and assign a weightage to each of it. Weightage would be used to rank records while executing search.
  • Weightage will define,  if the keyword match with this table, how much weightage it would be given comparing to if the same keyword match for some other records from other tables.
  • Normally Pivot table contains the highest weightage, and tables close to Pivot table will get higher weightage.
  • Within same level of tables, you can tweak weightage as per importance of information.
  • That weightage will be multiplied with built-in search rank number (generated by full text engine) to increase or decrease the overall rank of record.

Design Search Strategy around Pivot Table

Define the search strategy that include:

  • Plan for a query where you can retrieve Pivot record and rank associated with that record for each associated table using relationship. That would be implemented in a stored procedure
  • Plan for tables or view that you would like to retrieve as a result of matching record. Here multiple tables can be linked using relationship while retrieving information.

Develop Search Server Code

Develop application code and models, to represent query, transformation, and result.

  • Write classes to represent Query on type T, FullTextSearchQuery<T> where T is data class for pivot table, we can call Pivot class.
  • Types of Queries i.e. inclusive or exclusive (with ‘AND’, or ‘OR’ operator in between them)
  • Transformation from user provided keywords and the transformed query that would be sent to database full text engine. 

Write Database Code (Stored Procedures)

We will write two central stored procedures, for each Pivot Table.

Full_Text_Search_SP

  • This is the stored procedure where we will search the actual tables and will rank the results as per matched keywords.
  • Simplest way would be to collect rank and IDs (of pivot table) from all the related tables. Take SUM of all ranks received from multiple tables, and take UNION of all the IDs while grouping by IDs, this will give set of IDs of records from Pivot table along with their rank.
  • Results then will be sorted by combined Rank in descending order to get results by relevancy

Suggest_Keyword_SP

  • Develop the keyword strategy around each Pivot table, which will be provided as suggestions on user typing.
  • Need to identify all the related tabled along with Pivot table. Number of tables you select for keyword selection could be less or more than actual searching tables, depending on strategy for keywords.
  • You can only choose tables for keywords on which Full-Text index has been defined.

Code Usage

Using all of above components, the search would be using following (please refer to diagram in next part of this article):

  • Create Instance of FullTextSearchQuery<T>
  • On front end key press of search textbox, Use Provide_Suggestions to get number of keyword suggestions (in web through Ajax).
  • Call Execute Search to get search results, in which we will get object of FullTextSearchResponse<T> which will provide list of search results.
  • Display results to front end by order of rank.

Summary

Business Domain Driven Full text search require planning and designing for each type of search. Where each type of search revolves around a pivot information or pivot table in database. We select list of tables form DB, to define scope of searching, called radius of search. Each table form search is assigned a weightage which would be calculated for ranking of results. In core of search procedure, we select records based on ranking and retrieve results, where keywords are transformed in to Inclusive or Exclusive relationships, using AND / OR operators. This design will help to achieve required business domain driven full text searching for application.

Next Part – Design Diagram

In next part, I will share design as diagram, and will describe all the components of design.

About Bajwa

Bajwa Thumbnail Ali Haider is a software developer and technology enthusiast based in Reading, UK. He has over ten years of hands on experience in software design and development using Microsoft technologies on various fronts. He attempts to keep learning and applying cutting edge technologies and best practices, and his passion is to work on complex challenging projects and to develop data crunching frameworks.

Projects

    Tag Cloud