Optimize your sql query for best search result
Search Optimization and query optimization in SQL Server
MS SQL Server
When developing solutions that require querying large amount of data, you may want to consider user Stored Procedures. When accessing multiple "query patterns", you want to program your system to use the most efficient algorithm to return your data to the user. While using dynamic SQL may work for small databases, a preferred method would be to use a Stored Procedure to optimize performance.
We recently implemented this technique for a client who is planning to target a large national audience. His database will grow form a few hundred thousand records to potentially millions in rather short order. We had to create a search engine for this business to deal with both his current query needs but keep in mind the growth potential.
See how the search engine was implemented via Alpha Five Software Development and within SQL database.
The Search Concept
The idea of this search technique is to create a search to accommodate all of the possible scenarios that might be based on some or any of the following fields: FirstName, LastName, Address, City, State, Zip,Phone, Email, SerialNumber, PO and UserID. Some of the fields are from related tables (Address data, PO, SerialNumber, etc.)
We achieve search optimization by passing parameters into a stored procedure.
MS SQL and Alpha Five - a partnership for high performance for large database searches. Using ms sql stored procedure to optimize the performance for quick search results, while using Alpha Five as the front end for the software development. Please see below video:
Caching Query Plans
Stored procedure compile the first time they are run, and the procedure is then cached and re-used by subsequent queries. This caching is one of the primary benefits of using stored procedures. The SQL server does not need to keep recreating an execution plan - it simply re-uses the cached plan with new parameters submitted to the query.
To ensure optimization we also recommended running the Update Statistic process, at least once a week. This process helps the server to make the best decisions about which indexes to use when a query is processed - keeping up to date the key values in indexes and columns. It should be run manually whenever a large amount of data has been modified or added, changed, or deleted (such as batch imports, global price changes, etc.). It should also be run if there are many DML operations.
Note: (DML is abbreviation of Data Manipulation Language. These are command used to retrieve, store, modify, delete, insert and update data in database. For instance, SELECT, UPDATE and INSERT statements.
It may also be advisable to compress the data in order to increase performance. You may want to read this article: Data compression in large tables in MS SQL server.
As mentioned earlier, when you run any query the very first time, SQL Server builds a query plan and stores the plan in the sys table. The next time that specific query is run, the plan is re-used. If the plan is not used for an extended period of time it is automatically cleared from the cache and the plan is recreated when needed.
How Dynamic Query works
Your application may also build a query plan from a dynamic query as shown below:
In the above case MS SQL server may Auto Parameterized the query to improve the performance of the Dynamic SQL. SELECT * FROM tb WHERE col=@1. The @1 is basically a place holder for the parameter values like @1. This way the same query plan could be used. Dynamic SQL could not be complied as static sql. In a more complicated dynamic sql it would be more complicated to trouble shoot.
If you don't parameterized your dynamic sql then each time a new CHECKNO is submitted a new query plan is rebuilt by the SQL server and thus may be slower than a comparable stored procedure.
Dynamic SQL can also potentially introduce security vulnerabilities that you do not want. Hackers may try to inject data that causes the sql script to execute statements that you did not intend.
Note: I had a situation that one day I got more than a 100 emails via my web site contact form while someone tried to inject some values into the sql script. I was fortunate that my data was not exposed since I was not using any dynamic SQL for that contact form, but they always try.
Check our Custom Software Development Services.
Why static SQL is better optimized for searches
See above explanation of Caching Query Plans. I could easily make decisions based on parameters value which query to execute. See this topic about Decision making based on parameters values for more details. The below SQL statements are basically optimized based on any search possibility that may come up. The idea is to keep track of the primary key or each record that is searched and insert that into a table to store temporarily the result set. In this scenario we store the CustomerID into a table named SearchResults as shown below:
In addition, I also created a clustered index on the ResultID field. Since after the first SQL run we need to run another sql statement and JOIN it into the Customer table as follows:
Based on the above we show the actual customer record and all details per customers including multiple addresses, contact points, purchase order and much more. If the search is too broad then a large number of CustomeIDs will be inserted into the SearchResults table.
If the search is done based on the PO or a Serial Number which is unique and also related to one customer than only one record is found. The same idea works for Serial Number as well. The idea is to refine the search based on the parameters that are passed into the specific table and no need to add additional over head when adding an addition JOIN condition.
In each search we provide the approximate capability and if you don't know how the Last Name is spelled properly then using the wild card % on the right field will provide that capability in most cases. See my article as String manipulation on large data using LIKE operator or PATINDEX.
The UserID is also passed per each search so for every new search we delete the last search result for the same user. See below code for the actual search to get the idea.
Decision making based on parameters values
By studying the code below you can see that this routine only runs the simplest sql needed by checking for parameters that are not empty. Based on the values of the parameter the right sql statement is running to providing maximum performance.
If you search only for FirstName and LastName then the sql statement works only the Customer's table. The IF statement will make sure that only one simple is sql statement is executed as shown below:
I was using the same strategy for much more complicated search. The difference is per search scenario I was creating a separate stored procedure while the main stored procedure made the decisions of which one to execute.
When is the time to de-normalize the data
If you scroll down below there is only one select statement that is running on two tables and it is the last one. You can allow yourself to JOIN another table if you have maybe several millions of records and also depends on the server hardware configuration.
If you deal with hundreds Gigabytes or Terabytes of data you will want to combine the address into the First Name and Last Name fields and this way no JOIN condition is required. Working with large scale of data requires a complete different strategy. You may check some of the articles I have written on how to manipulate big data in sql starting with Processing hundreds of millions records got much easier.
Keeping track on each sql search
Your application may also require you to collect usage data - we store all the parameter values and timestamp the query. This data can help provide statistics identifying the most popular searches and when those queries are most often run.
Limit the top 100 records search:
If you have many records, than you may want to limit the search for the first X number of records as follows.
Parameter sniffing allows SQL to compile a plan that is tailored to the type of parameter that is actually passed into the stored procedure. One way to resolve that is to use a query hint as shown above. We basically use the OPTION(RECOMPILE) query hint to solve the parameter sniffing problems. This one is designed to give you the control at the actual query level and no need to recompile the entire stored procedure. This can happen when the stored procedure re-uses the previously cached plan but may not be optimal for the current input parameter values.
Another way to fix this problem is to convert the parameters to local variables and use them at the qeury level.