Data compression in large tables in MS SQL server

Do you know that you can compress data while using MS SQL 2008 or any later version? Compression is available for now in SQL Server 2008 or 2012 Enterprise and Developer edition.

This type of compression reduces the size of the table by at least of 25% and more. It also depends on how big your table is.

When you deal with TB(s) of data and you have a RAID setup in your dedicated server for your data with 5 SCSI HD you may want to make sure that you are not running out of disk space. For best performance I still use SCSI HD with 15000 RPM.

Call us for Free Consultation for Remote DBA services at our SQL Consulting Firm at: 732-536-4765

Compression can allow more rows to be stored on a page, but does not change the maximum row size of a table or index.

MS SQL data compression provides another great benefit, since compressed data is stored in fewer pages and therefore, queries need to read fewer pages from the disk and thus, improves the performance of I/O intensive workloads. However, some additional CPU resources are required on the database server to compress and decompress the data, while data is exchanged within the application. That additional resource based on my experience is not that significant at all. I am using 2 CPUs in my server and it works just fine.

In my case since I work with large amount of data and use PAGE compression. Row Compression helps to store data more efficiently while using 4 bits per compressed column to store the length of the data in the column. NULL and 0 take no additional space

Page compression works better for me since it is performing Row Compression plus optimizes the storage of multiple rows in a page, and this way minimizes the data redundancy.

To compress indexes, you must explicitly set the compression property of the indexes. By default, the compression setting for indexes will set to NONE when the index is created.

On a PAGE compression, a compressed index which could be clustered or non- clustered, the leaf level pages are compressed as PAGE, but no leaf pages are ROW compressed.

The leaf level is the data of the table. It is composed of data pages. All columns of the tables are stored for every row of the table.

No Leaf level - are composed of index pages.

Let's run the below stored procedure to see what will be the savings when compressing a 44,058.422 table which is 44GB.

See this link: for the this stored procedure named: sp_estimate_data_compression_savings (Transact-SQL)

The results shows 1 record only and the column named: Sample_size_with_requested_compression_settings(KB) has the value of 27816 which was a savings of 16242.422 KB. That is %36.87 and it was calculated in here: 1 - (27816/44058.422) * 100 = %36.87

Now let's create a compressed clustered index on the Users table and see the actual size of the table. Pay attention that I added the the DATA_COMPRESSION = PAGE at the end of the index script.

In this case I have no indexes on the Users table and therefore creating the clustered index would be relatively faster since no other indexes should be rebuilt on that table. Creating the compressed clustered index will compress the table as well, at the same time.

To create the index on 44GB table and 240 million records took about 23.27 min and the index size was just 54.750MB. The actual amount of kb for the Users table was reduced to 29361.078. The savings was: 1 - (29361.078/44058.422 * 100) = %33.358. For me it is a major savings also in disk space.

We can also use the below command in MS SQL 2008 or above to enable PAGE compression while trying to re build the HEAP (When the table does not have a clustered index) and requires all non-clustered indexes on the table to be rebuilt so that they have pointers to the new row locations in the heap.

For more info about the MS SQL data compression features see this subject SQL: Server Books Online

Conclusions:
-----------------

Using the compression capability of MS SQL is great and saves HD space and increases performance and reduces cost. It increases the query performance by reducing I/O.In addition, no application changes are required to implement table or index compression.

Call us for Free Consultation for Remote DBA services at our SQL Consulting Firm at: 732-536-4765

Usually if you need to create a clustered index I would remove first all non-clustered indexes since they will not be compressed when they will be re-built while a compressed clustered index is created.

Make sure to set up Compression as DATA_COMPRESSION = PAGE on non-clustered indexes after you created a compressed clustered index. This way you take full advantage of the compression capability of data and indexes which was added in MS SQL 2008 and above. In addition, see this article of how to remove duplicate records in a smart way.

Doron Farber - The Farber Consulting Group

I started to develop custom software since 1985 while using dBase III from Aston Tate. From there I moved to FoxBase and to FoxPro and ended up working with Visual FoxPro until Microsoft stopped supporting that great engine. With the Visual FoxPro, I developed the VisualRep which is Report and Query Engine. We are also a dot net development company, and one of our projects is a web scrapping from different web sites. We are Alpha AnyWhere developers, and the Avis Car Rental company trusted us with their contract management software that we developed with the Alpha Five software Engine.

Comments

Got questions about unleashing the full potential of your project?
We’ve got the answers!

Contact Us

Search