Do you know that you can compress data while using MS SQL 2008
or any later version? Compression is available for now in SQL
Server 2008 Enterprise and Developer edition.
This type of compression reduces the size of the table by at
least of 25% and more. It also depends on how big your table
is.
When you deal with TB(s) of data and you have a RAID setup in
your dedicated server for your data with 5 SCSI HD you may want to
make sure that you are not running out of disk space. For best
performance I still use SCSI HD with 15000 RPM.
Compression can allow more rows to be stored on a page, but does
not change the maximum row size of a table or index.
MS SQL data compression provides another great benefit, since
compressed data is stored in fewer pages and therefore, queries
need to read fewer pages from the disk and thus, improves the
performance of I/O intensive workloads. However, some additional
CPU resources are required on the database server to compress and
decompress the data, while data is exchanged within the
application. That additional resource based on my experience is not
that significant at all. I am using 2 CPUs in my server and it
works just fine.
In my case since I work with large amount of data and use PAGE
compression. Row Compression helps to store data more efficiently
while using 4 bits per compressed column to store the length of the
data in the column. NULL and 0 take no additional space
Page compression works better for me since it is performing Row
Compression plus optimizes the storage of multiple rows in a page,
and this way minimizes the data redundancy.
To compress indexes, you must explicitly set the compression
property of the indexes. By default, the compression setting for
indexes will set to NONE when the index is created.
On a PAGE compression, a compressed index which could be
clustered or non- clustered, the leaf level pages are compressed as
PAGE, but no leaf pages are ROW compressed.
The leaf level is the data of the table. It is composed of data
pages. All columns of the tables are stored for every row of the
table.
No Leaf level - are composed of index pages.
Let's run the below stored procedure to see what will be the
savings when compressing a 44,058.422 table which is 44GB.
--Using the PAGE compression
USE Users
GO
EXEC sp_Estimate_Data_Compression_Savings NULL, 'Users', NULL, NULL, 'PAGE' ;
GO
See this link: for the this stored procedure named: sp_estimate_data_compression_savings
(Transact-SQL)
The results shows 1 record only and the column named:
Sample_size_with_requested_compression_settings(KB) has the value
of 27816 which was a savings of 16242.422 KB. That is %36.87 and it
was calculated in here: 1 - (27816/44058.422) * 100 = %36.87
Now let's create a compressed clustered index on the Users table
and see the actual size of the table. Pay attention that I added
the the DATA_COMPRESSION = PAGE at the end of the index script.
CREATE UNIQUE CLUSTERED INDEX Users_PK ON dbo.Users
(
Users_PK
)
WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
DATA_COMPRESSION = PAGE)
In this case I have no indexes on the Users table and therefore
creating the clustered index would be relatively faster since no
other indexes should be rebuilt on that table. Creating the
compressed clustered index will compress the table as well, at the
same time.
To create the index on 44GB table and 240 million records took
about 23.27 min and the index size was just 54.750MB. The actual
amount of kb for the Users table was reduced to 29361.078. The
savings was: 1 - (29361.078/44058.422 * 100) = %33.358. For me it
is a major savings also in disk space.
We can also use the below command in MS SQL 2008 or above to
enable PAGE compression while trying to re build the HEAP (When the
table does not have a clustered index) and requires all
non-clustered indexes on the table to be rebuilt so that they have
pointers to the new row locations in the heap.
ALTER TABLE Users
REBUILD WITH (Data_Compression = PAGE)
For more info about the MS SQL data compression features see
this subject SQL Server
Books Online.
Conclusions:
-----------------
Using the compression capability of MS SQL is great and saves HD
space and increases performance and reduces cost. It increases the
query performance by reducing I/O.In addition, no application
changes are required to implement table or index compression.
Usually if you need to create a clustered index I would remove
first all non-clustered indexes since they will not be compressed
when they will be re-built while a compressed clustered index is
created.
Make sure to set up Compression as DATA_COMPRESSION = PAGE on
non-clustered indexes after you created a compressed clustered
index. This way you take full advantage of the compression
capability of data and indexes which was added in MS SQL 2008. In
addition, see this article of how to remove duplicate records in a smart
way.