<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rssdatehelper="urn:rssdatehelper"><channel><title>The Farber Consulting Group Inc.</title><link>http://www.dfarber.com</link><pubDate></pubDate><generator>umbraco</generator><description></description><language>en</language><item><title>Remove duplicate records in sql</title><link>http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx</link><pubDate>Mon, 26 Dec 2011 20:59:12 GMT</pubDate><guid>http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx</guid><description><![CDATA[ 
<p>In MS SQL 2005 and later editions, a much better way has been
provided to remove duplicate records smartly, while having full
control of which records to remove. It can be accomplished by using
the ROW_NUMBER() which will return a number of row within a
partition of a result set. With this code:</p>

<pre class="brush: sql;">
ROW_NUMBER() OVER (PARTITION BY Clause ORDER BY Clause)
</pre>

<p>You can have a partition on something like Email or SSN or a
combination of several fields like: First, Last, Street Number,
Street Name and City.</p>

<p>&nbsp;</p>

<p>The idea is to perform data cleansing while removing records
where you can control any aspect of it. What's even better is the
ability to combine different pieces of data from different records
into one record while the rest of the records for that partition
(group) are removed with no data being lost. Duplicate data is a
problem that exists in many databases and the purpose of removing
duplicate records is to improve productivity and increase the
performance of your database. The below article will discuss the
technique of how to remove duplicate records - rows in MSSQL
database.</p>

<p>&nbsp;</p>

<p>Let's create some sample data:</p>

<pre class="brush: sql;">
USE TempDB

-- Check if the table exists and if not then create it. 
IF NOT EXISTS(SELECT * 
    FROM Sys.Tables
    WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[#TableWithDups]'))
    
    -- Create temp table
    BEGIN
    CREATE TABLE #TableWithDups(
        [PK]    TINYINT NOT NULL PRIMARY KEY,
        [EMail] [varchar](40) NULL,
        [First] [varchar](15) NULL,
        [Last] [varchar](20) NULL,
        [Gender] [varchar] (1) NULL, 
        [DOB] [varchar](8) NULL,
        [Phone] [varchar](10) NULL,
        [StreetNumber] [varchar](30) NULL,
        [StreetName] [varchar](30) NULL
    ) 
    END
-----------------------------------------------------------------
-- Check if some temp tables exist in the TempDB database -------
-----------------------------------------------------------------
ELSE
BEGIN
    TRUNCATE TABLE #TableWithDups
END

IF EXISTS(SELECT * 
    FROM Sys.Tables
    WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[#DeDup_Testing]'))
BEGIN
    DROP TABLE #DeDup_Testing   
END

--------------------------------------------------------------------
--- Insert duplicates records into TableWithDups
--------------------------------------------------------------------
INSERT INTO #TableWithDups ([PK],[EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName]) 
VALUES (1,'MSSQL1@RemoveDupRecords.Com','Susan','Smith','F', '19600912','','1100','Main Street')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName]) 
VALUES (2,'MSSQL1@RemoveDupRecords.Com','Susan','Smith','',   '','1113058336','1100','Main Street')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName]) 
VALUES (3,'MSSQL1@RemoveDupRecords.Com','Susan','Smith','F','','1113058336','1100','Main Street')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName]) 
VALUES (4,'MSSQL2@RemoveDupRecords.Com','Joe','Smith','','19640825','2223058336','1100','Main Street')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName])  
VALUES (5,'MSSQL2@RemoveDupRecords.Com','Joe','Smith','M','19640825','2223058336','1100','Main Street')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName])  
VALUES (6,'MSSQL3@RemoveDupRecords.Com','Alex','Walker','M','','5553058888','55','KingFisher Road')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName]) 
VALUES (7,'MSSQL3@RemoveDupRecords.Com','Alex','Walker','F','19580912','','55','KingFisher Road')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName])  
VALUES (8,'MSSQL3@RemoveDupRecords.Com','Alex','Walker','','1952','','55','KingFisher Road')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName]) 
VALUES (9,'MSSQL4@RemoveDupRecords.Com','Uri','Pearl','M','1961','3333054422','110','Robertsville Road')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName])  
VALUES (10,'MSSQL4@RemoveDupRecords.Com','Uri','Pearl','','19600912','3333054422','110','Robertsville Road')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName]) 
VALUES (11,'MSSQL4@RemoveDupRecords.Com','Uri','Pearl','','','','110','Robertsville Road')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName]) 
VALUES (12,'MSSQL5@RemoveDupRecords.Com','Terry', 'Thurber','M','','4443058377','500','Larkin Street')

INSERT INTO #TableWithDups ([PK], [EMail],[First],[Last],[Gender],[DOB],[Phone],[StreetNumber],[StreetName]) 
VALUES (13,'MSSQL5@RemoveDupRecords.Com','Terry', 'Thurber','','19600809','','500','Larkin Street')
</pre>

<p>Below is some sample code to explain this idea of how to remove
duplicates using the MS SQL ROW_NUMBER and partition idea:</p>

<pre class="brush: sql;">
SELECT 
    [First],
    [Last], 
    [Email],
    [DOB],
    ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Email,DOB DESC,Gender) AS Rn, 
    StreetNumber,
    StreetName
FROM 
    #TableWithDups 
</pre>

<p>In the above, the Email works as a partition and has a unique
value to allow us to create a group of records on that field and
also control the order of how the records will show up. In the data
below Susan Smith with the record that has the value of Rn = 2 and
Rn=3 are considered as duplicate records and should be deleted.</p>

<p>&nbsp;</p>

<p>In actuality the only difference compared to the original data
that was created in the temp table named: TableWithDups is the
additional column named: Rn. Every record in the Rn that is greater
than 1 is considered as a duplicate one. Adding the ORDER BY on
different fields can change the way the records will show up as
Rn=1. See Sample 1 below to get the idea:<br />
<br />
 Sample 1:</p>

<p><img src="/media/67222/data-cleansing-remove-duplicate-records-in-sql-database-sample1.png" alt="Data cleansing remove duplicate records In SQL Database" title="Data cleansing remove duplicate records In SQL Database"/></p>

<p>&nbsp;</p>

<p>When running the below code as Common Table Expression (CTE)
flexibility is increased as shown below and a WHERE clause is added
as WHERE Rn = 1. As a side note using the CTE is considered as a
temporary result. The CTE can be self-referenced multiple times in
the same query. It is similar to a derived table but much more
flexible.</p>

<pre class="brush: sql;">
;
WITH DeDupe
AS
(
SELECT 
    [PK],
    [First],
    [Last], 
    [Email],
    [Gender],
    [DOB],
    ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Email,DOB DESC) AS Rn, 
    StreetNumber,
    StreetName
FROM 
    #TableWithDups 
)   
SELECT [PK],[First],[Last],[Email], [Gender],[DOB],Rn,StreetNumber,StreetName
FROM DeDupe WHERE Rn = 1
ORDER BY [First],[Last]     
</pre>

<p>Here is the data as shown below in Sample 2 after running the
above code:<br />
<br />
 Sample 2:<br />
 <img src="/media/67227/find-and-remove-duplicate-records-in-mssql-data-sample2.png" alt="Find and remove duplicate records In MSSQL Data" title="Find and remove duplicate records In MSSQL Data"/></p>

<p>&nbsp;</p>

<p>As you see in the above in Sample 2, all duplicate records that
have Rn &gt; 1 do not show up anymore. But if you look closely we
lost some data that could have been combined in the record with
Rn=1. We can increase the partition by adding more fields to make
sure there are no duplicate addresses, but that would not change
the fact that some data will be lost.</p>

<p>&nbsp;</p>

<p>The real power of the above technique is that we can combine
several values from different duplicate records in the record that
gets the value of Rn=1. Let's examine closely what we have above.
If you look at Sample 2 then Joe Smith and Terry Thurber lost the
data in the Gender column. Looking at Sample 1 under PK= 5, the
Gender is M for Joe and also PK = 13 which is Terry Thurber. The
same problem exists also for the Phone number as well. The
following code will take care of this problem and incorporate all
required pieces of data in one record.</p>

<pre class="brush: sql;">
;
WITH DeDupe
AS
(
SELECT
    PK,
    [First],
    [Last],
    [Email],
    [Gender],
    [DOB],
    MAX(Phone) OVER (PARTITION BY Email) AS Phone_Updated ,
    MAX(CASE WHEN CAST(DOB AS INT) &gt; 0 THEN CAST(DOB AS INT) ELSE DOB END) 
    OVER (PARTITION BY Email) AS DOB_Updated,
    MAX(Gender) OVER (PARTITION BY Email) AS Gender_Updated,  
    ROW_NUMBER() OVER (PARTITION BY Email,First,Last,LTRIM(LEFT(UPPER([StreetNumber]),4)) + LEFT(UPPER([StreetName]),4)  
    ORDER BY DOB DESC,Phone DESC,Gender DESC,Email,First,Last) AS Rn, 
    [Phone],
    [StreetNumber],
    [StreetName]
FROM #TableWithDups 
) 
SELECT 
    PK,
    [First],
    [Last],
    [Email],
    [Gender],
    [Gender_Updated], 
    [DOB],
    CASE WHEN CAST(DOB_Updated AS VARCHAR(10))='0' THEN '' ELSE CAST(DOB_Updated   
    AS VARCHAR(10)) END DOB_Updated,
    [Phone],
    [Phone_Updated],
    [Rn],
    [StreetNumber],
    [StreetName]
INTO #DeDupe_Testing FROM DeDupe WHERE Rn &gt; 0

-- Let's show some data
SELECT * FROM #DeDupe_Testing WHERE Rn=1
</pre>

<p>The below data is the result of running the above code:</p>

<p>&nbsp;</p>

<p>Sample 3:</p>

<p><img src="/media/67232/remove-duplicate-records-in-sql-sample3a.png" alt="Remove duplicate records in sql server and merge records to one" title="Remove duplicate records in sql server in a smart way"/></p>

<p>The below is the continuation of the above data.</p>

<p>&nbsp;</p>

<p><img src="/media/67237/data-cleansing-find-and remove-duplicate-records-sample3b.png" alt="Data cleansing- find and remove duplicate records in ms sql server" title="Data cleansing-find and remove duplicate records in ms sql in a smart way"/></p>

<p>&nbsp;</p>

<p>Using the MAX(Phone) with the partition technique on each field
that we don't want to lose data, will make sure to get the data in
the row that Rn=1. Having this code on the Phone MAX(Phone) OVER
(PARTITION BY Email) AS Phone_Updated and create a new column to
show the new data near the original column helps to analyze the
data to make sure everything works fine.</p>

<p>&nbsp;</p>

<p>In the above code we make sure that no data is lost while we
remove all duplicate records. When all partitions are handled then
we create ROW_NUMBER() OVER (PARTITION…) on all combinations of
fields that create a unique partition. We also ORDER each field
that was part of the MAX() as DESC. Order the field as DESC is
crucial and is part of the technique of merging all data from
different rows into one row of the same partition. In addition, I
added code around the DOB (Date of Birth) since in some cases it
has only 0 value and clearing the DOB to nothing is a better data
for me.</p>

<p>&nbsp;</p>

<p>In sample 3 all fields such as: Gender_Updated, DOB_Updated and
Phone_Updated are filled with data while all other duplicate
records were removed. In Sample 3 Joe Smith and Terry Thurber did
not lose any data compared to Sample 2. In addition, the idea is to
copy all records to a different table while leaving the original
data as UN touched for a backup.</p>

<p>&nbsp;</p>

<p>The DOB is a VARCHAR (10), since when import data it is always a
character based field that comes in many different formats. For
additional info on that matter please see this article at: <a
title="String Manipulation on large data using LIKE operator or PATINDEX"
 href="/computer-consulting-blog/2011/3/28/string-manipulation-on-large-data-using-like-operator-or-patindex.aspx">data cleansing and manipulating date
format</a>.</p>

<p><br />
 Conclusions:<br />
 ==========</p>

<p>Once you have found your duplicate records, you face the problem
of merging multiple records into one without losing any data. Some
of the problems you might face when merging records could be that
it has a phone or another record could have an email address and
you cannot afford to remove these duplicate records when you are
taking the risk of losing some important data. The duplicate record
remover that was discussed above covers all of these points and
provides a solution that safely removes duplicate records from your
database. In addition, to use the code above the Sample 3 data is
when you know for sure that you may have fields with empty data;
otherwise simply use the code above the Sample 2 data.<br />
<br />
</p>
]]></description></item><item><title>ERP Software Selection</title><link>http://www.dfarber.com/computer-consulting-blog/2011/8/22/erp-software-selection.aspx</link><pubDate>Mon, 22 Aug 2011 21:50:57 GMT</pubDate><guid>http://www.dfarber.com/computer-consulting-blog/2011/8/22/erp-software-selection.aspx</guid><description><![CDATA[ 
<h3><span style="font-size: medium;">Approach to Selecting ERP
Software for your SMB Company</span></h3>

<p>&nbsp;</p>

<p>You are an owner of a small or medium size company, and you need
to select ERP software to run your business. You are confused
between the large availability of ERP software packages. How do you
select the right one for your business, and how do you find one you
can afford?</p>

<p>&nbsp;</p>

<p>But first, you may be asking yourself what is ERP software and
why do I need it in the first place? ERP stands for Enterprise
Resource Planning. This software is designed to manage end-to-end
processes of an enterprise. A typical ERP software help manage
processes such as sales and order entry, customer invoicing,
procurement, manufacturing, inventory control, shipping, quality
control, and accounting just to name a few. For more information
about ERP, please refer to my blog titled <a
title="Benefits of ERP Software for Small or Medium Size Businesses"
 href="/computer-consulting-blog/2011/7/20/benefits-of-erp-software-for-small-or-medium-size-businesses.aspx">The Benefits of ERP Software for Small or
Medium Size Businesses</a>.</p>

<p>&nbsp;</p>

<p>Like a typical small and medium size business, you may be
running your business by deploying many spreadsheets, Access
databases, and leveraging popular QuickBooks or Peachtree
accounting software. However, your business is growing, and you are
exceeding the number of users and records limits of QuickBooks, and
managing all these spreadsheets is starting to prove inefficient
and increasingly error prone - costing you real money in lost
sales, missed shipments, lost inventory of some items and shortages
of others. Besides, QuickBooks Enterprise seemed to have enough
functionality to get you started, but you heard that real ERP
software can automate many of your business processes.</p>

<p>&nbsp;</p>

<p>You heard of ERP software, and your initial research unveiled
that there are many packages starting from the top tier SAP and
Oracle systems that cost millions and ending with many PC based
packages. So, how do you select the one that fits your business,
and how do you find a package that will not consume your entire
budget?</p>

<p>&nbsp;</p>

<p>It turns out that selecting a complex software package that runs
almost all aspects of your business is both art and science.
Standard ERP selection methodologies were developed by management
consulting firms. They are based on documentation of the existing
business process and careful analysis and documentation of business
requirements. This traditional approach includes the following main
activities:</p>

<p>&nbsp;</p>

<ul>
<li><span>Document the As-Is business processes to understand the
current situation</span></li>

<li><span>Document the To-Be business processes to be deployed in
the new environment</span></li>

<li><span>Document business requirements for the software including
any regulatory requirements</span></li>

<li><span>Document technical requirements - such as performance
criteria, requirement for computing environment (for example, must
run on Windows OS)</span></li>

<li><span>Identify software candidates and document gap analysis
between software capability and above</span></li>

<li><span>Develop a detailed RFP based on above information and
send to selected software ISVs</span></li>

<li><span>Analyze the responses, and invite the first cut of ISVs
to perform demos</span></li>

<li><span>Develop final ranking by criteria that is based on best
functional fit, technology, and cost considerations</span></li>

<li><span>Negotiate with final two candidates and select one to
proceed to contract</span></li>
</ul>

<p>&nbsp;</p>

<p>Two main challenge of this approach to selecting software for a
SMB is that a) it takes considerable effort, time, and expense, and
b) discourages taking advantage of best-practices already built
into an ERP package - by specifying in too much detail what and how
you want the package to work, you may miss some best business
practices that may be built into a candidate package, or end-up
with a package that is overly complex for your needs.</p>

<p>&nbsp;</p>

<p>Based on our extensive experience selecting and implementing
enterprise software, we developed a more efficient approach
specifically for small and medium size businesses (SMBs) -- one
that assures due-diligence in software selection while optimizing
the time and effort of the selection process.</p>

<p>&nbsp;</p>

<p>First of all, as an SMB, you have to recognize that modern Open
Source ERP software offer viable solutions to many companies.
Today's Open Source ERP offers competitive functionality to
traditional ERP software at a very low-cost that you can afford (in
principle, open source is free, but most customers would not be
able to implement the free version).</p>

<p>&nbsp;</p>

<p>Our ERP software selection process is based on the following 10
steps:</p>

<p>&nbsp;</p>

<pre>
    
1.    Identify the top level processes that are needed to be supported by the future ERP solution. For example, order 
       entry, procurement, manufacturing, inventory control, shipping.   
2.    Identify next level process characteristics. For example make-to-order, process/batch manufacturing, cross-dock
       distribution processes, drop shipment support, etc.
3.    Identify industry specific regulatory requirements such as GxP and Computer Validation FDA requirements for Life
       Sciences industry.
4.    Walk through the existing company operations review and note the following:
           a.  Unique business processes that as competitive advantage and critical for your company operations
           b.  Unique software that was developed in-house or purchased that is critical for your company operations
5.    Identify any technical requirement specific to your company (such as must use Apple OS, or prefer Windows, 
       or prefer minimum IT support).
6.    Based on the above information, develop a high level functional fit document.
7.    Match the fit document with a list of best-in-class Open Source ERP vendors and identify one package that can 
       address your needs at a lowest cost.
8.    Document and present expected customization requirements (if any) to support your business.
9.    Bring the ERP software vendor (ISV) to perform demo.
10.  Get final buy-in and negotiate agreements with ISV and implementation partner.
</pre>

<p>&nbsp;</p>

<p>For a typical SMB customer, the selection process may take less
than a week. The traditional management consulting based process
takes months to complete.</p>

<p>Our ERP software selection methodology assumes that an
experienced ERP consultant help with the evaluation and selection
process. The success and efficiency of our approach is largely
dependent on the ability of our experienced consultants to help you
in identifying your core business needs and matching them with ERP
software capability.</p>

<p>&nbsp;</p>

<p>We are able to distinguish between those specific processes that
are truly your competitive advantage and need to be preserved,
versus those that could be changed and adapt best-practices already
embedded in ERP software. Recognizing this important distinction
between your custom business processes is more art than science and
largely dependent on experience of our consultants and principals
and is the key to a successful selection of your future ERP
solution in a minimum amount of time.</p>

<p>&nbsp;</p>

<p>Therefore, if you need ERP software to grow or maintain the
growth of your business, you should know that the selection and
implementation of such software does not have to cost of a fortune
in money and time. There are affordable solutions in the market
place, and partnering with the right services provider, will help
you kickoff a successful CRM and ERP implementation effort.</p>

<p><br />
<br />
</p>
]]></description></item><item><title>Data compression in large tables in MS SQL server</title><link>http://www.dfarber.com/computer-consulting-blog/2011/7/26/data-compression-in-large-tables-in-ms-sql-server.aspx</link><pubDate>Tue, 26 Jul 2011 15:09:22 GMT</pubDate><guid>http://www.dfarber.com/computer-consulting-blog/2011/7/26/data-compression-in-large-tables-in-ms-sql-server.aspx</guid><description><![CDATA[ 
<p>Do you know that you can compress data while using MS SQL 2008
or any later version? Compression is available for now in SQL
Server 2008 Enterprise and Developer edition.</p>

<p>&nbsp;</p>

<p>This type of compression reduces the size of the table by at
least of 25% and more. It also depends on how big your table
is.</p>

<p>&nbsp;</p>

<p>When you deal with TB(s) of data and you have a RAID setup in
your dedicated server for your data with 5 SCSI HD you may want to
make sure that you are not running out of disk space. For best
performance I still use SCSI HD with 15000 RPM.</p>

<p>&nbsp;</p>

<p>Compression can allow more rows to be stored on a page, but does
not change the maximum row size of a table or index.</p>

<p>&nbsp;</p>

<p>MS SQL data compression provides another great benefit, since
compressed data is stored in fewer pages and therefore, queries
need to read fewer pages from the disk and thus, improves the
performance of I/O intensive workloads. However, some additional
CPU resources are required on the database server to compress and
decompress the data, while data is exchanged within the
application. That additional resource based on my experience is not
that significant at all. I am using 2 CPUs in my server and it
works just fine.</p>

<p>&nbsp;</p>

<p>In my case since I work with large amount of data and use PAGE
compression. Row Compression helps to store data more efficiently
while using 4 bits per compressed column to store the length of the
data in the column. NULL and 0 take no additional space</p>

<p>&nbsp;</p>

<p>Page compression works better for me since it is performing Row
Compression plus optimizes the storage of multiple rows in a page,
and this way minimizes the data redundancy.</p>

<p>&nbsp;</p>

<p>To compress indexes, you must explicitly set the compression
property of the indexes. By default, the compression setting for
indexes will set to NONE when the index is created.</p>

<p>&nbsp;</p>

<p>On a PAGE compression, a compressed index which could be
clustered or non- clustered, the leaf level pages are compressed as
PAGE, but no leaf pages are ROW compressed.</p>

<p>&nbsp;</p>

<p>The leaf level is the data of the table. It is composed of data
pages. All columns of the tables are stored for every row of the
table.</p>

<p>&nbsp;</p>

<p>No Leaf level - are composed of index pages.</p>

<p>&nbsp;</p>

<p>Let's run the below stored procedure to see what will be the
savings when compressing a 44,058.422 table which is 44GB.</p>

<p>&nbsp;</p>

<pre class="brush: sql;">
--Using the PAGE compression 
USE Users 
GO 
EXEC sp_Estimate_Data_Compression_Savings NULL, 'Users', NULL, NULL, 'PAGE' ; 
GO  
</pre>

<p>&nbsp;</p>

<p>See this link: for the this stored procedure named: <a
href="http://msdn.microsoft.com/en-us/library/cc280574.aspx">sp_estimate_data_compression_savings
(Transact-SQL)</a></p>

<p>&nbsp;</p>

<p>The results shows 1 record only and the column named:
Sample_size_with_requested_compression_settings(KB) has the value
of 27816 which was a savings of 16242.422 KB. That is %36.87 and it
was calculated in here: 1 - (27816/44058.422) * 100 = %36.87</p>

<p>&nbsp;</p>

<p>Now let's create a compressed clustered index on the Users table
and see the actual size of the table. Pay attention that I added
the the DATA_COMPRESSION = PAGE at the end of the index script.</p>

<p>&nbsp;</p>

<pre class="brush: sql;">
CREATE UNIQUE CLUSTERED INDEX Users_PK ON dbo.Users 
(
    Users_PK
)   
WITH (
    PAD_INDEX  = OFF, 
    STATISTICS_NORECOMPUTE  = OFF, 
    SORT_IN_TEMPDB = OFF, 
    IGNORE_DUP_KEY = OFF, 
    DROP_EXISTING = OFF,
    ONLINE = OFF, 
    ALLOW_ROW_LOCKS  = ON, 
    ALLOW_PAGE_LOCKS  = ON,
    DATA_COMPRESSION = PAGE)
</pre>

<p>&nbsp;</p>

<p>In this case I have no indexes on the Users table and therefore
creating the clustered index would be relatively faster since no
other indexes should be rebuilt on that table. Creating the
compressed clustered index will compress the table as well, at the
same time.</p>

<p>&nbsp;</p>

<p>To create the index on 44GB table and 240 million records took
about 23.27 min and the index size was just 54.750MB. The actual
amount of kb for the Users table was reduced to 29361.078. The
savings was: 1 - (29361.078/44058.422 * 100) = %33.358. For me it
is a major savings also in disk space.</p>

<p>&nbsp;</p>

<p>We can also use the below command in MS SQL 2008 or above to
enable PAGE compression while trying to re build the HEAP (When the
table does not have a clustered index) and requires all
non-clustered indexes on the table to be rebuilt so that they have
pointers to the new row locations in the heap.</p>

<p>&nbsp;</p>

<pre class="brush: sql;">
ALTER TABLE Users 
REBUILD WITH (Data_Compression = PAGE)
</pre>

<p>&nbsp;</p>

<p>For more info about the MS SQL data compression features see
this subject SQL <a
href="http://technet.microsoft.com/en-us/library/cc280449.aspx">Server
Books Online.</a></p>

<p>&nbsp;</p>

<p>Conclusions:<br />
 -----------------</p>

<p>Using the compression capability of MS SQL is great and saves HD
space and increases performance and reduces cost. It increases the
query performance by reducing I/O.In addition, no application
changes are required to implement table or index compression.</p>

<p>&nbsp;</p>

<p>Usually if you need to create a clustered index I would remove
first all non-clustered indexes since they will not be compressed
when they will be re-built while a compressed clustered index is
created.</p>

<p>&nbsp;</p>

<p>Make sure to set up Compression as DATA_COMPRESSION = PAGE on
non-clustered indexes after you created a compressed clustered
index. This way you take full advantage of the compression
capability of data and indexes which was added in MS SQL 2008. In
addition, see this article of how to <a
title="Remove duplicate records in mssql while merging records from different fields without losing data"
 href="/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx">remove duplicate records</a> in a smart
way.<br />
<br />
</p>
]]></description></item><item><title>Benefits of ERP Software for Small or Medium Size Businesses</title><link>http://www.dfarber.com/computer-consulting-blog/2011/7/20/benefits-of-erp-software-for-small-or-medium-size-businesses.aspx</link><pubDate>Wed, 20 Jul 2011 10:31:36 GMT</pubDate><guid>http://www.dfarber.com/computer-consulting-blog/2011/7/20/benefits-of-erp-software-for-small-or-medium-size-businesses.aspx</guid><description><![CDATA[ 
<p>If you are a typical stakeholder of a small or medium size
business, you probably prefer to invest in assets that contribute
directly to the growth and revenue of your business (for example,
tools for making your product, machines to expand capacity, people
to design and sell your product, etc.). You know that information
technology and business software is important, but you avoid
spending capital in this area, as long as you can manage by using
spreadsheets and other office software, and at most utilize the
popular QuickBooks to manage your accounting, invoicing, and do
very basic inventory management.</p>

<p>&nbsp;</p>

<p>You may have heard of Enterprise Resource Planning (ERP)
software that large companies use to run their businesses, but you
think that you cannot afford this software anyway, and you are not
sure what would be the real benefits of such an investment. This
article is going to dispel some of these myths - ERP software has a
real value for your business and it does not have to be expansive
at all!</p>

<p>&nbsp;</p>

<p>First, I would like to quickly define what ERP software does and
how it works. ERP software has been around for at least 20 years
and it was developed from a previous class of software called MRP
(Material Requirement Planning) that existed at least 20 years
before the emergence of ERP. ERP software was developed to manage
most of the business related processes of a typical enterprise. ERP
can be customized to fit many industries including general
manufacturing, retail, CPG, Life Sciences, Process, distribution,
and service industries.</p>

<p>&nbsp;</p>

<p>The main benefit of ERP software is that it integrates the flow
of information between various business processes and integrates
business transactions with company's financial system in real time.
ERP enables improved business decisions by making information from
various parts of the business available to throughout your business
operations in real-time. It posts relevant business transactions to
company's general ledger and other financial systems allowing
management to see financial implications in real time. ERP
typically covers all processes starting from sales order entry,
shipping/delivery, product pricing, warehousing and inventory,
procurement, production, invoicing (both A/P and A/R), service,
financials, and many others.</p>

<p>&nbsp;</p>

<p>For example, when entering a sales order, customer service
person (or your web based client) can see the available inventory
for this sales order in the various warehouse locations. If no
inventory is available at the time of the order, ERP system can
allocate planned production or planned purchases, or estimate when
a make-to-order could be completed by your manufacturing
operations. ERP can also apply the proper pricing for the
applicable customer, sales tax, and shipping charges, and determine
promised delivery date. Once the order is entered, the system
reserves the inventory to fulfill this order. The system then
automatically plans the required production orders to replenish the
sold inventory and related external purchases of raw material (or
finished goods depending on whether you make the product in-house
or buy it from supplier). Please note that this is just one example
of a typical ERP process - ERP transactions can be adapted to a
large variety of industry and process requirements.</p>

<p>&nbsp;</p>

<p>ERP contains critical master data for customers, vendors,
products (some system refer to items or materials), bill of
materials (BOMs), manufacturing work processes and routings,
assets, and many other master records. Maintaining these master
records would help your company to improve consistency and work
from the same set of data.</p>

<p>&nbsp;</p>

<p>Many small companies may be able to get by using office tools
such as spreadsheets and simple accounting software (e.g.,
QuickBooks or Peachtree), while most of the work is done by company
owners and the number of employees is small (although, some SMBs
have complex business processes that require ERP almost from the
start). However, as your revenue and employee grow, the lack of
proper integrated business software will start impact your business
results in real ways that impact your bottom line such as lost
orders, poor planning, inventory shortages, missed shipments,
missed invoicing, and general lack of control.</p>

<p>&nbsp;</p>

<p>ERP software have many real business benefits and including the
following:</p>

<p>&nbsp;</p>

<ul>
<li>Improved overall management visibility to business
operations</li>

<li>Improved visibility to company's financial performance in
real-time</li>

<li>Improved sales performance</li>

<li>Improved inventory control and planning</li>

<li>Consistent business policies and processes throughout your
company improving customer service and profitability</li>

<li>Improved on-time delivery</li>

<li>Reduced procurement cost</li>

<li>Reduced work-in-process inventory</li>

<li>Reduced labor costs (for both office operations and
manufacturing)</li>

<li>Reduced overall waste</li>
</ul>

<p>….and many more. ERP benefits should be identified and
documented for your specific business.</p>

<p>&nbsp;</p>

<p>The availability of user-friendly QuickBooks software provided a
good stop-gap solution for many small businesses, but with all the
advancement of QuickBooks functionality, it is still far from being
integrated ERP software with proper controls, auditing, and
scalability required to support end-to-end business operations.
Features that make QuickBooks user friendly (such as ability to
reverse transactions), make it a terrible software for controlling
a growing business with growing number of employees. SMBs should
consider upgrading QuickBooks to ERP as soon as they grow beyond
five to ten employees and over $5 million in annual revenue (this
is just a general rule of thumb). Many companies should consider
implementing ERP light software from the start to avoid migration
and re-training efforts.<br />
<br />
 SMBs have as complex business processes and need as large
corporations. Large corporations invested millions in implementing
state-of-the-art ERP software packages, but can SMBs afford ERP? It
is clear that there are many business benefits to justify
investment in this critical software, but as stated in the
beginning of this article, SMBs are sensitive to cash outlays. The
good news is that the emergence of Open Source ERP software
substantially reduced the cost of ERP.</p>

<p>&nbsp;</p>

<p>Open Source software existed for many years. Many have heard of
the Open Source Linux operating system. Open Source software is
developed by a community of developers that have no commercial
interest in the software they develop. Therefore, in its pure form,
Open Source software is free! In the case of Open Source ERP,
however, it is not entirely free; because Open Source ERP ISVs
(independent software vendors) typically have a basic free version
and then offer a more capable version that contains close source
code or charge required support and maintenance fees. However, the
cost of Open Source ERP software license is still about 10 times
lower than the cost of traditional close source ERP software. In
recent years, Open Source ERP considerably advanced from a
functional perspective becoming a viable alternative to traditional
ERP software.</p>

<p>&nbsp;</p>

<p>Leveraging Open Source ERP software, small and medium size
businesses (SMBs) can now afford to implement enterprise class
software that was available up until recently only to large
corporations. They can scale and grow beyond QuickBooks and
Peachtree at a very low investment cost, or implement from the
start scalable ERP light solution. Since ERP is such an important
enabler of business success, SMBs cannot afford to overlook this
opportunity.</p>

<p>&nbsp;</p>

<p><a title="Simon Raban Bio" href="/enterprise-solutions-erp-crm/simon-raban-bio.aspx">Written By
Simon Raban</a></p>
]]></description></item><item><title>Secure MS SQL server when running web application</title><link>http://www.dfarber.com/computer-consulting-blog/2011/5/16/secure-ms-sql-server-when-running-web-application.aspx</link><pubDate>Mon, 16 May 2011 20:05:47 GMT</pubDate><guid>http://www.dfarber.com/computer-consulting-blog/2011/5/16/secure-ms-sql-server-when-running-web-application.aspx</guid><description><![CDATA[ 
<p>When people need to connect publically, those individuals are
taking the risk of outside attacks on your server. A year and half
ago I noticed that 2 hackers are trying to get into my MS SQL 2008
server via the SA account.</p>

<p>&nbsp;</p>

<p>The Windows Log account basically provided me with every attempt
to log into the server plus the IP address of the hacker. One was
in Canada and the other was from China. I called the Internet
provider of the individual from Canada, and they shut down his
Internet, the person from China was another story, I sent them an
email with no reply.</p>

<p>&nbsp;</p>

<p>But with some simple steps we can avoid the attacks on the
server. The first step is to change the default port which is 1433
to any new one that are you comfortable with, but try in a
different range then the current default port.</p>

<p>&nbsp;</p>

<p>Secondly, disable the SA account and you can use the below code
for that. Make sure that you have a valid admin account before you
do that.</p>

<pre class="brush: sql;">
ALTER LOGIN sa DISABLE
</pre>

<p>Then create a trigger script to validate an authorized user
based on a table that stores all known credentials of the users.
The trigger should be found under Server Objects-&gt;Triggers.</p>

<p>&nbsp;</p>

<p>Before you create the trigger, let's create the table with the
following fields in a different database then your data. It could
be Utils.mdf</p>

<pre class="brush: sql;">
   
CREATE TABLE [dbo].[Logins](
    [Logins_PK] [Int] IDENTITY(1,1) NOT NULL,
    [Name] [VarChar](40) NULL) 
ON [PRIMARY]
</pre>

<p>Now we can populate this table with all authorized accounts and
the below code will add the Administrator's account into the table.
The Principal_ID is the ID number of the Principals table. It is
unique within that table.</p>

<pre class="brush: sql;">
INSERT INTO Logins(Name) 
SELECT Name FROM Sys.Server_Principals
WHERE Principal_ID IN (273)
</pre>

<p>A Windows Login is an example of an indivisible principal. If
you ran the below code then you could see all accounts available,
and then you need to make a decision which one should be added into
the Logins table.</p>

<pre class="brush: sql;">
SELECT Name FROM Sys.Server_Principals
</pre>

<p>The below were the first accounts that were added by me:</p>

<pre class="brush: sql;">
MyDataBaseName\Administrator 
NT AUTHORITY\SYSTEM
NT SERVICE\MSSQLSERVER
NT SERVICE\SQLSERVERAGENT
</pre>

<p>Then Insert other users' accounts that should be able to log in.
I would add one account at the time using the above INSERT sample
code.</p>

<p><br />
 Here is the Trigger code:<br />
 ----------------------------------</p>

<pre class="brush: sql;">
CREATE TRIGGER [AuditLogin] 
--server means instance level
ON ALL SERVER  
WITH EXECUTE AS SELF
FOR LOGON
AS BEGIN
    IF ORIGINAL_LOGIN() NOT IN (SELECT Name FROM Utils..Logins) 
    ROLLBACK;
END
GO

SET ANSI_NULLS OFF
GO

SET QUOTED_IDENTIFIER OFF
GO

ENABLE TRIGGER [AuditLogin] ON ALL SERVER
GO
</pre>

<p>Conclusions:<br />
 ===========<br />
 With the simple steps I described above I increased the security
of my server tremendously. Changing MS SQL port is a crucial first
step. Then when ever you log into the server the AuditLogin trigger
is called which checks accounts found in the Logins table. If the
account is found in the table then the Log In will be continued by
the server. If not then the trigger will roll back that account.
Using that technique you can basically disable a user's account by
Deleting that record from the Login table without disabling the
actual account in the server.</p>
]]></description></item><item><title>Reasons For Hiring an SEO Company in USA</title><link>http://www.dfarber.com/computer-consulting-blog/2011/3/28/reasons-for-hiring-an-seo-company-in-usa.aspx</link><pubDate>Mon, 28 Mar 2011 18:58:45 GMT</pubDate><guid>http://www.dfarber.com/computer-consulting-blog/2011/3/28/reasons-for-hiring-an-seo-company-in-usa.aspx</guid><description><![CDATA[ 
<p>Google is the most democratic nation in the landscape of the
World Wide Web where every website or business is given an equal
opportunity to represent itself and compete for the top slot,
without any prejudice. This analogy may sound a bit bizarre but
this is the reality of the world governed by the Big G. It is like
a ray of hope for small and medium sized companies which want to
compete with the corporate giants on the same footings.</p>

<p>&nbsp;</p>

<p>If you are a start-up company or even an established brand with
billions of dollars in resources you still need to show Google that
you are all about quality if you wish to get the top positions in
search engine rankings. But a very obvious question which strikes
our minds is why a strong Google ranking is so important. It is
important because it is the easiest way of driving traffic to your
site without any cost. In other words it is the best mode of
advertisement as more than 85% of the online world uses search
engines to find information, and if your site ranks in the top
three then you are going to reap the benefits and profit for your
business. The power of Google placement is so strong that market
pundits say that if you are not visible in first few pages of
search results you do not exist.</p>

<p>&nbsp;</p>

<p>In the market of today, due to stiff competition and constantly
changing parameters of search engines it becomes a daunting task to
rank well without the assistance of an SEO Expert in the USA. The
Farber Consulting Group is a leading SEO (Search Engine
Optimization) Company in the USA which has led several online SEO
campaigns for its clients for years.</p>

<p>The team of SEO Experts at the leading SEO Company in the USA,
The Farber Consulting Inc takes care of all the aspects of on-page
and off page optimization of client sites at the guaranteed lowest
prices.</p>

<p>&nbsp;</p>

<p>If you are looking for an SEO Expert Agency who can take care of
all your online marketing needs, contact us and your business will
be known to many more people and generate much more money.</p>
]]></description></item><item><title>String Manipulation on large data using LIKE operator or PATINDEX</title><link>http://www.dfarber.com/computer-consulting-blog/2011/3/28/string-manipulation-on-large-data-using-like-operator-or-patindex.aspx</link><pubDate>Mon, 28 Mar 2011 15:00:09 GMT</pubDate><guid>http://www.dfarber.com/computer-consulting-blog/2011/3/28/string-manipulation-on-large-data-using-like-operator-or-patindex.aspx</guid><description><![CDATA[ 
<p>The idea is to perform data cleansing on the Date and Time field
from different formats into a unify format such as: YYYYMMDD. Here
is a sample data to be used for the data cleansing that I will
perform below.</p>

<p>&nbsp;</p>

<pre class="brush: sql;">
  
CREATE TABLE #T 
    (
        D1 [VARCHAR] (20) NULL 
    )
INSERT INTO #T VALUES('5/15/2006 18:11')
INSERT INTO #T VALUES('5/13/2006 17:24')
INSERT INTO #T VALUES('2007-02-07 11:38:07')
INSERT INTO #T VALUES('9/4/2008 6:55 19a')
INSERT INTO #T VALUES('5/11/2006 0:47:52')
INSERT INTO #T VALUES('2007-01-04 12:43:00')
INSERT INTO #T VALUES('')
INSERT INTO #T VALUES('  6/11/2006 0:47:52')
INSERT INTO #T VALUES('9/22/2002 07:20:07')
INSERT INTO #T VALUES('09/22/2002 07:20:07')
INSERT INTO #T VALUES('4/13/2007')
INSERT INTO #T VALUES('2/1/2007 20:23') 
INSERT INTO #T VALUES('04-21-2011 18:47:42')
INSERT INTO #T VALUES('04-21-2008 17:11:00')
INSERT INTO #T VALUES('2/1/2007 11:44') 
INSERT INTO #T VALUES('5/9/2007 9:22')
INSERT INTO #T VALUES('04/13/2007')
INSERT INTO #T VALUES('')
INSERT INTO #T VALUES('2004-04-21 16:00:00')
INSERT INTO #T VALUES('04-21-2008 17:11:00')
INSERT INTO #T VALUES('2004')
INSERT INTO #T VALUES('2010-04-21 16:00:00')
INSERT INTO #T VALUES('0000-04-21 16:00:00')
INSERT INTO #T VALUES('2011')
INSERT INTO #T VALUES('2004')
INSERT INTO #T VALUES('1700')
INSERT INTO #T VALUES('1800')
INSERT INTO #T VALUES('Nov 2 2010')
INSERT INTO #T VALUES('02-22-03 00:00:00')
INSERT INTO #T VALUES('02-22-03 00:00:00')
INSERT INTO #T VALUES('10/5/1941 8/2/2005')
INSERT INTO #T VALUES('30.09.2002')
INSERT INTO #T VALUES('05:05:06' )

ALTER TABLE #T ADD [D2] VARCHAR(20) 
ALTER TABLE #T ADD [Update_Type] TinyInt 

SELECT * FROM #T
</pre>

<p>The reason that I am using string manipulation instead of the
CONVERT() function is: Since the table could have millions of dates
that could not be converted using the CONVERT() function and
therefore string manipulation is required. Even though most records
could use the CONVERT() function, I will still would rather use the
string manipulation strategy since in one transaction I convert all
dates whether they can use the CONVERT() or not. See examples for
these dates:</p>

<pre class="brush: sql;">
SELECT CONVERT(VARCHAR(15),CAST('30.09.2002' AS DATE),112)
</pre>

<p>The ISDATE() function will return 0 but this date could be
converted as 20090930 and I could have 5 millions of these
dates.</p>

<p><br />
 If you ran the above code you will get the below error:<br />
<br />
</p>

<p>Msg 241, Level 16, State 1, Line 1 Conversion failed when
converting date and/or time from character string.</p>

<p><br />
 Here is another example of date that should be using string
manipulation:</p>

<pre class="brush: sql;">
SELECT CONVERT(VARCHAR(15),CAST('1/27/2007 6:55 17a' AS DATE),112)
</pre>

<p><br />
 The actual date could be 20070127 but ISDATE() returns 0 as seems
below.</p>

<pre class="brush: sql;">
SELECT ISDATE('1/27/2007 6:55 17a') 
</pre>

<p><br />
 The following are 4 patterns that I will use for this article:
Each different pattern will be converted to this format as:
YYYYMMDD. So the date that appears as this one 9/19/2009 will be
shown as this one: 20090919. The above is a realistic data that I
need to deal when import data and then perform data cleaning using
MS SQL scripts.</p>

<p><br />
 1. 9/19/2009<br />
 2. 1/1/2009<br />
 3. 09/25/2010<br />
 4. 2009-01-09<br />
<br />
 Extracting this format of date: 9/19/2009</p>

<pre class="brush: sql;">
SELECT 
    D1
FROM
    #T  
WHERE  D1 &lt;&gt; '' AND D1 LIKE '%[0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]%' 
</pre>

<p>Adding the left wild card will bring un desirable results of
dates as shown below with different formats. You need to use only
the right wild card so the requested pattern will show up as
needed. This way you can perform the string manipulation without
any surprises. As a rule of thumb I also make sure all data is
trimmed before starting working on it. (For the import I use
SSIS).</p>

<pre class="brush: sql;">
--5/15/2006 18:11<br />
--5/13/2006 17:24
--5/11/2006 0:47:52
--  6/11/2006 0:47:52
--9/22/2002 07:20:07
--09/22/2002 07:20:07
--4/13/2007
--04/13/2007
</pre>

<p>1. When running the below code for this format: 9/19/2009 you
get the right data.</p>

<pre class="brush: sql;">
SELECT 
    D1
FROM
    #T  
WHERE  D1 &lt;&gt; '' AND D1 LIKE '[0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]%' 
</pre>

<p>2. Extracting this format of this kind of date: 1/1/2009</p>

<pre class="brush: sql;">
SELECT 
    D1
FROM
    #T  
WHERE D1 &lt;&gt; '' AND D1 LIKE '[0-9][/][0-9][/][0-9][0-9][0-9][0-9]%' 
</pre>

<p>Running the above code with the right wild card guarantees the
right format as shown below.</p>

<pre class="brush: sql;">
--9/4/2008 00:38:45
--2/1/2007 20:23
--2/1/2007 11:44
--5/9/2007 9:22
</pre>

<p>3. Extracting this format of date: 09/25/2010</p>

<pre class="brush: sql;">
SELECT 
    D1
FROM
    #T  
WHERE D1 &lt;&gt; '' AND D1 LIKE '[0-9][0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]%' 
</pre>

<p>The below data shows up</p>

<pre class="brush: sql;">
09/22/2002 07:20:07
04/13/2007 
</pre>

<p>4. Handle this format: 2009-01-09</p>

<pre class="brush: sql;">
SELECT
    D1
FROM 
    #T 
WHERE D1 &lt;&gt; '' AND D1 LIKE '[0-9][0-9][0-9][0-9][-][0-9][0-9][-][0-9][0-9]%' 
</pre>

<p>Running the above query will bring the below records:</p>

<pre class="brush: sql;">
--2007-02-07 11:38:07
--2007-01-04 12:43:00
--2004-04-21 16:00:00
</pre>

<p>Every pattern I will test as shown below to make sure I am
getting the right data. In real table I will have a Clustered index
on the PK and will add WHERE clause like this one: User_PK BETWEEN
1 AND 10000 and this way the data comes much quicker.</p>

<pre class="brush: sql;">
SELECT
    D1,
    REPLACE(LEFT(D1,10),'-','') AS ResultDate
FROM 
    #T 
WHERE D1 &lt;&gt; '' AND D1 LIKE '[0-9][0-9][0-9][0-9][-][0-9][0-9][-][0-9][0-9]%' 
</pre>

<p>Running the above query will bring the below records:</p>

<pre class="brush: sql;">
--2007-02-07 11:38:07  20070207
--2007-01-04 12:43:00   20070104
--2004-04-21 16:00:00   20040421
</pre>

<p>For the table that you perform the data cleansing, the idea is
to create another column that all updates will be implemented
there. This way I can compare and see if the conversion was done
correct. The Tiny column that also was added, I like to use it as a
flag for the different type of conversions. In reality there could
be at least 20 types of conversions that could be needed for the
full conversion for the date field. See below code that will
convert all different types of dates into YYYYMMDD in one
script.</p>

<pre class="brush: sql;">
UPDATE #T
    SET D2 = CASE WHEN D1 LIKE '[0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]%' THEN 
                SUBSTRING(D1,6,4) + '0' + REPLACE(LEFT(D1,5),'/','')            -- 9/19/2009
             WHEN D1 LIKE '[0-9][/][0-9][/][0-9][0-9][0-9][0-9]%' THEN 
                SUBSTRING(D1,5,4) + '0' + LEFT(D1,1) + '0' + SUBSTRING(D1,3,1)  -- 9/9/2009
             WHEN D1 LIKE '[0-9][0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]%' THEN 
                SUBSTRING(D1,7,4) + REPLACE(LEFT(D1,5),'/','')                  -- 09/19/2009
             WHEN D1 LIKE '[0-9][0-9][0-9][0-9][-][0-9][0-9][-][0-9][0-9]%' THEN
                REPLACE(LEFT(D1,10),'-','')                                     -- 2009-01-09
         ELSE D2 END, 
    Update_Type = CASE WHEN D1 LIKE '[0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]%' THEN 1 -- 9/19/2009
           WHEN D1 LIKE '[0-9][/][0-9][/][0-9][0-9][0-9][0-9]%' THEN 2                  -- 9/9/2009
           WHEN D1 LIKE '[0-9][0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]%' THEN 3        -- 09/19/2009
           WHEN D1 LIKE '[0-9][0-9][0-9][0-9][-][0-9][0-9][-][0-9][0-9]%' THEN 4        -- 2009-01-09
        ELSE D2 END
FROM 
    #T  
WHERE D1 &lt;&gt; ''    

SELECT * FROM #T 
</pre>

<p>Running the above SELECT statement shows the updated columns as
shown in the below:</p>

<pre class="brush: sql;">
--D1                    D2          Update_Type
--5/15/2006 18:11       20060515    1
--5/13/2006 17:24       20060513    1
--2007-02-07 11:38:07   20070207    4
--9/4/2008 00:38:45     20080904    2
--5/11/2006 0:47:52     20060511    1
--2007-01-04 12:43:00   20070104    4
--                      NULL        NULL
--  6/11/2006 0:47:52   NULL        NULL
--9/22/2002 07:20:07    20020922    1
--09/22/2002 07:20:07   20020922    3
--4/13/2007             20070413    1
--2/1/2007 20:23        20070201    2
--04-21-2011 18:47:42   NULL        NULL
--04-21-2008 17:11:00   NULL        NULL
--2/1/2007 11:44        20070201    2
--5/9/2007 9:22         20070509    2
--04/13/2007            20070413    3
--                      NULL        NULL
--2004-04-21 16:00:00   20040421    4
--04-21-2008 17:11:00   NULL        NULL
</pre>

<p>The entire update technique should be used within a batch and
please see this article I wrote about it and how to implement
it:</p>

<p>&nbsp;</p>

<p><a
href="/computer-consulting-blog/2011/1/14/processing-hundreds-of-millions-records-got-much-easier.aspx">
/computer-consulting-blog/2011/1/14/processing-hundreds-of-millions-records-got-much-easier.aspx</a></p>

<p>&nbsp;</p>

<p>With this technique it keeps the log file very small</p>

<p>&nbsp;</p>

<p>If you prefer to run each process one at the time then you can
use the code as shown below for this format: 9/19/2009</p>

<pre class="brush: sql;">
UPDATE #T
    SET D2 = SUBSTRING(D1,6,4) + '0' + REPLACE(LEFT(D1,5),'/',''),
    Update_Type = 1
WHERE D1 &lt;&gt; '' AND D1 LIKE '[0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]%' 
</pre>

<p>I also tried to test the PATINDEX() function instead of the LIKE
operator as shown below. In this case I actually compared the
PATINEX() function to the LIKE operator. Running in a 300 million
records table, the LIKE was always faster in most cases at least in
1 second. The LIKE is getting to be by far very slow when there are
only small amount of records such as: 14030 records found for a
specific pattern in big tables as mentioned above.</p>

<p>&nbsp;</p>

<p>Running the same pattern using the PATINDEX() function took only
10 seconds while using the LIKE took 1 min and 9 seconds. Usually I
ran the COUNT() to get the idea how many records to be processed
for each pattern. In most cases I will use the LIKE operator for
this kind of data cleansing. Make sure to create an index on the
field that you process data on it. Remember to create the same type
of field for UPDATE purpose. This one took only 10 seconds to count
14030 records out of 300 million records.</p>

<pre class="brush: sql;">
SELECT  -- 10 seconds
    COUNT(D1)
FROM
    #T  
WHERE PATINDEX('[0-9][0-9][-][0-9][0-9][-][0-9][0-9][0-9][0-9]%',D1)=1 AND D1 &lt;&gt;'' 
</pre>

<p>This one took 1 min and 9 seconds for the same table.</p>

<pre class="brush: sql;">
SELECT 
    COUNT(D1)
FROM
    #T 
WHERE D1 &lt;&gt; '' AND D1 LIKE '[0-9][0-9][-][0-9][0-9][-][0-9][0-9][0-9][0-9]%' 
</pre>

<p>----------------------------------------------------------</p>

<p>------------------------ Conclusion: ------------------</p>

<p>----------------------------------------------------------</p>

<p>I covered some steps I take for different types of the data
conversion and in the process I realized that the PATINDEX() could
be much faster versus the LIKE operator for some conditions. There
could be different types of data that cannot be converted and must
be ignored. That's why a second field is nice to have, since that
will be the final field for the production database.</p>

<p>&nbsp;</p>

<p>There are many dates that have different patterns and could be
legit ones for conversion but will fail the CONVERT() function.
Only string manipulation provides the ultimate control and minimize
any errors of date conversion.</p>
]]></description></item><item><title>Processing hundreds of millions records got much easier</title><link>http://www.dfarber.com/computer-consulting-blog/2011/1/14/processing-hundreds-of-millions-records-got-much-easier.aspx</link><pubDate>Fri, 14 Jan 2011 16:56:33 GMT</pubDate><guid>http://www.dfarber.com/computer-consulting-blog/2011/1/14/processing-hundreds-of-millions-records-got-much-easier.aspx</guid><description><![CDATA[ 
<p><span>MS SQL 2008 - Processing hundreds of millions records can
be done in less than an hour.</span></p>

<p>&nbsp;</p>

<p>Processing hundreds of millions of records requires a different
strategy and the implementation should be different compared to
smaller tables with only several millions of records.</p>

<p>&nbsp;</p>

<p>Say you have 800 millions of records in a table and you need to
delete 200 million. Obviously you can use this code:</p>

<pre class="brush: sql;">
DELETE FROM Users WHERE DOB = ''
</pre>

<p>The problem is that you are running one big transaction and the
log file will grow tremendously. The process can take a long time.
It also depends on the speed of your server as well.</p>

<p>&nbsp;</p>

<p>The solution is to use small batches and process 1 to several
millions of records at the time. This way the log file stays small
and whenever a new process starts, the new batch will reuse the
same log file space and it will not grow. Before I used the batch
process technique, in one time I had 250 GB log file when I tried
to split the Address field.</p>

<p>&nbsp;</p>

<p>If I need to move 250 millions of records from one database to
another the batch processing technique is a winner. Under my server
it would take 30 minutes and 41 seconds, and also can track down
the time per each batch. This way I will be able to predict when
the entire process is finished. Another advantage when using small
batch is if you need to Cancel the process from whatever reason
then it takes immediately or several seconds to recover. (Depends
on your server speed).</p>

<p>&nbsp;</p>

<p>I basically use this technique even for a small change that I
may do in a table. Say you need to add an Identity field and you
have a table with 250 millions of records. That simple code:</p>

<pre class="brush: sql;">
 
ALTER TABLE Users ADD Users_PK INT IDENTITY(1,1)
</pre>

<p>will take at least 7 hours to process.</p>

<p>&nbsp;</p>

<p>Copy the whole table to an empty one will be much faster as
demonstrated below. Another example for saving time is if you need
to add a computed field as a Persisted one, it took us more than a
day without using the batch technique for a table of 250 millions
of records. The time it takes also depends of the complexity of the
computed field. A persisted computed field was part of the empty
table where data was inserted and that did not change the speed of
the below process.</p>

<p>&nbsp;</p>

<p>Another advantage for using ms sql batch processing code is when
you have an error. It is very helpful to debug ms sql table. In my
case it could be a truncate error when trying to fix data from one
field to another. By looking at the BatchProcess table you can see
the last processed batch range and you can go right into that range
and inspect the data. You know it is the last batch since the code
will stop working after the error occurred. When process hundreds
of millions records sometimes bad data could cause a truncate
issue.</p>

<p>&nbsp;</p>

<pre class="brush: sql;">
--- Inspect the data based on the last batch.
SELECT
   COUNT(StreetName)
FROM
    Users
WHERE Users_PK BETWEEN 6000007 AND 7000007 AND LEN(StreetName) &gt; 30
</pre>

<p>&nbsp;</p>

<p>The technique below requires that you have a clustered index on
the PK, and this way 1 million records takes to process from 8 to
30 seconds compare to 10 minutes without a clustered index. But
even without the clustered index working with batches reduces the
processing time by far. To split an Address to Street Number and
Street Name without a clustered index took about 8 hours and before
it took days to process. For that process an UPDATE was used. For
the below process even though I used the ORDER BY First and Last,
the clustered index on Users_PK was sufficient for the entire
process and no other indexes were needed. Again in other cases you
may need to have additional indexes.</p>

<p>&nbsp;</p>

<pre class="brush: sql;">
BEGIN TRY

--- Copy the PK from the source table into the target table-------
SET IDENTITY_INSERT Users..Users ON
------------------------------------------------------------------

-- Increment every 1000000 records. You can choose any number that works for you.
DECLARE @Increment INT = 1000000,
        @LastRecord_PK INT = 0,
        @iRowCount INT = 0,
        @BatchRowCount INT = 0,
        @iCurrentRowCount INT = 0,
        @FirstRec_PK INT = 0
        
-- Get the min of the PK minus 1. This way we can start the range from the first record.
-- Make sure to choose the right source table for the MIN(PK)
DECLARE @FirstRecord_PK AS INT=(SELECT
    MIN(Users_PK)
FROM
    Users.dbo.Users) - 1

SELECT @FirstRecord_PK

-- Select the database for the table that records each INSERT
USE Utilities
IF NOT EXISTS(SELECT *
            FROM Sys.Tables
            WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[BatchProcess]'))

BEGIN
CREATE TABLE [Utilities].[dbo].[BatchProcess](
    [Batch_PK] [INT] IDENTITY(1,1) NOT NULL,
    [RowCountTemp]  [INT] NULL,
    [BatchRowCount] [INT] NULL,
    [DateTimeEntered] [DATETIME] NULL,
    [FirstRecord_PK] INT NOT NULL,
    [LastRecord_PK] INT NOT NULL
) ON [PRIMARY]       
END
ELSE
BEGIN
-- Remove all records from the previous process.
TRUNCATE TABLE Utilities.dbo.BatchProcess
END

-- Create an Error handler table if not found.
IF NOT EXISTS(SELECT *
            FROM Sys.Tables
            WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[Errors]'))

BEGIN
CREATE TABLE [Utilities].[dbo].[Errors](
    [Errors_PK] [INT] IDENTITY(1,1) NOT NULL,
    [Error_No] [INT] NULL,
    [ErrorMessage] [VARCHAR] (4000) NULL,
    [ErrorLine] [VARCHAR] (50) NULL,
    [Date_Time] [DATETIME] NULL,
 ) ON [PRIMARY]
END

--- Starting the time and that will be the first record in the BatchProcess table
INSERT INTO Utilities.dbo.BatchProcess
    (RowCountTemp,BatchRowCount,DateTimeEntered,FirstRecord_PK,LastRecord_PK)
VALUES
    (0,0,GETDATE(),@FirstRecord_PK,@LastRecord_PK)

USE Users
WHILE EXISTS (SELECT TOP 1 Users_PK FROM Users.dbo.Users WHERE Users_PK &gt; @LastRecord_PK)
BEGIN

    -- Set the values for the counters, the first and last record.
    SELECT @FirstRecord_PK = @FirstRecord_PK + 1,
            @LastRecord_PK = @FirstRecord_PK + @Increment

    ------------------------------------------------------------------------------
    --- In here you place your code to process your records ----------------------
    --- Make sure to add the range for the PKs as shown below --------------------
    ------------------------------------------------------------------------------           
        --- Insert records into the target table
    INSERT INTO Users..Users(
           [Users_PK],
           [First],
           [Last],
           [StreetNumber],
           [StreetName],
           [City],
           [State],
           [Zip],
           [Phone],
           [Date_Uploaded]
           )
    SELECT
           [Users_PK],  
           [First],
           [Last],
           [StreetNumber],
           [StreetName],
           [City],
           [State],
           [Zip],
           [Phone],
           SUBSTRING([Date_Uploaded],1,8) AS Date_Uploaded
    FROM
        Users.dbo.Users Ue
    WHERE Ue.Users_PK BETWEEN @FirstRecord_PK AND @LastRecord_PK
    ORDER BY [Ue].[Last],[Ue].[First]
    ------------------------------------------------------------------------------

    -- Gets the number of rows that were processed per batch        
    SET @iCurrentRowCount = @@ROWCOUNT
    -- Gets the number of rows for all batches that were processed so far                                           
    SET @iRowCount = @iRowcount + @iCurrentRowCount
    -- Gets the number of rows per batch
    SET @BatchRowCount = @iCurrentRowCount
    -- Gets the first PK that was processed per batch
    SET @FirstRec_PK = @FirstRecord_PK
    -- Gets the next first PK for the next batch
    SET @FirstRecord_PK = @LastRecord_PK
    PRINT @iRowCount
    
    --- Keep track of each process
    INSERT INTO Utilities.dbo.BatchProcess
        (RowCountTemp,BatchRowCount,DateTimeEntered,FirstRecord_PK,LastRecord_PK)
    VALUES
        (@iRowCount,@BatchRowCount,GETDATE(),@FirstRec_PK,@LastRecord_PK)

END
END TRY

-- Let's check for any error and if so let's log the error.
BEGIN CATCH

INSERT INTO [Utilities].[dbo].Errors
    (
        Error_No,ErrorMessage,ErrorLine,Date_Time
    )
VALUES
    (
    ERROR_NUMBER(),ERROR_MESSAGE(),ERROR_LINE(),GETDATE()
    )

SELECT
    ERROR_NUMBER()      AS ErrorNumber ,
    ERROR_SEVERITY()    AS ErrorSeverity ,
    ERROR_STATE()       AS ErrorState ,
    ERROR_PROCEDURE()   AS ErrorProcedure ,
    ERROR_LINE()        AS ErrorLine ,
    ERROR_MESSAGE()     AS ErrorMessage;
END CATCH;

</pre>

<p>&nbsp;</p>

<p>Whenever the above code is running you can run the below code
and see the status of the process:</p>

<pre class="brush: sql;">
SELECT * FROM Utilities.dbo.BatchProcess ORDER BY DateTimeEntered DESC
</pre>

<p><br />
 In the below image the time difference between rows 7 and 8 was 8
seconds, and in rows 1 to 2 it was 7 seconds, and so far 6,957,786
records were processed, and that batch was 994804 records. You can
see the range of PK that was processed as well.</p>

<p>&nbsp;</p>

<p><img src="/media/45803/table1_499x160.jpg"  width="499"  height="160" alt="Table1" title="Process large amount of data in ms sql server"/></p>

<p>&nbsp;</p>

<p>Looking further towards the end of this process, then the
difference between rows 242 to 243 is 8 seconds as well. As you see
you can have a very good estimate of the time for the entire
process.</p>

<p>&nbsp;</p>

<p><img src="/media/45808/table2_489x110.jpg"  width="489"  height="110" alt="Table 2" title="Moving large scale of data in ms sql database"/></p>

<div><span></span></div>
]]></description></item><item><title>Welcome</title><link>http://www.dfarber.com/computer-consulting-blog/welcome.aspx</link><pubDate>Fri, 11 Dec 2009 14:21:47 GMT</pubDate><guid>http://www.dfarber.com/computer-consulting-blog/welcome.aspx</guid><description><![CDATA[ 
<p><span>Welome to our blog. Check our articles when you have a
chance.<br />
 We are a Custom Software Development company and will be glad to
assist you with your office automation needs.</span></p>

<p>&nbsp;</p>

<p>We provide office automation where we convert your manual
procedures into a long term software solution where you can grow
withit without the need to re-write it. We also provide affordable
ERP and CRM solutions.<br />
<br />
 We also provide MS SQL Database Management and optimize your
database performance.</p>
]]></description></item></channel></rss>

