Tuesday, July 3, 2018

Saving disk space by choosing the correct data type

"Database is almost 4 TB in size, user is complaining on long times for DEV data refresh and high storage costs for database and backups. Full backups are taking a long time to complete, even with compression enabled. 
After a quick check, we discover tables that stores data from 15 years ago, without any purge or historical archive process implemented, most of the data stored is only with logging purposes, using incorrect data types and storing a lot of nulls. After changing data types and implementing data purge processes, database size was reduced to almost 90 GB, drastically improving their application performance and reducing storage costs and refresh times."
- true story... or not -


Designing our databases it is something that must be done carefully and by knowing how our application will work and choosing the right data types, specially for databases that will contain high volumes of data, can drastically improve our application response times and storage required.

We will compare storage usage for the most common data types. For displaying the table size I'm using my database dashboard report.

Dates


Always analyze if you really need to store times, or if you need to store time, the precision you need for your application to work correctly.

We create different tables with each datatype we want to test: datetime, smalldatetime, datetime2, datetime2(0), date.


CREATE TABLE dbo.testDatetime
(
    colDate datetime NOT NULL
);
----------------------------------------
CREATE TABLE dbo.testSmallDatetime
(
    colDate smalldatetime NOT NULL
);
----------------------------------------
CREATE TABLE dbo.testDatetime2
(
    colDate datetime2 NOT NULL
);
----------------------------------------
CREATE TABLE dbo.testDatetime20
(
    colDate datetime2(0) NOT NULL
);
----------------------------------------
CREATE TABLE dbo.testDate
(
    colDate date NOT NULL
);




Then we proceed to insert the same info on each table (500,000 rows to have a good sample space)

INSERT INTO testDatetime VALUES(getdate())
INSERT INTO testSmallDatetime VALUES(getdate())
INSERT INTO testDatetime2 VALUES(getdate())
INSERT INTO testDatetime20 VALUES(getdate())
INSERT INTO testDate VALUES(getdate())

GO 500000

After the insert ends, we can see the different storage usages, as well as the data that is stored on each table.

Datetime needs around 20% more space than Date



Strings


We need to check the possible maximum length of the data to be stored, if we require to store or not unicode characters, and if we require a fixed or variable length.

We create different tables with each datatype we want to test: Char, VarChar, NChar, NVarChar.


CREATE TABLE dbo.testChar
(
    colString char(20) NOT NULL
);
-------------------------------
CREATE TABLE dbo.testVarChar
(
    colString varchar(20) NOT NULL
);
-------------------------------
CREATE TABLE dbo.testNChar
(
    colString nchar(20) NOT NULL
);
-------------------------------
CREATE TABLE dbo.testNVarChar
(
    colString nvarchar(20) NOT NULL
);
 

Again we insert same info on each table (500,000 rows also).

DECLARE @i int
SET @i =RAND()*20

INSERT INTO testChar VALUES(REPLICATE ('$' ,@i ) )
INSERT INTO testVarChar VALUES(REPLICATE ('$' ,@i ) )
INSERT INTO testNChar VALUES(REPLICATE ('$' ,@i ) )
INSERT INTO testNVarChar VALUES(REPLICATE ('$' ,@i ) )

GO 500000

We can determine what datatype use more storage space, we can determine that storing unicode vs non-unicode characters make a big difference, so this can be one of the first things to check at design level.

A good datatype for strings could make a big difference!

'X' are too mainstream, besides, everybody loves '$'

Numbers


For this datatype the requirements are more straightforward, so the only thing I do is to show you the different storage usages.

Again we create one table for each data type to check (since for numbers there are a lot of parameters and various subtypes, we will only check the most common with the defaults): Int, Decimal, Numeric, Money (exact types) and Real, Float (approximate types).


CREATE TABLE dbo.testInt
(
    colNumber int NOT NULL
);
-------------------------------
CREATE TABLE dbo.testDecimal
(
    colNumber decimal NOT NULL
);
-------------------------------
CREATE TABLE dbo.testNumeric
(
    colNumber Numeric NOT NULL
);
-------------------------------
CREATE TABLE dbo.testMoney
(
    colNumber money NOT NULL
);
-------------------------------
CREATE TABLE dbo.testFloat
(
    colNumber float NOT NULL
);
-------------------------------
CREATE TABLE dbo.testReal
(
    colNumber real NOT NULL
);
 
 

We insert same info on each table (500,000 rows also).

DECLARE @i int
SET @i =RAND()*1000

INSERT INTO testInt VALUES(@i/3.141592)
INSERT INTO testDecimal VALUES(@i/3.141592)
INSERT INTO testNumeric VALUES(@i/3.141592)
INSERT INTO testMoney VALUES(@i/3.141592)
INSERT INTO testFloat VALUES(@i/3.141592)
INSERT INTO testReal VALUES(@i/3.141592)

GO 500000

And again, with the default parameters for each data type, we can see the storage used by each one, so is always important to know your data, investigate the numbers to be stored in your database so you can optimize space usage.

By default, numeric and decimal datatypes use the most space, this can change based on the parameters specified.

Note that decimal and numeric defaults have no decimal digits
and values were approximated.
Make your own adjustments for specific tests

You can check the different data-types SQL Server offers, with their storage requirements as well their limitations, link is below, so you have a better understanding on how each one works, this will help you to design more efficient applications and reduce your future administration time, or maybe you can optimize databases already in place by just adjusting column types :)

Please note that these are only general guidelines, also I have used test data, always make sure to perform your own tests at development stage and make your own comparisons.

Source:

https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-2017


2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Hi Charley, Braindumps are not permitted for Microsoft test, hence, i am deleting this comment.

      Delete