Monday, July 23, 2018

Extended events (XE): Create a simple XE

When someone says still uses SQL Profiler.
Image is taken from here
An ancient, dark place, a source of many legends and histories, although some people have been able to get out alive, some others are stuck there, the locals refer to it as "The nosy one" but we know it for another name: SQL Server Profiler

SQL Server introduced Extended Events since 2008 version, and was meant to replace SQL Server profiler traces (that will be deprecated in the near future), and provides a lightweight, customizable event collection platform, and you should be using it now.

What is an Extended Event?


According to Microsoft official documentation, is a highly scalable and configurable architecture to be able to identify and troubleshoot any performance issue, it consists of:

Packages: Contains the required objects to collect and process data.
Sessions:  Process to be executed to collect the data
Events: Info to be collected and later analyzed.
Actions: Fields common to all events.
Predicates: Filters to be applied when capturing data.
Targets: Where the collection results are stored.

We will make a very simple example so you can see how it works and you can start using it with confidence.
We just will collect all users sessions from our instance, along with the SQL text if exists.


Using the wizard



Just open SSMS and open Management > Sessions > New session wizard.


Specify the name you want for your session, also you can enable the session to start at server startup if you want.



You can select a template for a predefined set of events to achieve common tasks, but for this example, we will not use a template to keep it simple.



In the event library, search and select Login, in the description we can see when this event is fired: Occurs when a successful connection is made to the Server. This event is fired for a new connection or when connections are reused from a connection pool



Then in actions, select the data we want to capture, for this example: database_name, session_id, sql_text, username



In the next window, you can add predicates to filter the info you collect, to avoid over-collecting data and end with huge logs, for this example we will not apply any filter.



Then you select the target, or where you want to save your data. For ad-hoc or quick traces, you can use the ring buffer, and for large collects use the save to file option.



Review the options you have created and also you can script the trace if you want to reuse it.


You can start the session to run when the wizard ends, also see the data being collected.



Using T-SQL


We can achieve the same results running the following T-SQL


--Event Session
CREATE EVENT SESSION [User Sessions] ON SERVER


--Events to track
ADD EVENT sqlserver.LOGIN (
ACTION(sqlserver.database_name, 
  sqlserver.session_id, 
  sqlserver.sql_text, 
  sqlserver.username
  )

-- If you want to filter events
WHERE ([sqlserver].[nt_user] = N'domain\user_to_filter')
)
 
 
-- TARGET to use, 
-- just choose Ring Buffer or Filename and comment the other


-- Ring Buffer
ADD TARGET package0.ring_buffer
WITH (
  MAX_MEMORY = 4096 KB
  ,EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS
  ,MAX_DISPATCH_LATENCY = 30 SECONDS
  ,MAX_EVENT_SIZE = 0 KB
  ,MEMORY_PARTITION_MODE = NONE
  ,TRACK_CAUSALITY = ON
  ,STARTUP_STATE = OFF
  )

-- Filename
/*
ADD TARGET package0.event_file(SET filename = N'User Sessions')
WITH (
  MAX_MEMORY = 4096 KB
  ,EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS
  ,MAX_DISPATCH_LATENCY = 30 SECONDS
  ,MAX_EVENT_SIZE = 0 KB
  ,MEMORY_PARTITION_MODE = NONE
  ,TRACK_CAUSALITY = ON
  ,STARTUP_STATE = OFF
  )
*/
  
  
-- To start the session
ALTER EVENT SESSION [User Sessions] ON SERVER STATE = START;

To watch the data


Just right-click on the new session created and select the Watch live data option



You can see some events already being recorded corresponding to the current session (in case you didn't apply a filter)



To test it, we just open a new window and connect with a different user.



We can see now the record in our session. If there is a query associated, we will be able to watch it also.



This very basic example will allow us to show more complex examples and how you can do cool thing using extended events.


Sources:

https://docs.microsoft.com/en-us/sql/relational-databases/extended-events/extended-events?view=sql-server-2017


Tuesday, July 17, 2018

PowerBI: Implement a What If parameter

Our Guinea Pig Dashboard
PowerBI is focused on creating interactive, analytical reports, from a lot of different data sources and with the ability to access them via mobile devices.

One of the cool options that allow your reports to be interactive, is the What If parameters: they allow you to visualize how your data could/should behave under different scenarios.
On this post, I will show you a super simple way to use a what if parameter.

Note: This test was done using PowerBI desktop and the Import option in the data source selection.



We will use a simple Dashboard representing sales by region in a specific year, as this:



Let's say we want to plan next year sales, we want to add a line representing the next year target by region, for this example just as a value between 0 and 200% of this year sales.

Adding the What If Parameter


Click on the Modeling Tab and select the New Parameter option.



A new window will open, here you configure the values for your parameter.
For this example, we configure a decimal value between 0 and 2.
Leave the Add Slicer option checked to add the value selection box to your dashboard.


The parameter visualization is added, but as a multiple select option.



We change the type of slicer to Single value (you can select other option if you want).




Adding a new measure


We will demonstrate how the newly created parameter works using a measure, it will multiply the year sales by the value we select in the slicer.
Of course you can use the parameter in more complex formulas, that modify data values more in depth, that are not so obvious as our example, but for now, we will keep it simple.

We proceed to create a new measure on our data table, select the header of the table and select the New Measure option.



Then enter the formula you want to use, for our example the sum of the sales of the year, multiplied by our parameter, to simulate next year sales.



You can see how the measure is visualized as any other column in our table



Modify your visualizations


Now, just modify the dashboard by adding the new column to any visualization you want and save it.
Then you can change your slicer and the column will change its value in real time.

A simple analytical report with a few clicks :)


Sources:

https://docs.microsoft.com/en-us/power-bi/desktop-what-if

https://docs.microsoft.com/en-us/power-bi/desktop-use-directquery

Tuesday, July 10, 2018

Display SELECT query or table as HTML

https://bertwagner.com/2018/07/03/code-youd-hate-to-live-without-t-sql-tuesday-104-invitation/
This post is part of #tsql2sday event :)
For my current job I support a lot of SQL Servers, they come in all the flavors and sizes (old, new, patched, unpatched, big, small, dedicated, shared, built by our team, built by others... etc). In the ocean of different possible configurations, having a centralized monitoring solution is sometimes not possible for the 100% of the servers, because of different reasons (unsupported features for older versions, located in a different domain or datacenter, some weird security restrictions).

Even when we have already in place a custom inventory and monitoring solution (BTW done using C#, asp.net, SQL Server and Powershell), a couple of rogue servers, needs their own special treat, most of the cases this has to do with some kind of monitoring.

So, for these serves, the solution is to implement local jobs to monitor different kinds of stuff and send via email the monitoring and usage results.
We could perfectly send the info in plain text, but reading this in an email, especially when your boss is included in the distribution list is just awful, so we had to format the data first to make it look better...

That's why I created sp_TabletoHTML, a Stored Procedure to convert any table or SELECT query to an HTML table, so you just have to paste the output in your email body.

Before this stored procedure, we had to do the HTML conversion based on each table columns, and after 3 or 4 times of adapting the same code, I was tired and ended up doing this stored procedure.

How it works?


  • Create the stored procedure and remember where you left it (can be the master database, it really doesn't matter as long as you can access it properly).
  • Then you pass in @stTable (yes, Unicode varchar in case you have weird characters in your object names) the name of the table or view containing the data using the schema.name convention or you can also pass a SELECT query sentence, giving you the ability of formatting or cleaning up your data before converting it to HTML.
  • You can specify if want to include column names or not (included by default) or the predefined style you want or no style at all (or leave the parameters at their defaults)
  • Execute the stored procedure and use the output string as your HTML <table> result.
  • Enjoy!

Sample execution:

SET @SQLSentence = 'SELECT 
name,
state_desc,
create_date,
collation_name 
FROM sys.databases'

EXEC sp_TabletoHTML @stTable = @SQLSentence,
 @TableStyle = 3,
 @RawTableStyle = @st OUTPUT

And you will obtain a beautiful looking HTML table like this one:

Formats can vary...

What is the logic involved?


Just tweaking a little bit the FOR XML functionality, and a lot of column reformatting.
This is the core of the formatting, all the columns with the same name of TD that will translate into <td></td> and the rows that will translate into elements named <tr></tr>:


SELECT [COL1] as TD, [COL2] as TD...
FROM ##rowstablePreHTML 
FOR XML RAW('TR'), ELEMENTS


Since some of the queries are build dynamically, I use sp_executesql, that could be a security risk for untrusted sources.

For internal use, we did not check for malicious code, because... I trust my own code, but before releasing it into Github I added a very basic malicious code check (just check for keywords that has nothing to do with table results) and raise an error for any occurrence encountered. As a recommendation, always validate your queries first.

Null values are not converted to an empty string (because of the unique way XML handle their nulls).  Before using it, remove null values from your data.

Some special datatypes like geography, timestamp, XML, image are not supported, if you try to use them, an error will raise, remove these columns before using it.

Where is the code?

Fortunately for you, it is free!

I offer this stored procedure as open source in GitHub, you can check it here.
The link is also in the Free Tools section on this blog.

The full documentation is also available on the Github page, so you can check the parameter usage and some other examples there!

Also, if any of you can improve this SP, please do it, you are more than welcome!

Tuesday, July 3, 2018

Saving disk space by choosing the correct data type

"Database is almost 4 TB in size, user is complaining on long times for DEV data refresh and high storage costs for database and backups. Full backups are taking a long time to complete, even with compression enabled. 
After a quick check, we discover tables that stores data from 15 years ago, without any purge or historical archive process implemented, most of the data stored is only with logging purposes, using incorrect data types and storing a lot of nulls. After changing data types and implementing data purge processes, database size was reduced to almost 90 GB, drastically improving their application performance and reducing storage costs and refresh times."
- true story... or not -


Designing our databases it is something that must be done carefully and by knowing how our application will work and choosing the right data types, specially for databases that will contain high volumes of data, can drastically improve our application response times and storage required.

We will compare storage usage for the most common data types. For displaying the table size I'm using my database dashboard report.

Dates


Always analyze if you really need to store times, or if you need to store time, the precision you need for your application to work correctly.

We create different tables with each datatype we want to test: datetime, smalldatetime, datetime2, datetime2(0), date.


CREATE TABLE dbo.testDatetime
(
    colDate datetime NOT NULL
);
----------------------------------------
CREATE TABLE dbo.testSmallDatetime
(
    colDate smalldatetime NOT NULL
);
----------------------------------------
CREATE TABLE dbo.testDatetime2
(
    colDate datetime2 NOT NULL
);
----------------------------------------
CREATE TABLE dbo.testDatetime20
(
    colDate datetime2(0) NOT NULL
);
----------------------------------------
CREATE TABLE dbo.testDate
(
    colDate date NOT NULL
);




Then we proceed to insert the same info on each table (500,000 rows to have a good sample space)

INSERT INTO testDatetime VALUES(getdate())
INSERT INTO testSmallDatetime VALUES(getdate())
INSERT INTO testDatetime2 VALUES(getdate())
INSERT INTO testDatetime20 VALUES(getdate())
INSERT INTO testDate VALUES(getdate())

GO 500000

After the insert ends, we can see the different storage usages, as well as the data that is stored on each table.

Datetime needs around 20% more space than Date



Strings


We need to check the possible maximum length of the data to be stored, if we require to store or not unicode characters, and if we require a fixed or variable length.

We create different tables with each datatype we want to test: Char, VarChar, NChar, NVarChar.


CREATE TABLE dbo.testChar
(
    colString char(20) NOT NULL
);
-------------------------------
CREATE TABLE dbo.testVarChar
(
    colString varchar(20) NOT NULL
);
-------------------------------
CREATE TABLE dbo.testNChar
(
    colString nchar(20) NOT NULL
);
-------------------------------
CREATE TABLE dbo.testNVarChar
(
    colString nvarchar(20) NOT NULL
);
 

Again we insert same info on each table (500,000 rows also).

DECLARE @i int
SET @i =RAND()*20

INSERT INTO testChar VALUES(REPLICATE ('$' ,@i ) )
INSERT INTO testVarChar VALUES(REPLICATE ('$' ,@i ) )
INSERT INTO testNChar VALUES(REPLICATE ('$' ,@i ) )
INSERT INTO testNVarChar VALUES(REPLICATE ('$' ,@i ) )

GO 500000

We can determine what datatype use more storage space, we can determine that storing unicode vs non-unicode characters make a big difference, so this can be one of the first things to check at design level.

A good datatype for strings could make a big difference!

'X' are too mainstream, besides, everybody loves '$'

Numbers


For this datatype the requirements are more straightforward, so the only thing I do is to show you the different storage usages.

Again we create one table for each data type to check (since for numbers there are a lot of parameters and various subtypes, we will only check the most common with the defaults): Int, Decimal, Numeric, Money (exact types) and Real, Float (approximate types).


CREATE TABLE dbo.testInt
(
    colNumber int NOT NULL
);
-------------------------------
CREATE TABLE dbo.testDecimal
(
    colNumber decimal NOT NULL
);
-------------------------------
CREATE TABLE dbo.testNumeric
(
    colNumber Numeric NOT NULL
);
-------------------------------
CREATE TABLE dbo.testMoney
(
    colNumber money NOT NULL
);
-------------------------------
CREATE TABLE dbo.testFloat
(
    colNumber float NOT NULL
);
-------------------------------
CREATE TABLE dbo.testReal
(
    colNumber real NOT NULL
);
 
 

We insert same info on each table (500,000 rows also).

DECLARE @i int
SET @i =RAND()*1000

INSERT INTO testInt VALUES(@i/3.141592)
INSERT INTO testDecimal VALUES(@i/3.141592)
INSERT INTO testNumeric VALUES(@i/3.141592)
INSERT INTO testMoney VALUES(@i/3.141592)
INSERT INTO testFloat VALUES(@i/3.141592)
INSERT INTO testReal VALUES(@i/3.141592)

GO 500000

And again, with the default parameters for each data type, we can see the storage used by each one, so is always important to know your data, investigate the numbers to be stored in your database so you can optimize space usage.

By default, numeric and decimal datatypes use the most space, this can change based on the parameters specified.

Note that decimal and numeric defaults have no decimal digits
and values were approximated.
Make your own adjustments for specific tests

You can check the different data-types SQL Server offers, with their storage requirements as well their limitations, link is below, so you have a better understanding on how each one works, this will help you to design more efficient applications and reduce your future administration time, or maybe you can optimize databases already in place by just adjusting column types :)

Please note that these are only general guidelines, also I have used test data, always make sure to perform your own tests at development stage and make your own comparisons.

Source:

https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-2017