Monday, July 15, 2019

Refreshing dependent views after a DDL change

When you are in charge of databases with tons of objects, making a small change over an object can be difficult to replicate if you have a lot of objects that depend on it.
For this post, we will focus on how to update column changes on dependent views.

For this type of task, there are 2 scenarios:


  • You explicitly defined column names on the dependent views, or defined WITH SCHEMABINDING: For this case, you need to perform a find/replace task, this can be difficult to achieve since there are no native tools to do it (as far as I know). You can use third-party free tools like RedGate SQL Search or ApexSQL SQL Search, both having its pro and cons, so take a look at both tools and find out what it suits best your needs.

  • You implicitly defined column names on dependant views: The famous SELECT *, even when this is not a best practice, it is easy to replicate changes for this case, we will show you how to do it.


Setting up our example

We will use WideWorldImporters test database for this example, we create two simple views (one depending on the other), as follows:


USE WideWorldImporters;
GO

CREATE VIEW dbo.[vi_invoices_received_by]
AS
SELECT 
    ConfirmedReceivedBy, 
    COUNT(InvoiceID) as [Num],
    CustomerID
FROM Sales.Invoices
GROUP BY ConfirmedReceivedBy, CustomerID;
GO


CREATE VIEW dbo.[vi_additional_fields_received_by]
AS 
SELECT RB.*,
C.CustomerName 
FROM dbo.vi_invoices_received_by RB
INNER JOIN sales.Customers C
ON C.CustomerID = RB.CustomerID;
GO

We proceed to execute the second view, you can see that the column labels of the first view are not so descriptive, and can be improved:



So we proceed to execute an alter view over the first view:


ALTER VIEW dbo.[vi_invoices_received_by]
AS
SELECT 
    ConfirmedReceivedBy as [Received by], 
    COUNT(InvoiceID) as [# of Invoices],
    CustomerID
FROM Sales.Invoices
GROUP BY ConfirmedReceivedBy, CustomerID;
GO

So we should now be able to see the updated column names on the second view, right?
but if you execute the view again, you will obtain the same results as before:


What happened?

Refreshing your dependent views

SQL Server provides the procedure sp_refreshview to update metadata of views, so you can use it to update the schema without having to recreate the objects.

For our case, you just have to execute the following code against the second view:


EXEC sp_refreshview 'dbo.[vi_additional_fields_received_by]'; 
GO

If we execute the second view again, we can see that the labels have been updated now:



And that is all, you just have to execute the procedure for each dependent view.

What if I don't know the dependent objects?

There are several ways to determine what objects depend on the one you are modifying, I will show you two of them.


  • Via T-SQL: Execute the sp_depends procedure, it will return two subsets: the first one, are objects on which our current object depends. The second one objects that depend on the current object. Usage is as follows:



  • Via SSMS: Just right-click on the object and select View Dependencies




Thursday, July 4, 2019

SQL Engine basics: Caching data

SQL Server engine does an excellent job to optimize query performance and it has different methods to achieve this goal, one of these methods is the data caching.

Data caching consist of putting 8KB pages from storage into memory as they are needed, so the database engine can access the required data again in a much faster way.

At a glance, the way SQL Engine caches the data is the following:


When a request is made the engine checks if the data is already cached in memory (or buffer pool), if the data is already available, then it is returned. If the data is not available, then it is retrieved from storage and put into memory until the buffer pool is full.

How long the data is stored in the buffer pool depends on various aspects: the memory allocated to SQL Server, size of your database, workload, type of queries executed, In-memory OLTP.. etc.

When the buffer pool does not have enough space to allocate the desired memory, then a data spill occurs, it consists of additional data stored on tempdb, this is usually caused by inaccurate memory grants. Fortunately, in SQL Server 2019 Memory Grant feedback is being implemented, that can help to reduce this problem, you can read more about it here.

Since SQL Server 2014, you can also configure Buffer pool extensions, as the term suggests, this consist on an extension of the memory by using faster storage (usually SSD or Flash memory), this is an advanced feature and you can learn more about it in the Microsoft documentation.

Related T-SQL Commands


You have now a concept on what is the data caching and how it works, now let us see it in action.

we will use WideWorldImporters test database for this example, first we run some queries, does not matter what you execute for this example, can be any SELECT statement:

DECLARE  @RawValue decimal(10,2) = 25000

SELECT SUM(UnitPrice * Quantity) as RawValue
FROM Sales.InvoiceLines
GROUP BY InvoiceID
HAVING SUM(UnitPrice * Quantity) > @RawValue;

SET @RawValue = RAND()*1500

SELECT InvoiceID, InvoiceLineID, UnitPrice,Quantity
FROM Sales.InvoiceLines 
WHERE TaxAmount > @RawValue
OPTION(FAST 100);

After that we use the DMO sys.dm_os_buffer_descriptors to check what data is into memory, after making the query more readable we can use it like this:

--DETAILS BY DATABASE:
SELECT 
    IIF(database_id = 32767,'ResourceDB',DB_NAME(database_id)) AS [Database],
    file_id, -- This File is at database level, it depends on the database you are executing the query
    page_id,
    page_level,
    allocation_unit_id, -- can be joined with sys.allocation_units (database level DMO)
    page_type,
    row_count,
    read_microsec
FROM sys.dm_os_buffer_descriptors BD
ORDER BY [Database], row_count DESC;

Depending on the workload, we obtain something like this:


If you want something more compact, you can execute the query grouping it by database:

-- TO OBTAIN TOTAL BY DATABASE:
SELECT 
    IIF(database_id = 32767,'ResourceDB',DB_NAME(database_id)) AS [Database],
    COUNT(row_count) AS cached_pages
FROM sys.dm_os_buffer_descriptors 
GROUP BY database_id
ORDER BY cached_pages DESC;

We obtain the results grouped by database:



In order to release the data from the cache, you run the DBCC DROPCLEANBUFFERS command:

DBCC DROPCLEANBUFFERS;

NOTE: Avoid running this command in production environments, since all the data will be released from memory and must be pulled from the storage again, incurring in additional IO usage.


If you have Buffer pool extension enabled and want to check their options, or if you want to determine if the instance has it, use the sys.dm_os_buffer_pool_extension_configuration DMO as follows:

-- Check if you have a buffer pool extension enabled:
SELECT * FROM sys.dm_os_buffer_pool_extension_configuration;

The results are something like this (I don't have a buffer pool extension enabled on my machine):



If you want the script for the commands we discussed, you can download it from my GitHub repository.

Tuesday, May 7, 2019

Embed Azure Data Studio Notebooks in your website

Notebooks are a functionality available in Azure Data Studio, that allows you to create and share documents that may contain text, code, images, and query results. These documents are helpful to be able to share database insights and create runbooks that you can share easily.

Are you new to notebooks? don't know what are the uses for it? want to know how to create your first notebook? then you can get started in ADS notebooks checking my article for MSSQLTips.com here.

Once you have created your first notebooks and share them among your team, maybe you want to share it on your website or blog for public access.
even when you can share the file for download, you can also embed it on the HTML code.

On this post, I will show you how to do it.

What do you need?


We will use an online HTML converter, nbviewer, provided by Jypiter website, on that homepage, you just have to provide the link of your .ipynb file (my GitHub Notebook repository for this example).

It looks something like this:


After clicking the Go! button, the next window will show you the notebook rendered:


At this point, you could share this link on your site and the user can click on it to see your notebook contents, but what if you want to show the results directly on your website?

Embedding it into your website


You can use the IFrame HTML tag (reference here), with this tag, you can embed an external URL in your HTML code (just be aware of security risks of embedding external code in your application).
the code should look like this:


<iframe
width="600px" height="800px" 
src="your nbviewer URL" >
</iframe>

The final result is this:



Not the best possible way ¯\_(ツ)_/¯, but it is something.

Going further

If you want to fine-tune the above results or host them on your own website, you can check nbviewer GitHub repository so you can use the code locally.

Thursday, April 11, 2019

Creating logins and users in Azure Database

Azure Database is the PaaS solution for SQL Server databases, on a previous post we have discussed how to create one.

On this post, I want to show you how you can secure your Azure SQL Database by creating users and segregating their permissions.

When you connect to your Azure Database using SSMS (or another tool), you can see the management options are very limited compared to an On-Premises instance.




If you want to create a login and database user, you must create them via T-SQL, on this post I will show you how to do it.

Types of logins


Azure SQL database support two types of logins: SQL Server login and Azure Active directory login.

In order to create Azure AD logins, you must set up an AD administrator first using the Azure portal, you configure it on the server dashboard, then accessing the Active Directory Admin, as follows:



Once you set up your AD Admin, you can connect to the Azure database using this account and you can then assign proper access to other AD accounts.

Creating logins and users


As we told you before, you must create users and assign permissions using T-SQL, the basic script for creating users will be as follows (this is for SQL logins):

Note that as PaaS, you connect only to one database, so, the USE <database> command is not supported on Azure databases, so you must run the T-SQL script connecting to the required database manually:

Script #1 for creating the login

/******* Run This code on the MASTER Database ******/

-- Create login,

CREATE LOGIN az_read
 WITH PASSWORD = 'YourStrongP@ssW0rd' 
GO


-- Create user in master database (so the user can connect using ssms or ADS)
CREATE USER az_read
 FOR LOGIN az_read
 WITH DEFAULT_SCHEMA = dbo
GO

Script #2 - For the user database you want to provide access:

/******* Run This code on the Database you want to give the access ******/

-- The user database where you want to give the access

CREATE USER az_read
 FOR LOGIN az_read
 WITH DEFAULT_SCHEMA = dbo
GO

-- Add user to the database roles you want
EXEC sp_addrolemember N'db_datareader', N'sqlreadusr'
GO

Explaining it:

You first need to create the login, and set up your password, following the Azure strong password requirements.
Then, if the user is planning to connect to the instance using SSMS or ADS or another tool where the default database to connect is not required,  you must create the user in the master database (without roles, unless required specific access).
Next step is to create the user on the database you want to provide the access.
Finally, you assign the roles you want for that particular user.

After that you can connect with the user to provide the respective access:



For creating logins from Azure Active Directory, the script changes a little, you must create the login connecting to the database using another AD account (the administrator we configurated earlier or another AD user with enough privileges), then you specify the AD account followed by FROM EXTERNAL PROVIDER.

Once you are connected you must only change the first script, as follows:


/******* Run This code on the MASTER Database ******/

-- Create login,

CREATE USER [epivaral@galileo.edu] FROM EXTERNAL PROVIDER;
GO

-- Create user in master database (so the user can connect using ssms or ADS)
CREATE USER [epivaral@galileo.edu]
 FOR LOGIN [epivaral@galileo.edu]
 WITH DEFAULT_SCHEMA = dbo
GO

There is no change in script 2 to provide access to the user on a specific database.

Contained User


For more secure environments, you can create contained database users, this approach provide you a more portable database, with no need to worry about the logins, this is the recommended way to grant users access to Azure databases.

In order to create a contained user, just connect to the database you want to provide the access, and run the create user script as follows:


/**** Run the script on the Azure database you want to grant access *****/

CREATE USER contained_user 
WITH PASSWORD = 'YourStrongP@ssW0rd';

GO

After creating the contained user, you can use it by specifying the database you want to connect in the connection options (for this example using Azure Data Studio):


You can see in the object explorer, we only have access to the database we connected, improving security and portability:



You can read more about this in the Microsoft official documentation here.

Wednesday, March 13, 2019

Quick tip: Zoom in Azure Data Studio

If you finally have given a try to Azure Data Studio, and if you use it on a regular basis, maybe you want to customize it to suit your needs.

Among the huge customization options it has, you can control the text size in form of zoom. To do change it, just use the following keyboard combinations:

  • (Ctrl + = ) For Zoom in.
  • (Ctrl + - ) For Zoom out.
  • (Ctrl + 0) For Zoom reset.
 You can see it in action:


Also, as with everything on this tool, you can access this functionality from command palette (Ctrl + Shift + P), and type "zoom" (you can fine-control the editor zoom from here):


If you haven't tried it yet, you can download Azure Data Studio here.

Saturday, February 16, 2019

SQL Saturday 828 - T-SQL Basics: Coding for performance

A great experience!
Thanks to all the atendess to my session about T-SQL, for being my first time as a speaker for a SQL Saturday it was good!

As I promised, the presentation and session material is available at the following links:

SQLSaturday #828 site:
(Please evaluate my session if you attend)

https://www.sqlsaturday.com/828/Sessions/Details.aspx?sid=87912


My personal GitHub:

https://github.com/Epivaral/Scripts/tree/master/T-SQL%20Basics%20coding%20for%20performance

Some pictures from the event:



SQL Server local users group board!




Monday, February 11, 2019

Quick tip: Speeding up deletes from SSIS execution log

If you have SQL Server Integration Services installed on your server, and you left the default configurations a table named sysssislog is created on MSDB database, it contains logging entries for packages executed on that instance.

If you are not careful enough, this table can grow uncontrollably over time and can make subsequent insertions very slow.

A proper deletion process must be put in place, so you not get into situations like this one in your msdb database:



If you are already on this situation, you can the following T-SQL Script to delete records by batches:


DECLARE @date_del datetime,
  @batch_size int = 1000, -- will delete on batches of 1000 records
  @RowsAffected int =1

-- Time to keep in the history, in our case 1 month
SET @date_del= DATEADD(mm,-1,getdate()); 

SET NOCOUNT ON;

WHILE (@RowsAffected >0)
BEGIN
 DELETE TOP(@batch_size) 
 FROM [dbo].[sysssislog]
 WHERE starttime < @date_del;

 SET @RowsAffected = @@ROWCOUNT;

 -- If you want to know rows affected, uncomment this:
 -- PRINT @RowsAffected;
END

SET NOCOUNT OFF;


After that you can implement the same query to your msdb maintenance job to have all in one place.

Tuesday, February 5, 2019

I am speaking at SQLSaturday Guatemala 2019




I’m very thrilled to announce that I will be participating as speaker in this year’s SQL Saturday #828 event in Guatemala city!
This will be my first time as a speaker on a SQLSaturday.

Event will take place on February 16 at Universidad Francisco Marroquin, Calle Manuel F. Ayau (6 Calle final), zona 10, Guatemala

Here are the details of the session I will be presenting (at 3:15 PM CST in Dev Room)

T-SQL Basics: Coding for performance


It is very common in the IT field for a developer to switch to database developer or administrator, even when the programming concepts are the same, the skillset required to code T-SQL is different.
In this session, we will learn some basic tips to improve our code and improve database performance from early application planning stages to already deployed applications.

We will also see some demos about:
  • Compatibility level and deprecated features
  • Filtering basics: SARGABLE arguments
  • Covering indexes
  • Indexed views
  • Implicit conversions
  • Memory Grants
  • Joining records with NULL 
  • DMOs to find top resource intensive queries
  • Collation: considerations when working with multiple databases

I will show you execution plans using an excellent tool called Plan Explorer from SentryOne, best thing is that is free.
You can download it from here


As any SQL Saturday event organized by SQLPass you can register for free, it takes less than 5 minutes to get in and sign up.:

https://www.sqlsaturday.com/828/registernow.aspx


Hoping to see you there!

Wednesday, January 30, 2019

Understanding and working with NULL in SQL Server

Graphic representation of the difference between 0 and NULL
Image taken from 9gag.com
According to database theory, a good RDBMS must implement a marker to indicate "Missing or inapplicable information".

SQL Server implements the lack of value with NULL, that is datatype independent and indicates missing value, so the logical validations are different, this is better known as Three-Valued Logic, where any predicate can evaluate to True, False or Unknown.

I see a common error, referring to null like "Null values" but the correct definition of null is "Lack of value" so you must refer to it as null, in singular form.

On this post, we will learn how to work with null in SQL Server.

Declaring and assigning NULL


For working with null, SQL Server engine uses the reserved word NULL to refer to it.
It is datatype independent so you just have to assign it to any variable or field, using the equal operator =, as you can see on this example:

DECLARE @NVi as int = NULL;
DECLARE @NVc as nvarchar(30) = NULL;
DECLARE @NVv as sql_variant = NULL;

SELECT @NVi, @NVc, @NVv;

If we run the statement, we will obtain these results, the same for each data type, as expected.



For inserting and updating fields with NULL we do it like on this example:


-- For inserting values
INSERT INTO test1..Table_1 (column_char,column2)
VALUES(NULL, NULL);

-- For updating values
UPDATE test1..Table_1
SET column2 = NULL;

Be careful when working with null, the equal operator = is only used for assignment.

Operations and comparison against NULL


As we stated earlier, any predicate or comparison can evaluate to TRUE, FALSE or UNKNOWN, so when a value is unknown we don't know if it is true or false, so comparing or working any value with unknown is also unknown.

For example, the following operations result is NULL in all cases:

--Arithmetic operations
SELECT NULL+5,NULL-3.47, NULL*3.1416, NULL/0; 

SELECT SQRT(NULL), POWER(NULL,2);

--String operations
SELECT 'HELLO ' +NULL + 'WORLD';

SELECT QUOTENAME(NULL);

SELECT LTRIM(NULL);

--Date operations

SELECT DATEADD(m,1,NULL);

SELECT DATEDIFF(m,GETDATE(),NULL);

When comparing to null, we also obtain null as well, as in those examples, as you can see, even comparing null to null is unknown, and when we execute below code, we obtain NO for all:


--comparing to 0
IF(0= NULL) OR (0 <> NULL)
 SELECT 'YES'
ELSE
 SELECT 'NO' 

--Comparing to empty string ''
IF(''= NULL) OR (''<> NULL)
 SELECT 'YES'
ELSE
 SELECT 'NO' 

--Even comparing to another null
IF(NULL= NULL) OR (NULL<> NULL)
 SELECT 'YES'
ELSE
 SELECT 'NO' 

So, if we want to compare value or column and check if is null or not what must we do?
SQL Server implements the IS NULL and IS NOT  NULL to compare against null, usage is as follows:


-- IS NULL usage

SELECT *
FROM test1..Table_1
WHERE column2 IS NULL;


-- IS NOT NULL usage

SELECT *
FROM test1..Table_1
WHERE column_char IS NOT NULL;


-- Using on IF construct

DECLARE @NVi as int = NULL;

IF(@NVi IS NULL)
 SELECT 'YES'
ELSE
 SELECT 'NO';


-- For Replacing NULL you can use
-- ISNULL Value since SQL 2008

SELECT ISNULL(@NVi,0);


With these tools we are ready to work with null in our databases, so now you should follow some considerations to not impact your database performance.

Special considerations for good performance


As the last point, I would like to give you some tips for dealing with NULL

Prefer IS NULL over ISNULL()


When possible, try to compare predicates using IS NULL, before casting NULL to default values using ISNULL(), because casted values are not SARGABLE.

Take as an example of these two queries, they are equivalent, but the first one has better performance over the second:


-- First query uses an index seek :)
SELECT FirstName
      ,MiddleName
      ,LastName     
FROM Person.Person
WHERE MiddleName = N'' 
 OR MiddleName IS NULL;

-- Second query uses an index scan :(
SELECT FirstName
      ,MiddleName
      ,LastName     
FROM Person.Person
WHERE ISNULL(MiddleName,N'')=N'';

These are the execution plans:

First query execution plan, an index seek is used :)

Second query execution plan, an index scan is used :(

We get a warning on the second execution plan
You can see the 2 differences on the 2 plans, so for this case, we prefer to stick to the first one, even if you must write more code.

Be careful with aggregations over nonexistent data


When you perform aggregations, be extra careful with no existent data, even when columns do not allow null, aggregate data that does not exist on the table returns null, contrary to what one could think can be the usual (a 0 value), as you can see on this example:


-- Even when TotalDue field does not allow NULL, 
-- the SUM of noexistent values is NUll, not 0 as one could think

SELECT SUM(TotalDue) as [Total Due]
FROM Sales.SalesOrderHeader
WHERE DueDate > GETDATE();

And the query results:


For those cases, you should use the ISNULL() function after SUM.

As I always recommend: test anything before going live, and use default values and not null columns when possible, to make your life easier.


Monday, January 21, 2019

MSSQLTips.com Rookie of the year 2018

You need a lot of hard work for success!
2018 was a good year for me in a lot of areas of my life both personal and professional (the birth of my daughter, my MCSE certification and another MCSA as well, the start of the local PASS Local group, the growth and recognition of this blog, etc).

Also, since August 2018, I started writing technical posts for other sites, and one of them is MSSQLTips.com. This is one of the best SQL Server related sites on the world, it contains a lot of technical articles in a "tip" format where you can learn new things or how to fix an issue in your environments.

The new year 2019 has come and new opportunities to continue growing, to continue learning and improve your career. I also have started this year with good news, last week I was awarded "Rookie of the year 2018" by MSSQLTips. This award is won by community, peer and site leaders votes, so I am very honored that many people have voted for me, this is a great honor and a big responsibility at the same time, I cannot put my guard down, so many people have been giving me their vote so I have to continue with the hard work learning and providing high-quality content, not just on that site, but all the sites I write.

I hope to continue writing and be able to be nominated for author of the year 2019, as some of you know, the most difficult part of writing is to find good topics to write about, but I want to focus on things I have difficulties when I started working with SQL Server, so new people starting with databases can have more handy and easy to follow material to get started.

I also want to congratulate my friends Pablo Echeverria and Alejandro Cobar for being also nominated on the site, we have a lot of potential here in Guatemala.

If you are interested, you can see the award here:
https://www.mssqltips.com/sqlservertip/5883/mssqltipscom-author-and-rookie-of-2018/

Wednesday, January 9, 2019

Source Control using Azure Data Studio and GitHub

You can maintain and deploy code easily
Source control is a vital part of IT work, since it is used to track changes over software and documents, with two main benefits: having a centralized, up to date copy of all your applications and utilities, and to be able to rollback to a previous stable version in case of a failed deployment.

This is perfectly applicable to SQL Server, where you want to have an updated and ready to deploy elsewhere copy of your database objects.

There are a lot of source control applications and software, everyone has its pros and cons, but personally, I like to use GitHub, since it is free to use and since it was recently acquired by Microsoft, support for other products is easier (SQL Server for this case).

On this post, I will show you how to implement a source control for a database using GitHub and Azure Data Studio (ADS).

Setting up your machine


First of all, you need to have Azure Data Studio installed and running on your machine, and also a GitHub account to be able to upload and maintain your files.

You will need Git to be able to use source control features in ADS, you can download from here. After downloading it, proceed to install it on your local machine (default features are ok).

To verify Git is installed correctly, just open any command prompt and type git then press enter:



Configuring Azure Data Studio


To be able to use source control on ADS, we have to perform some simple steps:

Open ADS and click on File>Open Folder, locate the folder you want for work and click OK (can be an empty folder for now).
For this example, I am using C:\Repos\SourceControlDB



You can see that now the selected folder appears on the top bar



Once a folder is selected, click on Source Control, located on the left bar



Click on Initialize Repository



And that is all, you are ready to work with source control.


Working with Git

Adding a file to source control


Now we will add an object to our Git folder to see how sourcing works.
Open any already created database object or a TSQL query on ADS IDE



Save it on your Git Folder with a meaningful name



Once saved, you can see the file is added to the source control as an untracked change



Just select the changes you want to commit (in case you have more open files) and then click on Commit on the upper right corner of the source control area



Since the changes have not been staged, a message box asking to do it automatically, click on Yes



Provide a meaningful commit message and press Enter



Now the file has been added to source control successfully.

Editing the file


We now proceed to do a simple change in the stored procedure



At the moment you save the file, you can notice that a new change is added to source control and modified or added lines are marked on the file



If you click on the change, you can now see the actual changes made to the file



If you are Ok with the changes, commit them using the same option as earlier, and that is the basic Git usage.



We will cover the rollback, merge and some other features on the next post.

Now we will learn how to upload your local Git files to GitHub.


Linking your repository to GitHub the first time


The easiest way is using your GitHub profile, so let's do it...

Create a new repository on your profile with the options you want, and click on Create Repository



In the next step copy the code for pushing an existing Git repository



Open a new command prompt, navigate to your Git folder and paste the code there, one line at a time



In my case for the first line, the origin already exists so it will throw an error, but for your first time it will execute ok, for the second line, after a few moments it will ask you for username and password



After a few moments (depending on your folder size) files will be uploaded and now your repository is uploaded to GitHub successfully (as long with all the versions you have)



You have successfully linked your Git folder with GitHub, as you can see the file is uploaded




Your GitHub is now linked


Since we have done the linking, you can upgrade existing or add new files and they will be uploaded to GitHub when you commit the changes and sync them.

We perform another quick change to our file and commit it



Then we click on Sync to synchronize with GitHub



After the process ends, if you check your GitHub repository you can see the new change



Now you have configured your environment to support source control directly to GitHub. On the next post, we will see some other advanced options, like perform the sync automatically.
Also, we will learn how to pull those files from another place and apply them to other SQL Server instance.