How to restore encrypted databases (Cannot find server certificate with thumbprint)

This article is only valid for SQL 2008 and SQL 2008 R2 as some feature were removed/improved in further versions.

Problem:

When you restore an encrypted backup in another SQL Server, it either restored it fully encrypted with null values only, or doesn’t restore at all by raising the following error:

Msg 33111, Level 16, State 3, Line 1
Cannot find server certificate with thumbprint '0xE11A199C1059C6F1E0223B56581CDCF3F043DFE8'.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

 

In order to successfully make the restore in a different server you will need to create a master certificate in the detonation and transfer the certificates and backups in that order.

 

Workaround:

First, identify all the objects affected. This includes the certificate, keys, and databases. With the query below you can list the encrypted databases. Notice the thumbprint is the same as the error raised originally. In this case we only have one certificate.

Use master
GO
SELECT    name,DEK.*
FROM      sys.databases D
JOIN      sys.dm_database_encryption_keys DEK
ON        DEK.database_id = D.database_id
ORDER BY  name

Result of encrypted databases

Second, identify the certificate by navigating in the source server to Master –> Security –> Certificates

Identify the certificate

Next, create a master key in the destination server. By default SQL creates one that is valid for system databases only. You need to create your own with the following syntax:

CREATE MASTER KEY ENCRYPTION BY PASSWORD ='StrongPassword'
Command(s) completed successfully.

Next, you have to backup the certificate in the source and create a copy of it in the destination server.  It must be with password and private key otherwise you will have the following error in the destination server “Msg 15507, Level 16, State 1, Line 1 A key required by this operation appears to be corrupted.”

USE Master
go
BACKUP CERTIFICATE DB_Encrypt_Cert
TO FILE = 'Z:\Backup\DB_Encrypt_Cert.cer'
WITH PRIVATE KEY(
FILE = 'Z:\Backup\DB_Encrypt_Cert.prvk',
ENCRYPTION BY PASSWORD = 'StrongPassword'
)

Restore and create a certificate in the destination based on the backup file you took in the step before. Notice the name must remain the same, you will need the private key, password, and pay attention on the syntax it changed from Encryption to Decryption. The warning is OK, this is because the original certificate was not set with an expiration date.


CREATE CERTIFICATE DB_Encrypt_Cert
FROM FILE = 'E:\MSSQL\DB_Encrypt_Cert.cer'
WITH PRIVATE KEY(
FILE = 'E:\MSSQL\DB_Encrypt_Cert.prvk',
DECRYPTION BY PASSWORD = '7Hx81GbNaxHP65rsSfiKAaVvKvN5beUY'
)

Warning: The certificate you created is expired.

 

Finally, you can restore the database with your normal method and it will with no issues.

Encrypted DB erstored
Encrypted DB restored

Useful links

https://sqlblogcasts.com/blogs/sqldbatips/archive/2008/06/24/new-in-sql-2008-transparent-data-encryption-part-ii.aspx

http://www.sqlmatters.com/Articles/Setting%20up%20Transparent%20Data%20Encryption%20(TDE).aspx

 

Advertisements

Cannot fetch a row from OLE DB provider IBMDADB2 for linked server

We have a very peculiar error where we spent months working with Microsoft and IBM to figure it out what could be wrong. You can see the short summary below

I have SQL Server 2008R2 SP2 with the DB2 drivers installed, v. 11.64.11.00 (64 bits). I have two instance in the same box, both using the same ODBCs, one instance is fully functional but second instance gives me this result when I try to execute a command on a linked server:

The OLE DB provider IBMDADB2 for linked server reported an error. The provider did not give any information about the error. Msg 7330, Level 16, State 2, Line 1 Cannot fetch a row from OLE DB provider IBMDADB2 for linked server

I already verified allow in process is = 1, and also assign the users the “Create global objects” user right.

This affects only the windows account but it works fine with SQL Logins. After months of troubleshooting we found the error can be fixed by Disabling the UAC you can refer to https://technet.microsoft.com/en-us/library/cc709691(v=ws.10).aspx

  1. Click Start, and then click Control Panel.
  2. In Control Panel, click User Accounts.
  3. In the User Accounts window, click User Accounts.
  4. In the User Accounts tasks window, click Turn User Account Control on or off.
  5. If UAC is currently configured in Admin Approval Mode, the User Account Control message appears. Click Continue.
  6. Clear the Use User Account Control (UAC) to help protect your computer check box, and then click OK.
  7. Click Restart Now to apply the change right away, or click Restart Later, and then close the User Accounts tasks window.

I sincerely hope this could save somebody months of workaround. If this is your case, leave a comment.

A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: seconds. Working set, committed, memory utilization

Long story short: you need to add the SQL service account to Lock Pages in Memory and set a max SQL memory configuration.

The problem is when SQL Server stops working but don’t attempt to failover. There are a few scenarios where this error applies but most of the cases SQL doesn’t not accept new connections for short periods (5 seconds.)  The problem can be identified in SQL Error Log with the following entry:

A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 157348, committed (KB): 444456, memory utilization: 35%.

Environment

This error is more visible in virtual server and server with more than 128 GB in RAM. This affects so far SQL Server 2012 and 2014 Standard or Enterprise Editions.

 Root Cause

Problem happened because Windows paged out SQL server memory. KB 918483 discusses the issue and the resolution.

Resolution

You will need to work with Enterprise Windows or Wintel team to resolve the paging out issue. You can refer “How to troubleshoot this problem” of the “More information” section of the following article: How to reduce paging of buffer pool memory in the 64-bit version of SQL Server https://support.microsoft.com/en-us/kb/918483

Workaround

You can grant the SQL service account is ‘DOMAIN\SA_SQLAccount_P’ Lock pages in memory user right to avoid the issue. Before you grant the privilege, you need to make sure SQL maximum memory is set such that there is enough memory left for the OS to work smoothly. This needs to be done in each node of the cluster.

To enable the lock pages in memory option

1. On the Start menu, click Run. In the Open box, type gpedit.msc.

2. On the Local Group Policy Editor console, expand Computer Configuration, and then expand Windows Settings.

3. Expand Security Settings, and then expand Local Policies. –> Lock Pages in Memory

Lock Pages in Memory

In the Local Security Setting – Lock pages in memory dialog box, click Add User or Group.

Adding user to Lock pages in memory
Adding user to Lock pages in memory

In the Select Users, Service Accounts, or Groups dialog box, add an account with privileges to run sqlservr.exe.

Log out and then log back in for this change to take effect.

SQL Server has now lock page in memory and it is registered in the SQL Error Log:

sql_after_plm_configured

Conclusion

SQL Server memory is tricky when running in a virtual machine. Anybody can get the Page Out Memory error but it will become more and more frequent. The problem is when it goes down, not even monitoring tools can catch it as recovery time is two to five seconds. However, important SSIS, jobs, or request can occur during those seconds and it will reject all connections. The problem goes very serious when the application is not able to recover itself and SQL did not register any attempt of failover.

Fixing the problem with this relatively easy solution will not only improve the performance but also maintain a consistent database health. Make sure to configure the SQL Server max memory 80/20 to avoid any issue.

 

References

Enable lock pages https://msdn.microsoft.com/en-us/library/ms190730(v=sql.120).aspx

How to reduce paging of buffer pool memory in the 64-bit version of SQL Server https://support.microsoft.com/en-us/kb/918483

SQL Server Set Up Error The RPC Server is Unavailable

This error occurs at installation time when SQL is unable to authenticate the service accounts to start the services. This occurs with A domain accounts, B domain accounts, and local system accounts. This error stops all the installation of SQL Server and it says “The RPC server is unavailable.” The screen below shows how this error occurs.

SQL Error

I troubleshoot this error by changing the accounts to start the services with no luck. Also attempted with a valid domain_A\sa_account, domain_B\valid_account, and NT AUTHORITY\Local Account but none of them worked.

Solution
Windows Server by default has some policies enabled. In order to fix this error HP configured the following
1. Below marked FW rules should be enabled from group policy.
Enable_Windows_FW_Rules

2. Network binding should be below order. Yours may change but the production/primary must be listed first.
Windows_Binding

3. Make sure DNS suffix entry should be correct order.
DNS_Suffix_Order

4. Execute below command if any issue found on Time synchronization

w32tm.exe /register
net stop w32time
net start w32time

After these changed were made, DBA team was able to proceed with the installation and the error “The RPC server is unavailable” disappear.

Technical Specifications
OS Version: Microsoft Windows Server 2012 R2 Standard x64
SQL Version: SQL Server Standard Edition 2014 x64
Memory: 16Gb
Processor: 4 CPUs
Two nodes Active-Passive Cluster.

SSRS Subscription Failure sending mail: The user or group name ‘domain\user’ is not recognized.Mail will not be resent.

When you’re getting the error above is because the user is no longer valid for Reporting Services.
Quick Fix –> Re-create the subscription.
Best Fix –> Reassign the subscriptions to a Service Account.

However, if you have several subscriptions it is not feasible to recreate all of them. Here it is how can you reassign these subscription to someone else.

1. Create a Service Account to assign it as owner of these subscriptions. This is because the Active Directory account set before is no longer valid (the employee left the company.) If you just use another user account and that user left the company you’ll have to do it over and over again. Account created: SA_REPORTS_P

2. Grant DBO permissions on ReportServer database to the new Service Account (SA_REPORTS_P)

3. Grant ALL permissions to the Service Account on Report Server Home directory. Make sure all the folder inherit permissions from parent, if not, individually check the permissions in each folder.

SSRS Permissions grant
SSRS Permissions grant

4. Find the OWNERID of the subcriptions that have an invalid user or group name. Run the query below in your ReportServer database and find those with the error.

--Query to get the Last Status of each Subcription and its OwenerID.
SELECT s.[SubscriptionID] -- Subscription ID
,s.[OwnerID] -- Report Owner
--,s.[Report_OID] -- Report ID
, c.Path -- Report Path
--,rs.ScheduleID as SQLJobName -- Name of Job on SQL Server
,s.[LastStatus] -- Status of last subscription execution.
,s.[Description] -- Description of the report subscription
,s.[EventType] -- Subscription type
,s.[LastRunTime] -- Last time subscription executed
--,s.[Parameters] -- Parameters used for subscription
,s.[DeliveryExtension] -- How to deliver the subscription
FROM [ReportServer].[dbo].[Subscriptions] as s left join dbo.Catalog as c
on c.ItemID = s.Report_OID left join dbo.ReportSchedule as rs
on rs.ReportID = s.Report_OID
order by s.LastStatus

OwnerID_LastStatus

5. Identify OwnerID for the Service Account Created (or whatever account you’re going to use). Run the query below to list the users.

SELECT [UserID],[UserName]
FROM ReportServer.dbo.Users

SSRS_users
6. Replace the invalid OwnerID with a valid OwnerID. I use the Service Account ID.

UPDATE [Subscriptions]
SET [OwnerID] = '96C17DCA-42D6-44AE-B036-9EA1F870AEFD' -- New VALID OwnerID
WHERE [OwnerID] = 'EFBCD353-3745-41E4-B2BF-8C23198014AD' -- Old and invalid OwnerID

7. Verify the subscriptions were updated. Run the query below to see how the new user name is not in place.

--Query to get the Report Name and its Subcription Owner.
SELECT
jobs.name AS JobName,
C.path AS ReportPath,
C.name AS ReportName,
u.username AS SubscriptionOwner
FROM dbo.ReportSchedule RS JOIN msdb.dbo.sysjobs jobs
ON CONVERT(varchar(36), RS.ScheduleID) = jobs.name
INNER JOIN dbo.Subscriptions S
ON RS.SubscriptionID = S.SubscriptionID
INNER JOIN dbo.Catalog C
ON s.report_oid = C.itemid
INNER JOIN dbo.users u
ON s.ownerid = u.userid
order by u.username

SSRS_SubscriptionOwner

The problem should be fixed at this point. Nevertheless, if you want to check if it really works, you can wait for the next subscription to run or invoke it manually. So, here you have a bonus on how to manually call a subscription.

A. In ReportServer, execute the query below and it will give you the SQL Job Name. Two Jobs (subcriptions) that were failing due to an invalid OwnerId are 36F500A6-58DA-4ACB-87AF-9947FEBB5797 and
542C8818-914D-4BAF-A9AA-C58FCADC24E0

--Query to get the SQL Job Name of each subscription in the server
SELECT
jobs.name AS JobName,
C.path AS ReportPath,
C.name AS ReportName,
u.username AS SubscriptionOwner,
s.LastStatus
FROM dbo.ReportSchedule RS JOIN msdb.dbo.sysjobs jobs
ON CONVERT(varchar(36), RS.ScheduleID) = jobs.name
	INNER JOIN dbo.Subscriptions S
ON RS.SubscriptionID = S.SubscriptionID
	INNER JOIN dbo.Catalog C
ON s.report_oid = C.itemid
	INNER JOIN dbo.users u
ON s.ownerid = u.userid
order by s.LastStatus

SSRS_SQLJobName

B. Execute the job manually with the script below.

EXEC msdb.dbo.sp_start_job @job_name = '36F500A6-58DA-4ACB-87AF-9947FEBB5797'

C. Validate the new status by executing the script in step A.
SSRS_SQLJobName_result2.jpg

Veeam Failed to truncate transaction logs for SQL instances: MICROSOFT##WID. Possible reasons: lack of permissions, or transaction log corruption

This error occurs for several reason, but it can be fixed by applying a hotfix that replaces some libraries. However, follow the steps outlined in this KB article https://www.veeam.com/kb2027 as is relevant to this type of errors, if none of them work out, then you’re in the right place!

NOTE: The hotfix I’ve uploaded is in PDF format but it is a ZIP file, wordpress doesn’t allow to upload ZIP files.

Short solution
– Download this ZIP file and replace them on your affected VM C:\Program Files\Veeam\Backup and Replication\VSS\VeeamGuestHelpres\. (rename it from PDF to ZIP)

Detailed solution
In my architecture we are working with a VM with the following software:
– SQL Version: Microsoft SQL Server 2014 Standard Edition (64-bit)
– OS Version: Windows Server 2012 Standard
– Veeam Backup & Replication 8.0.0.2084

The error or warning shows a full text like this:
Unable to truncate SQL server transaction logs. Details: Failed to process ‘TruncateSQLLog’ command.
Failed to truncate transaction logs for SQL instances: MICROSOFT##WID. Possible reasons: lack of permissions, or transaction log corruption.

In your Veeam console you may see something like this:

Console warning
Console warning

To fix this, follow these steps:
1. Download the new files here (I got them directly from Veeam Support – make sure to rename it form pdf to zip)
2. Copy the zip file or extracted files to you VM.
3. Connect to the affected VM and find the path C:\Program Files\Veeam\Backup and Replication\VSS\VeeamGuestHelpres\. Where you’re going to see some libraries like VeeamVSSSupport*
4. Take a backup of those files
5. Replace the files with the new files. The new files size should be similar to this:
Files_To_Replace
6. Execute the backup job in your Veeam Console

The issue will be fixed after you replace the old files for the new files.

AlwaysOn Availability Replica is in Resolving State

AlwaysOn is a “new” feature in SQL server and there is a little understanding on how it really works. Microsoft has set a lot of functions in a black box and even Microsoft’s help says it is complicated to explain.

I had a problem with an Availability Replica that was shown in Resolving state after Windows team applied the monthly patches in the server.

As result all the databases in the Availability Group were set in Recovery Pending state.
 
 
If you try to bring to start the availability replica you will get an error as this:

TITLE: Microsoft SQL Server Management Studio
------------------------------
The local node is not part of quorum and is unable to initiate a failover. This could be caused by one of the following reasons:
•   The local node is not able to communicate with the WSFC cluster.
•   No quorum set exists across the WSFC cluster.
 For more information on recovering from quorum loss, see SQL Server Books Online.
  (Microsoft.SqlServer.Management.HadrTasks)
------------------------------
BUTTONS:
OK
------------------------------
Recovery Pending
Recovery Pending

WSFC error
WSFC error

 
 
 
 

Before you go further and spend time looking up for the WSFC (Windows Failover Service Clustering) here it is the solution that worked for me. In my case we have configured a cluster with 3 nodes. Only two nodes have a vote SQL01 and SQL02, however, SQL03 is a replica and it doesn’t have a vote.

So the next step is to open the Cluster Administrator.

Cluster Administrator
Cluster Administrator

Next, go to the Node section and you you’ll see the SQL03 is down. The server that is resolving state is down in nodes. It doesn’t have a vote and that is why it is not visible in the main page of the Cluster Administrator.

Node is down
Node is down

Finally, bring the node online as the picture below:

Bring node online
Bring node online

Once the node is online the Availability group got fixed and the synchronization begins in all the databases. If your node was up and running and everything was well configured then it could be something in the ports, endpoints, and so on. Here it is a useful link from Microsoft https://msdn.microsoft.com/en-us/library/ff878308.aspx

If your node is constantly going down because of different reasons you can schedule a script to bring it online automatically. I have posted a solution in my Windows/Scripting section that may help you to simplify your Administrator life.