Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bulk insert into intermediary memory optimized temporary table #136

Closed
jzabroski opened this issue Aug 2, 2018 · 12 comments
Closed
Assignees

Comments

@jzabroski
Copy link

jzabroski commented Aug 2, 2018

CC @mmiller678

Currently, you create a session temp table #ZZZ_projects_GUID

This still has to hit disk, creating WRITELOG and PAGEIOLATCH_EX overhead as well.

The MERGE statement can use a memory optimized table in the source, but not the target.

See Microsoft's example here: Faster temp table and table variable by using memory optimization

@JonathanMagnan this is an attempt to troubleshoot the high INSERT BULK time we see. I believe this could be a generally great feature, though.

@JonathanMagnan JonathanMagnan self-assigned this Aug 2, 2018
@JonathanMagnan
Copy link
Member

Hello @jzabroski ,

We will look at it tomorrow about having an option and check the limitations for using memory optimized table.

However, in your case, I believe we need to highly look why the tempdb is taken so much time. Something is not normal from statistics I received that happen in this database.

Best Regards,

Jonathan

@jzabroski
Copy link
Author

You're probably right that this won't solve our problems. It's just a workaround that might sidestep our issues and add value to your product.

I think this trick could give some customers explosive performance gains, which ultimately means more happy paying customers.

@JonathanMagnan
Copy link
Member

Hello @jzabroski ,

We made some investigation about memory-optimized table.

Unfortunately, it looks they are better/faster for SELECT than for inserting with SqlBulkCopy

The table creates/drop take over 200ms each alone which is already bad but we found out that SqlBulkCopy was taking more than 2x more time in our environment.

It can still be a good idea for people having a very slow disk but I'm not sure that's a very good long-term solution.

Best Regards,

Jonathan

@jzabroski
Copy link
Author

jzabroski commented Aug 3, 2018 via email

@JonathanMagnan
Copy link
Member

Hello @jzabroski ,

Great, we will add an option probably tomorrow to support this scenario. So you will be able to test the performance with a beta version of this feature.

@JonathanMagnan
Copy link
Member

What do you mean, create/drop

I thought we needed to manage it but if you say that's something you create on your side before using our library, then we can skip this time.

@jzabroski
Copy link
Author

@JonathanMagnan

If you look at the example code on MSDN, they show step by step how to create a "memory optimized temp table". The documentation is obviously confusing, because it's not REALLY a temp table in the sense of a table in tempdb, but in practice can provide the facade of a temp table. Please see the MSDN article example C. Scenario: Replace session tempdb #table. Note this example even states (emphasis mine):

Replace the CREATE TABLE #tempSessionC statements in your code with DELETE FROM dbo.soSessionC, to ensure a session is not exposed to table contents inserted by a previous session with the same session_id. It is important to create the memory-optimized table at deployment time, not at runtime, to avoid the compilation overhead that comes with table creation.

So, in your code, the internal API would have a try-finally, the finally block would have to delete the session data:

DELETE FROM dbo.soSessionC

The end-user would also probably want to truncate the table occasionally to ensure rude-aborts don't cause "garbage" to accumulate in the staging table. Obviously, while this should produce blazing fast MERGE statements, it should be used with some care. I do believe this would allow us to shatter ETL benchmarks, though, and I'm curious to see the results!

@jzabroski
Copy link
Author

To be clear, the pseudo-code should be:

var deleteSessionData = "DELETE FROM {memoryOptimizedTableName}";
try {
   Call SqlCommand with text deleteSessionData
   Bulk Insert data into memoryOptimizedTableName
   Do Merge using memoryOptimizedTableName as source
}
finally {
   Call SqlCommand with text deleteSessionData
}

Hope this is clearer.

@JonathanMagnan
Copy link
Member

Thank for the additional information @jzabroski ,

Everything is very clear ;)

@JonathanMagnan
Copy link
Member

Hello @jzabroski ,

Just to let you know it will take a few more days before releasing it.

The code is already completed and working. However, we found out Memory Table is not very transaction friendly, so we need to improve our code for it.

Best Regards,

Jonathan

@jzabroski
Copy link
Author

Interesting. @mmiller678 is on vacation so a few days won't bother us.

However, we found out Memory Table is not very transaction friendly, so we need to improve our code for it.

Please elaborate so that I can learn along with you how "Memory Table is not very transaction friendly". We have SqlBulkCopy logic that doesn't depend on EFCore, so it might help us a great deal to know what you learned.

After your comment, I googled and found on MSDN Docs > SQL > Relational databases > In-memory OLTP > Transactions with Memory-Optimized Tables > Transaction phases and lifetime, but nothing stands out to me as cause for issues?

@JonathanMagnan
Copy link
Member

Hello @jzabroski ,

If you want to try it directly with our Bulk Operations library, you can do it starting with the v2.13.26: https://www.nuget.org/packages/Z.BulkOperations/

Please elaborate so that I can learn along with you how "Memory Table is not very transaction friendly

When using a transaction, we have the following error: Accessing memory optimized tables using the READ COMMITTED isolation level is supported only for autocommit transactions. It is not supported for explicit or implicit transactions. Provide a supported isolation level for the memory optimized table using a table hint, such as WITH (SNAPSHOT).

We have tried several isolation level but it seem they have all their limitations. We will try with table hint probably tomorrow and see what's hapenning.

Here is an example:

using System.Collections.Generic;
using System.Data.SqlClient;
using System.Diagnostics;
using System.Windows.Forms;
using Z.BulkOperations;

namespace Lab.BulkOperations.v451
{
    /*
CREATE TABLE[dbo].[TestMemoryTable](
[ID]
[int] IDENTITY(1,1) NOT NULL,

[ColumnInt] [int]
NULL,
[ColumnString]
[varchar](50) NULL
) ON[PRIMARY]
GO
CREATE TABLE [dbo].[Memory_TestMemoryTable]
(
	[ID] [int] IDENTITY(1,1) NOT NULL,
	[ColumnInt] [int] NULL,
	[ColumnString] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AI NULL,
	[ZZZ_Index] [int] NULL,

 CONSTRAINT [PK_Memory_TestMemoryTable]  PRIMARY KEY NONCLUSTERED 
(
	[ID] ASC
)
)WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_ONLY )
     */
    public partial class Form_Request_MemoryTable : Form
    {
        public Form_Request_MemoryTable()
        {
            InitializeComponent();

            var list = new List<TestMemoryTable>();

            for (var i = 0; i < 100000; i++)
            {
                list.Add(new TestMemoryTable {ColumnInt = i, ColumnString = "Z_" + i});
            }

            var connectionString = "Server=localhost;Initial Catalog=Z.BulkOperations.Lab;Integrated Security=true;";
            var clock = new Stopwatch();
            using (var conn = new SqlConnection(connectionString))
            {
                conn.Open();

                //var trans = conn.BeginTransaction();

                using (var bulk = new BulkOperation<TestMemoryTable>(conn))
                {
                    //bulk.Transaction = trans;
                    bulk.TemporaryTableName = "Memory_TestMemoryTable";
                    bulk.TemporaryTableIsMemory = true; // Required since some options such as LOCK TABLES is not supported for SqlBulkCopy
                    bulk.TemporaryTablePersist = true; // Optional if you don't want to drop data (using schema only will do it automatically)
                    bulk.DestinationTableName = "TestMemoryTable";


                    clock.Start();
                    bulk.BulkMerge(list);
                    clock.Stop();
                }

                //trans.Commit();
            }
        }

        public class TestMemoryTable
        {
            public int ID { get; set; }
            public int ColumnInt { get; set; }
            public string ColumnString { get; set; }
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants