44589

Speed up inserts into SQL Server from pyodbc

In python, I have a process to select data from one database (Redshift via psycopg2), then insert that data into SQL Server (via pyodbc). I chose to do a read / write rather than a read / flat file / load because the row count is around 100,000 per day. Seemed easier to simply connect and insert. However - the insert process is slow, taking several minutes.

Is there a better way to insert data into SQL Server with Pyodbc?

select_cursor.execute(output_query) done = False rowcount = 0 while not done: rows = select_cursor.fetchmany(10000) insert_list = [] if rows == []: done = True break for row in rows: rowcount += 1 insert_params = ( row[0], row[1], row[2] ) insert_list.append(insert_params) insert_cnxn = pyodbc.connect('''Connection Information''') insert_cursor = insert_cnxn.cursor() insert_cursor.executemany(""" INSERT INTO Destination (AccountNumber, OrderDate, Value) VALUES (?, ?, ?) """, insert_list) insert_cursor.commit() insert_cursor.close() insert_cnxn.close() select_cursor.close() select_cnxn.close()

Answer1:

<strong>UPDATE:</strong> pyodbc 4.0.19 added a Cursor#fast_executemany option that can greatly improve performance by avoiding the behaviour described below. See this answer for details.

<hr>

Your code does follow proper form (aside from the few minor tweaks mentioned in the other answer), but be aware that when pyodbc performs an .executemany what it actually does is submit a separate sp_prepexec for each individual row. That is, for the code

sql = "INSERT INTO #Temp (id, txtcol) VALUES (?, ?)"
params = [(1, 'foo'), (2, 'bar'), (3, 'baz')]
crsr.executemany(sql, params)


the SQL Server actually performs the following (as confirmed by SQL Profiler)

exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',1,N'foo'
exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',2,N'bar'
exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',3,N'baz'


So, for an .executemany "batch" of 10,000 rows you would be

    <li>performing 10,000 individual inserts,</li> <li>with 10,000 round-trips to the server, and</li> <li>sending the identical SQL command text (INSERT INTO ...) 10,000 times.</li> </ul>

    It is possible to have pyodbc send an initial sp_prepare and then do an .executemany calling sp_execute, but the nature of .executemany is that you still would do 10,000 sp_prepexec calls, just executing sp_execute instead of INSERT INTO .... That could improve performance if the SQL statement was quite long and complex, but for a short one like the example in your question it probably wouldn't make all that much difference.

    One could also get creative and build "table value constructors" as illustrated in this answer, but notice that it is only offered as a "Plan B" when native bulk insert mechanisms are not a feasible solution.

    Answer2:

    <s>It's good that you're already using executemany().</s> [Struck out after reading other answer.]

    It should speed up a (very little) bit if you move the connect() and cursor() calls for your insert_cnxn and insert_cursor outside of your while loop. (Of course, if you do this, you should also move the 2 corresponding close() calls outside of the loop as well.) In addition to not having to (re)establish the connection every time, re-using the cursor will prevent having to recompile the SQL each time.

    However, you probably won't see a huge speed up from this just because you're probably only making ~10 passes through that loop anyway (given that you said ~100,000 a day and your loop groups together 10,000 at a time).

    One other thing you might look into is whether there are any "behind-the-scenes" conversions being made on your OrderDate parameter. You can go to SQL Server Management Studio and look at the execution plan of the query. (Look for your insert query in the "recent expensive queries" list by right-clicking on the server node and choosing "Activity Monitor"; right click on the insert query and look at its Execution Plan.)

Recommend

  • Activity.onDestroy behind the scenes And what does destroy really mean?
  • What to do when your ant build process craps on your Version Control
  • Dynamic SQL with variables inside a view (SQL Server)
  • Why do I link my jquery inside a document.write?
  • Jquery Ajax form using .on(“submit”,
  • How to add a “using” statement to the System.Data.Entity namespace [closed]
  • What are the use cases of dlopen vs standard dynamic linking?
  • Applescript to ping test each client prior to ssh connection
  • Query timeout expired in django-mssql when executing custom SQL directly
  • How To Delete All Words After X Characters
  • No rows to manipulate in html table created with jQuery csvToTable?
  • FragmentActivity with a Fragment Containing a MapView
  • Hibernate in Glassfish - Ejb3Configuration NoClassDefFoundError
  • Mongodb update() vs. findAndModify() performace
  • Cursor in wrong place in contenteditable
  • Adding independent aspx/asmx pages into DotNetNuke
  • It is possible use the same sql azure instance from two different cloud service of two different sub
  • Pre-populated SQLite Database not reading properly in Android Studio
  • xtable - background colour of added rows
  • Converting query results into DataFrame in python
  • Spring integration inbound-gateway Fire an event when queue is empty
  • Using Laravel 5.4 pusher
  • Cloud Code function running twice
  • Find group of records that match multiple values
  • Possible to get mouse events fired when cursor is outside page?
  • Center align outputs in ipython notebook
  • Oledb connection string for excel files
  • How do I access an unhandled exception in an MVC Error view?
  • How to define and use opencv mat of user type
  • Row Count Is Returning the incorrect number using RaptureXML
  • Google Custom Search with transparent background
  • How to do unit test for HttpContext.Current.Server.MapPath
  • Is my CUDA kernel really runs on device or is being mistekenly executed by host in emulation?
  • Counter field in MS Access, how to generate?
  • Cassandra Data Model
  • vba code to select only visible cells in specific column except heading
  • Arrays break string types in Julia
  • Getting Messege Twice Using IMvxMessenger
  • Python/Django TangoWithDjango Models and Databases
  • To Get the radio button value in ruby on rails