23988

Fast import data to mysql in Java

How long will take insert about 500.000 records from CSV file to MySQL database by Java code? Database hosted on localhost.

Table structure: AI id, | varchar(8) | datetime | int | varchar(2). My code need to insert 70.000 records over 40 minutes. Is there any way to do it faster? Here is the main part of my code:

CsvReader pro

ducts = new CsvReader(path);
products.readHeaders();
stmt = con.createStatement();
String updateString = "INSERT INTO table (T_V1, date, T_V2, T_V3) VALUES (?,?,?,?)";
PreparedStatement preparedStatement = con.prepareStatement(updateString);

            while (products.readRecord()) {
                v1= products.get("V1");
                date = format.parse(products.get("Date") + " " + products.get("Hour"));
                java.sql.Date dateDB = new java.sql.Date(data.getTime());
                v2 = products.get("V2");
                v3 = products.get("V3");



                preparedStatement.setString(1, v1);
                preparedStatement.setDate(2,dateDB);
                preparedStatement.setInt(3, Integer.parseInt(v2));
                preparedStatement.setString(4, v3);   
                preparedStatement.executeUpdate();
            }


According to your advice I moved creation of the statement out of the loop. Now I have 33 records per second, after I had 29 rps.

Answer1:

I might opt for using the LOAD DATA statement from MySQL instead of using Java:

LOAD DATA LOCAL INFILE '/path/to/your/file.csv' INTO TABLE table;

This would avoid a lot of the overhead you currently have, assuming you are processing each line before inserting it into MySQL.

You can execute a LOAD DATA statement from Java using raw JDBC.

Answer2:

If it's not necessary to insert code using Java, you can use SQL to insert data.

Use the code below in your GUI tool (SQLyog etc):

LOAD DATA LOCAL INFILE 'D:\\Book1.csv' INTO TABLE table_name FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' (column_name1, column_name2);

Answer3:

Instead of create a PreparedStatement inside the while create a PreparedStatement outside and simply set the values inside the while loop.

Something like

String updateString = "INSERT INTO table (T_V1, date, T_V2, T_V3) VALUES (?,?,?,?)"; PreparedStatement preparedStatement = con.prepareStatement(updateString); while (products.readRecord()) { v1= products.get("V1"); date = format.parse(products.get("Date") + " " + products.get("Hour")); java.sql.Date dateDB = new java.sql.Date(data.getTime()); v2 = products.get("V2"); v3 = products.get("V3"); preparedStatement.setString(1, v1); preparedStatement.setDate(2,dateDB); preparedStatement.setInt(3, Integer.parseInt(v2)); preparedStatement.setString(4, v3); preparedStatement.executeUpdate(); }

Additionaly you should commit every a number of rows that can be handled by the memory of your database engine, otherwyse after a certain number of inserts the system slow down very fast.

Note that generally should be possible to create more than 70.000 records in 40 minutes. Probably you have a bottleneck in your network. It is the database local to the java application or is a remote server? If it is a remote server check the connection speed.

Answer4:

you should go for batch insert

PreparedStatement prepStmt = con.prepareStatement("Insert query"); prepStmt.setString(1,parameter1); prepStmt.addBatch(); // for next set of parameter prepStmt.setString(1,parameter2); prepStmt.addBatch(); int [] numUpdates=prepStmt.executeBatch()

)

see Which is faster: multiple single INSERTs or one multiple-row INSERT?

How to do a batch insert in MySQL

Answer5:

First you can create the preparedstatement out of your loop. You can also refactor your code to use multithreading because your insert statements seems not dependent each other, so you can process all the data by splitting it in parallel.

But there is no absolute answer to your question "How long...". It depends on machine where mysql is hosted and the machine where java code is executed: number of core, memory available, etc.

Recommend

  • How to enumerate Azure subscriptions and tenants programmatically?
  • QBOv3 XML Validation Fault
  • ASP.NET 5: Error with Nuget package in Class library Package
  • Timer once a minute on the minute
  • 403 forbidden error while sending messages to facebook connector through Unification Engine API
  • 1º Day of Daylight Saving Time Java and JS showing a different behavior
  • Making more efficient Matlab ismember for large matrices: any faster suggestion than logical indexin
  • Selenium c#: WaitForCondition (how to find when ajax page is fully loaded)
  • How to pull data counter from a website to use in another HTML project as a JS variable
  • Transpose table then set and rename index
  • Timing loops with asynchronous functions
  • Parallel sieve of Eratosthenes - Java Multithreading
  • ConfigurationBuilder not working in azure function
  • Unable to play media with vlc ocx
  • How to resolve this packager error on react native Android
  • Are Richfaces and Primefaces compatible with each other?
  • Put value at centre of bins for histogram
  • Installing PHP 7 on digitalocean
  • android google indoor map
  • Combining two different ActiveRecord collections into one
  • pyodbc doesn't report sql server error
  • Stop Bash Script if Hive Fails
  • Why querying a date BC is changed to AD in Java?
  • Row Count Is Returning the incorrect number using RaptureXML
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • Illegal mix of collations for operation for date/time comparison
  • PHP - How to update data to MySQL when click a radio button
  • Counter field in MS Access, how to generate?
  • Cross-Platform Protobuf Serialization
  • Release, debug version and Authorization Google?
  • How to format a variable of double type
  • using conditional logic : check if record exists; if it does, update it, if not, create it
  • Hits per day in Google Big Query
  • coudnt use logback because of log4j
  • embed rChart in Markdown
  • JaxB to read class hierarchy
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app
  • Converting MP3 duration time