Improve Insert Performance?

Posts   
 
    
Posts: 33
Joined: 31-May-2005
# Posted on: 14-Jun-2005 05:32:20   

Hello,

We have been using LLBLGen successfully for the past 2 weeks (and love it), but now we're running into an apparent performance bottleneck. We're migrating from an old legacy database (non-SQL) to SQL Server, and need to continually syncronize the data. The schema is vastly different, and we have over 1000 columns, so LLBLGen seemed like a great way to maintain our sanity, reduce development time, and improve code readability.

However, when I run a short test to INSERT ~8000 rows into our SQL Server database, the code is really slow. I'm wondering if I'm doing something wrong. Here's some of the offending code:

EmployeeEntity employee; //100 columns, we only update 3 in our test
EmpHistoryEntity employeehist; //10 columns, we only update 2 in our test
.
.
.
//Iterate ~2000 times
while (KeepGoing)
{
  employee = new EmployeeEntity();
  employee.Num = emp.e_num.ToString(); //emp is a Struct holding legacy data
  employee.Fmname = emp.e_fmname;
  employee.Lname = emp.e_lname;
  _adapter.SaveEntity(employee, true);

  while (KeepGoing) //iterate ~3 times for each employee
  {
    employeehist = new EmpHistoryEntity();
    employeehist.EmployeeId = employee.Id;
    employeehist.HistoryTypeCode = his.h_type; //his is a Struct holding legacy data
    _adapter.SaveEntity(employeehist);
  }
}

I left out the non-LLBLGen code. This added 8000 rows in 50 seconds, which seems pretty slow (2GHz machine w/ 1GB RAM). To ensure the legacy code wasn't causing the bottleneck, I removed the LLBLGen code and the while loops retrieved the 8000 rows of legacy data in 0.30 seconds.

As another test, I created a simple SQL script with 8000 non-parameterized INSERT statements and it ran in <5 seconds via sqlcmd. I'd think sqlcmd would be the slowest method for bulk INSERTs, which is why I figure I'm doing something wrong.

Thanks, Josh Lindenmuth

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 14-Jun-2005 09:57:43   

JoshLindenmuth wrote:

However, when I run a short test to INSERT ~8000 rows into our SQL Server database, the code is really slow. I'm wondering if I'm doing something wrong.

The main problem is that you are performing a number of INSERTs using different DB connections... I would recommend using the UnitOfWork for this kind of operation. Basically you simply create a UnitOfWork and then call AddForSave for each entity that you wish to save . Finally you call UnitOfWork.Commit which performs all the inserts using a single DB connection under the covers.

Also take a look at http://llblgen.com/tinyforum/Messages.aspx?ThreadID=3287 for a similiar discussion.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39896
Joined: 17-Aug-2003
# Posted on: 14-Jun-2005 10:31:19   

indeed, or open the connection before the loop, but a UnitOfWork is a better alternative as you then also have a transaction. Keep in mind that transactional inserts can be slower, so if you don't want to run teh complete batch in 1 transaction, simply open the connection up front, do the loop and then close it.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 33
Joined: 31-May-2005
# Posted on: 01-Jul-2005 05:10:58   

I used both of your suggestions (keeping a connection open and unit of work) in my single table sample, and it certainly ran faster (but nowhere near as fast as sqlcmd). However, now that I'm writing my production app, the performance is less than ideal again.

We are loading 1 parent table (employee) and 1 child table (employee history). The employee table has 2000 rows, and the employee history table has 6000 rows. Via the command line using sqlcmd, I can insert the 8000 rows into these 2 tables in a few seconds.

However, via llblgen with no transactions and a single open connection (keepConnectionOpen = true) as you suggested, the 8000 rows can take up to 4 minutes. Same performance with a UnitOfWork. Here's a very simple example of inserting 8000 rows ... like my production table, this runs for 4 minutes:


String connectString = "SERVER=Computer\\SqlExpress; DATABASE=TESTDB; Integrated Security=SSPI;";
_adapter = new DataAccessAdapter(connectString,true);

EmployeeEntity llblEmployee;
EmpHistoryEntity llblEmpHistory;
for (int i = 0; i < 2000; i++)
{
    llblEmployee = new EmployeeEntity(); //primary key is IDENTITY column
    llblEmployee.Num = i.ToString();
    adapter.SaveEntity(llblEmployee, true);

    for (int j = 0; j < 3; j++)
    {
        llblEmpHistory = new EmpHistoryEntity(); //pk is IDENTITY column
        llblEmpHistory.EmployeeId = llblEmployee.Id;
        llblEmpHistory.Descr = j.ToString();
        adapter.SaveEntity(llblEmpHistory);
    }
}

Any thoughts? What am I doing wrong? What do other people do if they have thousands or millions or rows to insert?

Thank you very much, Josh Lindenmuth

BTW - sorry for the code format above, I couldn't figure out how to add spaces or tabs.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39896
Joined: 17-Aug-2003
# Posted on: 01-Jul-2005 09:36:04   

You should use the overload of SaveEntity() which lets you pass in 'false' for recurse. Now recurse is true and it starts a new transaction for each element, commits that etc. You also don't have to refetch the entity to read back for the ID, you can read that without problems, as it's the PK.

I think that should speed up your code.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 33
Joined: 31-May-2005
# Posted on: 01-Jul-2005 13:42:54   

Down to a minute, which is more reasonable ... I can fool with it from here. I also just realized since we bought the professional version, we have all the source code, nice!.

Thanks for your help,
Josh Lindenmuth

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39896
Joined: 17-Aug-2003
# Posted on: 01-Jul-2005 20:42:48   

simple_smile

Still curious how inserting 8000 rows in raw sql takes just a few seconds. or did you use bulk import statements?

Frans Bouma | Lead developer LLBLGen Pro
vikramka
User
Posts: 42
Joined: 08-Sep-2004
# Posted on: 11-Sep-2005 06:55:56   

I have been writing a program to insert a whole lot of rows into mysql database. So what I have done till now are:

  1. Create a collection of the entity with its associated collections and do a recursive saveentitycollection.
  2. Create seperate collections for each table and then insert them all non recursively in batches
  3. unitofwork

Seems like option 2 performs far better than the others (maybe my observations are wrong).

Is there any way I can speed it up a lot more. I have already set all the collections DoNotPerformAddIfPresent to false.

Will saving entity by entity save any more time? Will doing adapter.Commit after each savecollection save any time?

I have to load like 200000 records into a table. This table has a lot of other attached tables which have their own indexes and all and I have to load them also. So totally there are 10 tables to load and all the business logic in between. I create all the collections using the business logic and then persist the data in batches of 1000 for the main table. At the pace it is going right now, looks like the entire load will take like a whole day.

From the previous discussion, it looked like im using the right way to insert the data, but any thoughts or suggestions are welcome.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39896
Joined: 17-Aug-2003
# Posted on: 12-Sep-2005 10:10:44   

vikramka wrote:

I have been writing a program to insert a whole lot of rows into mysql database. So what I have done till now are:

  1. Create a collection of the entity with its associated collections and do a recursive saveentitycollection.
  2. Create seperate collections for each table and then insert them all non recursively in batches
  3. unitofwork

Seems like option 2 performs far better than the others (maybe my observations are wrong).

Is there any way I can speed it up a lot more. I have already set all the collections DoNotPerformAddIfPresent to false.

Will saving entity by entity save any more time? Will doing adapter.Commit after each savecollection save any time?

Bulk saves are faster when you do: - use non-recursive saves, and only collections. - open the connection BEFORE the bulk save, and close it afterwards. - only use a transaction if it's absolutely necessary.

I have to load like 200000 records into a table. This table has a lot of other attached tables which have their own indexes and all and I have to load them also. So totally there are 10 tables to load and all the business logic in between. I create all the collections using the business logic and then persist the data in batches of 1000 for the main table. At the pace it is going right now, looks like the entire load will take like a whole day.

From the previous discussion, it looked like im using the right way to insert the data, but any thoughts or suggestions are welcome.

200000 rows should be saved non-recursively, with an adapter which already has its connection open, then save all the data, then close the connection. If you have a transaction open, be aware that a database system will have a lot of temp data when the transaction is huge. On SqlServer for example, this results in an increase of the transaction log file. If it's small, it has to be extended each n rows, which can be slow due to disk trashing. I'm sure mysql runs into similar problems.

Frans Bouma | Lead developer LLBLGen Pro
vikramka
User
Posts: 42
Joined: 08-Sep-2004
# Posted on: 13-Sep-2005 15:23:56   

Otis wrote:

Bulk saves are faster when you do: - use non-recursive saves, and only collections. - open the connection BEFORE the bulk save, and close it afterwards. - only use a transaction if it's absolutely necessary.

200000 rows should be saved non-recursively, with an adapter which already has its connection open, then save all the data, then close the connection. If you have a transaction open, be aware that a database system will have a lot of temp data when the transaction is huge. On SqlServer for example, this results in an increase of the transaction log file. If it's small, it has to be extended each n rows, which can be slow due to disk trashing. I'm sure mysql runs into similar problems.

Thanks a lot Frans. I wasn't using transactions, and doing just as you mentioned. I was using InnoDB engine which was pretty slow. I didnt test my program on sqlserver bcos each test run was taking hours.

Anyway I found a much better way to do this using "Load Data In File". Its 20 times faster and when I used MyISAM engine, probably a 100 times faster.

Regards,

Ian avatar
Ian
User
Posts: 511
Joined: 01-Apr-2005
# Posted on: 29-Jan-2006 15:42:15   

I think it would be good if LLBLGen had a way of using SQL Server's native bulk insert mechanism.

ado.net 2.0 enables this now with the SqlBulkCopy class.

Even if 200000 rows are saved non-recursively, the connection is kept open and there's no transaction isn't that still using an INSERT for each row?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39896
Joined: 17-Aug-2003
# Posted on: 29-Jan-2006 16:36:50   

Bulk inserts will always be faster by using native import methods like DTS or its successor. For example bulk inserts in access are way faster using a CSV file than any ADO method can do. If you need to bulkcopy data, use the appropriate method, these operations are generally done once.

Frans Bouma | Lead developer LLBLGen Pro
Ian avatar
Ian
User
Posts: 511
Joined: 01-Apr-2005
# Posted on: 29-Jan-2006 17:56:50   

If one uses a CSV, how should commas in the data be encoded and then returned to a comma once the data is in the table?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39896
Joined: 17-Aug-2003
# Posted on: 29-Jan-2006 19:12:14   

Ian wrote:

If one uses a CSV, how should commas in the data be encoded and then returned to a comma once the data is in the table?

You save a csv and tell access to import the data (I forgot the syntax, but I recently saw the code in a newsgroup posting). It was an example from me of bulk imports being faster through special methods than through a Data access layer.

Frans Bouma | Lead developer LLBLGen Pro