Iterate through large data from LInQ result

Posts   
 
    
eSPiYa
User
Posts: 10
Joined: 12-Oct-2018
# Posted on: 12-Oct-2018 05:19:36   

I'm trying to iterate through a huge result set from LInQ query:

LinqMetaData metaData = new LinqMetaData(adapter);
var query = from prod in metaData.Products
     select prod;

I'm expecting more than 30,000 records from such query. And to iterate through it, I tried both streaming and using enumerator, performance is relatively the same. But is there other way to make it even faster? Also I have to use LInQ because we already made bunch of helpers for it.

Here's my code for streaming

int ctr = 0;
                    foreach (var record in query)
                    {
                        // do processing here
                        ctr++;
                    }

and here's my enumerator code

using (var enumerator = query.GetEnumerator())
                    {
                        while (enumerator.MoveNext())
                        {
                            var record = enumerator.Current;
                            // do processing here
                            ctr++;
                        }
                    }

Also is there a way to fetch the datareader from LInQ/IQueryable<TEntity>?

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 12-Oct-2018 08:22:04   

eSPiYa wrote:

I'm expecting more than 30,000 records from such query. And to iterate through it, I tried both streaming and using enumerator, performance is relatively the same. But is there other way to make it even faster?

What is the slow part? Did you measure it? i.e.: What is slow (the query execution, the 30000 records traveling through the wire, the memory handling in the app side, the manipulation or processing, etc)? The possible solution depends on where the issue is. For instance: you could reduce the query complexity, reduce the number of fields, using paging, fetch a flat list instead of entities (pocos, datatables, typedViews, typedLists).

eSPiYa wrote:

Also I have to use LInQ because we already made bunch of helpers for it.

IMHO, you could add specific code for specific needs (LLBLGen API, QuerySpec). The thing with Linq is that when it's enumerated, it's executed and materialized. The only thing you could do there in that respect is to use paging.

eSPiYa wrote:

Also is there a way to fetch the datareader from LInQ/IQueryable<TEntity>?

In short: no. But you can do it with QuerySpec or LLBLGen API. Please read Fetching DataReaders and Projections.

Hope this helps.

David Elizondo | LLBLGen Support Team
eSPiYa
User
Posts: 10
Joined: 12-Oct-2018
# Posted on: 12-Oct-2018 08:56:34   

What is the slow part? Did you measure it?

I think it is not really slow, but I'm trying to look for other way to improve the performance. It takes 6.5 - 7.5 minutes just to iterate through the records without processing.

The thing with Linq is that when it's enumerated, it's executed and materialized.

Based on my experience in implementing LInQ on other things(text files, web api, etc), as long as it is still an IQueryable<t>, it is not yet executed unless you call the ToList(), First/FirstOrDefault, or similar methods. I doubt using paging would improve it because I'm already reading the records one at a time through streaming or enumeration.

In short: no.

Won't you support this later? Changing the approach in querying just to take advantage a certain functionality is a big turn off.

For example: In our case, we already have an existing LInQ query for fetching records to display on our grid for viewing, creating a new function with different approach like the other API you've mentioned, we could hit some roadblocks because QuerySpec or LLBLGen API don't support some functions we need, or can make things more complicated. We actually abandoned LLBLGen API and used LInQ instead because it made our work a lot easier.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 12-Oct-2018 09:13:02   

eSPiYa wrote:

What is the slow part? Did you measure it?

I think it is not really slow, but I'm trying to look for other way to improve the performance. It takes 6.5 - 7.5 minutes just to iterate through the records without processing.

7.5 minutes to read 30.000 rows is way too much. Our benchmarks fetch 32000 rows in 200ms or less (that's over a lan so the DB isn't even local). It really helps if you look at the code with a profiler to see where things are slow. I.e. if the processing code itself is very slow, it's best if you first fetch all the data in a list, and then process it, instead of processing it row by row in a loop.

Is the connection with the database slow? E.g. over a modem or something, or through a firewall that slows things down? The 30000 rows, are those a fragment of a massively large table with millions of rows? Or all the rows in a table?

Additionally, if the data is very large, i.e. you have large fields in the entities with e.g. megabytes of data, and you don't need those in the processing, you can exclude them from the fetch.

But without profiling it's impossible to say what it is that's slow, but 1 thing is certain: 30,000 rows should take less than a second.

The thing with Linq is that when it's enumerated, it's executed and materialized.

Based on my experience in implementing LInQ on other things(text files, web api, etc), as long as it is still an IQueryable<t>, it is not yet executed unless you call the ToList(), First/FirstOrDefault, or similar methods. I doubt using paging would improve it because I'm already reading the records one at a time through streaming or enumeration.

It depends on what you're doing and what's causing the slowdown.

In short: no.

Won't you support this later? Changing the approach in querying just to take advantage a certain functionality is a big turn off.

With linq there's no datareader fetch, we offer other ways to fetch a datareader though, e.g. through QuerySpec with FetchAsDataReader or the low-level API. Linq's queries are fetched by enumerating them so a datareader isn't fitting the linq API. That aside, the materialization of entities is very fast, so that's not the bottleneck with 30000 entities. Without profiling, no one can say what's slow, except for the parts which are already profiled to death, like entity materialization, query generation, query fetching etc. We do that regularly to stay ahead of the competition, using our own benchmarks https://github.com/FransBouma/RawDataAccessBencher

For example: In our case, we already have an existing LInQ query for fetching records to display on our grid for viewing, creating a new function with different approach like the other API you've mentioned, we could hit some roadblocks because QuerySpec or LLBLGen API don't support some functions we need, or can make things more complicated. We actually abandoned LLBLGen API and used LInQ instead because it made our work a lot easier.

The low level API is indeed quite verbose and it's understandable you moved away from that simple_smile However queryspec is more flexible than linq and has more features. But I don't think your slowdown has anything to do with linq, it's caused by something else, so please, try to profile your code to see where the bottleneck really is.

As the normal run already takes 7+ minutes, profiling it will take longer, but after half a minute you can already stop the profile and look at the data it gathered as it will already give you a clear insight in what's wrong and where things are slow.

Please let us know what you find simple_smile

Frans Bouma | Lead developer LLBLGen Pro
eSPiYa
User
Posts: 10
Joined: 12-Oct-2018
# Posted on: 12-Oct-2018 10:20:06   

Thanks for the suggestions. I tried to use fetch all of the records using ToListAsync(), and was able to iterate through the result set in less than 2 secs. I'm wondering why reading records directly one at a time is a lot slower than fetching them all.

Knowing that fetching all records at once is bad for the memory, I tried to use pagination and found that the optimal as of now is 5,000 records at a time.

Anyway, I've got confused by the streaming feature. I'm using EF7's streaming feature by directly iterating through the records, EF6 and lower have an AsStreaming() method which I'm not aware of. Our team lead said that the version of LLBL we're using has Entity Framework implemented so I assumed that directly iterating through the records will produce the same result. Though I rarely used EF7's streaming functionality so I'm not sure how is the performance when working with more than 10,000 of records.

Walaa avatar
Walaa
Support Team
Posts: 14946
Joined: 21-Aug-2005
# Posted on: 12-Oct-2018 11:23:22   

Fetching all records with paging is the way to go. Iterating one record at a time, you'd better use a DataReader as Otis and David pointed out.

Our team lead said that the version of LLBL we're using has Entity Framework implemented so I assumed that directly iterating through the records will produce the same result.

LLBLGen Pro Designer supports multiple ORM frameworks, but you will decide on one of them per project. So apparently you are using LLBLGen Pro Framework, which is fast (chck the benchmarks).