StringCacheForFetcher gone in v5.x!

Posts   
 
    
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 20-Feb-2019 19:46:28   

Wouldn't it have been better to just leave it null by default so that those of us who would benefit greatly from that feature can still turn it on where required?

It all seems to be based on the oft-mentioned premise that objects are fetched, used and discarded in the blink of an eye. For some applications, webapps, server services, sure but my WinForms applications rarely do that.

As an example, a lot of my reference data is stored for the lifetime of the application and is updated automatically (or manually). And if a user opens a screen with a grid containing thousands of objects, it stays around until the user closes that screen. So there is potentially a lot of memory to be saved and the timing of garbage collections simply don't come into it.

You also mention using resultset caching but that will still have the problem that the cached datareader values can still contain the same string many, many times.

"So till the next GC.Collect run, this memory was still being allocated, even if string caching was used or not.". The last part of this sentence surely contradicts the whole premise then! The point is that strings from DataReaders will stay at least until a GC no matter what, but the difference is when those strings got used in objects that survive that GC. With string uniquing on, they will disappear completely and you get your memory back; without it, they stay and you will have less overall memory to play with.

I also don't think your description about "...give less memory to be allocated at a given moment..." (because the GC hasn't run recently) is correct. If at the point of running a query that needs memory, the GC needs to do a collect then so be it. That is about timing, not memory availability - the amount of available memory will not be less when string uniquing.

I think this removal came about because of benchmarking which does indeed lend itself to short-lived memory usage. And I can understand you wanted to remove the overhead of string uniquing by default for that reason. But to remove the feature completely was a regressive step IMHO.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 21-Feb-2019 10:19:38   

We profiled the code extensively, and the string cache only adds memory consumption and time taken to complete a request, it doesn't make things more efficient.

The main reason is that the string obtained from the datareader is already allocated. So a string cache won't make that allocation go away: it's already there. It can be replaced with a string that's already in memory elsewhere, but that doesn't make the string I just read from the datareader become unallocated, it's there, and has to be cleaned up by the GC.

The time it takes to pull a string from the cache adds to the time taken to fetch a set. It's not low either. Additionally, the amount of strings in-memory isn't lowered at all, since the strings allocated by a fetch is always the same: the strings allocated by the datareader are the ones allocated during the fetch.

Your reasoning relies on the assumption that GC will save you, but there's no rule when the GC will clean things up. That might be after 15-20 minutes or more.

The string cache removal isn't a regression: it was a hidden feature that made things 1) slower (a lot) and 2) eat more memory than needed. Removing it won't regress anything, other than for the situation where 1) the GC would clean things up and 2) there's so much data in-memory that it actually makes a difference.

If one runs into this, and you seem to do that, why not lower the amount of entities in memory?

In any case, it's not coming back, sorry.

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 21-Feb-2019 12:24:46   

the string obtained from the datareader is already allocated Yes indeed and there is nothing that can be done about that.

But they then get referenced by entities on a 1:1 basis. So when the GC comes (sooner or later - makes no difference), if those entities are still around then every single one of those duplicate strings will stay in memory. But it they had been uniqued before they went into the entity then only one of those duplicate strings would stay in memory - the rest would be collected and give you more free memory.

there's no rule when the GC will clean things up. That might be after 15-20 minutes or more. That may be true for certain use-cases but not WinForms ones/WPF ones - I am certain of that where GCs can happen multiple times per second - but it completely misses the point: You _will _get your memory back at some point. And memory allocation isn't just about entities BTW - there are lots of other requests coming in and leaving unnecessary duplicate strings will increase the frequency of GC0s and increase the likelihood of them moving to GC1s. You need to consider the total amortized cost of allocations/GCs not just the immediate state after a fetch,

it was a hidden feature It wasn't a complete hidden feature - there was a protected property specifically to give control - it was part of the API.

If one runs into this, and you seem to do that, why not lower the amount of entities in memory? Shocking! You are telling me how to write my apps now? Again I say that you are ignoring scenarios like WinForms where data stays around longer than you are assuming.

The bottom lines is that a good feature has been removed when all that should have happened was this line should have been removed:

this._stringCacheForFetcher = new UniqueList<string>(256);

That would have given you your faster benchmarking scores and still allowed the rest of us to use it where appropriate for my real world scenarios.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 21-Feb-2019 12:51:06   

simmotech wrote:

the string obtained from the datareader is already allocated Yes indeed and there is nothing that can be done about that.

yet it's the core of the issue.

But they then get referenced by entities on a 1:1 basis. So when the GC comes (sooner or later - makes no difference), if those entities are still around then every single one of those duplicate strings will stay in memory. But it they had been uniqued before they went into the entity then only one of those duplicate strings would stay in memory - the rest would be collected and give you more free memory.

In that ideal scenario, sure. But the world isn't an ideal place. You can't control when GC happens and one shouldn't 'assume' it will happen quite soon after you magically allocated memory and freed it.

there's no rule when the GC will clean things up. That might be after 15-20 minutes or more. That may be true for certain use-cases but not WinForms ones/WPF ones - I am certain of that where GCs can happen multiple times per second - but it completely misses the point: You _will _get your memory back at some point.

Please don't try to lecture me on memory usage. I get your point, I just don't agree with it and I do know how things work wrt memory allocation.

And memory allocation isn't just about entities BTW - there are lots of other requests coming in and leaving unnecessary duplicate strings will increase the frequency of GC0s and increase the likelihood of them moving to GC1s. You need to consider the total amortized cost of allocations/GCs not just the immediate state after a fetch,

Yes, hence we profile memory footprints of what entities do, how much memory they allocate, and what's left after a fetch or insert action. Again, I know how this all works, Simon. I also know when someone runs into edge cases when memory is a problem, i.e. when you have a massive amount of entities in-memory. The root cause of having memory pressure in that case isn't duplicate strings, but the fact there are a massive amount of entities in memory.

it was a hidden feature It wasn't a complete hidden feature - there was a protected property specifically to give control - it was part of the API.

I don't know what you're trying to achieve here, but I've little time bickering over these kind of things. The feature was 'hidden' as in: it was automatic, and no user interaction was required to set it up. You could adjust it but I doubt a lot of people did that, considering it's not something you'd think of. You knew it was there, but I doubt many others did, as there was no need to know it: it didn't require any configuration/interaction.

If one runs into this, and you seem to do that, why not lower the amount of entities in memory? Shocking! You are telling me how to write my apps now?

I think I asked a genuine question? I deal with users of my work for over 15 years now and I know what the general patterns are. Almost never do I run into a situation where people keep that much entities in memory. I then wonder why one would do that and not fetch things from a DB (even locally) on the fly.

Mind you, you are telling me how to optimize my framework.

Again I say that you are ignoring scenarios like WinForms where data stays around longer than you are assuming.

We deliberately created features like DataScopes for winforms / wpf situations, so why you conclude we 'ignore' that entirely is a bit of a blur to me, sorry.

The bottom lines is that a good feature has been removed when all that should have happened was this line should have been removed:

this._stringCacheForFetcher = new UniqueList<string>(256);

That would have given you your faster benchmarking scores and still allowed the rest of us to use it where appropriate for my real world scenarios.

No it wouldn't have. As the presence of it still bled through in code and it had to take it into account. We had serious problems with it and it stopped optimization work because it was there: code has to assume it IS there so has to act accordingly. Removing it would solve a problem we had to optimize it further and therefore we wondered when it would be beneficial to have it. We couldn't think of a situation where it would have (only an edge case with massive amount of entities and at the same time an eager GC cleaning things up right away and a low amount of available memory at all) so we removed it, freeing up a way to optimize the framework further. Simpler code, faster code, easier to maintain.

Optimizing these frameworks isn't about running a query once and guessing where things are slow. It's about defining tests, profiling endlessly to get better insight in what's going on and where bottlenecks are, apply fixes and see if they give better results and lower memory footprints. It's always a trade-off and we hope to make the ones which are beneficial for everyone.

Frans Bouma | Lead developer LLBLGen Pro