SaveEntity never returning when unable to access database

Posts   
 
    
Robert-Jan
User
Posts: 2
Joined: 10-Jan-2020
# Posted on: 10-Jan-2020 11:31:09   

Hi,

I am running a worker service inside a Docker Container. The service saves data in a SQL Server DB that is hosted outside the Docker Container. I noticed that the worker service stops working every time it tries to save an entity in the database.

Debugging this issue I found out that calling SaveEntity(IEntity2 entityToSave, bool refetchAfterSave) on a DataAccessAdapter just blocks indefinitely. It works fine if I run the worker service outside of the Docker Container, so I probably just cannot access the SQL Server database from within the container. I have not looked into the cause of that just yet.

What is worrying me is that it seems to be blocking executing, when I cannot reach the SQL Server database. I don't want to be rebooting the Docker Container every time the database is unavailable.

I did find a workaround for this issue by changing the recovery strategy, using async and a CancellationToken, but that does not feel right.

The following code is the code that does not work:


public bool SaveNetMessages(IEnumerable<NetmessageEntity> netMessages)
{
    var success = true;
    using var adaptor = new DataAccessAdapter();
    foreach (var netMessage in netMessages)
    {
        if (!adaptor.SaveEntity(netMessage, false))
        {
            success = false;
        }
    }
    return success;
}

It blocks indefinitely on the adaptor.SaveEntity(netMessage, false) call.

Changing it to this code solves the blocking problem:


public bool SaveNetMessages2(IEnumerable<NetmessageEntity> netMessages)
{
    var success = true;
    using var adaptor = new DataAccessAdapter();
    adaptor.ActiveRecoveryStrategy = new SimpleRetryRecoveryStrategy(1, new RecoveryDelay(TimeSpan.FromSeconds(1), 1, RecoveryStrategyDelayType.Linear));
    foreach (var netMessage in netMessages)
    {
        var tcs = new CancellationTokenSource(TimeSpan.FromSeconds(5));
        var cancellationToken = tcs.Token;
        var task = adaptor.SaveEntityAsync(netMessage, false, cancellationToken);
        var succeeded = task.Result;
        if (!succeeded)
        {
            success = false;
        }
    }
    return success;
}

This will throw an exception with the message 'Recovery failed: Maximum number of retries of 1 exceeded.' (RuntimeBuild 5.6.1_Netstandard2x), which is what I expect.

As I am planning to use the Async option anyway, I could just refactor the second piece of code to production code and move on. But it does not feel right to have to use specific recovery strategy settings and using a delay on a cancellation token and setting it to a value that fits the recovery strategy settings to make it work.

Any idea what could be wrong here?

Cheers, Robert-Jan

Otis avatar
Otis
LLBLGen Pro Team
Posts: 38110
Joined: 17-Aug-2003
# Posted on: 10-Jan-2020 13:56:27   

Do you have a timeout of 0 defined in the connection string? By default the timeout on SqlConnection to open a connection is 30 seconds, it should timeout after 30 seconds of not being able to connect and you should then get an exception.

(also, you're sure it's the connection and not the command that's waiting for a lock to be lifted ? )

Robert-Jan
User
Posts: 2
Joined: 10-Jan-2020
# Posted on: 10-Jan-2020 15:23:07   

There is no timeout defined in my connection string. My connection string looks something like this: "Data Source=db.dev.something.com,1234;Initial Catalog=XyzDevelopment;User Id=ourdb;Password=somepassword;"

I am quite sure that it's the connections. If it was the command waiting for a lock I would expect the issue to also occur when not running in a Docker Container, which it doesn't. I have also tried changing the CommandTimeout setting on the adapter from the default value of 30 seconds to five seconds and that also didn't make a difference.

But maybe I should first focus on finding out why it doesn't seem to be able to connect from the Docker Container where it is able to connect from outside of the container. When the cause of that is known, it might be easier to reason about what is causing the blocking issue.

daelmo avatar
daelmo
Support Team
Posts: 8152
Joined: 28-Nov-2005
# Posted on: 11-Jan-2020 06:57:16   

Yes, it must be something about the connection, open ports, etc. However it should raise an exception as well when the connection timeout is reached. Adding some logs should give you more information on this.

I will close this for now, please re-open the thread when you have more information about the issue.