Monday, July 16, 2012

Connection problems: inbound connection timed out (ORA-3136)

A couple of days ago I ran into error some messages in the alert_<SID>.log which read about like this:


Fatal NI connect error 12170.

  VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.1.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production
  Time: 11-JUL-2012 08:54:24
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12606
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192..168..018.13)(PORT=64811))
WARNING: inbound connection timed out (ORA-3136)


The application accessing the database is running on a Redhat Linux application server using a JDBC connection. The database also resides on a Linux box but the database site was not the limiting factor which you can see further down this Blog.
The application server was virtual and just had enough resource to run the application. 
The application opened several database connections at the same time ( within 1 minute )


This error usually comes from connection attempts which try to open a database connection and do not send the credentials in time. Usually the listener spawns a new server process and hands off the connection to the new server process. But if the listener does not get the connect credentials in time, it throws an error to avoid a DOS attack. This timeout, which could be set via the INBOUND_CONNECT_TIMEOUT sqlnet.ora parameter and the listener parameter INBOUND_CONNECT_TIMEOUT in the listener.ora file, has
a default value of 60 seconds if not set to a different value.

Usually one would think that is enough time for the client to send the username/password. 
Well, in fact that's true but in this case the client really had problems sending the username/password combination.

After some research it turned out the root cause of the problem is the JDBC driver using the java.security.SecureRandom class which in turn is using the /dev/random device under Linux.
The /dev/random device is using a pool or cache of bytes which the Linux Kernel gathers from various sources to produce a random sequence of bytes for applications which use cryptographic procedures.
Each read from /dev/random drains the pool with requested amount of bits/bytes.
If the pool is completely empty the program accessing the /dev/random device blocks until the requested number of bytes are available. 

One can check this by simply executing: "cat /dev/random " a couple of times. If you are getting blocking results you might face an issue when connecting to a oracle database.

Even though you are not using "advanced encryption" the connect credentials ( username, password ) are always sent encrypted to the database. Therefore the JDBC client is using encryption even though you are not using advanced encryption functionality.

There is another Linux device  /dev/urandom which is not blocking when the pool is empty. Instead of blocking this device reuses the bytes until the pool is filled up again.

Unfortunately the JDBC driver uses the SecureRandom Class of the java standard library which depends on /dev/random.
But fortunately you can change what Implementation the JDBC driver is using by putting the following line in your java program before opening the connection :

System.setProperty("java.security.egd", "file:///dev/urandom");


or you can use a java property setting when starting your java program:


-Djava.security.egd=file:///dev/urandom


 Here is a interesting link which explains the 2 workarounds above.

This makes the JDBC driver use the /dev/urandom device which is not blocking.

Alternatively you could set the listener or sqlnet.ora parameter INBOUND_CONNECT_TIMEOUT to a appropriate higher value for your systems.


Another solution could be approaching the problem from the Linux side:
There is a Linux daemon "rngd" which can help filling up the /dev/random device with entropy.
This daemon can be used to pull entropy from hardware accelerators or other devices and feed it into the entropy pool on which /dev/random relies.  
In our case we could just pull the data from the /dev/urandom device which is non-blocking. 

 Use the following command to run this service on /dev/urandom:


rngd -r /dev/urandom 


The process will start as daemon which means it will run in background.

Please note that all of this possible solutions will decrease the level of security !
For long secure term solutions please consider adding a hardware accelerator.






1 comment:

  1. Thanks for this article. We had the exact same problem and we opted for the rngd solution.

    ReplyDelete