The what?

That's right! Just like with rotating a cow in your head, there's nothing stopping you from tunneling port 10007 to port 10007 and BACK!

That doesn't sound reasonable you say? That's right, it is not reasonable at all. It's a bug in the setup of tunneling when it happens. However, there's nothing to stop you either, because it is just such an absurd scenario. So, how did it all start?

Background

Needless to say, I use SSH tunnels quite a bit. I use an automatic port allocator that provides unique port numbers for my entire infrastructure. It uses ever-increasing IDs from the database that should always guarantee that they don't change and they don't overlap.

That's where the mistake began. There was a mistake in the allocator that would make it provide sequential port numbers all the time except when a last connection between services is removed... That would instead magically change the allocation numbers.

Inherently, that'd be ok. I also have an automatic re-connection script that creates a secure mesh for all services using the new port numbers. Except... I had neglected to remove previous connections. Do you see where this is going yet?

AutoSSH fails...

I had a service forwarding port 10007. I always forward from/to the same port to keep things simple. So the forwarding service expects port 10007 to have a service listening on the remote server and will forward it to local 10007. All neat and simple.

However, the AutoSSH in the service or rather SSH underneath it failed with a fun sized message:

channel 1019: open failed: connect failed: Too many open files

The WHAT? What exactly is SSH doing? What have I done to deserve this?

Fortunately Linux has a number of utilities to debug such bizarre behaviour.

The useful one in this case is lsof. It is an acronym for "list open files". And oh wow how many open files did SSH have. A total of 1024 open files. The same number as happened to be the output of ulimit -n. That is for the "maximum number of open files".

As it turns out, AutoSSH service was not lying. Not that the thought would have ever crossed my mind in the first place.

How can there be so many files?

So... I looked at the other server. Just to check everything. And... For some reason there was another service chilling on port 10007. Forwarding from 10007 of the first server to its own 10007. That in turn forwarded to 10007.

AND it also had the same error.

channel 1024: open failed: connect failed: Too many open files

The fix...

I stopped the service on the second server and restarted the service on the first server... And everything went back to normal. Since the service on the second server was no longer meant to exist anyway, I disabled it and removed it permanently.

The last part is to make sure that the port numeration does not change when something is removed. It's a rare enough event, but it's required for avoiding nasty surprises in the future.

Or if I don't fix it then let this message be to my future self and perhaps to others who happen to run into such obscure message. I cannot imagine it being a common issue, but who knows. Everything could happen on the internet.

Have a great day of developing!

  • Heidi (Founder)