Pay4Bugs finds Ubuntu 8.04 Ruby On Rails IPv6 Bug

The launch of Pay4Bugs was unique not only because it is one of the first real products launched from day one with dual stack IPv4/IPv6 access, but also because Pay4Bugs, along with its sister IPv6-enabled OpenID provider, 87id,  is the first site to make possible OpenID sign on using exclusively IPv6 in a production environment.  That is to say, the two make the first known IPv6-enabled commercial OpenId provider and relying party combination.

Knowing that some aspects of the software supporting this configuration may not have had extensive real life testing in production IPv6 environments, I invited friends on The Great IPv6 Experiment mailing list to help us give IPv6 access to Pay4Bugs and 87id a thorough shakedown.

For those of you not familiar with the Pay4Bugs Software Testing Market, Pay4Bugs allows software developers to post their production or pre-production software or website and offer a reward or "bug bounty" to testers who find and report bugs.

Target Acquired

OpenId/IPv6 Pay4Bugs bug report screenshot (click for full image)

Besides finding and reporting several non-IPv6 related Pay4Bugs typo and glitches, one IPv6-enabled tester helped us locate a rather serious problem in our IPv6 stack.

The Pay4Bugs tester reported that during OpenID login,  the relying party contacted his IPv6 enabled OpenID server via IPv4 instead of IPv6. To see the original full bug report, you can click on Pay4Bugs bug report screenshot at the right.

Before approving the bug, I first verified the tester's problem was reproducible by logging into Pay4Bugs with my 87id OpenID while watching the 87id Apache access log. The log confirmed Pay4Bugs's OpenID subsystem was in fact connecting to 87id to conduct discovery via IPv4 instead of IPv6 consistent with the tester's report.

Drilling down to the root cause

Simple things first

From years of experience, I have learned that when trying to solve potentially complex problems, it is best to make sure the simple things are working as intended. Computer not turning on? Is it plugged in? Car not starting? Is there gas in the tank?

In this case, the first step was to double-check on both the 87id and Pay4Bugs servers that IPv6 connectivity between the two worked.  A simple ping from one to the other verified that DNS AAAA record lookup and IPv6 routing was working fine.

Since the only services on the machines to display strange IPv6 behavior to this point were the Pay4Bugs and 87id web apps, this led me to suspect a problem with the Ruby on Rails stack.

Search amongst the gems

Pay4Bugs uses the excellent ruby-openid gem to handle OpenID login. Examining the source code to locate the section responsible for connecting to the OpenID provider for OpenID discovery indicated that it relies on the "Net::HTTP" standard library to create the requisite HTTP connection.  Nothing regarding DNS look up or selecting IPv4 versus IPv6 was handled at this level.

Testing Ruby

With the ruby-openid most likely eliminated from the list of suspects, I set about testing Net::HTTP to see if using it directly to open an HTTP connection to an IPv6-enabled server would result in an IPv6 connection or an IPv4 connection.

{% codeblock %}
irb(main):001:0> require 'net/http'
=> true
irb(main):002:0> Net::HTTP.get_print '87id.com', '/index.html'
{% endcodeblock %}

From the 87id Apache log:

{% codeblock %}
210.21.67.21 - - [07/Mar/2009:17:26:21 +0000] "GET /index.html HTTP/1.1" 404 947 "-" "-"
{% endcodeblock %}

The 87id server logs did in fact indicate that our attempt to download the non-existent file index.html were coming via IPv4. This ruled out Apache, Passenger (mod_rails) and even Rails as possible culprits.

I knew from the success of our earlier ping test that the problem wasn't a problem with the underlying operating system's address resolution code. Therefore, the next logical step was to examine Ruby's address resolution logic.

{% codeblock %}
irb(main):001:0> require "socket"
=> true
irb(main):002:0> IPSocket.getaddress "87id.com"
=> "65.19.178.122"
{% endcodeblock %}

Ah HA! Problem located!  If it was behaving properly, it would have given us the following result:

{% codeblock %}
irb(main):001:0> require "socket"
=> true
irb(main):002:0> IPSocket.getaddress "87id.com"
=> "2001:470:1f04:18a::87"
{% endcodeblock %}

Recompile Ruby

A search of the Ruby source code revealed references to a compile time configuration option "--with-lookup-order-hack (expected INET, INET6 or UNSPEC)". Ubuntu wouldn't have shipped a deliberately IPv6-broken Ruby stack on an IPv6-enabled operating system would they?? Apparently, the answer is "Yes."

Simply compiling a fresh copy of Ruby 1.8.6 and installing fixed the Ruby IPv6 look up problem.

Houston, we have another problem

After updating our staging server with the fresh copy of Ruby, despite the irb test succeeding, OpenID login would STILL occur over IPv4.  Beginning to suspect this might not be just a Ruby problem, I tried to telnet directly to the 87id webserver.

{% codeblock %}
larry@passat:~$ telnet 87id.com 80
Trying 2001:470:1f04:18a::87...
Trying 65.19.178.122...
Connected to 87id.com.
Escape character is '^]'.
^]{% endcodeblock %}

telnet>

Address lookup was working properly, but it just was not connecting via IPv6.  I started to suspect our IPv6 firewall was configured improperly and blocking all outgoing TCP connections.

After much tweaking and re-tweaking our ip6tables scripts, I was convinced the firewall was configured properly just not behaving as the configuration said it should.  Most notably, the kernel seemed to be just ignoring the "state RELATED,ESTABLISHED" settings and simply dropping the TCP "ACK SYN" packet (see below)  sent as a response to a new TCP connection request as if it was a new, unknown connection instead of one that I had initiated myself.

{% codeblock %}
root@sophia:/etc/ip6wall# telnet 87id.com 80
Trying 2001:470:1f04:18a::87...
Mar 7 19:41:12 sophia kernel: DROP INPUT IN=sit1 OUT= MAC=fe:fd:42:a0:8d:76:00:0c:db:fc:8b:59:08:00:45:00:00:64:db:8e:00:00:1c:29:42:4e:48:34:68:4a:42:a0:8d:76 TUNNEL=72.52.104.74->66.160.141.118 SRC=2001:0470:1f04:018a:0000:0000:0000:0087 DST=2001:0470:1f04:0ba4:0000:0000:0000:0002 LEN=80 TC=0 HOPLIMIT=63 FLOWLBL=0 PROTO=TCP SPT=80 DPT=50756 WINDOW=5632 RES=0x00 ACK SYN URGP=0
{% endcodeblock %}

A bit more googling revealed that IPv6 TCP connection tracking did not work on the (relatively old) kernel version that we were using.  Updating to a newer kernel and a quick reboot solved the problem.

Thanks again to the Pay4Bugs tester who helped us find and correct this problem!

Please let us know of anymore IPv6-related bugs (or any bugs for that matter) by leaving a comment here or logging in as a Pay4Bugs Tester and filing a bug report under the appropriate assignment!