Tales from the factory – curl vs wget

open_book_02

 

From time to time I will share some stories based on true events, maybe someone will learn something from them. Then again, maybe not. To protect the innocent, some names and events might be edited. Here comes the first one.

 

Someone raised a ticket that their application cannot access a certain url, let’s say “http://My.url.tld”. You dutifully log in to the system in question and try to access the url. Since the app is using the “libcurl” library, you naturally try to test with the respective utility. You confirm that it does not work:

[user@someserver ~]$ curl http://My.url.tld
Error message.
[user@someserver ~]$

In the same time a colleague also sees the ticket but for some reason he does the testing by the way of “wget”. It’s working for him:

[user@someserver ~]$ wget http://My.url.tld
Correct result.
[user@someserver ~]$

You go back and forth with “it’s working”, “no, it’s not” messages until both of you realize that you test differently. So, it’s working with “wget” but not with “curl”. Baffling. What could be wrong ?

After running both utils in debug mode you spot a minute difference:

[user@someserver ~]$ curl -v http://My.url.tld
* About to connect() to My.url.tld port 80 (#0)
* Trying 1.1.1.1... connected
* Connected to My.url.tld (1.1.1.1) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl
> Host: My.url.tld:80
> Accept: */*
>
<
< Error message
<
<
* Connection #0 to host My.url.tld left intact
* Closing connection #0
[user@someserver ~]$

[user@someserver ~]$ wget -d http://My.url.tld
DEBUG output created by Wget 1.12 on linux-gnu.
Resolving My.url.tld... 1.1.1.1
Caching My.url.tld => 1.1.1.1
Connecting to My.url.tld|1.1.1.1|:80... connected.
Created socket 3.
Releasing 0x000000000074fb60 (new refcount 1).

---request begin---
GET / HTTP/1.0
User-Agent: Wget (linux-gnu)
Accept: */*
Host: my.url.tld:80
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK

Correct answer

---response end---
200 OK
Registered socket 3 for persistent reuse.
Length: 242 [text/xml]
Saving to: “filename”

100%[=========================================================================================================================================================================>] 242 --.-K/s in 0s

“filename” saved [242/242]
[user@someserver ~]$

Have you seen it ?

.
.
.
.
.
(suspense drumroll)
.
.
.
.
.
.
.

Turns out that wget is doing the equivalent of a tolower(“url”) so in the actual http request it’s sending “Host: my.url.tld” and curl it’s just taking what I specified in the command line, namely “Host: My.url.tld”. Taking the test test further it turns out that calling curl with the “only lowercase” url is producing the expected results (i.e. working).

I know what you are thinking, it should not matter how you call an hostname. True. Except that in this story there is a load balancer in the way, who tries (and mostly succeeds) to do smart stuff. Well, it turns out that there was an host-based string match in that load balancer that did not quite matched the mixed-case cases.

But a question remains. What is the correct behavior ? The “curl” or the “wget” one ? I lean on the “curl” approach but maybe I am biased. What do you think ?

5 thoughts on “Tales from the factory – curl vs wget

  1. sin

    RFC2616 doesn’t say how the the text in the Host header must be formatted (lowercase, uppercase or mixed).

    From what you are describing, I tend to believe that the fault lies with the load-blancer as it should not case how the request is formatted as long as it is valid according to the RFC. If it runs regexps to figure out where to route requests, make them case insensitive or just pass the Host header through a to_lower() equivalent function and then process the request.

    Reply
    1. Dumitru Post author

      Of course the fault was at the load balancer, but that was not the question.

      The question is if wget is doing the right thing or not.

      Reply
      1. rpetre

        Let’s see what “the right thing” is. RFC 3986 aka. STD66 (“Uniform Resource Identifier (URI): Generic Syntax”) states in section 3.2.2:


        The host subcomponent is case- insensitive.
        […]
        Although host is case-insensitive, producers and normalizers should use lowercase for registered names and hexadecimal addresses for the sake of uniformity, while only using uppercase letters for percent-encodings.

        As such, the load-balancer is clearly at fault for not doing cas-insensitive matches, and wget is slightly better for obeying the SHOULD in the RFC.

        Reply
        1. Dumitru Post author

          All bow to rpetre, the context-sensitive RFC archive.

          Thank you, this should cover it.

          Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.