ehttp HTTP client

NAME

ehttp - Client-side implementation of the HTTP/1.0 and 1.1 protocols.

SYNOPSIS

package require Humelib

::ehttp::config ?options?

::ehttp::geturl url ?options?

::ehttp::formatQuery key value ?key value ...?

::ehttp::reset token ?why?

::ehttp::wait token

::ehttp::status token

::ehttp::size token

::ehttp::code token

::ehttp::ncode token

::ehttp::data token

::ehttp::error token

::ehttp::cleanup token

::ehttp::register proto port command

::ehttp::unregister proto

DESCRIPTION

The ehttp package provides the client side of the HTTP version 1.0 and 1.1 protocols. The software is based on the standard Tcl http package with enhancements for using the version 1.1 HTTP protocol, and being able to trace data that is sent or received. The package implements the GET, POST, and HEAD operations of HTTP. It allows configuration of a proxy host to get through firewalls. The package is compatible with the Safesock security policy, so it can be used by untrusted applets to do URL fetching from a restricted set of hosts. This package can be extended to support additional HTTP transport protocols, such as HTTPS, by providing a custom socket command, via ehttp::register.

The ::ehttp::geturl procedure does a HTTP transaction. Its options determine whether a GET, POST, or HEAD transaction is performed. The return value of ::ehttp::geturl is a token for the transaction. The value is also the name of an array in the ::ehttp namespace that contains state information about the transaction. The elements of this array are described in the STATE ARRAY section.

If the -command option is specified, then the HTTP operation is done in the background. ::ehttp::geturl returns immediately after generating the HTTP request and the callback is invoked when the transaction completes. For this to work, the Tcl event loop must be active. In Tk applications this is always true. For pure-Tcl applications, the caller can use ::ehttp::wait after calling ::ehttp::geturl to start the event loop.

COMMANDS

::ehttp::config ?options?

The ::ehttp::config command is used to set and query the name of the proxy server and port, and the User-Agent name used in the HTTP requests. If no options are specified, then the current configuration is returned. If a single argument is specified, then it should be one of the flags described below. In this case the current value of that setting is returned. Otherwise, the options should be a set of flags and values that define the configuration:

-accept mimetypes: The Accept header of the request. The default is */*, which means that all types of documents are accepted. Otherwise you can supply a comma separated list of mime type patterns that you are willing to receive. For example, "image/gif, image/jpeg, text/*".
-proxyhost hostname: The name of the proxy host, if any. If this value is the empty string, the URL host is contacted directly.
-proxyport number: The proxy port number.
-proxyfilter command: The command is a callback that is made during ::ehttp::geturl to determine if a proxy is required for a given host. One argument, a host name, is added to command when it is invoked. If a proxy is required, the callback should return a two element list containing the proxy server and proxy port. Otherwise the filter should return an empty list. The default filter returns the values of the -proxyhost and -proxyport settings if they are non-empty.
-useragent string: The value of the User-Agent header in the HTTP request. The default is "Hume EDA Tcl http client package 1.0."

::ehttp::geturl url ?options?

The ::ehttp::geturl command is the main procedure in the package. The -query option causes a POST operation and the -validate option causes a HEAD operation; otherwise, a GET operation is performed. The ::ehttp::geturl command returns a token value that can be used to get information about the transaction. See the STATE ARRAY and ERRORS section for details. The ::ehttp::geturl command blocks until the operation completes, unless the -command option specifies a callback that is invoked when the HTTP transaction completes. ::ehttp::geturl takes several options:

-binary boolean

Specifies whether to force interpreting the url data as binary. Normally this is auto-detected (anything not beginning with a text content type or whose content encoding is gzip or compress is considered binary data).

-blocksize size

The blocksize used when reading the URL. At most size bytes are read at once. After each block, a call to the -progress callback is made (if that option is specified).

-channel name

Copy the URL contents to channel name instead of saving it in state(body).

-command callback

Invoke callback after the HTTP transaction completes. This option causes ::ehttp::geturl to return immediately. The callback gets an additional argument that is the token returned from ::ehttp::geturl. This token is the name of an array that is described in the STATE ARRAY section. Here is a template for the callback:

proc httpCallback {token} {
    upvar #0 $token state
    # Access state as a Tcl array
}

-handler callback

Invoke callback whenever HTTP data is available; if present, nothing else will be done with the HTTP data. This procedure gets two additional arguments: the socket for the HTTP data and the token returned from ::ehttp::geturl. The token is the name of a global array that is described in the STATE ARRAY section. The procedure is expected to return the number of bytes read from the socket. Here is a template for the callback:

proc httpHandlerCallback {socket token} {
    upvar #0 $token state
    # Access socket, and state as a Tcl array
    ...
    (example: set data [read $socket 1000];set nbytes [string length $data])
    ...
    return nbytes
}

-headers keyvaluelist

This option is used to add extra headers to the HTTP request. The keyvaluelist argument must be a list with an even number of elements that alternate between keys and values. The keys become header field names. Newlines are stripped from the values so the header cannot be corrupted. For example, if keyvaluelist is Pragma no-cache then the following header is included in the HTTP request:

Pragma: no-cache

-progress callback

The callback is made after each transfer of data from the URL. The callback gets three additional arguments: the token from ::ehttp::geturl, the expected total size of the contents from the Content-Length meta-data, and the current number of bytes transferred so far. The expected total size may be unknown, in which case zero is passed to the callback. Here is a template for the progress callback:

proc httpProgress {token total current} {
    upvar #0 $token state
}

-query query

This flag causes ::ehttp::geturl to do a POST request that passes the query to the server. The query must be a x-url-encoding formatted query. The ::ehttp::formatQuery procedure can be used to do the formatting.

-queryblocksize size

The blocksize used when posting query data to the URL. At most size bytes are written at once. After each block, a call to the -queryprogress callback is made (if that option is specified).

-querychannel channelID

This flag causes ::ehttp::geturl to do a POST request that passes the data contained in channelID to the server. The data contained in channelID must be a x-url-encoding formatted query unless the -type option below is used. If a Content-Length header is not specified via the -headers options, ::ehttp::geturl attempts to determine the size of the post data in order to create that header. If it is unable to determine the size, it returns an error.

-queryprogress callback

The callback is made after each transfer of data to the URL (i.e. POST) and acts exactly like the -progress option (the callback format is the same).

-timeout milliseconds

If milliseconds is non-zero, then ::ehttp::geturl sets up a timeout to occur after the specified number of milliseconds. A timeout results in a call to ::ehttp::reset and to the -command callback, if specified. The return value of ::ehttp::status is timeout after a timeout has occurred.

-TRACE tracebits

The -TRACE option is used in conjunction with the -tracecmd option to control the information being passed to the user-defined tracecmd. The tracebits integer value is used as a bitfield. The table below shows the bit values as hexadecimal integers, and the corresponding information provided if the bit is set.

0x0001: status messages
0x0100: Received HTTP header data
0x0800: Received HTTP content data
0x1000: Sent HTTP header data
0x8000: Sent HTTP query data

-tracecmd tracecmd

The ehttp software is instrumented with the capability to execute user-defined code which is passed the data being exchanged or status information. See the -TRACE option above for a description of the bitfield values that control the tracecmd evaluation. The tracecmd is concatenated with three arguments during evaluation, (1) the geturl transaction token (an array name), (2) the pertinent TRACE bit value, and (3) the data that was read or written or a status message. If this option is used, in addition to the bit values specified by the -TRACE option, the tracecmd code is also executed with the bit value of 0 and the value of the geturl transaction token after the transaction array has been initialized.

-type mime-type

Use mime-type as the Content-Type value, instead of the default value (application/x-www-form-urlencoded) during a POST operation.

-validate boolean

If boolean is non-zero, then ::ehttp::geturl does an HTTP HEAD request. This request returns meta information about the URL, but the contents are not returned. The meta information is available in the state(meta) variable after the transaction. See the STATE ARRAY section for details.

-version version

The version argument can be specified as 1.0 or 1.1 to specify the desired version of the HTTP protocol. The default is 1.0 which uses a new socket connection for each geturl operation. Version 1.1 re-uses socket connections and is slightly more efficient if more than one operation is being performed with the same HTTP server. There can be problems using version 1.1 since the content length stated by the server is relied upon to know when the contents have been completely received. With version 1.0, the end-of-file condition indicates transmission completion even if the content length is not correct. In particular, a problem is seen when a non-binary document containing CR, LF sequences is fetched. The character count computed by the client is less than that stated by the server when the CR, LF characters are mapped to \n during utf-8 encoding. So the geturl operation does not complete. To avoid this situation, the -binary option should be specified, or version 1.0 of the HTTP protocol should be used.

::ehttp::formatQuery key value ?key value ...?

This procedure does x-url-encoding of query data. It takes an even number of arguments that are the keys and values of the query. It encodes the keys and values, and generates one string that has the proper & and = separators. The result is suitable for the -query value passed to ::ehttp::geturl.

::ehttp::reset token ?why?

This command resets the HTTP transaction identified by token, if any. This sets the state(status) value to why, which defaults to reset, and then calls the registered -command callback.

::ehttp::wait token

This is a convenience procedure that blocks and waits for the transaction to complete. This only works in trusted code because it uses vwait. Also, it's not useful for the case where ::ehttp::geturl is called without the -command option because in this case the ::ehttp::geturl call doesn't return until the HTTP transaction is complete, and thus there's nothing to wait for.

::ehttp::data token

This is a convenience procedure that returns the body element (i.e., the URL data) of the state array.

::ehttp::error token

This is a convenience procedure that returns the error element of the state array.

::ehttp::status token

This is a convenience procedure that returns the status element of the state array.

::ehttp::code token

This is a convenience procedure that returns the http element of the state array.

::ehttp::ncode token

This is a convenience procedure that returns just the numeric return code (200, 404, etc.) from the http element of the state array.

::ehttp::size token

This is a convenience procedure that returns the currentsize element of the state array, which represents the number of bytes received from the URL in the ::ehttp::geturl call.

::ehttp::cleanup token

This procedure cleans up the state associated with the connection identified by token. After this call, the procedures like ::ehttp::data cannot be used to get information about the operation. It is strongly recommended that you call this function after you're done with a given HTTP request. Not doing so will result in memory not being freed, and if your app calls ::ehttp::geturl enough times, the memory leak could cause a performance hit...or worse.

::ehttp::register proto port command

This procedure allows one to provide custom HTTP transport types such as HTTPS, by registering a prefix, the default port, and the command to execute to create the Tcl channel. E.g.:

package require Humelib
package require tls

ehttp::register https 443 ::tls::socket

set token [ehttp::geturl https://my.secure.site/]

::ehttp::unregister proto

This procedure unregisters a protocol handler that was previously registered via ehttp::register.

ERRORS

The ehttp::geturl procedure will raise errors in the following cases: invalid command line options, an invalid URL, a URL on a non-existent host, or a URL at a bad port on an existing host. These errors mean that it cannot even start the network transaction. It will also raise an error if it gets an I/O error while writing out the HTTP request header. For synchronous ::ehttp::geturl calls (where -command is not specified), it will raise an error if it gets an I/O error while reading the HTTP reply headers or data. Because ::ehttp::geturl doesn't return a token in these cases, it does all the required cleanup and there's no issue of your app having to call ::ehttp::cleanup.

For asynchronous ::ehttp::geturl calls, all of the above error situations apply, except that if there's any error while reading the HTTP reply headers or data, no exception is thrown. This is because after writing the HTTP headers, ::ehttp::geturl returns, and the rest of the HTTP transaction occurs in the background. The command callback can check if any error occurred during the read by calling ::ehttp::status to check the status and if its error, calling ::ehttp::error to get the error message.

Alternatively, if the main program flow reaches a point where it needs to know the result of the asynchronous HTTP request, it can call ::ehttp::wait and then check status and error, just as the callback does.

In any case, you must still call ::ehttp::cleanup to delete the state array when you're done.

There are other possible results of the HTTP transaction determined by examining the status from ehttp::status. These are described below.

If the HTTP transaction completes entirely, then status will be ok. However, you should still check the ehttp::code value to get the HTTP status. The ehttp::ncode procedure provides just the numeric error (e.g., 200, 404 or 500) while the ehttp::code procedure returns a value like "HTTP 404 File not found".

eof

If the server closes the socket without replying, then no error is raised, but the status of the transaction will be eof.

error

The error message will also be stored in the error status array element, accessible via ::ehttp::error.

Another error possibility is that ehttp::geturl is unable to write all the post query data to the server before the server responds and closes the socket. The error message is saved in the posterror status array element and then ehttp::geturl attempts to complete the transaction. If it can read the server's response it will end up with an ok status, otherwise it will have an eof status.

STATE ARRAY

The ::ehttp::geturl procedure returns a token that can be used to get to the state of the HTTP transaction in the form of a Tcl array. Use this construct to create an easy-to-use array variable:

upvar #0 $token state

Once the data associated with the url is no longer needed, the state array should be unset to free up storage. The ehttp::cleanup procedure is provided for that purpose. The following elements of the array are supported:

body

The contents of the URL. This will be empty if the -channel option has been specified. This value is returned by the ::ehttp::data command.

charset

The value of the charset attribute from the Content-Type meta-data value. If none was specified, this defaults to the RFC standard iso8859-1, or the value of $::ehttp::defaultCharset. Incoming text data will be automatically converted from this charset to utf-8.

coding

A copy of the Content-Encoding meta-data value.

currentsize

The current number of bytes fetched from the URL. This value is returned by the ::ehttp::size command.

error

If defined, this is the error string seen when the HTTP transaction was aborted.

http

The HTTP status reply from the server. This value is returned by the ::ehttp::code command. The format of this value is:

HTTP/1.x code string

The code is a three-digit number defined in the HTTP standard. A code of 200 is OK. Codes beginning with 4 or 5 indicate errors. Codes beginning with 3 are redirection errors. In this case the Location meta-data specifies a new URL that contains the requested information.

meta

The HTTP protocol returns meta-data that describes the URL contents. The meta element of the state array is a list of the keys and values of the meta-data. This is in a format useful for initializing an array that just contains the meta-data:

Content-Type: The type of the URL contents. Examples include text/html, image/gif, application/postscript and application/x-tcl.
Content-Length: The advertised size of the contents. The actual size obtained by ::ehttp::geturl is available as state(size).
Location: An alternate URL that contains the requested data.

posterror

The error, if any, that occurred while writing the post query data to the server.

status

Either ok, for successful completion, reset for user-reset, timeout if a timeout occurred before the transaction could complete, or error for an error condition. During the transaction this value is the empty string.

totalsize

A copy of the Content-Length meta-data value.

type

A copy of the Content-Type meta-data value.

url

The requested URL.

EXAMPLE

# Copy a URL to a file and print meta-data
proc ::ehttp::copy { url file {chunk 4096} } {
    set out [open $file w]
    set token [geturl $url -channel $out -progress ::ehttp::Progress \
	-blocksize $chunk]
    close $out
    # This ends the line started by ehttp::Progress
    puts stderr ""
    upvar #0 $token state
    set max 0
    foreach {name value} $state(meta) {
	if {[string length $name] > $max} {
	    set max [string length $name]
	}
	if {[regexp -nocase ^location$ $name]} {
	    # Handle URL redirects
	    puts stderr "Location:$value"
	    return [copy [string trim $value] $file $chunk]
	}
    }
    incr max
    foreach {name value} $state(meta) {
	puts [format "%-*s %s" $max $name: $value]
    }

    return $token
}
proc ::ehttp::Progress {args} {
    puts -nonewline stderr . ; flush stderr
}

KEYWORDS

HTTP, security policy, socket