Harp Manual, version 1.1

Table of Contents


Next: , Up: (dir)

Harp Manual

This document is the official manual for harp, version 1.1. It covers how to use the program, how to use libharp and how to contribute to the project.

Copyright © 2014, Thomas Feron

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.


Next: , Previous: Top, Up: Top

1 Overview of harp

Harp is an HTTP reverse proxy. It accepts HTTP connection and, based on the configuration, either proxies those requests to some upstream server or serve some local files.

In addition, harp have two other important features. Firstly, a configuration can contain different possible “paths” which are chosen from based on a hash of the client's IP address. Secondly, some of those paths can be tagged in the configuration. The tags are sent to the upstream server as a comma-separated list with the X-Tags header. An example will explain it better:

     {
       hosts "127.0.0.1" "localhost";
       ports 80;

       [
         3 {
           tag "feature1";
           server "127.0.0.1" 3000;
         }

         7 {
           server "127.0.0.1" 3001;
         }
       ]
     }

In this simple example, 30% of the traffic will be proxied to 127.0.0.1:3000 and the 70% remaining to 127.0.0.1:3001. The first server will receive “feature1” in the X-Tags header. In reality, one would use either of these techniques but not both as it is redundant: either deploying different versions of the application on different servers/ports or having one server serving different versions depending of the tags.


Next: , Previous: Overview, Up: Top

2 Installation

This chapter explains how to compile and install harp and libharp on your system.

Once you downloaded harp-1.1.tar.gz, it can be installed by running:

     $ cd /tmp
     $ tar xzf /path/to/harp-1.1.tar.gz
     $ cd harp-1.1
     $ ./configure
     $ make
     $ sudo make install

When compiling the source code from the repository, some files need to be downloaded or generated. Before running ./configure, you need to run `sh ./bootstrap'.


Next: , Up: Installation

2.1 Disabling syslog

By passing --disable-syslog to ./configure, all the syslog-related code is replaced by the preprocessor. The commands become:

     $ ./configure --disable-syslog
     $ make
     $ make install

Instead, harp will simply log to the standard error.


Previous: Disabling syslog, Up: Installation

2.2 Running the tests

Running the test suite before installing the program and the library can show some problems at an earlier step. It might be a good idea in your situation. Just run make check.


Next: , Previous: Installation, Up: Top

3 Configuration


Next: , Up: Configuration

3.1 Configuration grammar

            <config> ::= '{' <config-items> '}'

      <config-items> ::= <filter>       <config-items>
                       | <tag>          <config-items>
                       | <resolver>     <config-items>
                       | <choice-group> <config-items>
                       | <config>       <config-items>
                       | <empty>

            <filter> ::= "hostnames" <hostnames> ';'
                       | "ports"     <ports>     ';'

               <tag> ::= "tag" <string> ';'

          <resolver> ::= "static-path" <string>          ';'
                       | "server"      <string> <number> ';'

         <hostnames> ::= <string> <hostnames>
                       | <empty>

             <ports> ::= <number> <ports>
                       | <empty>

      <choice-group> ::= '[' <choices> ']'

           <choices> ::= <choice> <choices>
                       | <empty>

            <choice> ::= <number> <config>

            <string> ::= '"' <characters> '"'


Next: , Previous: Grammar, Up: Configuration

3.2 Directives


Next: , Up: Directives

3.2.1 hostnames

     hostnames "hostname1" "hostname2" [...];

The directive hostnames filters the requests by the value of the Hostname HTTP header. The request's hostname should be one in the list.


Next: , Previous: hostnames, Up: Directives

3.2.2 ports

     ports 80 81 [...];

The directive ports filters the requests by port. The combination of all ports in the configurations (remember there can be several configurations in a configuration file) determines on what ports will harp be listening. A request matches a configuration if the port is listed in this filter.


Next: , Previous: ports, Up: Directives

3.2.3 tag

     tag "example";

When proxying a request (see server), the tags are sent to the upstream server as a comma-separated list in the X-Tags HTTP header. In the case of nested configurations, the tags from the levels above are sent as well.


Next: , Previous: tag, Up: Directives

3.2.4 static-path

Warning: This feature is pretty basic and it is NOT recommended to use for production websites.
     static-path "/path/to/dir";

This directive attempts to resolve requests by serving files from the file system. In the example above, a request to /example will be served with the content of /path/to/dir/example. If it is a directory, harp will look for /path/to/dir/example/index.html and /path/to/dir/example/index.htm. It will set the Content-Type header for only a few recognised file extensions.


Next: , Previous: static-path, Up: Directives

3.2.5 server

     server "hostname" 3000;

The directive server proxies the request to an upstream server with possible tags (see tag). It takes the hostname of IP address of the server and the port on which to connect. Apart from the header X-Tags, the request is transmitted as it has been received without any modification.


Previous: server, Up: Directives

3.2.6 include

     include "path/to/file.conf";

This is not really a directive as it is not defined in the grammar and is not reflected in the data structure parsed by harp. It includes another configuration file in the current one and can be anywhere inside the file. Paths are relative to the file being parsed.


Next: , Previous: Directives, Up: Configuration

3.3 Full examples

This chapter shows different possible strategies to configure harp.


Next: , Up: Examples

3.3.1 Static files

Harp can be used to serve static files even though the support for static files is pretty basic. It's probably a better idea not to use it for production environments. It is pretty straightforward:

     {
       hostnames "www.example.com";
       ports 80;

       static-path "/var/www/example";
     }


Next: , Previous: Static files, Up: Examples

3.3.2 Different servers

One possibility is to deploy every different versions of the application on different servers and/or ports. Harp could then be used to despatch between the versions.

In this example, we have two versions deployed on the same host where harp is running on ports 3000 and 3001 respectively. One user in five should be served the version on port 3001. Here is how to do it:

     {
       hosts "www.example.com";
       ports 80;

       [
         4 {
           server "localhost" 3000;
         }

         1 {
           server "localhost" 3001;
         }
       ]
     }


Next: , Previous: Different servers, Up: Examples

3.3.3 Feature flagging

With one server for the application, it's possible to use feature flagging in the souce code and let harp send the flags (called tags in harp).

To illustrate this, let's consider an application running on port 3000 on the same host. In order to test a new feature, one user in ten should be able to use it. The application shall enable the feature when receiving the tag feature1 in the X-Tags header.

     {
       hostnames "www.example.com";
       ports 80;

       server "localhost" 3000;

       [
         1 {
           tag "feature1";
         }

         9 { }
       ]
     }


Next: , Previous: Feature flagging, Up: Examples

3.3.4 Nested configurations

There is no limit to the level of nesting the configuration can reach through choice groups. A choice group is simply a weighed list of possible configurations to choose from based on the IP of the client. Here is an example of a perfectly valid configuration file.

     {
       hostnames "www.example.com";
       ports 80;

       server "localhost" 3000;

       [
         2 {
           tag "feature1";

           [
             1 { tag "variant1"; }
             1 { tag "variant2"; }
           ]
         }

         1 {
           tag "basic";
         }
       ]
     }

Moreover, a configuration can contain other ones (which can contain other ones and so on and so forth). It is useful to avoid repetition in a configuration file. `{ directive1; { directive2; } { directive3; } }' is equivalent to having two configurations `{ directive1; directive2; }' and `{ directive1; directive3; }'. A more convulated example is the following.

     {
       directive1;

       {
         directive2;
       }

       {
         directive3;

         {
           directive4;
         }

         {
           directive5;
         }
       }
     }

This configuration file would be equivalent to this one:

     {
       directive1;
       directive2;
     }

     {
       directive1;
       directive3;
       directive4;
     }

     {
       directive1;
       directive3;
       directive5;
     }


Previous: Nested configurations, Up: Examples

3.3.5 Several applications

Configurations and configuration files are not to be confused. A configuration file can contain several configurations as described in this grammar. Harp will loop through the configurations until it finds one that matches. The different configurations just need to follow each other.


Previous: Examples, Up: Configuration

3.4 Option and signal

By default, harp will load the configuration from a path like /usr/local/etc/harp.conf (on OpenBSD) or something similar depending on your system. You can see it by running harp --help.

To specify a different file, run

     $ harp --config-path /another/path.conf

or

     $ harp -c /another/path.conf

When harp is already running, it reloads the configuration file when receiving the signal SIGHUP. Simply run

     $ pkill -SIGHUP harp


Next: , Previous: Configuration, Up: Top

4 Options

This chapter covers the different command-line options one which can be passed to harp.


Next: , Up: Options

4.1 Run as a daemon

To run harp as a daemon, you first need to detach it, i.e. make it create its own process group. This can be achieved with the --background or -b.

On most systems, to bind on ports under 1024, harp needs to run as the superuser. Of course, the server shouldn't run with those privileges. In order to avoid this, harp should be told to change user and group. The options --user (or -u) and --group (or -g) sets, respectively, the user ID and the group ID of the process.

If the group is not specified but the user is, the process will change the group to the one having the same name as the user. Note that the specified user should be able to change to the given group.

Finally, the option --chroot is used to change the process' root directory. See chroot(2). When used, remember the path to the configuration file should be relative to the chroot directory as well as any `static-path' directive, even relative paths as the working directory is set to root directory once chroot() has been called.


Previous: Daemon, Up: Options

4.2 Other options

For the sake of comprehension, let's just say the options --help (or -h) and --version (or -V) are pretty self-explanatory.

So are the options config-path or -c though some considerations should be made, see Option and signal and --chroot.

To change the syslog facility, use --syslog-facility with the following possible values: `auth', `authpriv', `cron', `daemon', `ftp', `kern', `lpr', `mail', `news', `syslog', `user', `local0', `local1', `local2', `local3', `local4', `local5', `local6' and `local7'.

If not specified, harp uses the default facility, see syslog(3). Also, this option is not available if harp was compiled with --disable-syslog.

Harp starts workers as threads. A worker can handle one request at a time. Therefore, the number of threads sets the number of concurrent requests that the server can handle. The options --thread-number or -n is used just for this purpose.


Next: , Previous: Options, Up: Top

5 Error handling

This chapter explains how harp reacts to errors that might happen while processing requests. This table summarises the behaviour for every possible cases.

In case there are several resolvers (e.g. static-path or server), the request will go through every one of them in order until one succeeds. If all of them fail, the last one decides how to handle the error. For instance, if the first resolver fails with an error code corresponding to a 404 and the second one fails with an error code corresponding to a 502, a 502 will be sent back to the cient.

Scenario Behaviour Log message?
No matching configuration 400 Bad Request No
No resolver in the configuration 500 Internal Error No


When resolving with static-path
The file does not exist 404 Not Found No
Error when calling stat() on the path 500 Internal Error Yes
Error when sending data Close the connection Yes


When resolving with server
Can't resolve the server (getaddrinfo()) 502 Bad Gateway Yes
Error when creating the socket 500 Internal Error Yes
Can't connect to the upstream server 502 Bad Gateway No
It take too long to connect to the server 504 Gateway Timeout No
Error when polling the connection (poll()) 504 Bad Gateway (1) Yes
Error when making the socket non-blocking (fcntl()) 500 Internal Error Yes
Can't parse request to add the tags Ignore, no tags added No
Error when sending data to the client Close the connection No

(1) It probably shouldn't but the implementation is simpler this way.

Connections are kept open for at least 1 minute. Afterwards, the total average speed should stay above 2 kilobytes per second (2048 bytes per second) and the maximum idle time (any period of time where no data is transferred) is 2 minutes. If any of those conditions is not met, the connection is closed and the worker is ready to process new requests.


Next: , Previous: Error handling, Up: Top

6 Bugs and missing features

If you experience problems running harp or using libharp, please report it on issue tracker (http://hub.darcs.net/thoferon/harp/issues). By reporting bugs, you contribute to the project and your help is much appreciated.

If you think some features is missing, you can also use the issue tracker to record it or, if you can, develop it yourself (see Contributing).


Next: , Previous: Bugs/Missing features, Up: Top

7 Libharp

The library libharp allows to read, create and write configurations for harp. This chapter explains the basics to develop with this library.

To use it, you should compile with -lharp (for GCC) and use #include <harp.h> wherever the code is calling libharp.

For more detailed information about any functions, see the man pages.


Next: , Up: Libharp

7.1 Errors

Libharp mimics the standard library when handling errors: Functions that can fail return an error code (e.g. -1), a NULL pointer or something else depending on the function and set harp_errno. The calling code can thereafter call harp_strerror() passing it the error number to get an error message. harp_errno is either the value of errno or an negative error number specific to libharp.

Warning: This API is not thread-safe. A racing condition between two threads calling functions of the library could lead to wrong error messages.

A typical call to the library will resemble the following:

     // ...
     int rc = harp_write_configs(configs, "/some/path.conf");
     if(rc == -1) {
       fprintf(stderr, "harp_write_configs: %s\n",
               harp_strerror(harp_errno));
       return -1;
     }
     // ...


Next: , Previous: Errors, Up: Libharp

7.2 Lists

The API makes an extensive use of lists which this section will cover. Let's start with an example:

     struct thing element1, element2, element3;

     // ...

     harp_list_t *list1 = harp_cons(&element1, harp_singleton(&element2));
     harp_list_t *list2 = harp_singleton(&element3);

     harp_list_t *list = harp_concat(list1, list2);

     harp_list_t *current;
     HARP_LISTFOREACH(current, list) {
       struct thing *element = (struct thing*)current->element;

       // do something

       if(current->next != HARP_EMPTY_LIST) {
         // do something except for the last element
       }
     }

     // ...

Here is how lists are reprensented in memory:

     #define HARP_EMPTY_LIST NULL

     typedef struct harp_list {
       void *element;
       struct harp_list *next;
     } harp_list_t;

A list is data structure pointing to an element of any type (typically it is then casted in the code) and the next node, or HARP_EMPTY_LIST if there is none.

In order to build lists, you can use harp_singleton(void *) to create a one-element list, harp_cons(void *, harp_list_t *) to prepend a new element at the beginning, harp_concat(harp_list_t *, harp_list_t *) to concatenate two lists and harp_append(harp_list_t *, void *) to append a new element at the end. Note that those functions reuse the lists they are given. Therefore, freeing the memory of one list will be free part of or the entirety of the others. To avoid this problem, harp_duplicate(harp_list_t *, harp_duplicate_function_t *) duplicates a list and its elements if a duplication function is passed. Also, harp_append() and harp_concat() modifies the list (the first for harp_concat()) they are given. All these functions return a pointer to the resulting list.

harp_last(harp_list_t *) and harp_length(harp_list_t *) are query functions and return, respectively, a pointer to the last node and the number of elements as an int.

harp_free_list(harp_list_t *, harp_free_function_t *) recursively traverse the list and free all the node. If a function pointer is passed, it is used to free the elements of the list as well.

harp_find_element(harp_list_t *, harp_predicate_function_t *) returns the node for which the element matches the predicate. The predicate function takes a pointer to an element and returns a bool. If none matches, it returns HARP_EMPTY_LIST.

Finally, HARP_LIST_FOR_EACH(varname, list) is a helper macro to loop over a list's elements. The example above shows how to use it.


Next: , Previous: Lists, Up: Libharp

7.3 Build configurations

The API to build configuration is a bit larger than the one for lists and is more likely to change so that only the core principles are covered in this section. Please see the man pages for more details.

A series of function is available to construct configurations for harp. It all starts with pimp_make_empty_config() which, as the name says, returns a pointer to a new empty configuration (harp_config_t *). Similar functions creates filters and resolvers and start with harp_make_. These functions allocate memory for the structure. It is the responsibility of the developer to free it.

Functions starting with harp_free_ free the memory allocated by other functions recursively. For instance, harp_free_config(harp_config_t *) deallocates the memory of the struct harp_config and recurses to free all filters, resolvers, etc. Typically, some code builds a configuration and free it in one function call.

Once filters, resolvers or others have been created, they can be added to a configuration with the functions starting with harp_cons_. harp_cons_tag(char *, harp_config_t *) takes the tag directly and does not any harp_make_ function. Also, remember that a choice group is simply a list of choices. Hence, adding a choice group will look like this:

     // ...
     harp_config_t *config = harp_make_empty_config();

     harp_config_t *subconfig1 = harp_make_empty_config();
     harp_config_t *subconfig2 = harp_make_empty_config();

     // We need strdup() because free_config() will free the tags
     harp_cons_tag(strdup("v1"), subconfig1);
     harp_cons_tag(strdup("v2"), subconfig2);

     harp_choice_t *choice1 = harp_make_choice(3, subconfig1);
     harp_choice_t *choice2 = harp_make_choice(7, subconfig2);
     harp_list_t *choice_group =
       harp_cons(choice1, harp_singleton(choice2));

     harp_cons_choice_group(choice_group, config);
     // ...

Should you need to duplicate elements of a configuration (to free the memory of a structure while reusing some of its parts for example), the functions starting with harp_duplicate_ duplicate the structures recursively and return a pointer to the newly created one. They can be freed independently thereafter.

Finally, harp_get_ports(harp_list_t *) is also used by harp to connect to the relevant ports when starting or reloading the configuration (see Option and signal). It returns a list (harp_list_t *) of all the ports found anywhere in the configurations given as parameter.


Previous: Build configurations, Up: Libharp

7.4 Read and write

Reading and writing configurations from file is easy with harp_read_configs() and harp_write_configs(). Remember that a configuration can contain several configurations.

harp_read_configs(char *) takes the path to a configuration file and returns a pointer to the list of parsed configurations. If an error occurs, it returns a NULL pointer and set harp_errno (see Errors).

harp_write_configs(harp_list_t *, char *) takes the list of configurations to write and the path of the file to write them to. It returns 0 if successful and -1 otherwise (and also sets harp_errno).


Next: , Previous: Libharp, Up: Top

8 Contributing

If you want to contribute to the project, they are a few things you need to know. This chapter covers the architecture of the source code, how to compile it and some conventions used. It also explains where the code is versioned and how to send your patches for them to get into the repository.

Any contribution is really welcome and you shoud not hesitate to ask for help should you need some. Send your questions to tho.feron@gmail.com.


Next: , Up: Contributing

8.1 Generalities

Autotools

Harp uses the Autotools for its build system. Autoconf, Automake and Libtool need to be installed to compile from the repository. The first thing to do after getting the repository is to call `sh ./bootstrap' to download and generate some files in the working directory. This script downloads any dependencies (e.g. m4 macros) and runs `autoreconf --install'. This might need to be repeated after pulling patches from the repository.

Texinfo

Texinfo is an another dependency that needs to be installed on the developer's machine. On OpenBSD, the package is called “texlive_base”, there should be a similar one for your operating system.

Ideally, the documentation should always be in sync with the code so you might have to check it before recording your changes. From the project's root directory, `make info' generates the documentation and `info doc/harp.info' allows you to browse it. If you prefer HTML, use `make html' and `open doc/harp.html'.

Check

Harp's test suite uses Check (see http://check.sourceforge.net/) for the unit tests and it needs to be installed on your computer. On OpenBSD, the package is named “check” and it should be similar on your operating system.

Execinfo

This dependency is optional. It is used by tests/memory/check_memory_usage to get the address of the functions leaking memory. If libexecinfo is not available, the tool is compiled without this code and won't record any address. You will just see `0x0' when running it.


Next: , Previous: Generalities, Up: Contributing

8.2 Working with the repository

At the moment, the source code is hosted at http://hub.darcs.net/thoferon/harp. You first have to clone the repository into your local file system. This is quite easy:

     $ darcs get http://hub.darcs.net/thoferon/harp local/path --lazy

The --lazy option is optional but it might get the job done faster, especially if you have a slow connection. If you plan on working offline, it might be better to remove this option.

Once some changes are ready to be recorded, simply run darcs record and an interactive prompt will ask what changes to record. This creates a new patch in the local repository. The first time it's run in a repository, it will prompt for your e-mail address. It will be use later when sending your patches so ensure it is correct.

Before sending your patches to the project, please remember to pull the latest patches and fix any potential conflicts. Also make sure the tests are still passing and that harp is compiling and working fine. Patches can be pulled with darcs pull.

In order to send a patch bundle, use darcs send. It will interactively ask what patch to send, answer by <y> or <n>. When asked for the email address, enter a maintainer's address (currently only me, at tho.feron@gmail.com). You will then have a chance to edit the contents of the email, please explain what the changes are and anything that needs to be known to review them.


Next: , Previous: Repository, Up: Contributing

8.3 Architecture

Here is a table explaining the purpose of every directory in the repository.

src/
The source code for the harp executable. It depends on code in common/ and libharp/.
include/
The header files to be installed on the system (for libharp).
libharp/
The source code for the library libharp which allows to read and write configuration files.
common/
Some code shared between the executable and libharp but is not public, e.g. smalloc() is not shared by libharp but used by both the executable and the library.
tests/unit/
The unit tests for the code in both src/ and libharp/.
tests/memory/
The source code for a simple tool that starts the server, sends some requests and track any memory leaks (with a lot of false positives unfortunately).
doc/
As the name says, the documentation you are currently looking at.
website/
The files for http://www.harphttp.org.


Next: , Previous: Architecture, Up: Contributing

8.4 Compilation

As said previously, harp uses Autotools. The first step is therefore to run ./configure before being able to compile anything. Should you modify configure.ac or pull some patches that do, `autoreconf --install' needs to be ran again or even `sh ./bootstrap'. See Generalities.

The build system is then all managed with Makefiles. `make' builds the executable, the library as well as the documentation. `make check' builds and runs the unit tests but only builds tests/memory/check_memory_usage. `make distcheck' creates a distribution tarball, extracts it in a directory and attempts to build the project and run the tests from another.

The executable depends on two libraries: libharp and libcommon. The first is a library to read and write configuration files and is installed on the user's computer. The second is simply a convenience library and is not installed. Libharp also uses libcommon during the compilation.

A second convenience library (not installed on the user's system) is created when compiling the executable: libharpapp. Its purpose is to be linked with the test programs to make compilation faster.


Next: , Previous: Compilation, Up: Contributing

8.5 Conventions

The code is indented with 2 spaces and is aligned so that lines are no longer than 80 characters long. In my Emacs configuration, I use the following code.

     (require 'whitespace)
     (setq whitespace-style '(face empty tabs lines-tail trailing))
     (global-whitespace-mode t)

Whenever possible, the code is aligned on some characters. For instance, the assignments can be aligned like this:

     char *short       = "hello";
     char *longer_name = "world";

The types are ended with “_t” whether it's a struct, an enum or any other typedef. For example, to define a structure, you will use:

     typedef struct my_struct {
       // ...
     } my_struct_t;

In the header files, the names of the variables are ommited and there is a space between the type and `*' for pointers. A signature looks like this:

     char *some_function(char *, int);

It is hard to describe everything you should know about the conventions used by the code. Have a look at the code to get a better idea and, when in doubt, search for a similar thing to what you want to do elsewhere in the code.


Previous: Conventions, Up: Contributing

8.6 Getting credit

When working on something, it is normal to want to get some credit for it. Please edit the file AUTHORS at the root directory and append your name and your email address if you wish to be contactable.

Also, you should edit the file COPYING and add a copyright statement after the other ones. The possible formats are as followed for, respectively, work in 2014, work in 2014, 2015 and 2016 and work in 2014 and 2016 but not 2015.

     Copyright (c) 2014, Firstname Lastname
     Copyright (c) 2014-2016, Firstname Lastname
     Copyright (c) 2014, 2016, Firstname Lastname


Previous: Contributing, Up: Top

Index