Securing Jetty and Solr with PHP authentication.

If you ever have set up Solr in a production environment you have probably wondered how to secure it. Jetty is the container that comes default with most Solr nightly builds and does just fine in a production environment with a few tweaks. The focus of this post is to first introduce you to a simple setup that will allow you to setup BASIC http authentication with Jetty and in the second part of this post will show you how to extend “Apache_Solr_HttpTransport_Abstract” which is an abstract class from the Solr-Php-Client project found here: http://code.google.com/p/solr-php-client/. Lets get on with setting up Jetty to be secure. First lets establish the installation path of Jetty as JETTY_HOME, for example I have a Solr installation installed at “/opt/solr/solr-production”. For all intents and purposes this is Jetty’s home. So when I refer to JETTY_HOME, I am referring to the home directory of Jetty. Within JETTY_HOME is a folder called “/etc”, inside of this folder you will find jetty.xml and webdefault.xml. Crank open these two files in your favorite editor (I used VI). Inside of webdefault.xml right before the closing “</web-app>” tag place the following:

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>administrator</role-name>
    </auth-constraint>
  </security-constraint>
 
  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>My Solr</realm-name>
  </login-config>

The previous configuration strings set up a few things, these are the url pattern (“/” means the entire application), the role (which is “administrator”), the authentication method (BASIC) and the realm under which we will be authenticating our users. Please visit http://docs.codehaus.org/display/JETTY/Realms for detailed information in regards to securing Jetty. Now we move onto jetty.xml inside the “JETTY_HOME/etc” directory. Inside of this file do a search for “UserRealms” chances are the configuration for authentication realms is just commented out, if not add the following:

    <Set name="UserRealms">
      <Array type="org.mortbay.jetty.security.UserRealm">
        <Item>
          <New class="org.mortbay.jetty.security.HashUserRealm">
            <Set name="name">My Solr</Set>
            <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
            <Set name="refreshInterval">0</Set>
          </New>
        </Item>
      </Array>
    </Set>

This basically tells Jetty where the file for user names and passwords. Which would be “JETTY_HOME/etc/realm.properties”. Chances are this file does not exist already, it is okay to create the file (sudo touch realm.properties in the JETTY_HOME/etc folder). This file has a format that must be adhered to, but before that lets generate a password for the “administrator” user. I like to execute the following:

$ echo -n "test" | openssl md5
098f6bcd4621d373cade4e832627b4f6

(BTW “test” is a poor password, please don’t copy this verbatim.)

So with that password we are ready to put an entry in the realm.properties file. As per the Jetty documents the format is “username: password[,rolename ...]“, where:

  • username is the user’s unique identity
  • password is the user’s (possibly obfuscated or MD5 encrypted) password
  • rolename is the user’s role

So for example:

administrator: MD5:098f6bcd4621d373cade4e832627b4f6,administrator

would work just fine. There you go, you have secured Jetty to an extent. A word of precaution. CHMOD that file to 600 as soon as it is created, you will sleep better at night. Also CHOWN it to root:root as well. Restart your Solr package and then visit your Solr instance in the browser, you will be prompted for a user name and password for the realm. Now we can move onto making sure our PHP requests are honored with the right authentication information.

The solr-php-client project is coded in the Zend Framework API style of coding. It makes sense and it integrated very easily into my company’s ZF application, I am sure the process would be similar for other Solr PHP clients but this one will focus purely on this project. If you use the Apache_Solr_Service class you will notice that by default it uses a FileGetContents http transport. This class is found in “/Apache/Solr/HttpTransport/FileGetContents.php” of the solr client download. So this class at first glance looks to be extend-able but since the author decided to make the stream contexts private you cannot access them in a child class. Instead I decided to extend the http transport abstract class found in “/Apache/Solr/HttpTransport/Abstract.php”. This allows me to pull the best parts of the FileGetContents.php file and merge them with http headers to authenticate with. Here is the class in its entirety:

class Get2KnowMe_Search_FileGetContents extends Apache_Solr_HttpTransport_Abstract {
    private $authorization;
    private $username;
    private $password;
    private $_getContext;
    private $_postContext;
    private $_headContext;
 
    public function __construct($authorization = false, $username = null, $password = null) {
        $this->authorization = $authorization;
        $this->username = $username;
        $this->password = $password;
        $this->_getContext = stream_context_create();
        $this->_postContext = stream_context_create();
        $this->_headContext = stream_context_create();
    }
 
    public function performGetRequest($url, $timeout = false)
    {
        if($this->authorization) {
            stream_context_set_option($this->_getContext, 'http', 'header', 'Authorization: Basic ' . base64_encode($this->username.':'.$this->password));
        }
 
        // set the timeout if specified
        if ($timeout !== FALSE && $timeout > 0.0)
        {
                // timeouts with file_get_contents seem to need
                // to be halved to work as expected
                $timeout = (float) $timeout / 2;
 
                stream_context_set_option($this->_getContext, 'http', 'timeout', $timeout);
        }
        else
        {
                // use the default timeout pulled from default_socket_timeout otherwise
                stream_context_set_option($this->_getContext, 'http', 'timeout', $this->getDefaultTimeout());
        }
 
        // $http_response_headers will be updated by the call to file_get_contents later
        // see http://us.php.net/manual/en/wrappers.http.php for documentation
        // Unfortunately, it will still create a notice in analyzers if we don't set it here
        $http_response_header = null;
        $responseBody = @file_get_contents($url, false, $this->_getContext);
 
        return $this->_getResponseFromParts($responseBody, $http_response_header);
    }
 
    public function performHeadRequest($url, $timeout = false)
    {
        if($this->authorization) {
            stream_context_set_option($this->_headContext, 'http', 'header', 'Authorization: Basic ' . base64_encode($this->username.':'.$this->password));
        }
 
        stream_context_set_option($this->_headContext, array(
                        'http' => array(
                                // set HTTP method
                                'method' => 'HEAD',
 
                                // default timeout
                                'timeout' => $this->getDefaultTimeout()
                        )
                )
        );
 
        // set the timeout if specified
        if ($timeout !== FALSE && $timeout > 0.0)
        {
                // timeouts with file_get_contents seem to need
                // to be halved to work as expected
                $timeout = (float) $timeout / 2;
 
                stream_context_set_option($this->_headContext, 'http', 'timeout', $timeout);
        }
 
        // $http_response_headers will be updated by the call to file_get_contents later
        // see http://us.php.net/manual/en/wrappers.http.php for documentation
        // Unfortunately, it will still create a notice in analyzers if we don't set it here
        $http_response_header = null;
        $responseBody = @file_get_contents($url, false, $this->_headContext);
 
        return $this->_getResponseFromParts($responseBody, $http_response_header);
    }
 
    public function performPostRequest($url, $rawPost, $contentType, $timeout = false)
    {
        if($this->authorization) {
            stream_context_set_option($this->_postContext, 'http', 'header', 'Authorization: Basic ' . base64_encode($this->username.':'.$this->password) . "\r\n");
        }
 
        stream_context_set_option($this->_postContext, array(
                        'http' => array(
                                // set HTTP method
                                'method' => 'POST',
 
                                // Add our posted content type
                                'header' => "Content-Type: $contentType",
 
                                // the posted content
                                'content' => $rawPost,
 
                                // default timeout
                                'timeout' => $this->getDefaultTimeout()
                        )
                )
        );
 
        // set the timeout if specified
        if ($timeout !== FALSE && $timeout > 0.0)
        {
                // timeouts with file_get_contents seem to need
                // to be halved to work as expected
                $timeout = (float) $timeout / 2;
 
                stream_context_set_option($this->_postContext, 'http', 'timeout', $timeout);
        }
 
        // $http_response_header will be updated by the call to file_get_contents later
        // see http://us.php.net/manual/en/wrappers.http.php for documentation
        // Unfortunately, it will still create a notice in analyzers if we don't set it here
        $http_response_header = null;
        $responseBody = @file_get_contents($url, false, $this->_postContext);
 
        // reset content of post context to reclaim memory
        stream_context_set_option($this->_postContext, 'http', 'content', '');
 
        return $this->_getResponseFromParts($responseBody, $http_response_header);
    }
 
    private function _getResponseFromParts($rawResponse, $httpHeaders)
    {
            //Assume 0, false as defaults
            $status = 0;
            $contentType = false;
 
            //iterate through headers for real status, type, and encoding
            if (is_array($httpHeaders) && count($httpHeaders) > 0)
            {
                    //look at the first headers for the HTTP status code
                    //and message (errors are usually returned this way)
                    //
                    //HTTP 100 Continue response can also be returned before
                    //the REAL status header, so we need look until we find
                    //the last header starting with HTTP
                    //
                    //the spec: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.1
                    //
                    //Thanks to Daniel Andersson for pointing out this oversight
                    while (isset($httpHeaders[0]) && substr($httpHeaders[0], 0, 4) == 'HTTP')
                    {
                            // we can do a intval on status line without the "HTTP/1.X " to get the code
                            $status = intval(substr($httpHeaders[0], 9));
 
                            // remove this from the headers so we can check for more
                            array_shift($httpHeaders);
                    }
 
                    //Look for the Content-Type response header and determine type
                    //and encoding from it (if possible - such as 'Content-Type: text/plain; charset=UTF-8')
                    foreach ($httpHeaders as $header)
                    {
                            // look for the header that starts appropriately
                            if (strncasecmp($header, 'Content-Type:', 13) == 0)
                            {
                                    $contentType = substr($header, 13);
                                    break;
                            }
                    }
            }
 
            return new Apache_Solr_HttpTransport_Response($status, $contentType, $rawResponse);
    }    
}

SOLR_USERNAME and SOLR_PASSWORD are part of configurations that are set before the calls are made from a configuration file. It was really easy to get the ball rolling and Jetty is an excellent choice as a Solr container. If you have any questions, comments, concerns or tips to make Solr even more secure don’t hesitate to post a comment.

Got bored, wrote a C program.

I got bored so I did an exercise from The C Programming Language, exercise 2-3 to be exact:

 
#include <stdio.h>
int htoi(char s[]);
 
int main() {
  char hexNum[] = "0x43fa";
  char hexNum2[] = "0xfa";
  char hexNum3[] = "0xa";
  char hexNum4[] = "0x0";
  printf("The value %s converted to %d\n", hexNum, htoi(hexNum));
  printf("The value %s converted to %d\n", hexNum2, htoi(hexNum2));
  printf("The value %s converted to %d\n", hexNum3, htoi(hexNum3));
  printf("The value %s converted to %d\n", hexNum4, htoi(hexNum4));
  return 0;
}
 
int htoi(char s[]) {
  int i = 0,n = 0,t = 0;
 
  for(i = 0; s[i] != '\0'; i++) {
    t = 0;
    if(s[i] >= '0' && s[i] <= '9' && i != 0) {
      t = s[i] - '0';
      n = (16 * n) + t;
    } else if (s[i] >= 'a' && s[i] <= 'f') {
      t = 10 + (s[i] - 'a');
      n = (16 * n) + t;
    } else if (s[i] >= 'A' && s[i] <= 'F') {
      t = 10 + (s[i] - 'A');
      n = (16 * n) + t;
    }
  }
 
  return n;
}
Posted in C | Tagged