The when and where of pulling data from the database.

If you work on an application (web/desktop, etc…) where in the program logic do you pull information about the current user? This is a question I have toiled with constantly for a few months now. Being a little bit wishy washy on the subject there are two trains of thoughts here, option A and option B. Option A is pulling the information from the database as soon as the application wakes up. Option B is pulling pieces of the current user as necessary. Option A has the following benefits. All information can be pulled at once and in one location in the code. This means that if you were to check the data in the registry/view it would already be set as a mechanism in your plug-in or bootstrapping process would have loaded it for you. However if you have made changes to the user after this loading period then you would incur another loading period as the data would be stale. If this was a website you could quite possibly get away with displaying stale data at first and then modifiers update the presentation layer via ajax. Option B involves you loading only what is necessary where you need it (other than in the view if you are in the MVC state of mind). For example in one part of your controller you need the users user name and email address, then a few lines down you need their data of birth and the time that their profile was modified. Each query would load only what is necessary at that time. This presents a problem though you have fragmented and small queries in addition to having more queries than Option A. However this presents you with fresh data in every instant as you are pulling it as close to presentation as possible. I am on the fence about both methods but did some brain storming on paper and created a few scribbles.

Both options are viable but it depends on the situation.

Pulling data diagram.

Pulling data diagram.

Posted in database, PHP | Leave a comment

Securing Jetty and Solr with PHP authentication.

If you ever have set up Solr in a production environment you have probably wondered how to secure it. Jetty is the container that comes default with most Solr nightly builds and does just fine in a production environment with a few tweaks. The focus of this post is to first introduce you to a simple setup that will allow you to setup BASIC http authentication with Jetty and in the second part of this post will show you how to extend “Apache_Solr_HttpTransport_Abstract” which is an abstract class from the Solr-Php-Client project found here: http://code.google.com/p/solr-php-client/. Lets get on with setting up Jetty to be secure. First lets establish the installation path of Jetty as JETTY_HOME, for example I have a Solr installation installed at “/opt/solr/solr-production”. For all intents and purposes this is Jetty’s home. So when I refer to JETTY_HOME, I am referring to the home directory of Jetty. Within JETTY_HOME is a folder called “/etc”, inside of this folder you will find jetty.xml and webdefault.xml. Crank open these two files in your favorite editor (I used VI). Inside of webdefault.xml right before the closing “</web-app>” tag place the following:

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>administrator</role-name>
    </auth-constraint>
  </security-constraint>
 
  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>My Solr</realm-name>
  </login-config>

The previous configuration strings set up a few things, these are the url pattern (“/” means the entire application), the role (which is “administrator”), the authentication method (BASIC) and the realm under which we will be authenticating our users. Please visit http://docs.codehaus.org/display/JETTY/Realms for detailed information in regards to securing Jetty. Now we move onto jetty.xml inside the “JETTY_HOME/etc” directory. Inside of this file do a search for “UserRealms” chances are the configuration for authentication realms is just commented out, if not add the following:

    <Set name="UserRealms">
      <Array type="org.mortbay.jetty.security.UserRealm">
        <Item>
          <New class="org.mortbay.jetty.security.HashUserRealm">
            <Set name="name">My Solr</Set>
            <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
            <Set name="refreshInterval">0</Set>
          </New>
        </Item>
      </Array>
    </Set>

This basically tells Jetty where the file for user names and passwords. Which would be “JETTY_HOME/etc/realm.properties”. Chances are this file does not exist already, it is okay to create the file (sudo touch realm.properties in the JETTY_HOME/etc folder). This file has a format that must be adhered to, but before that lets generate a password for the “administrator” user. I like to execute the following:

$ echo -n "test" | openssl md5
098f6bcd4621d373cade4e832627b4f6

(BTW “test” is a poor password, please don’t copy this verbatim.)

So with that password we are ready to put an entry in the realm.properties file. As per the Jetty documents the format is “username: password[,rolename ...]“, where:

  • username is the user’s unique identity
  • password is the user’s (possibly obfuscated or MD5 encrypted) password
  • rolename is the user’s role

So for example:

administrator: MD5:098f6bcd4621d373cade4e832627b4f6,administrator

would work just fine. There you go, you have secured Jetty to an extent. A word of precaution. CHMOD that file to 600 as soon as it is created, you will sleep better at night. Also CHOWN it to root:root as well. Restart your Solr package and then visit your Solr instance in the browser, you will be prompted for a user name and password for the realm. Now we can move onto making sure our PHP requests are honored with the right authentication information.

The solr-php-client project is coded in the Zend Framework API style of coding. It makes sense and it integrated very easily into my company’s ZF application, I am sure the process would be similar for other Solr PHP clients but this one will focus purely on this project. If you use the Apache_Solr_Service class you will notice that by default it uses a FileGetContents http transport. This class is found in “/Apache/Solr/HttpTransport/FileGetContents.php” of the solr client download. So this class at first glance looks to be extend-able but since the author decided to make the stream contexts private you cannot access them in a child class. Instead I decided to extend the http transport abstract class found in “/Apache/Solr/HttpTransport/Abstract.php”. This allows me to pull the best parts of the FileGetContents.php file and merge them with http headers to authenticate with. Here is the class in its entirety:

class Get2KnowMe_Search_FileGetContents extends Apache_Solr_HttpTransport_Abstract {
    private $authorization;
    private $username;
    private $password;
    private $_getContext;
    private $_postContext;
    private $_headContext;
 
    public function __construct($authorization = false, $username = null, $password = null) {
        $this->authorization = $authorization;
        $this->username = $username;
        $this->password = $password;
        $this->_getContext = stream_context_create();
        $this->_postContext = stream_context_create();
        $this->_headContext = stream_context_create();
    }
 
    public function performGetRequest($url, $timeout = false)
    {
        if($this->authorization) {
            stream_context_set_option($this->_getContext, 'http', 'header', 'Authorization: Basic ' . base64_encode($this->username.':'.$this->password));
        }
 
        // set the timeout if specified
        if ($timeout !== FALSE && $timeout > 0.0)
        {
                // timeouts with file_get_contents seem to need
                // to be halved to work as expected
                $timeout = (float) $timeout / 2;
 
                stream_context_set_option($this->_getContext, 'http', 'timeout', $timeout);
        }
        else
        {
                // use the default timeout pulled from default_socket_timeout otherwise
                stream_context_set_option($this->_getContext, 'http', 'timeout', $this->getDefaultTimeout());
        }
 
        // $http_response_headers will be updated by the call to file_get_contents later
        // see http://us.php.net/manual/en/wrappers.http.php for documentation
        // Unfortunately, it will still create a notice in analyzers if we don't set it here
        $http_response_header = null;
        $responseBody = @file_get_contents($url, false, $this->_getContext);
 
        return $this->_getResponseFromParts($responseBody, $http_response_header);
    }
 
    public function performHeadRequest($url, $timeout = false)
    {
        if($this->authorization) {
            stream_context_set_option($this->_headContext, 'http', 'header', 'Authorization: Basic ' . base64_encode($this->username.':'.$this->password));
        }
 
        stream_context_set_option($this->_headContext, array(
                        'http' => array(
                                // set HTTP method
                                'method' => 'HEAD',
 
                                // default timeout
                                'timeout' => $this->getDefaultTimeout()
                        )
                )
        );
 
        // set the timeout if specified
        if ($timeout !== FALSE && $timeout > 0.0)
        {
                // timeouts with file_get_contents seem to need
                // to be halved to work as expected
                $timeout = (float) $timeout / 2;
 
                stream_context_set_option($this->_headContext, 'http', 'timeout', $timeout);
        }
 
        // $http_response_headers will be updated by the call to file_get_contents later
        // see http://us.php.net/manual/en/wrappers.http.php for documentation
        // Unfortunately, it will still create a notice in analyzers if we don't set it here
        $http_response_header = null;
        $responseBody = @file_get_contents($url, false, $this->_headContext);
 
        return $this->_getResponseFromParts($responseBody, $http_response_header);
    }
 
    public function performPostRequest($url, $rawPost, $contentType, $timeout = false)
    {
        if($this->authorization) {
            stream_context_set_option($this->_postContext, 'http', 'header', 'Authorization: Basic ' . base64_encode($this->username.':'.$this->password) . "\r\n");
        }
 
        stream_context_set_option($this->_postContext, array(
                        'http' => array(
                                // set HTTP method
                                'method' => 'POST',
 
                                // Add our posted content type
                                'header' => "Content-Type: $contentType",
 
                                // the posted content
                                'content' => $rawPost,
 
                                // default timeout
                                'timeout' => $this->getDefaultTimeout()
                        )
                )
        );
 
        // set the timeout if specified
        if ($timeout !== FALSE && $timeout > 0.0)
        {
                // timeouts with file_get_contents seem to need
                // to be halved to work as expected
                $timeout = (float) $timeout / 2;
 
                stream_context_set_option($this->_postContext, 'http', 'timeout', $timeout);
        }
 
        // $http_response_header will be updated by the call to file_get_contents later
        // see http://us.php.net/manual/en/wrappers.http.php for documentation
        // Unfortunately, it will still create a notice in analyzers if we don't set it here
        $http_response_header = null;
        $responseBody = @file_get_contents($url, false, $this->_postContext);
 
        // reset content of post context to reclaim memory
        stream_context_set_option($this->_postContext, 'http', 'content', '');
 
        return $this->_getResponseFromParts($responseBody, $http_response_header);
    }
 
    private function _getResponseFromParts($rawResponse, $httpHeaders)
    {
            //Assume 0, false as defaults
            $status = 0;
            $contentType = false;
 
            //iterate through headers for real status, type, and encoding
            if (is_array($httpHeaders) && count($httpHeaders) > 0)
            {
                    //look at the first headers for the HTTP status code
                    //and message (errors are usually returned this way)
                    //
                    //HTTP 100 Continue response can also be returned before
                    //the REAL status header, so we need look until we find
                    //the last header starting with HTTP
                    //
                    //the spec: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.1
                    //
                    //Thanks to Daniel Andersson for pointing out this oversight
                    while (isset($httpHeaders[0]) && substr($httpHeaders[0], 0, 4) == 'HTTP')
                    {
                            // we can do a intval on status line without the "HTTP/1.X " to get the code
                            $status = intval(substr($httpHeaders[0], 9));
 
                            // remove this from the headers so we can check for more
                            array_shift($httpHeaders);
                    }
 
                    //Look for the Content-Type response header and determine type
                    //and encoding from it (if possible - such as 'Content-Type: text/plain; charset=UTF-8')
                    foreach ($httpHeaders as $header)
                    {
                            // look for the header that starts appropriately
                            if (strncasecmp($header, 'Content-Type:', 13) == 0)
                            {
                                    $contentType = substr($header, 13);
                                    break;
                            }
                    }
            }
 
            return new Apache_Solr_HttpTransport_Response($status, $contentType, $rawResponse);
    }    
}

SOLR_USERNAME and SOLR_PASSWORD are part of configurations that are set before the calls are made from a configuration file. It was really easy to get the ball rolling and Jetty is an excellent choice as a Solr container. If you have any questions, comments, concerns or tips to make Solr even more secure don’t hesitate to post a comment.

Posted in Java, Jetty, Linux, PHP, Security, Solr | Leave a comment