Tuesday, March 20, 2012

Adventures with Solr Join

Solr. Brilliant when you can denormalise all your data into a document by document structure. Not so brilliant when you can't and you need to "join" searches across multiple and independent indices.

For example. Say you have an index of Parents and another index of Children and you want to get a list of Parents who have a Child called Tom.

The normal Solr response to this is to add the Children's names as a multivalued field in the Parent document.

However, what if you can't ? There are 4 million Parents in the database with hundreds of fields and to load the index that way would mean making 4 million sub query calls to get the Children. And the Parents don't really change much but the World's sperm banks are issuing and correcting thousands or parental links a day, which would mean updating thousands of Parent documents (I should mention that this whole example is entirely hypothetical!).

In this case you really need to maintain two indexes. An index of Parents and an index of Children (that contains a mother and father id field).

Both can be populated by a single query to the backing database.

The current trunk copy of Solr 4.0 has a new join feature that promises this functionality. However there are a few things to bear in mind -


  • You can only perform a join on one index with itself or across two cores in the same servlet container. No joining across multiple servlets (as far as I can tell).

  • You can join based on a criteria on the Child but you'll only get fields back from the Parent - nothing from the Child is returned.

  • There's not much documentation out there.


Some things I had to do to get my example to work -



If you want to join by filtering on a Child field that doesn't exist in the Parent, for now at least, you have to add the same field as a dummy field in the Parent schema.

If you want to perform a query on the Parents at the same time as joining with Children you need to write the join part as a nested query.


http://localhost:8983/solr/parents/select?q=alive:yes AND _query_:"{!join fromIndex=children from=fatherid to=parentid v='childname:Tom'}"


In English: Return all fathers that are alive and have a child called Tom.

Note: The field childname had to be added to the Parents schema.

Monday, May 02, 2011

Ubuntu Natty Narwhal (11.04) ssh connection reset

Just upgraded a development machine from Ubuntu 10.10 to 11.04 and all my ssh outbound connections stopped working with a connection reset error.
Took a while to find the solution with google but this chaps posting helped -
Natty Narwhal: Problems connecting to servers behind (Cisco) firewalls using ssh
The top of my .ssh/config now has these two lines which makes it apply to all connections -
Ciphers aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc MACs hmac-md5,hmac-sha1,hmac-ripemd160

Tuesday, February 15, 2011

Ajax-solr, struts2 and a backend solr server

I'd been using solr for a while when I stumbled across the ajax-solr project. Its really quite neat, well thought out and is relatively easy to customise. Within a few hours I had it integrated into our webapp and running queries with beautifully paginated & highlighted results.
All well and good, until I started to think about to roll it out to the production environment. The ajax-solr library is fast and works well as it communicates directly with the solr server... but in our case this is several tiers down and not accessible to the outside world. Solr itself has no security built in and the thought of trying to convince the network security guys to open up a route through to the solr server was not even worth entertaining.
There are a few postings on how to write a proxy servlet to pass requests through but they all looked too complicated and exposed too many risks. I was looking at having to throw away the investment in ajax-solr and roll my own server based solution.
However, for this webapp we use Struts2 to handle some of the requests and handle ajax requests through the actions. Why not set the ajax-solr url to point to a Struts2 url, within the action make the request to the backend solr server and then stream the results back ? Turned out to be simpler to do than write the previous sentence.
First I had to change the way the AjaxSolr.Manager crafts the url as it wasn't going to work as it was -
Changed a line in Manager.jquery.js from
jQuery.getJSON(this.solrUrl + servlet + '?' + this.store.string() + '&wt=json&json.wrf=?', {}, function (data) { self.handleResponse(data); });
to
jQuery.getJSON(this.solrUrl + '?' + this.store.string() + '&wt=json&jsonwrf=?', {}, function (data) { self.handleResponse(data); });
to rename the parameter "json.wrf" to "jsonwrf" (its not easy to handle a parameter name with a dot in Struts2... or at least I didn't even try) and remove the "servlet" part as I needed to construct a url in the form "myaction!solr.action?q...."

Then add to my action a solr method a bit like this -

 
    private String q, solrrequest,jsonwrf;
    private Integer start;

    public String solr() throws Exception {
        logger.debug("Original request - " + getRequest().getQueryString());
        
        StringBuffer query = new StringBuffer();
        query.append("q=");
        query.append(URLEncoder.encode(q, "UTF-8"));
        if (start != null) {
            query.append("&start=");
            query.append(start);         
        }
        query.append("&hl=on");
        query.append("&hl.fl=*");
        query.append("&wt=json");
        if (jsonwrf != null) {
            query.append("&json.wrf=");
            query.append(jsonwrf);         
        }
        
        solrrequest = "http://mysolrserver/solr/select?" + query.toString();

        logger.info("Built Http Get request to solr: " + solrrequest);

        return "jsondata";
    }

    public InputStream getInputStream() throws Exception {
        HttpClient httpClient = new HttpClient();

        GetMethod method = new GetMethod(solrrequest);

        int statusCode = httpClient.executeMethod(method);

        logger.info("Http Get request returned response: '" + HttpStatus.getStatusText(statusCode) + "'");

        return method.getResponseBodyAsStream();

    }


then in my struts.xml add a mapping a bit like this which makes the output a stream of json data from the getInputStream() method above.

 
        <action name="myaction" class="com.me.MyAction">  
            <result name="jsondata" type="stream">
                <param name="contentType">application/json</param>
                <param name="inputName">inputStream</param>
            </result>            
        </action>                 


and finally the setup of the ajax-solr looks a bit like this -

 
  $(function () {
   
    Manager = new AjaxSolr.Manager({
     solrUrl: 'myaction!solr.action'
    });
    
    Manager.addWidget(new AjaxSolr.ResultWidget({
        id: 'result',
        target: '#results'
      }));

    .
    . 
    .



So we have ajax-solr talking through a Struts2 action to the backend solr server. The advantage being that the access to the solr server is constrained by what you put in the action and there is only a moderate degradation in performance.

Hope that helps someone one day solve a similar problem.

Tuesday, May 18, 2010

Unable to locate Spring NamespaceHandler for XML schema namespace [http://www.springframework.org/schema/batch]

Writing a Spring Batch project (Spring 2.5.6.SEC01, Spring Batch 2.0.0.RELEASE) and started to make use of the batch namespace to define the steps in the batch job. Everything worked fine testing in Eclipse.

Built the project using maven into an rpm and deployed to a test Linux box as normal and as per all the other batch jobs for this project. The only change to the job had been to switch some elements of the context to the batch namespace.

Tried to run the job and got the following -

Unable to locate Spring NamespaceHandler for XML schema namespace [http://www.springframework.org/schema/batch]

After much Googling I found this JIRA that explains a problem with the core Spring project overwriting and ignoring the schemas from the non core Spring projects.

Bob Miers added some help on how he fixed it. My solution is similar.

Create your own spring.handlers and spring.schemas in your own src/main/resources/META-INF directory. Copy into those the contents from both the Spring core jar and the Spring batch jar into these files and bingo - problem solved (for now).

Probably because I'm building an rpm I didn't have to use Bob's solution of using the Maven shade plugin.

Tuesday, May 04, 2010

Running a Maven junit test in JMeter for performance testing

We needed to load test a remote service. I already had a JUnit test written to test the service so the logical thing was to see if we could get JMeter to use it.

This pdf describes how to set it up, the problem being getting the right jars from Maven.

First you need to get Maven to produce a jar file of all your unit tests -

 
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<executions>
<execution>
<id>test-jar</id>
<phase>package</phase>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
</plugin>


This will create a jar in your target directory with the JUnit tests in your project.

Next you need to export the dependency jars. Add this to your pom.xml -

 
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptors>
<descriptor>assembly.xml</descriptor>
</descriptors>
</configuration>
</plugin>



and create an assembly.xml file that looks a bit like this -

 
<?xml version="1.0" encoding="UTF-8"?>
<assembly>
<id>dependencies</id>
<includeBaseDirectory>false</includeBaseDirectory>
<dependencySets>
<dependencySet>
<outputDirectory></outputDirectory>
<unpack>false</unpack>
<scope>test</scope>
</dependencySet>
</dependencySets>
</assembly>


Note how the scope is set to test - this makes sure all the dependencies scoped as test in your pom are included as well.

Now run the assembly:directory goal from the command line to generate all the dependent jars.

Now all you have to do is copy all these jars into your jmeter/lib/junit directory as the pdf describes and you're away.... just make sure you follow it to the letter.

Friday, March 19, 2010

Loading properties into Spring context files during Spring JUnit tests

Spring has some rather handy JUnit test wrapper classes like AbstractSingleSpringContextTests.

Normally that is all great. All you need to do is override the getConfigLocations method with your context filename(s) and it will load the context(s) before running the tests. All your beans are now available to call in your unit test.

However, on a project I've been working on recently we place a lot of properties in our contexts that get overwritten at runtime depending on if we're running in development, pre-production or production (database connections etc). Loading these up as they in a Spring JUnit test causes all sorts of problems !

What we've done is create another small context file that we've called applicationContext-test.xml that looks like this -

  
<beans xmlns="http://www.springframework.org/schema/beans" xsi="http://www.w3.org/2001/XMLSchema-instance" util="http://www.springframework.org/schema/util" aop="http://www.springframework.org/schema/aop" context="http://www.springframework.org/schema/context" schemalocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.5.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-2.5.xsd">


<!-- Ensures that all the properties defined in development.properties are loaded -->
<context:property-placeholder location="file:config/development.properties">

</context:property-placeholder></beans>



Then just add this to the start of the list of context files we list in getConfigLocations() and all your properties get loaded and used !

  
@Override protected String[] getConfigLocations() {
/*
* By including the applicationContext-test.xml it populates the parameters from the
* dev config file (see the applicationContext-test.xml file)
*/
return new String[] {
"applicationContext-test.xml", "applicationContext-mq.xml", "applicationContext-db.xml"
};
}

MyEclipse debugging not stopping at break points

For a long time I've had a hit and miss relationship with debugging in MyEclipse. I'd set a breakpoint, cross my fingers and hope it would stop at that point. About 80% of the time it would. The other 20% was a real pain in the butt.

Finally decided to sort it out once and for all and it turns out to be a bug in the jdk 1.6.0_14 I'm using.

All you need to do to fix it is add -XX:+UseParallelGC to your jvm arguments.

Window -> Preferences -> Java -> Installed JRE's -> (edit your currently used JRE) -> Edit "Default VM Arguments" box and enter (w/o questionmarks) "-XX:+UseParallelGC"