org.apache.solr.update.processor
Class URLClassifyProcessor

java.lang.Object
  extended by org.apache.solr.update.processor.UpdateRequestProcessor
      extended by org.apache.solr.update.processor.URLClassifyProcessor

public class URLClassifyProcessor
extends UpdateRequestProcessor

Update processor which examines a URL and outputs to various other fields characteristics of that URL, including length, number of path levels, whether it is a top level URL (levels==0), whether it looks like a landing/index page, a canonical representation of the URL (e.g. stripping index.html), the domain and path parts of the URL etc.

This processor is intended used in connection with processing web resources, and helping to produce values which may be used for boosting or filtering later.


Field Summary
 
Fields inherited from class org.apache.solr.update.processor.UpdateRequestProcessor
next
 
Constructor Summary
URLClassifyProcessor(SolrParams parameters, SolrQueryRequest request, SolrQueryResponse response, UpdateRequestProcessor nextProcessor)
           
 
Method Summary
 URL getCanonicalUrl(URL url)
          Gets a canonical form of the URL for use as main URL
 URL getNormalizedURL(String url)
           
 boolean isEnabled()
           
 boolean isLandingPage(URL url)
          Calculates whether the URL is a landing page or not
 boolean isTopLevelPage(URL url)
          Calculates whether a URL is a top level page
 int length(URL url)
          Calculates the length of the URL in characters
 int levels(URL url)
          Calculates the number of path levels in the given URL
 void processAdd(AddUpdateCommand command)
           
 void setEnabled(boolean enabled)
           
 
Methods inherited from class org.apache.solr.update.processor.UpdateRequestProcessor
finish, processCommit, processDelete, processMergeIndexes, processRollback
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

URLClassifyProcessor

public URLClassifyProcessor(SolrParams parameters,
                            SolrQueryRequest request,
                            SolrQueryResponse response,
                            UpdateRequestProcessor nextProcessor)
Method Detail

processAdd

public void processAdd(AddUpdateCommand command)
                throws IOException
Overrides:
processAdd in class UpdateRequestProcessor
Throws:
IOException

getCanonicalUrl

public URL getCanonicalUrl(URL url)
Gets a canonical form of the URL for use as main URL

Parameters:
url - The input url
Returns:
The URL object representing the canonical URL

length

public int length(URL url)
Calculates the length of the URL in characters

Parameters:
url - The input URL
Returns:
the length of the URL

levels

public int levels(URL url)
Calculates the number of path levels in the given URL

Parameters:
url - The input URL
Returns:
the number of levels, where a top-level URL is 0

isTopLevelPage

public boolean isTopLevelPage(URL url)
Calculates whether a URL is a top level page

Parameters:
url - The input URL
Returns:
true if page is a top level page

isLandingPage

public boolean isLandingPage(URL url)
Calculates whether the URL is a landing page or not

Parameters:
url - The input URL
Returns:
true if URL represents a landing page (index page)

getNormalizedURL

public URL getNormalizedURL(String url)
                     throws MalformedURLException,
                            URISyntaxException
Throws:
MalformedURLException
URISyntaxException

isEnabled

public boolean isEnabled()

setEnabled

public void setEnabled(boolean enabled)


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.