Monday, February 12, 2007

Lucene demo

I unpacked the Lucene package , and everything is available under the course web site. Under docs, look at "Getting Started", and the basic Lucene demo.

I have extracted Lucene, and compiled it. The gzipped tarfile is called lucene-2.0.0.tar.gz, and I'll put a link on the course web site in case you'd like (and have room for) your own copy.

Building Lucene required me to install apache ant, which I hope you won't need. The hardest part was modifying my CLASSPATH, as shown below:

# to build and use lucene
setenv ANT_HOME $HOME/www/676/apache-ant-1.7.0
setenv LUCHOME $HOME/www/676/lucene-2.0.0
setenv CLASSPATH .\:$LUCHOME/lucene-demos-2.0.0.jar\:$LUCHOME/lucene-core-2.0.0.jar


A tour of the Lucene directory might be a good idea. Building an index of the Lucene source files is gratifying!

Homework 1: Repeat this demo, as far as creating an index of the Lucene source. Then figure out how one would modify the IndexFiles program so that it counts the number of files indexed, and the number of bytes read. If you know Java well enough and have the disk space to recompile it, you may demonstrate that your modifications work. You should all get the same answers! We'll talk about this in class on Wednesday a little, and this homework will be due next Monday.

9 comments:

Luke said...

Running this in Windows from the command line was not too difficult. I installed Java JDK 1.6 and Apache Ant 1.7 into my Program Files directory. I set the ANT_HOME environment variable to C:\Program Files\apache-ant-1.7.0. I set the JAVA_HOME environment variable to C:\Program Files\jdk1.6.0. I added the following to my PATH environment variable: ;c:\program files\Java\jdk1.6.0\bin;c:\program files\apache-ant-1.7.0\bin. I didn't bother with the CLASSPATH -- I'll explain later.

To build the core project, I opened a command prompt, navigated to the lucene directory, and executed "ant". To build the demo, I executed "ant jar-demo".

To run the IndexFiles class, I executed:

java -cp "build\lucene-core-2.0.1-dev.jar; build\lucene-demos-2.0.1-dev.jar" org.apache.lucene.demo.IndexFiles src

-cp adds the next argument to your classpath so you don't need to bother with constantly changing the CLASSPATH environment variable. There should be no space after the semicolon -- blogger doesn't like long words because they stretch the comments box too much.

After making the changes to my java files, I could run ant and/or ant jar-demo again to rebuild the jar file.

Unknown said...

Thanks for the info, Luke...did anyone else have any problems while running Ant? My build failed, and I'm not quite sure why...this was what the compiler spit back at me:

BUILD FAILED
C:\Program Files\Lucene-2.0.0\common-build.xml:118: The following error occurred
while executing this line:
C:\Program Files\Lucene-2.0.0\common-build.xml:231: Compile failed; see the comp
iler error output for details.

Any ideas?

Unknown said...

I was not able to get Netbeans to read Lucene's ant build.xml file, even though the versions of ant looked to be compatable. I heard that Eclipse also had problems with Lucene's build.xml.

So I created the project with the "use existing sources" option. Then had to add a link to the lucene-core-2.0.0.jar [Project Properties, Libraries, Add JAR/Folder].

Mike

Michael Wilson said...

Okay, how deep should we drill into the program?

For instance - should I make the assumption that if a file is to be processed, its entire length will have been read and then return the sum of each file length as the number of bytes read?

Or should I drill down deep within the program to find exactly when it reads each byte, and attempt to sum from there?

Unknown said...

By the way, Lucene does work with Eclipse and I would recommend this setup, since it is a lot easier to keep code managed, in my opinion, in the Eclipse environment than in the command line environment. If you use the TAR file Dr. Nicholas linked to from the blog, it takes about 10 minutes to set up, at most.

~JC

Sandor Dornbush said...

Eclipse can handle Lucene very well. I was able to get it up and running in about 10 minutes too. I would strongly suggest it to others. Make a project with existing source and you should be off and running.

Unknown said...

I am taking a taking CS 188 class at UCLA. I have already set my paths and everything correctly... when trying to run the demo on lucene, I get the follow error:

$ java org.apache.lucene.demo.IndexFiles /src
Exception in thread "main" java.lang.NoClassDefFoundError : org/apache/lucene/demo/IndexFiles

Do you guys know what the problem is? (I don't think its the actualy code, since we should not even do any coding for testing the demo.)

Your help is appreciated.

Unknown said...

Sina,

Try following Luke's post exactly, except change the java -cp ... line to use the filenames for the version of Lucene you downloaded. Also, make sure you don't have a space after the semicolon.

MKS said...

Yeah Sina,

Jason is correct.

thanks,
Manish