<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-2649463705847737437</id><updated>2011-04-21T16:15:38.691-07:00</updated><title type='text'>CS 676 Spring 2007</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>33</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-8946262602412756853</id><published>2007-05-09T14:05:00.000-07:00</published><updated>2007-05-09T14:07:45.488-07:00</updated><title type='text'>programming project</title><content type='html'>When you turn in your project, I'd like to see the code you wrote or modified; figures for how large the indices are for words and n-grams, for example details from a directory listing; and a couple of sample queries and the results generated.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-8946262602412756853?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/8946262602412756853/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=8946262602412756853' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/8946262602412756853'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/8946262602412756853'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/05/programming-project.html' title='programming project'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-3976736347941986226</id><published>2007-05-09T12:20:00.000-07:00</published><updated>2007-05-09T12:54:23.092-07:00</updated><title type='text'>April 23, 25 and 30</title><content type='html'>I note that my blogging has taken a break.  &lt;br /&gt;&lt;br /&gt;Web search could probably be a course on its own, covering search as well as web services, RSS, and maybe semantic web stuff.  In an IR course, web search may be good as a running example, but it can't take over the course.&lt;br /&gt;&lt;br /&gt;A newer textbook can cover new topics, and that's important in an IR course.  Is it my imagination, though, or is it true that some older textbooks are better than some new ones, even if some material is dated?&lt;br /&gt;&lt;br /&gt;Callan's paper is fine, but I feel the need to add a survey of peer-to-peer IR.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-3976736347941986226?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/3976736347941986226/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=3976736347941986226' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/3976736347941986226'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/3976736347941986226'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/05/april-23-25-and-30.html' title='April 23, 25 and 30'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-5343950587247160906</id><published>2007-05-09T12:19:00.000-07:00</published><updated>2007-05-09T12:55:24.135-07:00</updated><title type='text'>May 2, 7 and 9</title><content type='html'>So far, the student talks have gone well.  People have been staying in the time limits very well, without too much prodding from me, and that's good.  I'm learning from listening to the talks, and that's good too!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-5343950587247160906?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/5343950587247160906/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=5343950587247160906' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5343950587247160906'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5343950587247160906'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/05/may-2-7-and-9.html' title='May 2, 7 and 9'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-8671920100784111114</id><published>2007-04-18T15:31:00.000-07:00</published><updated>2007-04-18T15:39:42.214-07:00</updated><title type='text'>schedule of talks</title><content type='html'>5/2&lt;br /&gt;Ron Roff&lt;br /&gt;Joel G.&lt;br /&gt;Mike Wilson&lt;br /&gt;JC Montminy&lt;br /&gt;&lt;br /&gt;5/7&lt;br /&gt;Chris&lt;br /&gt;Mansi Radke&lt;br /&gt;Beenish&lt;br /&gt;Luke&lt;br /&gt;&lt;br /&gt;5/9&lt;br /&gt;Stephen&lt;br /&gt;Ginny&lt;br /&gt;Jason&lt;br /&gt;Sayeed&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;5/14&lt;br /&gt;Justin&lt;br /&gt;Mike&lt;br /&gt;Marcin&lt;br /&gt;Sandor&lt;br /&gt;Aparna&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-8671920100784111114?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/8671920100784111114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=8671920100784111114' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/8671920100784111114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/8671920100784111114'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/04/schedule-of-talks.html' title='schedule of talks'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-1849157502594157940</id><published>2007-04-17T13:44:00.000-07:00</published><updated>2007-04-17T14:04:21.732-07:00</updated><title type='text'>class Monday evening 4/17</title><content type='html'>So, you may have noticed that we had no class yesterday evening.  The Catonsville area and UMBC in particular suffered a power outage that closed us down from noon until 6pm.  The department mail servers were running, but I had no machine that had power, so there was no way for me to notify you.&lt;br /&gt;&lt;br /&gt;In hindsight, I should have put a notice on the door, but frankly that slipped my mind.&lt;br /&gt;&lt;br /&gt;Anyway, I apologize to those who made the trip to campus for nothing.&lt;br /&gt;&lt;br /&gt;I plan to cover cross-language IR on Wednesday.  I'll be posting a paper or two shortly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-1849157502594157940?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/1849157502594157940/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=1849157502594157940' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1849157502594157940'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1849157502594157940'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/04/class-monday-evening-417.html' title='class Monday evening 4/17'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-162893999333776755</id><published>2007-04-11T14:41:00.000-07:00</published><updated>2007-04-11T15:08:16.929-07:00</updated><title type='text'>format for writing project presentations</title><content type='html'>max. 15 minutes - I suggest you rehearse&lt;br /&gt;&lt;br /&gt;2-3 minutes/slide&lt;br /&gt;&lt;br /&gt;title and your name&lt;br /&gt;executive summary &lt;= 50 words&lt;br /&gt;what piqued your interest in this topic?&lt;br /&gt;&lt;br /&gt;present an example (multiple slides, but make it snappy)&lt;br /&gt;OR&lt;br /&gt;explain what you learned&lt;br /&gt;&lt;br /&gt;what assumptions are made?  what are the advantages and disadvantages?&lt;br /&gt;&lt;br /&gt;future work - questions that still need to be answered&lt;br /&gt;conclusions&lt;br /&gt;&lt;br /&gt;peer-review is fine!  don't be mean to each other&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-162893999333776755?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/162893999333776755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=162893999333776755' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/162893999333776755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/162893999333776755'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/04/format-for-writing-project.html' title='format for writing project presentations'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-8261047120613998749</id><published>2007-04-09T12:11:00.000-07:00</published><updated>2007-04-09T12:13:22.031-07:00</updated><title type='text'>Monday April 9</title><content type='html'>The material on passage-based retrieval won't take too long to present.  We'll probably discuss the programming project a little.  With the Writing Project due soon, i.e. Wednesday of next week April 18?  Homework 3 will be due the following Monday April 23.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-8261047120613998749?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/8261047120613998749/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=8261047120613998749' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/8261047120613998749'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/8261047120613998749'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/04/monday-april-9.html' title='Monday April 9'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-7302300495217754554</id><published>2007-04-09T12:01:00.000-07:00</published><updated>2007-04-09T12:11:19.417-07:00</updated><title type='text'>Wednesday, April4, 2007</title><content type='html'>On Monday 4/2 we began the clustering tutorial found at http://www.cs.umbc.edu/~nicholas/clustering&lt;br /&gt;&lt;br /&gt;Finished that on Wednesday April 4&lt;br /&gt;&lt;br /&gt;Two search engines that use clustering include vivisimo and iboogie.  Both still seem to be around.&lt;br /&gt;&lt;br /&gt;Also talked about the programming project.  Basically, the project is to use the ngram package (located in lucene's contrib directory) with the Reuters corpus.&lt;br /&gt;&lt;br /&gt;Since the Reuters corpus uses SGML markup and several documents in a single file, parsing the documents is non-trivial.  Originally I had said to index only the text inside paragraphs, but that may be too awkward.  &lt;br /&gt;&lt;br /&gt;So, what would it take to index the whole Reuters corpus, using ngrams, (with n=5)?  One approach would be to use a perl script to make a file for each document, which would involve several thousand files.  With n-grams, common ngrams will occur in each document with roughly the same frequency, so the markup shouldn't make much difference.&lt;br /&gt;&lt;br /&gt;So that's the project:  use the ngrams package to index the Reuters corpus, and use some sample queries, maybe titles with empty documents, to show that it works.  Queries may have SGML markup.&lt;br /&gt;&lt;br /&gt;I had also described Homework 3:  Using a script or lucene or a program of your choice, tell me the ten most common 5-grams in the Reuters corpus.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-7302300495217754554?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/7302300495217754554/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=7302300495217754554' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/7302300495217754554'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/7302300495217754554'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/04/wednesday-april4-2007.html' title='Wednesday, April4, 2007'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-5946286510358666294</id><published>2007-04-02T13:13:00.000-07:00</published><updated>2007-04-02T13:29:45.455-07:00</updated><title type='text'></title><content type='html'>Wednesday 3/28&lt;br /&gt;We went over the McNamee and Mayfield paper, in some detail, and then we finished Salton and Buckley's relevance feedback paper.  Shannon's 1948 paper is available on the &lt;a href="http://doi.acm.org/10.1145/584091.584093"&gt;web&lt;/a&gt;, and is still worth reading.&lt;br /&gt;&lt;br /&gt;The use of LSA in cross-lanaguage IR is described in several places, e.g. &lt;a href="http://lsi.research.telcordia.com/lsi/papers/XLANG96.pdf"&gt;&lt;br /&gt;http://lsi.research.telcordia.com/lsi/papers/XLANG96.pdf&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-5946286510358666294?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/5946286510358666294/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=5946286510358666294' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5946286510358666294'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5946286510358666294'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/04/wednesday-328-we-went-over-mcnamee-and.html' title=''/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-339973987641533300</id><published>2007-03-26T14:12:00.000-07:00</published><updated>2007-03-26T14:18:45.761-07:00</updated><title type='text'>plans for Monday 3/26 and Wednesday 3/28</title><content type='html'>Tonight I'll be talking about relevance feedback.  Let me know if you think this is a good topic or not :-)&lt;br /&gt;&lt;br /&gt;Do people want to learn more about n-grams?  An article on generalized n-grams just appeared in Information Processing and Management.&lt;br /&gt;&lt;br /&gt;In my opinion, IP&amp;amp;M is one of the very best IR journals.  You can access it online at&lt;br /&gt;&lt;a href="http://www.sciencedirect.com/science/journal/03064573"&gt;http://www.sciencedirect.com/science/journal/03064573&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Wednesday is a little open - more on RF, more on n-grams, or maybe an introduction to clustering.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-339973987641533300?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/339973987641533300/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=339973987641533300' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/339973987641533300'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/339973987641533300'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/03/plans-for-monday-326-and-wednesday-328.html' title='plans for Monday 3/26 and Wednesday 3/28'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-4907969996773168472</id><published>2007-03-26T12:27:00.000-07:00</published><updated>2007-03-26T12:29:09.365-07:00</updated><title type='text'></title><content type='html'>&lt;h1&gt;Information Retrieval: Data Structures &amp;amp; Algorithms&lt;/h1&gt;&lt;br /&gt;&lt;h3&gt;edited by William B. Frakes and Ricardo Baeza-Yates&lt;/h3&gt;&lt;br /&gt;&lt;a href="http://www.pimpumpam.com/motoridiricerca/ir/toc.htm"&gt;&lt;br /&gt;http://www.pimpumpam.com/motoridiricerca/ir/toc.htm&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-4907969996773168472?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/4907969996773168472/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=4907969996773168472' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/4907969996773168472'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/4907969996773168472'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/03/information-retrieval-data-structures.html' title=''/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-6112443960294315106</id><published>2007-03-15T14:48:00.000-07:00</published><updated>2007-03-15T14:49:58.720-07:00</updated><title type='text'>more on the writing project</title><content type='html'>You may implement something, but that's not necessary.  You'll probably need to read up on your topic, focus QUICKLY on some particular subtopic, and explain it to me in ten pages...  If you write something explaining that topic to me, maybe with a simple example, that would be fine.&lt;br /&gt;&lt;br /&gt;A student wrote:&lt;br /&gt;&lt;br /&gt;&gt; Hello.  I just had a question about the writing project.&lt;br /&gt;&gt;&lt;br /&gt;&gt; What exactly are you expecting for this project?  I'm not entirely certain&lt;br /&gt;&gt; exactly what I'm supposed to write -- is this a research style project&lt;br /&gt;&gt; where I'm supposed to implement something and investigate a new topic, or&lt;br /&gt;&gt; am I going to go over a bunch of subtopics in the topic I have provided&lt;br /&gt;&gt; and inform you about them?  I thought I had a better idea over what&lt;br /&gt;&gt; exactly was necessary, but I find myself a little confused about it right&lt;br /&gt;&gt; now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-6112443960294315106?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/6112443960294315106/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=6112443960294315106' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/6112443960294315106'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/6112443960294315106'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/03/more-on-writing-project.html' title='more on the writing project'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-5793811247947796396</id><published>2007-03-14T16:02:00.000-07:00</published><updated>2007-03-14T16:05:20.799-07:00</updated><title type='text'>Notes from March 12 and March 14</title><content type='html'>Introduced LSA on Monday.&lt;br /&gt;&lt;br /&gt;Finished LSA on Wednesday, and talked about n-grams.  I probably should have passed out Damashek 95 beforehand.  I was asked what character set was used in the acquaintance plots, and I thought it was all unicode but I don't know.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-5793811247947796396?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/5793811247947796396/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=5793811247947796396' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5793811247947796396'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5793811247947796396'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/03/notes-from-march-12-and-march-14.html' title='Notes from March 12 and March 14'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-7792557024361027172</id><published>2007-03-09T10:47:00.000-08:00</published><updated>2007-03-09T10:55:54.543-08:00</updated><title type='text'>clarification for hw 2</title><content type='html'>The idea of hw 2 is to give experience with computing tf.idf weights, and to see how stop words are treated.  One approach is to compute the tf.idf score for each of the 35 stopwords, on a per document basis.  The output would be a 330 by 35 matrix, and most of the scores should be positive but close to zero.  Reading the output may be a little cumbersome, but this would be fine.&lt;br /&gt;&lt;br /&gt;Another approach is to keep track of the min and max values of tf, for each of the 35 stopwords, as the documents are parsed and the index built.  Then calculate idf for each stopword, and print for each stop word the min tf.idf and max tf.idf.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-7792557024361027172?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/7792557024361027172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=7792557024361027172' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/7792557024361027172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/7792557024361027172'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/03/clarification-for-hw-2.html' title='clarification for hw 2'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-7275357069235432617</id><published>2007-02-28T13:46:00.000-08:00</published><updated>2007-02-28T14:00:25.747-08:00</updated><title type='text'>Homework 2</title><content type='html'>This homework is due Monday, March 12&lt;br /&gt;&lt;br /&gt;1) find the module or modules in Lucene that handle stopwords.  Is there a static stopword list?  If so, where is it?&lt;br /&gt;&lt;br /&gt;2) how would Lucene be modified in order to count the occurrences of individual stopwords? &lt;br /&gt;&lt;br /&gt;3) Make the necessary changes, and rebuild the index on the Lucene src tree as in homework 1, and have it print a report saying how many times each stopword occurred (tf) and the total number of documents in which that stopword occurred (df).   Then using (one of) Salton and Buckley's suggested tf.idf formulae for documents, print the term weight that should be given to each stopword.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-7275357069235432617?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/7275357069235432617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=7275357069235432617' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/7275357069235432617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/7275357069235432617'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/homework-2.html' title='Homework 2'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-3293910429223121963</id><published>2007-02-28T13:32:00.000-08:00</published><updated>2007-02-28T13:46:23.427-08:00</updated><title type='text'>Notes from 2/26, plans for 2/28</title><content type='html'>On Monday I talked about some more writing project topics.  I started talking about probabilistic IR, using the slides posted.&lt;br /&gt;&lt;br /&gt;I'll do more with probabilistic IR this evening.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-3293910429223121963?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/3293910429223121963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=3293910429223121963' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/3293910429223121963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/3293910429223121963'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/notes-from-226-plans-for-228.html' title='Notes from 2/26, plans for 2/28'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-5618916106720301126</id><published>2007-02-26T13:46:00.000-08:00</published><updated>2007-02-26T13:52:57.650-08:00</updated><title type='text'>Notes from 2/21, plans for 2/26</title><content type='html'>Spent a LOT of time last Wednesday talking about writing project topics.  Here are some more:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;There are other packages besides Lucene, e.g. Lemur, and Clairlib, and maybe others.  A comparison of those packages would be a good topic.&lt;/li&gt;&lt;li&gt;The connection between IR and other areas, such as machine learning or NLP, can be explored.&lt;/li&gt;&lt;/ul&gt;For Monday evening 2/26, I'll talk about the Salton and Buckley paper, and introduce the concept of probabilistic IR.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-5618916106720301126?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/5618916106720301126/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=5618916106720301126' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5618916106720301126'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5618916106720301126'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/notes-from-221-plans-for-226.html' title='Notes from 2/21, plans for 2/26'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-5038451973764055664</id><published>2007-02-20T19:21:00.000-08:00</published><updated>2007-02-20T19:42:41.067-08:00</updated><title type='text'>writing project</title><content type='html'>My usual procedure is to wait until later in the semester, and assign a writing project that is due at the end of the semester.  People rarely complain, but who needs more stress at the end of the semester anyway?&lt;br /&gt;&lt;br /&gt;So let's get an early start.  Within ten days, say by Monday March 5, I'd like you to  tell me, in a short email, the topic of your paper.  It has to have something to do with IR, and NOT something that we'll be going over in class in detail, although going in depth in some topic that we mention in class is fine.  Describe your topic  in a paragraph, and list at least three references that you're thinking of consulting.  &lt;br /&gt;&lt;br /&gt;The final paper should be about ten pages, with at least ten references.  Don't let all the references be from Wikipedia.  The paper will be due on Wednesday, April 18.&lt;br /&gt;&lt;br /&gt;There are lots of possible topics!  We can start with the   &lt;br /&gt;&lt;a href="http://www.sigir2007.org/cfp.html"&gt;SIGIR Call for Papers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;and move on to the &lt;br /&gt;&lt;a href="http://www.fc.ul.pt/cikm2007/Call-for-Papers.html"&gt;CIKM Call for Papers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I'm not expecting original research results of conference quality (although that'd be nice) but you'll need to do something more than just a rehash of existing work.  It's always a good idea to summarize work in an area, and then suggest future work that somebody could do for a 698 project, or a thesis.  Another approach is to study some technique, and then present a new example that would help people understand it.  If you want to write a program (e.g. an extension or modification to Lucene) you can include that as an appendix, and it won't count towards the ten-page limit.&lt;br /&gt;&lt;br /&gt;It doesn't bother me if your writing project happens to be related to your job, or dovetails with something you're doing in another class.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-5038451973764055664?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/5038451973764055664/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=5038451973764055664' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5038451973764055664'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5038451973764055664'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/writing-project.html' title='writing project'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-6476187764288594301</id><published>2007-02-20T14:22:00.000-08:00</published><updated>2007-02-20T14:30:47.742-08:00</updated><title type='text'>Monday 2/19/07</title><content type='html'>Distributed annotated copies of the onjava article dated 1/15/03, and the today.java dated 7/30/03.  &lt;br /&gt;&lt;br /&gt;Most people seem to have finished homework 1, and we discussed that a little.  Getting lucene to recompile was the hardest part, at least for me.&lt;br /&gt;&lt;br /&gt;In response to questions, I talked about phrase-based retrieval and n-gram retrieval (both character and word n-grams) as alternatives to the bag of words model.  Note that words, phrases, and n-grams have their pros and cons - all three are just the way you decide what terms are to be indexed.  Once the "term space" is identified, the vector space, probabilistic, or boolean models of retrieval are options.&lt;br /&gt;&lt;br /&gt;Unix tools can be used to do "sanity checks" on IR results.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-6476187764288594301?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/6476187764288594301/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=6476187764288594301' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/6476187764288594301'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/6476187764288594301'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/monday-21907.html' title='Monday 2/19/07'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-9043310174700746272</id><published>2007-02-16T11:15:00.000-08:00</published><updated>2007-02-16T11:16:28.709-08:00</updated><title type='text'>Homework 1 now due Wednesday 2/21</title><content type='html'>&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-9043310174700746272?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/9043310174700746272/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=9043310174700746272' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/9043310174700746272'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/9043310174700746272'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/homework-1-now-due-wednesday-221.html' title='Homework 1 now due Wednesday 2/21'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-347734379617458192</id><published>2007-02-16T11:05:00.000-08:00</published><updated>2007-02-16T11:13:37.767-08:00</updated><title type='text'>No class on Wednesday February 14</title><content type='html'>Notes from Monday 2/12.  Went over the slides on the vector space model, including a short example with tf.idf and calculation of a similarity coefficient.  Did NOT cover the problem of document (or query) length normalization, nor the boolean model.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-347734379617458192?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/347734379617458192/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=347734379617458192' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/347734379617458192'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/347734379617458192'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/no-class-on-wednesday-february-14.html' title='No class on Wednesday February 14'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-1562895490445091122</id><published>2007-02-14T11:01:00.000-08:00</published><updated>2007-02-14T12:17:15.060-08:00</updated><title type='text'>More on Lucene</title><content type='html'>For a good introduction to Lucene, you might enjoy&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html"&gt;&lt;br /&gt;http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html"&gt;&lt;br /&gt;http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;and this one,&lt;br /&gt;&lt;br /&gt;&lt;a href="http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html"&gt;&lt;br /&gt;http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;A Perl version of Lucene is available at&lt;br /&gt;&lt;a href="http://search.cpan.org/~tmtm/Plucene-1.25/lib/Plucene.pm"&gt;&lt;br /&gt;http://search.cpan.org/~tmtm/Plucene-1.25/lib/Plucene.pm&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-1562895490445091122?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/1562895490445091122/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=1562895490445091122' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1562895490445091122'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1562895490445091122'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/more-on-lucene.html' title='More on Lucene'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-7180908032129053919</id><published>2007-02-12T15:17:00.001-08:00</published><updated>2007-02-12T14:04:07.608-08:00</updated><title type='text'>gzipped tarfile</title><content type='html'>http://www.cs.umbc.edu/~nicholas/676/lucene-2.0.0.tar.gz&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-7180908032129053919?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/7180908032129053919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=7180908032129053919' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/7180908032129053919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/7180908032129053919'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/gzipped-tarfile.html' title='gzipped tarfile'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-2635821064261792267</id><published>2007-02-12T13:40:00.000-08:00</published><updated>2007-02-11T18:53:31.882-08:00</updated><title type='text'>Lucene demo</title><content type='html'>I unpacked the Lucene package , and everything is available under the course web site.  Under docs, look at "Getting Started", and the basic Lucene demo.&lt;br /&gt;&lt;br /&gt;I have extracted Lucene, and compiled it.  The gzipped tarfile is called lucene-2.0.0.tar.gz, and I'll put a link on the course web site in case you'd like (and have room for) your own copy.&lt;br /&gt;&lt;br /&gt;Building Lucene required me to install apache ant, which I hope you won't need.  The hardest part was modifying my CLASSPATH, as shown below:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;# to build and use lucene&lt;br /&gt;setenv ANT_HOME $HOME/www/676/apache-ant-1.7.0&lt;br /&gt;setenv LUCHOME $HOME/www/676/lucene-2.0.0&lt;br /&gt;setenv CLASSPATH .\:$LUCHOME/lucene-demos-2.0.0.jar\:$LUCHOME/lucene-core-2.0.0.jar&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;A tour of the Lucene directory might be a good idea.  Building an index of the Lucene source files is gratifying!&lt;br /&gt;&lt;br /&gt;Homework 1:  Repeat this demo, as far as creating an index of the Lucene source.  Then &lt;span style="font-weight: bold;"&gt;figure out&lt;/span&gt; how one would modify the IndexFiles program so that it counts the number of files indexed, and the number of bytes read.  If you know Java well enough and have the disk space to recompile it, you may demonstrate that your modifications work.  You should all get the same answers!  We'll talk about this in class on Wednesday a little, and this homework will be due next Monday.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-2635821064261792267?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/2635821064261792267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=2635821064261792267' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/2635821064261792267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/2635821064261792267'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/lucene-demo.html' title='Lucene demo'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-8760283971345251280</id><published>2007-02-11T18:13:00.000-08:00</published><updated>2007-02-11T18:12:03.859-08:00</updated><title type='text'>Class Wed. 2/7</title><content type='html'>I spent a lot of time finishing up the slides from Chapter 1, but also touched on some ideas that get discussed in detail in Chapter 2 and beyond.&lt;br /&gt;&lt;br /&gt;The definitions of Precision and Recall are important. It is usually easy to measure precision, as long as you can tell when a specific document is relevant to a specific query. Recall is much harder to calculate, since the number of relevant documents in a large collection may well be unknown. I also talked about the pooled approach to IR evaluation. Without the pooled approach, evaulation of recall would be very difficult if not impossible on large collections. An overview of TREC, including a discussion of pooled document evaluation, is presented in Donna Harman's overview of TREC 4, available at http://trec.nist.gov/pubs/trec4/t4_proceedings.html.&lt;br /&gt;&lt;br /&gt;In another post I mention pivoted document length normalization, which is credited (in my mind at least) to Amit Singhal, then at Cornell and now at Google. PDLN is covered in Chapter 2. Somebody asked a question on Wednesday that brought query zoning to mind, and that is also credited to Singhal. His paper from SIGIR'97 is also worth reading, I think. This and related work can be seen at http://singhal.info/publications.html&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-8760283971345251280?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/8760283971345251280/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=8760283971345251280' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/8760283971345251280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/8760283971345251280'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/class-wed-27.html' title='Class Wed. 2/7'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-6462882357872262562</id><published>2007-02-11T18:08:00.000-08:00</published><updated>2007-02-05T14:45:07.502-08:00</updated><title type='text'>two important papers</title><content type='html'>In the discussion of the vector space model, Grossman and Frieder mention Salton's November 1975 CACM paper.  I recommend that you read it.  It's available through the ACM Digital Library, and through Google Scholar.&lt;br /&gt;&lt;br /&gt;They also mention Pivoted Document Length Normalization, which appeared in the 1996 SIGIR conference.  The main author is Amit Singhal.  That paper is likely still the best explanation of PDLN, which within a year or two of its introduction was widely accepted in IR.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-6462882357872262562?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/6462882357872262562/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=6462882357872262562' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/6462882357872262562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/6462882357872262562'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/two-important-papers.html' title='two important papers'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-5067278864855284413</id><published>2007-02-05T14:43:00.000-08:00</published><updated>2007-02-05T14:45:07.572-08:00</updated><title type='text'>David D Lewis's web page</title><content type='html'>&lt;a href="http://www.daviddlewis.com/resources/testcollections/reuters21578/"&gt;Reuters corpus&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-5067278864855284413?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/5067278864855284413/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=5067278864855284413' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5067278864855284413'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/5067278864855284413'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/david-d-lewiss-web-page.html' title='David D Lewis&apos;s web page'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-1929447264808309766</id><published>2007-02-05T11:49:00.000-08:00</published><updated>2007-02-05T12:23:08.674-08:00</updated><title type='text'>IR packages</title><content type='html'>So I mentioned the &lt;a href="http://lucene.apache.org/"&gt;Lucene project&lt;/a&gt; last time.&lt;br /&gt;&lt;br /&gt;Andrew McCallum (UMass) still makes the&lt;br /&gt;&lt;a href="http://www.cs.umass.edu/~mccallum/bow/"&gt;Bag of Words&lt;/a&gt; software available.&lt;br /&gt;&lt;br /&gt;What other packages are available to those who want to build their own IR systems?  They may or may not support multiple languages or multiple retrieval models.  They may be general purpose, or perhaps restricted to Web search for example.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.lemurproject.org/"&gt;Lemur package&lt;/a&gt; from CMU&lt;br /&gt;&lt;br /&gt;Two packages from NIST&lt;br /&gt;&lt;a href="http://www-nlpir.nist.gov/works/papers/zp2/zp2.html"&gt;PRISE&lt;/a&gt;&lt;br /&gt;&lt;a href="http://zing.ncsl.nist.gov/%7Ecugini/uicd/nirve-home.html"&gt;NIRVE&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;References to these and a couple more can be found at the&lt;br /&gt;&lt;a href="http://www.sigir.org/"&gt;SIGIR&lt;/a&gt; site.  There's some good stuff out here!&lt;br /&gt;&lt;br /&gt;MG is a little older, and is discussed in detail in the book Managing Gigabytes.  Zettair is related to MG, I think.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-1929447264808309766?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/1929447264808309766/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=1929447264808309766' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1929447264808309766'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1929447264808309766'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/02/ir-packages.html' title='IR packages'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-1216932054826849513</id><published>2007-01-31T12:54:00.001-08:00</published><updated>2007-02-05T14:06:19.959-08:00</updated><title type='text'>The project used a few years ago.</title><content type='html'>&lt;a href="http://www.csee.umbc.edu/~ian/irF02/project.html"&gt;Fall 2002 project&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-1216932054826849513?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/1216932054826849513/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=1216932054826849513' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1216932054826849513'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1216932054826849513'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/01/project-used-few-years-ago.html' title='The project used a few years ago.'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-1024019949731522198</id><published>2007-01-31T10:48:00.000-08:00</published><updated>2007-01-31T10:54:12.905-08:00</updated><title type='text'>Notes from class 1/29</title><content type='html'>As usual, setting up the computer support took some time.  Getting my laptop booted up and connected should be planned for.  Now that I know how to use Skype better, talking to remote students should be easier.&lt;br /&gt;&lt;br /&gt;I asked people to look at the project that was used a few years ago.  I also asked people to try to post a comment to the blog, but I haven't seen anything yet.&lt;br /&gt;&lt;br /&gt;On Wednesday 1/31, I'll take some Polaroids!  If I can get Google Talk to work on my laptop, that will be good.&lt;br /&gt;&lt;br /&gt;If this course was going to be on-line or hybrid, access to the lecture slides is obviously critical.  A set time for conference calls with students might be a good idea, although awkward for students in distant time zones.  Maybe Skype could be used to record the lecture - but breaking it up slide by slide would help even more.&lt;br /&gt;&lt;br /&gt;So Wednesday evening I plan to take pictures, and finish the slides started last time.  We'll also discuss the project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-1024019949731522198?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/1024019949731522198/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=1024019949731522198' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1024019949731522198'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1024019949731522198'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/01/notes-from-class-129.html' title='Notes from class 1/29'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-1627356039625903194</id><published>2007-01-29T07:49:00.000-08:00</published><updated>2007-01-29T07:50:14.692-08:00</updated><title type='text'>Class will meet in AC IV, 015</title><content type='html'>&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-1627356039625903194?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/1627356039625903194/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=1627356039625903194' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1627356039625903194'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1627356039625903194'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/01/class-will-meet-in-ac-iv-015.html' title='Class will meet in AC IV, 015'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-1213971105364574010</id><published>2007-01-09T13:01:00.000-08:00</published><updated>2007-01-09T13:32:51.964-08:00</updated><title type='text'></title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.amazon.com/gp/reader/1402030045/ref=sib_dp_pt/002-1065488-9840839#reader-link"&gt;&lt;img style="cursor: pointer; width: 320px;" src="http://www.amazon.com/gp/reader/1402030045/ref=sib_dp_pt/002-1065488-9840839#reader-link" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;center&gt; &lt;h1&gt;Information Retrieval&lt;/h1&gt; MW 5:30-6:45pm&lt;br /&gt;Room to be announced&lt;br /&gt;Dr. &lt;a href="http://www.cs.umbc.edu/%7Enicholas"&gt;Charles Nicholas&lt;/a&gt;&lt;br /&gt;&lt;a href="mailto:nicholas@umbc.edu"&gt;nicholas@umbc.edu&lt;/a&gt; &lt;/center&gt;    &lt;p&gt;Dr. Ian Soboroff taught this course a few years back, and I like his introduction:&lt;br /&gt;&lt;/p&gt; &lt;p&gt;This course is an introduction to the theory and implementation of software systems designed to search through large collections of text. Ever wonder how World-Wide Web search engines work? Ever wondered why they don't? You'll learn about it here. Information retrieval (IR) is one of the oldest branches of computer science, and has influenced nearly every aspect of computer usage: "search and replace" in a word processor, querying a card catalog, grep'ing through your source code, filtering the spam out of your email, searching the Web.&lt;/p&gt;    &lt;p&gt;This course will have two main thrusts. The first is to cover the fundamentals of IR: retrieval models, search algorithms, and IR evaluation. The second is to give a taste of the implementation issues by having you write (a good chunk of) your own text search engine and test it out on a sample text collection. This will be a semester-long project, details TBA.&lt;/p&gt;   &lt;p&gt;You will need to have taken the equivalent of CMSC 341 (Data Structures), and an algorithms course (441 or 641) is recommended. Linear algebra (MATH 221) and Statistics (STAT 355) are recommended but not required; they give background which will be helpful in understanding many IR concepts.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt; &lt;p&gt;&lt;span style="font-weight: bold;"&gt;Text&lt;/span&gt;&lt;br /&gt;The text will be Grossman and Frieder, available at the UMBC bookstore (at least it's been ordered) as well as&lt;span style="text-decoration: underline;"&gt; &lt;/span&gt;&lt;a href="http://www.amazon.com/Information-Retrieval-Algorithms-Heuristics-2nd/dp/1402030045/sr=8-1/qid=1168377138/ref=sr_1_1/002-1065488-9840839?ie=UTF8&amp;s=books"&gt;Amazon&lt;/a&gt;.  We will follow this book fairly closely.  Details about which chapters will be covered, and when, will follow.  Other readings will be assigned, and made available.&lt;br /&gt;&lt;/p&gt; &lt;p&gt;&lt;span style="font-weight: bold;"&gt;Grading&lt;/span&gt;&lt;br /&gt;There will be a multi-phase programming project, details to be announced, worth about 50% of the grade.  Homeworks will be another 25%.   There will also be a writing project, worth 25%.  Presentations on the programming project will take the place of the final exam.&lt;br /&gt;&lt;/p&gt; &lt;p&gt;&lt;span style="font-weight: bold;"&gt;Academic Integrity&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial, Helvetica, sans-serif;"&gt;&lt;i&gt;"By enrolling in            this course, each student assumes the responsibilities of an active            participant in UMBC's scholarly community in which everyone's academic            work and behavior are held to the highest standards of honesty. Cheating,            fabrication, plagiarism, and helping others to commit these acts are            all forms of academic dishonesty, and they are wrong. Academic misconduct            could result in disciplinary action that may include, but is not limited            to, suspension or dismissal. To read the full Student Academic Conduct            Policy, consult the UMBC Student Handbook, the Faculty Handbook, or            the UMBC Policies section of the UMBC Directory [or for graduate courses,            the Graduate School &lt;a href="http://www.umbc.edu/gradschool"&gt;website&lt;/a&gt;]."&lt;/i&gt;&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-1213971105364574010?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/1213971105364574010/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=1213971105364574010' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1213971105364574010'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/1213971105364574010'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/01/information-retrieval-mw-530-645pm-room.html' title=''/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2649463705847737437.post-2876746434744895155</id><published>2007-01-09T12:57:00.000-08:00</published><updated>2007-01-09T13:01:12.735-08:00</updated><title type='text'>Welcome</title><content type='html'>Welcome to CS 676, a course on information retrieval!&lt;br /&gt;&lt;br /&gt;It's not enough for a course to have a web site.  Nowadays you must have a blog as well!  So this blog will be used for much of the official and unofficial communication related to the course.&lt;br /&gt;&lt;br /&gt;Only students in the class, and I, will be able to post to the blog.  Readership alone is open to the public.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2649463705847737437-2876746434744895155?l=cs676sp07.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cs676sp07.blogspot.com/feeds/2876746434744895155/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2649463705847737437&amp;postID=2876746434744895155' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/2876746434744895155'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2649463705847737437/posts/default/2876746434744895155'/><link rel='alternate' type='text/html' href='http://cs676sp07.blogspot.com/2007/01/welcome.html' title='Welcome'/><author><name>Charles</name><uri>http://www.blogger.com/profile/16432516734622948167</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
