Lucene on Saleem Ansari

Lucene on Saleem Ansari /tags/lucene/ Recent content in Lucene on Saleem Ansari Hugo -- gohugo.io en (c) 2024 Saleem Ansari Wed, 17 Apr 2013 00:00:00 +0000 Understanding big Lucene index by inspecting a portion of it /2013/04/17/understanding-big-lucene-index-by-inspecting-a-portion-of-it/ Wed, 17 Apr 2013 00:00:00 +0000 /2013/04/17/understanding-big-lucene-index-by-inspecting-a-portion-of-it/ I was wondering if I could get a sample out of many huge Lucene indexes and inspect them with Lukeall on my machine. I quickly realized, that copying such indexes over network would be time consuming. First I googled for a ready-made solution so that I could copy on a few documents from the whole index into a separate ( small ) index. That way I could quickly understand the document structure. Indexing the documents stored in a database using Apache Solr and Apache Tika /2013/02/04/indexing-the-documents-stored-in-a-database-using-apache-solr-and-apache-tika/ Mon, 04 Feb 2013 00:00:00 +0000 /2013/02/04/indexing-the-documents-stored-in-a-database-using-apache-solr-and-apache-tika/ Indexing the documents stored in a database Outline: Setup a MySQL database [1] containing documents( PDF/DOC/HTML etc ). Setup Apache Solr / Tika Import the documents just by hitting an import url. NOTE: Also check the update note at the end of this post. These steps were done on my machine running Fedora 17. The commands be easliy converted for other distributions. Setup MySQL database with documents Install MySQL Server: Comparison on Lucene Solr and NoSQL /2011/07/09/comparison-on-lucene-solr-and-nosql/ Sat, 09 Jul 2011 00:00:00 +0000 /2011/07/09/comparison-on-lucene-solr-and-nosql/ Comparison on Lucene/Solr and NoSQL http://stackoverflow.com/questions/3215029/nosql-mongodb-vs-lucene-or-solr-as-your-database NoSQL, Lucene and Solr http://www.lucidimagination.com/blog/2010/04/30/nosql-lucene-and-solr/ For The Guardian, Solr is the new database http://www.lucidimagination.com/blog/2010/04/29/for-the-guardian-solr-is-the-new-database/ Is NoSQL database an alternative for a search engine? /2011/02/27/is-nosql-database-an-alternative-for-a-search-engine/ Sun, 27 Feb 2011 00:00:00 +0000 /2011/02/27/is-nosql-database-an-alternative-for-a-search-engine/ I have been thinking about this question: Is NoSQL database an alternative for a search engine? I think I just found an answer here. Lets talk about some terms and definitions first. NoSQL - Not only SQL means that a NoSQL database differs from a RDBMS in some way. IR - Information Retieval is the science of searching documents, their metadata, and retrieval. Here we compare a NoSQL storage engine MongoDB, and Information Retrieval library Apache Lucene. PyLucene on Fedora 14 /2011/02/25/pylucene-on-fedora-14/ Fri, 25 Feb 2011 00:00:00 +0000 /2011/02/25/pylucene-on-fedora-14/ I couldn’t install pylucene simply by following command: yum install pylucene Nor did following work: easy_install pylucene Nor the following :-( pip-python install pylucene So. I had to build it myself. Here, I list those steps: Install JCC $ JCC_JDK=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64 pip-python install jcc Download pylucene pylucene wget -c http://apache.mirrors.pair.com//lucene/pylucene/pylucene-2.4.1-1-src.tar.gz tar zxf pylucene-2.4.1-1-src.tar.gz cd pylucene-2.4.1-1 Build and install http://lucene.apache.org/pylucene/documentation/install.html pushd jcc # edit setup.py to match your environment JCC_JDK=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64 python setup.py build` sudo python setup.