Alessio Campanelli

Alessio Campanelli

Mobile Developer

Contact Me

Apache Lucene project develops open-source search software that provides indexing and searching functions. Go immediately to very simple code wich use only a small percent of complete capabilities. I Hope that this small article can stimulate curiosity for this great software.

A dataset analysed

My project is based on dataset of 4 Gb of tweets provide from my university. We use jsoup library for scraping all these txt files.

Creation of Lucene's docs

In this method, i read the local dataset and i create the lucene's docs for the query. It's important for me, sharing with you the importance of type property choosing. In fact StringField for the content of tweet NOT works at runtime query. Using TextField all kinds of query format works correctly

Implement search Engine

It's intersting that object ScoreDoc contain a lot of attributes utils for analysis. In my case i use rating. The keys 'tweetUser' and 'tweetText' are the keys choose previously for Lucene.