Each of the remaining words is reduced to its word stem using the Porter algorithm, thus mapping words with the same stem to the same feature. We refer to the resulting list of word stem frequencies in the document, normalized by dividing all values by the maximum frequency in the document, as the document's feature vector or vector of term frequencies (tf values). As an option, we can replace the tf values by tf*idf values where idf is the so-called inverse document frequency of a term, the inverse of the number of documents that contain the term. To avoid that idf values dominate the tf*idf product we use log10(idf) instead of idf for dampening. As idf values are a global measure and change with the progress of the crawl, they are re-computed lazily whenever a certain number of new documents have been crawled.