Programming in Lua - Chapter 11
Most Frequent Words
Another interlude chapter, this time going over a pretty simple concept; reading and counting all the words within a file. It accomplishes three main tasks;
- Reads all of the text, storing each count of the occurring word within a table
- Sorts the table based on the order of frequency
- Prints all results for
n
local counter = {}
for line in io.lines() do
for word in string.gmatch(line, "%w+") do
counter[word] = (counter[word] or 0) + 1
end
end
local words = {}
for w in pairs(counter) do
words[#words + 1] = w
end
table.sort(words, function (w1, w2)
return counter[w1] > counter[w2] or
counter[w1] == counter[w2] and w1 < w2
end)
local n = math.min(tonumber(arg[1]) or math,huge, #words)
for i = 1, n do
io.write(words[i], "\t", counter[words[i]], "\n")
end
Exercises
- When we apply this type of function to some text, usually we get uninteresting small words as our most frequent. Change the above code to only get words with a count of 4 or more characters.
- Repeat the previous exercise, but instead of using length as the criteria for ignoring a word, the program should read from a text file a list of words to be ignored.
Next: [[Programming in Lua - Chapter 12|Chapter 12]]