Programming in Lua - Chapter 4
Strings
A string is basically a representation of some text. They can be tiny, or huge, depending on how you’d like. We consider strings as a sequence of bytes, which we can go into more detail on later. For now, just know we can store any binary data as a string, or even Unicode representation.
Any string we make in Lua is immutable, so we can’t alter it once it’s made. Any alterations would have to be made to an entirely new string. Not only that, but Lua also has automatic memory management, so we don’t have to concern ourselves with allocation and deallocation.
> a = "My string!"
> b = string.gsub(a, 'My', 'Our')
> a --> My string!
> b --> Our string!
> #a --> 10
A helpful operator we’ll use a bunch is the length operator, or #. Just be careful, in certain encodings it doesn’t count characters but bytes instead.
The concatenation operator is two dots, ..
. This operator also always creates a new string.
Literal Strings
A literal string is any string in single or double quotation marks. If you use one, you can use the other within the same string without escaping which is a nice feature. It also has similar escape characters as Python does,
Name | Command |
---|---|
bell | \a |
back space | \b |
form feed | \f |
newline | \n |
carriage return | \r |
tab | \t |
vertical slash | \v |
backslash | \\ |
double quote | \" |
single quote | \' |
Lua allows representation of characters through many formats; numeric values, hexadecimal, ASCII, and even UTF-8 characters.
> x = '\x41'
> x --> A
> y = 'ALO\n\''
> y --> ALO
123'
> z = '\u{3b2}'
> z --> β
Long Strings
You can have like, really long strings in Lua. They are really easy to format, just put your text between double square brackets. If you end up needing to use double quotes, you can use some amount of equal signs between the brackets to differentiate them. Just make sure to use the same amount of signs for both the opening and closing.
Coercions
Funny things start to happen when you start mixing numbers and strings. For one, Lua will convert any string to a number if that string meets a numeric operator. Not only that, but Lua will convert any number to a string when expecting a string. The text recommends not counting on these in your programs, but to acknowledge their existence for potential bug fixing.
> print (10 .. 20)
1020
> "10" + 1
11.0
We can take any string and explicitly make it a number using the tonumber()
function. You can even do some crazy things with an additional parameter to define your base number;
> tonumber (" -3 ") --> -3
> tonumber (" 10e4 ") --> 100000.0
> tonumber ("10e") --> nil (not valid number)
> tonumber ("0x1.3p-4") --> 0.07421875
> tonumber ("100101", 2) --> 37
> tonumber ("100101", 36) --> 60467473
> tonumber ("fff", 16) --> 4095
> tostring(10) == "10" --> true
The last line is a small example of the reverse; how to turn a number into a string, but we have no control over the formatting of it. To avoid major issues, Lua doesn’t allow comparison between numbers and strings like 2 < "15"
.
The String Library
At the basic level, you can’t do much with strings. That’s what the string library is for. Let’s go over some useful methods;
string.len(s)
will return the length ofs
(equal to#s
)string.rep(s, n)
will repeat the given strings
,n
timesstring.reverse(s)
will reverse the strings
string.lower(s)
will return all the uppercase characters ins
as lowercasestring.upper(s)
will return the reverse of the abovestring.sub(s, i, j)
returns the characters from indexi
toj
from strings
- Important detail, this does not change the value of the original string, but creates a new string all together
string.char(x)
will return the converted characters for the given numberstring.byte(s)
will return the converted number for the given characters- Useful tech;
string.byte(s, 1, -1)
to return all codes of all characters ins
- Useful tech;
string.format()
will allow for more advanced string formatting- Takes a few different types of substitutions
d
for integerx
for hexadecimalf
for floating points
for strings
- Takes a few different types of substitutions
string.find(s, w)
searches for patterns; returning the index numbers ifw
exists ins
string.gsub(s, w, m)
replaces all occurrences ofw
ins
withm
- Also returns the amount of replacements made
Before getting into examples, we can call any function with two different ways; like the above, or with a shorten-er, s:function
;
> x = "I love lucy!"
> string.len(x) --> 12
> x:rep(2) --> I love lucy!I love lucy!
> x:reverse() --> !ycul evol I
> x:upper() --> I LOVE LUCY!
> x:sub(6, 10) --> e luc
> string.char(97) --> a
> string.byte('abc') --> 97
> tag, title = "h1", "My Title"
> string.format("< %s>%s</%s>", tag, title) --> <h1>My Title</h1>
> x:find("luc") --> 8 10
> x:gsub("lucy", "money") --> I love money! 1
The Unicode Library
As far as internet standards go, UTF-8 is the dominant decision for Unicode, and even has it’s own library. This is because some of the functions we saw above won’t work on Unicode characters. This lead to the utf8
library. It has extremely similar functions, like utf8.len()
, and utf8.char()
.
Exercises
- How can you embed the following XML into a Lua string? In two different ways?
<![CDATA[
Hello world
]]>
x = [===[
<![CDATA[
Hello world
]]>
]===]
or
x = '<!\u{5B}CDATA\u{5B}\nHello world\n\u{5D}\u{5D}>'
- Suppose you need yo write a long sequence of arbitrary bytes as a literal string. What format would you use and why?
I would use the [ [ ] ]
format (with spacing intentional for Obsidian).
- Write a function that can insert a string into a given position of another one
insert("Hello world", 1, "start: ")
:
function insert (text_string, num, in_text)
print (text_string:sub(1, num) .. in_text .. text_string:sub(num + 1, -1))
end
- Redo the previous exercise for UTF-8 strings
I think you could do this with a structure like a table or list; take all the characters from a Unicode string, gets the codes for them, then uses the concatenation to insert them into the string using normal string functions.
- Write a function to remove a slide from a string; the slice should be given by its initial position and length;
function remove (text_string, pos, leng)
print(text_string:sub(1, pos) .. text_string:sub((pos + leng) - 1), -1)
end
I can change the above with using multiple assignment to separate the index out.
Next: Chapter 5