Strings
Blade has a very rich support for strings and they can be expressed in several ways. In Blade,
strings are denoted by enclosing characters in pairs of single quotes ('...'
) or pairs of
double quotes ("..."
) and they are essentially the same.
Reference
- Strings
- Reference
- More about Strings
- Unicode and UTF-8
- String Interpolation
- Characters
- String Operations
- String Methods
- length()
- upper()
- lower()
- is_alpha()
- is_alnum()
- is_number()
- is_lower()
- is_upper()
- is_space()
- trim([chr: char])
- ltrim([chr: char])
- rtrim([chr: char])
- join(iterable: string | list | dict)
- split(delimiter: string | regex)
- index_of(str: string [, start_index: number])
- starts_with(str: string)
- ends_with(str: string)
- count(str: string)
- to_number()
- to_list()
- to_bytes()
- lpad(width: number [, fill: char])
- rpad(width: number [, fill: char]))
- match(str: string | regex [, offset: number = 0])
- matches(reg: regex [, offset: number = 0])
- replace(str: regex | string, replacement: string [, use_regex: bool = true])
- Regular Expressions
More about Strings
When strings are wrapped in '
or "
, you can escape that quotation within the string using
the backslash (\
).
For example:
%> 'string in single quote'
'string in single quote'
%> "another version with double quotes"
'another version with double quotes'
%> 'What\'s the escape character?'
"What's the escape character?"
%> "It's the \"\\\" character"
"It's the "\" character"
All Blade strings can span multiple lines whether created using single (
'
) or double ("
) quotes.For example:
# with single quotes 'Hello... World' # same with double quotes "I am a living Legend!"
In the REPL mode, strings are enclosed in quotes based on the kind of data they contains. If a
string contains no quote, it is wrapped in single quotes ('...'
). If it contains a single quote,
it is wrapped in double quotes ("..."
). If it contains both, the latter is used. This is simply
for presentation purpose. This is to discourage readers from confusing them with language schematics.
The print()
function produces a result that is more like what you'd expect. It does no extra
processing of the output.
For example:
%> print("It's the \"\\\" character")
It's the "\" character
Blade strings support a lot of special characters called escape sequence
for formatting and
they also need to be escaped with \
as follows:
Sequence | Meaning |
---|---|
\a |
Alert (Beep, Bell) |
\b |
Backspace |
\f |
Formfeed Page Break |
\n |
Newline |
\r |
Carriage Return |
\t |
Horizontal Tab |
\v |
Vertical Tab |
\\ |
Backslash |
\' |
Single Quotation Mark |
\" |
Dobule Quotation Mark |
\0 |
String terminator |
\$ |
Escape for interpolated strings |
\xhh |
Hexadecimal number |
\uhhhh |
Unicode code point below 10000 hexadecimal |
\Uhhhhhhhhh |
Unicode code point where h is a hexadecimal digit |
h
stands for hexadecimal digit.\0
used anywhere in a string will cause the rest of the string to be ignored and useless.\$
See theInterpolated Strings
section below.\u
takes 4 hexadecimal digits h after it.\U
takes 8 hexadecimal digits h after it.
Unicode and UTF-8
As mentioned in the previous section, Blade strings fully supports unicode and are UTF-8 encoded by default.
Unicode code points can be represented using Unicode \u
and \U
escape sequences.
For example,
%> '\u00a9'
'Š'
%> '10\u00B5s'
'10Âĩs'
%> '\U0002B695 is a chinese character'
'đĢ is a chinese character'
Unicode characters can also be written directly in strings. This means, that in a Blade string, you can actually use advanced texts like smilies, trademarks and many more directly in your source code.
For example:
%> 'I am đ'
'I am đ'
%> 'Black â rule'
'Black â rule'
%> 'éå¯éé常éīŧåå¯åé常å'
'éå¯éé常éīŧåå¯åé常å'
To verify our UTF-8 support, how about we try to get the length of the chinese string åå¯åé常å
.
With UTF-8 support, the length of this string should be six (6) and much longer without UTF-8 support.
To get the length of the string, we can call it's length()
method.
For example,
%> 'åå¯åé常å'.length()
6
%> 'Hello, World'.length() # compared with English text
12
As you can see, Blade returns the correct length irrespective of the language of the source text.
String Interpolation
As we write more code, we seldom find ourselves needing to join two strings together or at other times,
join a string to a declared variable. Some other times, we want to have the result of an operation or
expression within our string. All of these operations can soon become pretty verbose and tedious. Blade
allows interpolation into string literals using the $
character just as can be seen in Perl and Dart.
For example:
%> 'Sum after addition = ${10 + 15}'
'Sum after addition = 25'
The above example shows the general construct for string interpolation. Start interpolation with the
dollar ($
) character, and wrap the interpolated expression within curly braces {}
as shown above.
For another example, let's say we have two variables x
and y
declared as Number
and String
respectively and we want to have them concatenated to our string at some location, we can have
something like the following:
%> 'We have ${x} crates of ${y}'
'We have 20 crates of eggs'
To write the interpolation expression within a string without interpreting it, we need to escape the $
with a backslash (\
) as follows:
%> 'Sample interpolation: \${x * y}'
'Sample interpolation: ${x * y}'
Characters
In Blade, Characters are essentially strings with a length of one (1). No more, no less! However, there are
times when we require Characters over Strings. For example, the builtin ord()
function expects a character
and not a String. While this distinction looks thin, it is a very important distinction that must be put
to heart.
Characters are also UTF-8 compliant.
The sample code below shows an example of the clear distinctive use of characters and strings.
%> echo ord('A')
65
%> echo ord('AB')
Unhandled Exception: ord() expects character as argument, string given
StackTrace:
<repl>:1 -> @.script()
A more complex example that skips a lot into the future of this tutorial is given as below for reference.
%> import types
%> types.char('a')
true
%> types.char('ab')
false
%> types.char('å°')
true
%> types.char('å°įš')
false
Characters are always interchangeable for strings, but not the reverse.
String Operations
Blade strings support multiple operations categorized into one of the following four groups.
Two or more strings can be concatenated (glued together) via the +
operator whether it's a literal or
a variable, and a specific string can be repeated by multiplying it with a number via the *
operator.
For example:
%> 'str' + 'ing'
'string'
%> 'abc' * 4 # repeating 'abc' four times
'abcabcabcabc'
%> 'hat!' * 4 + 'rick' # and even in a more complex form
'hat!hat!hat!hat!rick'
The +
operator is quite powerful with a string, allowing you to add a string to a number or a number
to a string.
For example,
%> 5 + 'alive'
'5alive'
%> 'Base' + 64
'Base64'
Strings can also be checked for equality or inequality as needed. For example:
%> "abracadabra" == "xylophone"
false
%> "Hello, world." != "Goodbye, world."
true
%> "1 + 2 = 3" == "1 + 2 = ${1 + 2}"
true
Strings indexes can be accessed. The first character of a Blade string have an index of 0
. The result of
string indexes are characters.
For example:
%> 'Hello'[0]
'H'
%> 'Hello'[3]
'l'
Strings indexes can also be accessed with negative numbers. When using negative numbers to access string
indexes, note that the indexes will be returned in reverse. i.e. we start counting from the far right
where the first index will be -1
(since -0 is the same as 0).
For example:
%> 'Hello'[-1]
'o'
%> 'Hello'[-4]
'e'
Note that trying to access a non-existing index or an index out of the range of the length of the string will result in an error.
For example, the following code throws an exception.
%> 'Hello'[6]
Unhandled Exception: string index 6 out of range
StackTrace:
<repl>:1 -> @.script()
In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain subparts of a string.
For example:
%> 'Blade'[0,3] # characters starting from index 0 to index 3 - 1 (2)
'Bla'
%> 'Blade'[2,5] # characters from index 2 to index 5 - 1 (4)
'ade'
The general syntax for slicing in Blade is [lower limit, upper limit]
. Both lower limit and upper limit
can be omitted. When the lower limit is omitted, it defaults to 0
and when the upper limit is omitted,
it defaults to the length of the object e.g. the string length.
As with general indexing, the upper limit can also use negative numbers and follows the same rules as indexing with a negative number. A negative number in the lower limit will cause an empty object to be returned.
- Slices are lower limit inclusive and upper limit exclusive. For example, slice
[0,3]
will return a substring starting from index0
(inclusive) to index2
and index itself will be excluded.- Index
in[,i] + in[i,]
is equal to the value ofin
.
For example:
%> 'Blade'[0,3] # starting from index 0 to 2
'Bla'
%> 'Blade'[2,5] # starting from index 2 to 4
'ade'
%> 'Blade'[,] # starting from index 0 to the end
'Blade'
%> 'Blade'[,-3] # starting from index 0 to string length - 3
'Bl'
%> 'Blade'[3,] # starting from index 3 to the end
'de'
%> 'Blade'[-1,] # negative index in lower limit returns an empty string
''
%> 'Blade'[,4] # starting from index 0 to 3
'Blad'
%> 'Blade'[,3] + 'Blade'[3,] # in[,i] + in[i,]
'Blade'
Blade strings are immutable. Hence, a string cannot be changed. Assigning to an indexed position in the string results in an error:
For example,
> 'Blade'[0] = 'J'
Unhandled Exception: strings do not support object assignment
StackTrace:
<repl>:1 -> @.script()
You may notice how we are trying to assign to a string object directly instead of a variable and think that's why it isn't working. That's not why! In blade, if string wasn't immutable (e.g. Lists aren't immutable), Blade will go ahead and do that assignment. The fact that you aren't storing that value anywhere is up to you. But it's neither a syntax nor runtime error to do so.
If you need to to modify a string, you need to create a new one. Don't worry, Blade is smart enough to know when you don't need a string anymore and will gracefully delete the string for memory when necessary.
If you have no previous experience with C style languages or don't know what methods are, you may which to proceed into the next topic of the tutorial and come back here after you've completed the Class tutorial.
String Methods
Blade string comes with a lot of powerful text processing capabilities wrapped in methods described below.
length()
Returns the length of a string. Note that this method is UTF-8 compartible and will return the UTF-8
length for the string if the string contains UTF-8 characters whether written directly or via the
\u
or \U
escapes.
For example:
%> 'This is a pretty long string'.length()
28
%> 'ā¤ā¤¨ā¤ā¤ž ā¤ā¤ ā¤¸ā¤Žā¤¯'.length()
11
%> 'This text mixes English and į˛ĩčĒ'.length()
30
upper()
Returns a copy of the string with all the cased characters converted to uppercase. Note that the result
of this method may return false
when tested with is_upper()
of the string contains Unicode
characters that are not case folded.
For example:
%> 'blade'.upper()
'BLADE'
lower()
Return a copy of the string with all the cased characters converted to lowercase.
For example:
%> 'Blade Is Bae'.lower()
'blade is bae'
is_alpha()
Returns true
if all the characters in the string are all alphabeths and the string is not empty.,
otherwise returns false
.
For example:
%> 'abracadabra'.is_alpha()
true
%> 'my tooth aches'.is_alpha()
false
%> ''.is_alpha()
false
is_alnum()
Returns true
if all the characters in the string are either alphabeths or numbers and the string is
not empty, otherwise returns false
. This method is the same as string.is_alpha() or string.is_number()
.
For example:
%> '3Idiots'.is_alnum()
true
%> 'Three Idiots'.is_alnum()
false
%> '3 Idiots'.is_alnum()
false
%> '3'.is_alnum()
true
%> 'idiots'.is_alnum()
true
%> ''.is_alnum()
false
is_number()
Returns true
if all the characters in the string are all digits and the string is not empty,
otherwise returns false
.
For example:
%> '123.5'.is_number()
false
%> '1970'.is_number()
true
%> '1980s'.is_number()
false
is_lower()
Returns true
if at least one character in the string is cased, all cased characters are lower cased
and the string is not empty. Otherwise, it returns false
.
For example:
%> 'all'.is_lower()
true
%> 'all...123'.is_lower()
true
%> 'All...123'.is_lower()
false
%> ''.is_lower()
false
is_upper()
Returns true
if at least one character in the string is cased, all cased characters are upper cased
and the string is not empty. Otherwise, it returns false
.
For example:
%> 'ALL'.is_upper()
true
%> 'ALL...123'.is_upper()
true
%> 'All...123'.is_upper()
false
%> ''.is_upper()
false
is_space()
Returns true
if there are only whitespace characters in the string and the string is not empty.
Otherwise, it returns empty.
For example:
%> '. '.is_space()
false
%> '\r\n'.is_space()
true
%> '\t '.is_space()
true
trim([chr: char])
Returns a copy of the string with the given character (chr
) removed if it appears at the start or
end of the string. If chr
is not given, it defaults to a space (' '
). All matching leading and
trailing characters are removed until a character that doesn't match is encountered. If no match is
found, a copy of the original string is returned.
The square brackets (
[]
) around thechr: char
in the method definition indicates that the parameter is optional and does not mean you have to type the square brackets.
For example:
%> ' example '.trim()
'example'
%> ' example '.trim('e')
' example '
%> 'example'.trim('e')
'xampl'
ltrim([chr: char])
Similar to the trim()
method, except that this method only removes characters at the begining of
the string.
For example:
%> ' example '.ltrim()
'example '
%> 'example'.ltrim('e')
'xample'
rtrim([chr: char])
Similar to the trim()
method, except that this method only removes characters at the end of
the string.
For example:
%> ' example '.rtrim()
' example'
%> 'example'.rtrim('e')
'exampl'
join(iterable: string | list | dict)
Returns a stringwhich is a concatenation of the items in the iterable using the string as the separator. If the iterable contains just one item or the string is empty, the original element is returned. If the iterable contains non-string items, the items are converted to their string representation before joining.
Bytes
are the only non supported iterables.
For example:
%> ','.join(['ok', 1, true])
'ok,1,true'
%> '--'.join('name')
'n--a--m--e'
%> ','.join('a')
'a'
split(delimiter: string | regex)
Returns a list of words or characters in a string after separating the content of the string at every point where the delimiter is found.
If the delimiter is an empty string, the resultant list will contain the individual characters of the string in the order in which they appear in the original string. Consecutive delimiters are not grouped together and are deemed to delimit empty strings. Splitting an empty string with a specified separator returns an empty list.
This method has full UTF-8 support.
For example:
%> 'name'.split('')
[n, a, m, e]
%> '1<>2<>3'.split('<>')
[1, , 2, , 3]
%> '1,2,3'.split(',')
[1, 2, 3]
%> ''.split(',')
[]
%> 'å°įš'.split('')
[å°, įš]
%> 'who is in the garden'.split('/\s/')
[who, is, in, the, garden]
index_of(str: string [, start_index: number])
Returns the index position of the first occurrence of the string str
in the string string
. If
the str cannot be found anywhere in string, it returns -1. If the start_index
parameter is argument is given, it will start scanning from the given index.
For example:
%> 'hello, world'.index_of(' ')
6
%> 'hello, world'.index_of('e')
1
%> 'hello, world'.index_of('q')
-1
%> 'hello, world'.index_of('o')
4
%> 'hello, world'.index_of('o', 5) # next index of `o` starting from index 5.
8
starts_with(str: string)
Returns true
if the string begins with the string or character specified in str, otherwise
it returns false
.
For example:
%> 'hello, world'.starts_with('hello')
true
%> 'hello, world'.starts_with('hellios')
false
ends_with(str: string)
Returns true
if the string ends with the string or character specified in str, otherwise
it returns false
.
For example:
%> 'gumtree'.ends_with('tree')
true
%> 'gumtree'.ends_with('mree')
false
count(str: string)
Returns the number of non-overlapping occurrences of the substring str in the string.
For those coming from Python who may consider this method similar to Python's own, this method differs in that it does not allow specifying a start and end region for the operation. Blade considers this unnessary as the same can be accomplished by slicing the string.
For example:
%> 'Hallelujah'.count('l')
3
%> 'ding dong'.count('ng')
2
%> 'ding dong'[2,7].count('ng') # setting region to search for counts - 'ng do'
1
to_number()
Returns the first numeric value contained in the string if any exists or 0
if the string
contains no numeric value. Floating numbers that have the same value as their integer counterparts
will return the integer value.
For example:
%> '123.0 hell'.to_number()
123
%> '427 and 12'.to_number()
427
%> '96.3 of 31'.to_number()
96.3
%> 'error'.to_number()
0
to_list()
Returns a list whose elements consists of every character contained in the string in order of
appearance. Characters that repeat in the string will have different entries in the same index
as they appear in the string.
For example:
%> 'Blade'.to_list()
[B, l, a, d, e]
%> 'Plantation'.to_list()
[P, l, a, n, t, a, t, i, o, n]
to_bytes()
Returns the content of the string as a stream of bytes
.
The Blade REPL may trunctuate long bytes data when printing to console/terminal.
For example:
%> 'Blade'.to_bytes()
(42 6c 61 64 65)
%> 'Plantation'.to_bytes()
(50 6c 61 6e 74 61 74 69 6f 6e)
lpad(width: number [, fill: char])
Returns the string left justified in a string of length width. Padding is done using the specified
character fill if given of a space (' '
) if a fill is not specified. The original string is
returned if width is less than string.length()
.
For example:
%> 'cat'.lpad(5)
' cat'
%> 'cat'.lpad(5, '-')
'--cat'
%> 'cat'.lpad(2, '-')
'cat'
rpad(width: number [, fill: char]))
Returns the string right justified in a string of length width. Padding is done using the specified
character fill if given of a space (' '
) if a fill is not specified. The original string is
returned if width is less than string.length()
.
For example:
%> 'Hmm'.rpad(6)
'Hmm '
%> 'Hmm'.rpad(6, '.')
'Hmm...'
%> 'Hmm'.rpad(3, '.')
'Hmm'
match(str: string | regex [, offset: number = 0])
If the string str is a regular string, this method returns true
if the string contains a
substring str. Otherwise, it returns false
.
If the string str contains a valid regular expression (we'll get to that shortly below), it returns false
if a match for the regex str cannot be found in the string. Otherwise, it
returns a dictionary containing all first matching substring.
If the offset argument is specified, it becomes the offset in the string at which to start matching.
For example:
%> 'gorilla'.match('go') # regular string match
true
%> 'gorilla'.match('gox') # regular string non-match
false
%> 'gorilla'.match('/gox?/') # regular expression match
{0: go}
%> 'gorilla'.match('/gox\d/') # regular expression non-match
false
matches(reg: regex [, offset: number = 0])
Returns a dictionary containing every match of the given regular expression reg in the source string. If no match is found, an empty dictionary is returned.
If the offset argument is specified, it becomes the offset in the string at which to start matching.
For example:
%> '123 dollars'.matches('/[a-z]+|\d+/')
{0: [123, dollars]}
%> 'who is in the garden'.matches('/\w+/')
{0: [who, is, in, the, garden]}
replace(str: regex | string, replacement: string [, use_regex: bool = true])
Returns a copy of the string with all occurrences or matches of str replaced by the replacement string.
In the replacement string, if str is a regular expression, then capture groups can be referenced
using the syntax $index
. Taking as an example, capture group 0
contains the entire match and
can be used in the replacement string as $0
.
To escape the
$
sign in the replacement string, use the double backslashed (\\
).
For example:
%> 'lady friend'.replace('d', 'z') # non-regex
'lazy frienz'
%> 'John is 26 years old'.replace('/(\d+)/', '1$1') # regex example
'John is 126 years old'
%> 'John is 26 years old'.replace('/(\d+)/', '1\\$2')
'John is 1$2 years old'
When the third parameter
use_regex
is set to false, str will never be treated as a regular expression even if it contains a valid regular expression.
Apart from the above listed methods, String also implements the Iterable Decorators which we'll talk about in details under the Class lesson.
Regular Expressions
Regular expressions in Blade are simply special patterns expressed in a string following a few guidlines
that allow them to be distinguished by methods requiring them. We'll be using the term regex
or regexes
henceforth for the rest of this tutorial and most likely for the rest of the documentation.
Blade's regex is built on-top the PCRE2 library, an excellent library that already powers regular expression in many programming languages and have been around for decades. It feels like a better choice for now for Blade to depend on this library rather than invest years building one robust enough to match the library's capabilites.
In simple words, Blade's regex is PCRE compatible.
To create a valid regex in Blade, your regex pattern must be surrounded by identical non-word characters.
For example, /\d+/
. Note here how we surround our pattern \d+
with forward slashes (/
).
This tutorial will not attempt to teach regular expressions as there are many wonderful texts already written on that topic as well as many online tools for learning them in greater depths than we can cover in this tutorial. Majority of them based on the same engine we are using. So here's one of themfor your reference.
Most languages support different modifiers for regular expressions, and Blade has some too. Modifiers
are placed after a valid regex to control how the pattern is executed by the language. For example, in
Blade, the pattern /[a-z]/i
is a pattern modified with the i
modifier telling the interpreter to
make sure the matching is done case insensitive.
The following table lists Blade modifiers.
Modifier | Definition |
---|---|
i |
Case insensitive matching |
m |
Multi-line. This mode cases ^ and $ to match newlines. |
s |
Dot (. ) matches all |
x |
Extended matching (Ignore whitespace and # comments) |
A |
Force pattern anchoring |
D |
Do not match newline at the end. In this mode, $ will be the only valid line terminator. |
U |
Ungreedy match. |
u |
Treat pattern and subjects as UTF strings and use Unicode properties for \d, \w, etc. |
J |
Allow duplicate names for subpatterns |
Modifiers can be joined together to form a more powerful modification instruction. For example, you
can perform a multi-line and case-insensitive modification for our former sample as /[a-z]/mi
.
For example:
%> 'The side bar includes a Cheatsheet'.matches('/([A-Z])\w+/')
{0: [The, Cheatsheet], 1: [T, C]}
Or the same query with modifiers,
%> 'The side bar includes a Cheatsheet'.matches('/([A-Z])\w+/sim')
{0: [The, side, bar, includes, Cheatsheet], 1: [T, s, b, i, C]}