Next | Previous

Strings

Blade has a very rich support for strings and they can be expressed in several ways. In Blade, strings are denoted by enclosing characters in pairs of single quotes ('...') or pairs of double quotes ("...") and they are essentially the same.

Reference

More about Strings

When strings are wrapped in ' or ", you can escape that quotation within the string using the backslash (\).

For example:

%> 'string in single quote'
'string in single quote'
%> "another version with double quotes"
'another version with double quotes'
%> 'What\'s the escape character?'
"What's the escape character?"
%> "It's the \"\\\" character"
"It's the "\" character"

All Blade strings can span multiple lines whether created using single (') or double (") quotes.

For example:

# with single quotes
'Hello...
World'

# same with double quotes
"I am
a living
Legend!"

In the REPL mode, strings are enclosed in quotes based on the kind of data they contains. If a string contains no quote, it is wrapped in single quotes ('...'). If it contains a single quote, it is wrapped in double quotes ("..."). If it contains both, the latter is used. This is simply for presentation purpose. This is to discourage readers from confusing them with language schematics.

The print() function produces a result that is more like what you'd expect. It does no extra processing of the output.

For example:

%> print("It's the \"\\\" character")
It's the "\" character

Blade strings support a lot of special characters called escape sequence for formatting and they also need to be escaped with \ as follows:

Sequence Meaning
\a Alert (Beep, Bell)
\b Backspace
\f Formfeed Page Break
\n Newline
\r Carriage Return
\t Horizontal Tab
\v Vertical Tab
\\ Backslash
\' Single Quotation Mark
\" Dobule Quotation Mark
\0 String terminator
\$ Escape for interpolated strings
\xhh Hexadecimal number
\uhhhh Unicode code point below 10000 hexadecimal
\Uhhhhhhhhh Unicode code point where h is a hexadecimal digit
  1. h stands for hexadecimal digit.
  2. \0 used anywhere in a string will cause the rest of the string to be ignored and useless.
  3. \$ See the Interpolated Strings section below.
  4. \u takes 4 hexadecimal digits h after it.
  5. \U takes 8 hexadecimal digits h after it.

Unicode and UTF-8

As mentioned in the previous section, Blade strings fully supports unicode and are UTF-8 encoded by default. Unicode code points can be represented using Unicode \u and \U escape sequences.

For example,

%> '\u00a9'
'Š'
%> '10\u00B5s'
'10Âĩs'
%> '\U0002B695 is a chinese character'
'đĢš• is a chinese character'

Unicode characters can also be written directly in strings. This means, that in a Blade string, you can actually use advanced texts like smilies, trademarks and many more directly in your source code.

For example:

%> 'I am 😀'
'I am 😀'
%> 'Black ♞ rule'
'Black ♞ rule'
%> '道可道非常道īŧŒåå¯åéžå¸¸å'
'道可道非常道īŧŒåå¯åéžå¸¸å'

To verify our UTF-8 support, how about we try to get the length of the chinese string 名可名非常名. With UTF-8 support, the length of this string should be six (6) and much longer without UTF-8 support. To get the length of the string, we can call it's length() method.

For example,

%> '名可名非常名'.length()
6
%> 'Hello, World'.length() # compared with English text
12

As you can see, Blade returns the correct length irrespective of the language of the source text.

String Interpolation

As we write more code, we seldom find ourselves needing to join two strings together or at other times, join a string to a declared variable. Some other times, we want to have the result of an operation or expression within our string. All of these operations can soon become pretty verbose and tedious. Blade allows interpolation into string literals using the $ character just as can be seen in Perl and Dart.

For example:

%> 'Sum after addition = ${10 + 15}'
'Sum after addition = 25'

The above example shows the general construct for string interpolation. Start interpolation with the dollar ($) character, and wrap the interpolated expression within curly braces {} as shown above.

For another example, let's say we have two variables x and y declared as Number and String respectively and we want to have them concatenated to our string at some location, we can have something like the following:

%> 'We have ${x} crates of ${y}'
'We have 20 crates of eggs'

To write the interpolation expression within a string without interpreting it, we need to escape the $ with a backslash (\) as follows:

%> 'Sample interpolation: \${x * y}'
'Sample interpolation: ${x * y}'

Characters

In Blade, Characters are essentially strings with a length of one (1). No more, no less! However, there are times when we require Characters over Strings. For example, the builtin ord() function expects a character and not a String. While this distinction looks thin, it is a very important distinction that must be put to heart.

Characters are also UTF-8 compliant.

The sample code below shows an example of the clear distinctive use of characters and strings.

%> echo ord('A')
65
%> echo ord('AB')
Unhandled Exception: ord() expects character as argument, string given
  StackTrace:
    <repl>:1 -> @.script()

A more complex example that skips a lot into the future of this tutorial is given as below for reference.

%> import types
%> types.char('a')
true
%> types.char('ab')
false
%> types.char('地')
true
%> types.char('地į‚š')
false

Characters are always interchangeable for strings, but not the reverse.

String Operations

Blade strings support multiple operations categorized into one of the following four groups.

Two or more strings can be concatenated (glued together) via the + operator whether it's a literal or a variable, and a specific string can be repeated by multiplying it with a number via the * operator.

For example:

%> 'str' + 'ing'
'string'
%> 'abc' * 4 # repeating 'abc' four times
'abcabcabcabc'
%> 'hat!' * 4 + 'rick' # and even in a more complex form
'hat!hat!hat!hat!rick'

The + operator is quite powerful with a string, allowing you to add a string to a number or a number to a string.

For example,

%> 5 + 'alive'
'5alive'
%> 'Base' + 64
'Base64'

Strings can also be checked for equality or inequality as needed. For example:

%> "abracadabra" == "xylophone"
false
%> "Hello, world." != "Goodbye, world."
true
%> "1 + 2 = 3" == "1 + 2 = ${1 + 2}"
true

Strings indexes can be accessed. The first character of a Blade string have an index of 0. The result of string indexes are characters.

For example:

%> 'Hello'[0]
'H'
%> 'Hello'[3]
'l'

Strings indexes can also be accessed with negative numbers. When using negative numbers to access string indexes, note that the indexes will be returned in reverse. i.e. we start counting from the far right where the first index will be -1 (since -0 is the same as 0).

For example:

%> 'Hello'[-1]
'o'
%> 'Hello'[-4]
'e'

Note that trying to access a non-existing index or an index out of the range of the length of the string will result in an error.

For example, the following code throws an exception.

%> 'Hello'[6]
Unhandled Exception: string index 6 out of range
  StackTrace:
    <repl>:1 -> @.script()

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain subparts of a string.

For example:

%> 'Blade'[0,3] # characters starting from index 0 to index 3 - 1 (2)
'Bla'
%> 'Blade'[2,5] # characters from index 2 to index 5 - 1 (4)
'ade'

The general syntax for slicing in Blade is [lower limit, upper limit]. Both lower limit and upper limit can be omitted. When the lower limit is omitted, it defaults to 0 and when the upper limit is omitted, it defaults to the length of the object e.g. the string length.

As with general indexing, the upper limit can also use negative numbers and follows the same rules as indexing with a negative number. A negative number in the lower limit will cause an empty object to be returned.

  1. Slices are lower limit inclusive and upper limit exclusive. For example, slice [0,3] will return a substring starting from index 0 (inclusive) to index 2 and index itself will be excluded.
  2. Index in[,i] + in[i,] is equal to the value of in.

For example:

%> 'Blade'[0,3]  # starting from index 0 to 2
'Bla'
%> 'Blade'[2,5]  # starting from index 2 to 4
'ade'
%> 'Blade'[,]   # starting from index 0 to the end
'Blade'
%> 'Blade'[,-3]   # starting from index 0 to string length - 3
'Bl'
%> 'Blade'[3,]  # starting from index 3 to the end
'de'
%> 'Blade'[-1,]  # negative index in lower limit returns an empty string
''
%> 'Blade'[,4]   # starting from index 0 to 3
'Blad'
%> 'Blade'[,3] + 'Blade'[3,]     # in[,i] + in[i,]
'Blade'

Blade strings are immutable. Hence, a string cannot be changed. Assigning to an indexed position in the string results in an error:

For example,

> 'Blade'[0] = 'J'
Unhandled Exception: strings do not support object assignment
  StackTrace:
    <repl>:1 -> @.script()

You may notice how we are trying to assign to a string object directly instead of a variable and think that's why it isn't working. That's not why! In blade, if string wasn't immutable (e.g. Lists aren't immutable), Blade will go ahead and do that assignment. The fact that you aren't storing that value anywhere is up to you. But it's neither a syntax nor runtime error to do so.

If you need to to modify a string, you need to create a new one. Don't worry, Blade is smart enough to know when you don't need a string anymore and will gracefully delete the string for memory when necessary.

If you have no previous experience with C style languages or don't know what methods are, you may which to proceed into the next topic of the tutorial and come back here after you've completed the Class tutorial.

String Methods

Blade string comes with a lot of powerful text processing capabilities wrapped in methods described below.

length()

Returns the length of a string. Note that this method is UTF-8 compartible and will return the UTF-8 length for the string if the string contains UTF-8 characters whether written directly or via the \u or \U escapes.

For example:

%> 'This is a pretty long string'.length()
28
%> 'ā¤‰ā¤¨ā¤•ā¤ž ā¤ā¤• ā¤¸ā¤Žā¤¯'.length()
11
%> 'This text mixes English and į˛ĩčĒž'.length()
30

upper()

Returns a copy of the string with all the cased characters converted to uppercase. Note that the result of this method may return false when tested with is_upper() of the string contains Unicode characters that are not case folded.

For example:

%> 'blade'.upper()
'BLADE'

lower()

Return a copy of the string with all the cased characters converted to lowercase.

For example:

%> 'Blade Is Bae'.lower()
'blade is bae'

is_alpha()

Returns true if all the characters in the string are all alphabeths and the string is not empty., otherwise returns false.

For example:

%> 'abracadabra'.is_alpha()
true
%> 'my tooth aches'.is_alpha()
false
%> ''.is_alpha()
false

is_alnum()

Returns true if all the characters in the string are either alphabeths or numbers and the string is not empty, otherwise returns false. This method is the same as string.is_alpha() or string.is_number().

For example:

%> '3Idiots'.is_alnum()
true
%> 'Three Idiots'.is_alnum()
false
%> '3 Idiots'.is_alnum()
false
%> '3'.is_alnum()
true
%> 'idiots'.is_alnum()
true
%> ''.is_alnum()
false

is_number()

Returns true if all the characters in the string are all digits and the string is not empty, otherwise returns false.

For example:

%> '123.5'.is_number()
false
%> '1970'.is_number()
true
%> '1980s'.is_number()
false

is_lower()

Returns true if at least one character in the string is cased, all cased characters are lower cased and the string is not empty. Otherwise, it returns false.

For example:

%> 'all'.is_lower()
true
%> 'all...123'.is_lower()
true
%> 'All...123'.is_lower()
false
%> ''.is_lower()
false

is_upper()

Returns true if at least one character in the string is cased, all cased characters are upper cased and the string is not empty. Otherwise, it returns false.

For example:

%> 'ALL'.is_upper()
true
%> 'ALL...123'.is_upper()
true
%> 'All...123'.is_upper()
false
%> ''.is_upper()
false

is_space()

Returns true if there are only whitespace characters in the string and the string is not empty. Otherwise, it returns empty.

For example:

%> '.     '.is_space()
false
%> '\r\n'.is_space()
true
%> '\t  '.is_space()
true

trim([chr: char])

Returns a copy of the string with the given character (chr) removed if it appears at the start or end of the string. If chr is not given, it defaults to a space (' '). All matching leading and trailing characters are removed until a character that doesn't match is encountered. If no match is found, a copy of the original string is returned.

The square brackets ([]) around the chr: char in the method definition indicates that the parameter is optional and does not mean you have to type the square brackets.

For example:

%> '  example  '.trim()
'example'
%> '  example  '.trim('e')
'  example  '
%> 'example'.trim('e')
'xampl'

ltrim([chr: char])

Similar to the trim() method, except that this method only removes characters at the begining of the string.

For example:

%> '  example  '.ltrim()
'example  '
%> 'example'.ltrim('e')
'xample'

rtrim([chr: char])

Similar to the trim() method, except that this method only removes characters at the end of the string.

For example:

%> '  example  '.rtrim()
'  example'
%> 'example'.rtrim('e')
'exampl'

join(iterable: string | list | dict)

Returns a stringwhich is a concatenation of the items in the iterable using the string as the separator. If the iterable contains just one item or the string is empty, the original element is returned. If the iterable contains non-string items, the items are converted to their string representation before joining.

Bytes are the only non supported iterables.

For example:

%> ','.join(['ok', 1, true])
'ok,1,true'
%> '--'.join('name')
'n--a--m--e'
%> ','.join('a')
'a'

split(delimiter: string | regex)

Returns a list of words or characters in a string after separating the content of the string at every point where the delimiter is found.

If the delimiter is an empty string, the resultant list will contain the individual characters of the string in the order in which they appear in the original string. Consecutive delimiters are not grouped together and are deemed to delimit empty strings. Splitting an empty string with a specified separator returns an empty list.

This method has full UTF-8 support.

For example:

%> 'name'.split('')
[n, a, m, e]
%> '1<>2<>3'.split('<>')
[1, , 2, , 3]
%> '1,2,3'.split(',')
[1, 2, 3]
%> ''.split(',')
[]
%> '地į‚š'.split('')
[地, į‚š]
%> 'who is in the garden'.split('/\s/')
[who, is, in, the, garden]

index_of(str: string [, start_index: number])

Returns the index position of the first occurrence of the string str in the string string. If the str cannot be found anywhere in string, it returns -1. If the start_index parameter is argument is given, it will start scanning from the given index.

For example:

%> 'hello, world'.index_of(' ')
6
%> 'hello, world'.index_of('e')
1
%> 'hello, world'.index_of('q')
-1
%> 'hello, world'.index_of('o')
4
%> 'hello, world'.index_of('o', 5)  # next index of `o` starting from index 5.
8

starts_with(str: string)

Returns true if the string begins with the string or character specified in str, otherwise it returns false.

For example:

%> 'hello, world'.starts_with('hello')
true
%> 'hello, world'.starts_with('hellios')
false

ends_with(str: string)

Returns true if the string ends with the string or character specified in str, otherwise it returns false.

For example:

%> 'gumtree'.ends_with('tree')
true
%> 'gumtree'.ends_with('mree')
false

count(str: string)

Returns the number of non-overlapping occurrences of the substring str in the string.

For those coming from Python who may consider this method similar to Python's own, this method differs in that it does not allow specifying a start and end region for the operation. Blade considers this unnessary as the same can be accomplished by slicing the string.

For example:

%> 'Hallelujah'.count('l')
3
%> 'ding dong'.count('ng')
2
%> 'ding dong'[2,7].count('ng') # setting region to search for counts - 'ng do'
1

to_number()

Returns the first numeric value contained in the string if any exists or 0 if the string contains no numeric value. Floating numbers that have the same value as their integer counterparts will return the integer value.

For example:

%> '123.0 hell'.to_number()
123
%> '427 and 12'.to_number()
427
%> '96.3 of 31'.to_number()
96.3
%> 'error'.to_number()
0

to_list()

Returns a list whose elements consists of every character contained in the string in order of appearance. Characters that repeat in the string will have different entries in the same index as they appear in the string.

For example:

%> 'Blade'.to_list()
[B, l, a, d, e]
%> 'Plantation'.to_list()
[P, l, a, n, t, a, t, i, o, n]

to_bytes()

Returns the content of the string as a stream of bytes.

The Blade REPL may trunctuate long bytes data when printing to console/terminal.

For example:

%> 'Blade'.to_bytes()
(42 6c 61 64 65)
%> 'Plantation'.to_bytes()
(50 6c 61 6e 74 61 74 69 6f 6e)

lpad(width: number [, fill: char])

Returns the string left justified in a string of length width. Padding is done using the specified character fill if given of a space (' ') if a fill is not specified. The original string is returned if width is less than string.length().

For example:

%> 'cat'.lpad(5)
'  cat'
%> 'cat'.lpad(5, '-')
'--cat'
%> 'cat'.lpad(2, '-')
'cat'

rpad(width: number [, fill: char]))

Returns the string right justified in a string of length width. Padding is done using the specified character fill if given of a space (' ') if a fill is not specified. The original string is returned if width is less than string.length().

For example:

%> 'Hmm'.rpad(6)
'Hmm   '
%> 'Hmm'.rpad(6, '.')
'Hmm...'
%> 'Hmm'.rpad(3, '.')
'Hmm'

match(str: string | regex [, offset: number = 0])

If the string str is a regular string, this method returns true if the string contains a substring str. Otherwise, it returns false.

If the string str contains a valid regular expression (we'll get to that shortly below), it returns false if a match for the regex str cannot be found in the string. Otherwise, it returns a dictionary containing all first matching substring.

If the offset argument is specified, it becomes the offset in the string at which to start matching.

For example:

%> 'gorilla'.match('go')      # regular string match
true
%> 'gorilla'.match('gox')     # regular string non-match
false
%> 'gorilla'.match('/gox?/')  # regular expression match
{0: go}
%> 'gorilla'.match('/gox\d/') # regular expression non-match
false

matches(reg: regex [, offset: number = 0])

Returns a dictionary containing every match of the given regular expression reg in the source string. If no match is found, an empty dictionary is returned.

If the offset argument is specified, it becomes the offset in the string at which to start matching.

For example:

%> '123 dollars'.matches('/[a-z]+|\d+/')
{0: [123, dollars]}
%> 'who is in the garden'.matches('/\w+/')
{0: [who, is, in, the, garden]}

replace(str: regex | string, replacement: string [, use_regex: bool = true])

Returns a copy of the string with all occurrences or matches of str replaced by the replacement string.

In the replacement string, if str is a regular expression, then capture groups can be referenced using the syntax $index. Taking as an example, capture group 0 contains the entire match and can be used in the replacement string as $0.

To escape the $ sign in the replacement string, use the double backslashed (\\).

For example:

%> 'lady friend'.replace('d', 'z')  # non-regex
'lazy frienz'
%> 'John is 26 years old'.replace('/(\d+)/', '1$1') # regex example
'John is 126 years old'
%> 'John is 26 years old'.replace('/(\d+)/', '1\\$2')
'John is 1$2 years old'

When the third parameter use_regex is set to false, str will never be treated as a regular expression even if it contains a valid regular expression.

Apart from the above listed methods, String also implements the Iterable Decorators which we'll talk about in details under the Class lesson.

Regular Expressions

Regular expressions in Blade are simply special patterns expressed in a string following a few guidlines that allow them to be distinguished by methods requiring them. We'll be using the term regex or regexes henceforth for the rest of this tutorial and most likely for the rest of the documentation.

Blade's regex is built on-top the PCRE2 library, an excellent library that already powers regular expression in many programming languages and have been around for decades. It feels like a better choice for now for Blade to depend on this library rather than invest years building one robust enough to match the library's capabilites.

In simple words, Blade's regex is PCRE compatible.

To create a valid regex in Blade, your regex pattern must be surrounded by identical non-word characters.

For example, /\d+/. Note here how we surround our pattern \d+ with forward slashes (/).

This tutorial will not attempt to teach regular expressions as there are many wonderful texts already written on that topic as well as many online tools for learning them in greater depths than we can cover in this tutorial. Majority of them based on the same engine we are using. So here's one of themfor your reference.

Most languages support different modifiers for regular expressions, and Blade has some too. Modifiers are placed after a valid regex to control how the pattern is executed by the language. For example, in Blade, the pattern /[a-z]/i is a pattern modified with the i modifier telling the interpreter to make sure the matching is done case insensitive.

The following table lists Blade modifiers.

Modifier Definition
i Case insensitive matching
m Multi-line. This mode cases ^ and $ to match newlines.
s Dot (.) matches all
x Extended matching (Ignore whitespace and # comments)
A Force pattern anchoring
D Do not match newline at the end. In this mode, $ will be the only valid line terminator.
U Ungreedy match.
u Treat pattern and subjects as UTF strings and use Unicode properties for \d, \w, etc.
J Allow duplicate names for subpatterns

Modifiers can be joined together to form a more powerful modification instruction. For example, you can perform a multi-line and case-insensitive modification for our former sample as /[a-z]/mi.

For example:

%> 'The side bar includes a Cheatsheet'.matches('/([A-Z])\w+/')
{0: [The, Cheatsheet], 1: [T, C]}

Or the same query with modifiers,

%> 'The side bar includes a Cheatsheet'.matches('/([A-Z])\w+/sim')
{0: [The, side, bar, includes, Cheatsheet], 1: [T, s, b, i, C]}



Previous Topic | Next Topic