Because a string is just a view, assigning one string to another does not copy the underlying bytes. It only copies the data pointer and count. This means string assignment is always cheap.
x: string;
x = "Hello"; // x.data points at "Hello", x.count is 5
x = "Sailor"; // x.data now points at "Sailor", x.count is 6
// No memory was copied in either assignment.
Subscripting and Manipulation
Strings can be subscripted just like array views. The result is a u8, that is, a number, not some special character type. There is no dedicated character type in Jai.
x := "Sailor";
print("x[0] is: %\n", x[0]); // prints 83, the byte value of 'S'
Strings are bounds-checked at runtime. Accessing an index outside the range 0 to count - 1 will produce a runtime error.
Because a string is a view, you can take substrings without allocating or copying data. You just manipulate count and data directly:
x := "Sailor";
x.count = 4;
print("x is: '%'\n", x); // "Sail"
x.count -= 1;
x.data += 1;
print("x is: '%'\n", x); // "ail"
Strings Are Not Zero-Terminated
Unlike C, strings in Jai are not zero-terminated. The length is always known from count, and taking substrings does not require allocating new memory or appending a null byte. This is one of the primary advantages of the array view representation.
However, the compiler does place a zero byte after every string literal in the binary. This means that if you have a compile-time constant string, you can safely pass its .data pointer to a C function that expects a zero-terminated *u8. The compiler also allows string constants to implicitly cast to *u8 for this reason.
// Suppose we have a C-style function:
strlen :: (s: *u8) -> s64 #foreign crt;
// String constants implicitly cast to *u8:
len := strlen("Hello"); // This works, "Hello" is zero-terminated.
// But a non-constant string will NOT implicitly cast to *u8:
x := "Hello";
// len = strlen(x); // Compile-time error!
// If you need to pass a non-constant string to C, you must
// either copy it and add a zero, or use a helper like to_c_string.
Encoding
Jai does not enforce any particular encoding on strings. Each element is a u8, and the language assumes that most programs will use UTF-8 most of the time. In UTF-8, the common English alphabet and punctuation fit in a single byte, while other characters use multiple bytes of varying width. This means you cannot assume that count equals the number of visible characters, only the number of bytes.
The default for loop iterates one byte at a time. If you need to iterate one UTF-8 character at a time, you can use an iterator from the Unicode module:
#import "Unicode";
LYRICS :: "しかし、魚の質を考えれば";
for :utf8_iter LYRICS {
print("index [%] has utf8 value %\n", it_index, it);
}
#char
#char
#char is a compile-time directive that converts a single-character string literal
into its ASCII byte value. The result is a u8.
a := #char "W"; // a is 87
b := #char " "; // b is 32
c := #char "A"; // c is 65
print("type_of(#char \"W\") is: %\n", type_of(a)); // u8
This is primarily useful when working with APIs that deal in raw byte values rather than strings for example, checking which key was pressed, or scanning through a string byte by byte:
s := "Hello, World!";
for s {
if it == #char "," {
print("comma at index %\n", it_index);
}
}
The string argument to #char must be exactly one character long. Passing a multi-character string is a compile-time error.
Building Strings Dynamically
Because a string is just a view, it does not provide any method for concatenation or growing. If you see a string being assigned, you know that nothing expensive is happening. To build strings dynamically, Jai provides two main approaches.
sprint
sprint works just like print, but instead of writing to the console, it allocates and returns a new string with the formatted result:
n := 69105;
message := "For sure!!";
s := sprint("There are % leaves in the pile. %", n, message);
print("s is: '%'\n", s);
If your string doesn't need to exist for a long time, then consider using tprint to make a string allocated on temporary storage rather than the heap.
String_Builder
For heavier-duty tasks, such as building a string incrementally from many pieces, use String_Builder. It provides a buffered way of accumulating string data, and you extract the final result when you are done:
builder: String_Builder;
init_string_builder(*builder);
append(*builder, "One!");
append(*builder, "Two!");
append(*builder, "Three!");
// You can also use formatted output:
print_to_builder(*builder, " ... number %, exclamation %.", 42, "wow");
result := builder_to_string(*builder);
print("result is: '%'\n", result);
This is the idiomatic way to concatenate strings in Jai. Instead of writing a + b as you might in other languages, you append both strings to a builder and then extract the result. This makes the allocation explicit and predictable.
Escape Sequences
Inside a regular string literal, the backslash character introduces an escape sequence:
| Sequence | Meaning |
|---|---|
\e | Escape |
\n | Newline |
\r | Carriage Return |
\t | Tab |
\" | The character " |
\\ | The character \ |
\0 | The byte with value 0 |
\xAB | The byte with hexadecimal value AB |
\d123 | The byte with decimal value 123 (max 255) |
\uABCD | The 16-bit Unicode character U+ABCD, encoded as UTF-8 |
\UABCDEF12 | The 32-bit Unicode character U+ABCDEF12, encoded as UTF-8 |
Here Strings
When a string contains many backslashes or quotes, escape sequences become difficult to read. Jai provides #string for this situation. You write #string followed by a delimiter identifier, then all text on subsequent lines is included verbatim in the string until a line starting with the delimiter is encountered:
MY_STRING :: #string DONE
This string has "quotes" and \backslashes\ with no escaping needed.
It can span multiple lines.
DONE
The contents are completely verbatim, including leading whitespace. Line endings are normalized to \n regardless of your source file's line ending style. If you want \r\n line endings instead, use #string,cr.
Note that a #string always includes a trailing newline (from the last line before the delimiter). If you need to remove it, you can do so in code:
s := MY_STRING;
s.count -= 1; // Remove trailing newline.
Type Information
x := "Hello";
print("type_of(x) is: %\n", type_of(x)); // string
print("size_of(type_of(x)) is: %\n", size_of(type_of(x))); // 16 (8 for count + 8 for data pointer)
print("type_of(x[0]) is: %\n", type_of(x[0])); // u8
print("type_of(x.count) is: %\n", type_of(x.count)); // s64
print("type_of(x.data) is: %\n", type_of(x.data)); // *u8