uCalc String Library
uCalc String Library consists of a set of string-handling routines with names that will seem very familiar, especially to those who use either .NET's Microsoft.VisuaBasic.Strings module, VB 6, or PowerBASIC, but with support for uCalc Patterns, which infuses each routine with advanced parsing capabilities.
If you are familiar with such routines as InStr, Mid(), Left(), Right(), etc, you already know most of what you need to know to get started. This documentation focuses mainly on the additional functionality that uCalc brings to the table. The general description of each routine is similar to what you may be used to. The main differences are that text can be accessed or manipulated using special patterns, instead of just plain strings; and numeric arguments can be measured either in terms of characters, tokens, blocks, or expressions; and all routines have an additional Options parameter, which lets you select such things as whether the function call should be whitespace-sensitive and case-sensitive. By default case-sensitivity and white-space sensitivity is off, and tokens (not characters) are the unit of measurement, except where indicated otherwise. Some routines might be more familiar particularly to PowerBASIC users. As is common in BASIC dialects, character positions start at 1 (not 0).
For details about uCalc Patterns, visit http://www.ucalc.com/docs/Patterns.htm
Note: uCalc String Library routines are preceded by "uc" when called from your source code. However, when called from uCalc Console Calculator or uCalc Transform, the routines are not preceded by "uc". So for instance you would use ucInStr() if you are calling it from your source code, whereas you'd call InStr() if you are using uCalc Transform. With uCalc Fast Math Parser your source code would use ucInStr(), whereas your end-user expressions would use InStr(). In order for your uCalcFMP expressions to use the String Library, first place a call to ucLoadStringLibInFMP()
New: ucDelim, ucSplit, ucEquals, ucRange
Returns the location of text within MainString that matches Pattern.
Start - Starting character position from where to begin the search
MainString - String within which to search for Pattern
Pattern - String or pattern to find within MainString
Example: InStr with
default setting
ucInStr(1, MainString, "a test")
THIS IS A
TEST, "This is a test", (a test), a
test, a test.
^
Returns 9
Explanation:
With the default setting, "a
test"
(third argument) is a pattern and not a literal string. This pattern consists of the tokens
"a" followed by "test".
Spacing between the two does not matter, nor does casing (upper
case/lower case) by default.
Example: InStr with
special patterns
ucInStr(1, MainString, "({whatever})")
THIS IS A TEST, "This is a test", (a
test), a test, a test.
^
Returns 37
ucInStr(1, MainString, "is {etc:2}")
THIS IS A TEST, "This is a test",
(a test), a test, a test.
^
Returns 6
Explanation: In the first part, it matches the first occurrence of a pair of parenthesis with some text inside. {whatever} is a pattern variable. The name in this example was arbitrarily selected; instead of {whatever}, it could have been named {MyPattern}, or {item} just the same. The second pattern matches the token "is" followed by the next two tokens. {etc} is another pattern variable. The ":2" in {etc} tells it to pick up the next two tokens. See http://www.ucalc.com/docs/Patterns.htm for more details on uCalc Patterns.
Example: InStr with ucChar property
ucInStr(1, MainString, "a test", ucChar)
THIS IS A TEST, "This is a test", (a test), a
test, a test.
^
Returns 28
Explanation:
ucChar
makes it search character by character (the way traditional VB's InStr() function would do it), instead of token by token
(which is the default in uCalc). This is
similar to not using uCalc Patterns.
Example: InStr with ucCase
(case-sensitive) property
ucInStr(1, MainString, "a test", ucCase)
THIS IS A TEST, "This is a test", (a
test), a test, a test.
^
Returns 38
Explanation:
Here we introduce ucCase,
which makes the search case sensitive.
It skips over "A TEST", since it is all caps. Note that it also skips over the second
literal occurrence, because it is within quotes. By default comparisons are done token by
token, and "This is a
test"
(including surrounding quotes) represents one token, so it does not match the
two separate tokens: "a" followed by "test". Likewise, ucInStr(1,
MainString, "is") would match the
"IS" token following "THIS", and not the "IS"
that occurs within the word "THIS".
Example: InStr with ucSpace (whitespace-sensitive) property
ucInStr(1, MainString, "a test", ucSpace)
THIS IS A TEST, "This is a test", (a test), a
test, a test.
^
Returns 57
Explanation:
Here the pattern is whitespace-sensitive.
The given pattern has exactly one space, so it skips over occurrences
where there is more than one space between "a" and "test".
Example: InStr with ucBlock property
ucInStr(1, MainString, "a test", ucBlock)
THIS IS A TEST, "This is a test", (a test), a test, a
test.
^
Returns 48
ucInStr(1, MainString, "(a test)", ucBlock)
THIS IS A TEST, "This is a test", (a
test), a test, a test.
^
Returns 37
Explanation:
So far, we've searched character by character, or token by token. Here, we are searching block by block. A block consists of either one token, or if
the token was defined with the block property, the block starts with that token
and ends with the closing token associated with it. By default the following block token pairs
are defined ( and ), {
and }, and ( and ). The first part of the example skips over
"(a test)", and matches the
following occurrence because (a test),
including the parenthesis, represents one block unit, and it is not equal to
"a test". In the second example "(a test)" as a block matches the given
pattern "(a test)".
Example: InStr with combined properties
ucInStr(1, MainString, "testing 123")
TESTING 123, testing
123, TESTING 123, "testing 123", (testing 123), testing 123.
^
Returns 1
ucInStr(1, MainString, "testing 123", ucCase)
TESTING 123, testing 123,
TESTING 123, "testing 123", (testing 123), testing 123.
^
Returns 16
ucInStr(1, MainString, "testing 123", ucChar)
TESTING 123, testing 123, TESTING 123, "testing 123", (testing 123),
testing 123.
^
Returns 31
ucInStr(1, MainString, "testing 123", ucCase+ucChar)
TESTING 123, testing 123, TESTING 123, "testing 123", (testing
123), testing 123.
^
Returns 47
ucInStr(1, MainString, "(a test)", ucCase+ucSpace)
TESTING 123, testing 123, TESTING 123, "testing 123", (testing 123), testing 123.
^
Returns 60
ucInStr(1, MainString, "(a test)", ucCase+ucSpace+ucBlock)
TESTING 123, testing 123, TESTING 123, "testing 123",
(testing 123), testing 123.
^
Returns
74
Note:
Not all combinations will produce a logically meaningful result. For instance ucChar
cannot be combined with ucToken or ucBlock, because a character search is exclusive of token
patterns.
Returns the left-most n characters, tokens, blocks, or expressions in MainString.
Example: Counting tokens with ucLeft()
(which is the default, and not characters)
ucLeft(MainString, 5)
This "is a test"
using (the quick) brown fox, and more.
Returns: [the text that's highlighted]
Explanation:
By default functions from this library count tokens (not characters). The 5 left-most tokens are:
1. This
2. "
3. Is
4. A
5.
Test
This
example is not spacing-sensitive.
Example: Counting tokens with ucLeft()
(which is the default, and not characters)
ucLeft(MainString, 5, ucQuote)
This "is a test" using (the
quick) brown fox, and more.
Returns: [the text that's highlighted]
Explanation:
By default functions from this library count tokens (not characters). The 5 left-most tokens are:
1. This
2. "is a test"
3. using
4.
(
5.
the
This example is not spacing-sensitive. When ucQuote is on,
text within quotes (including the quotes) count as one token.
Example: Counting characters with ucLeft()
ucLeft(MainString, 5, ucChar)
This͘"is a
test" using (the quick) brown fox,
and more.
Returns:
"This " (without quotes, but including space)
Explanation:
The 5 left-most characters are:
1. T
2. h
3. i
4.
s
5.
[space]
Example: Counting blocks with ucLeft()
ucLeft(MainString, 5, ucQuote+ucBlock)
This "is a test" using (the quick) brown fox, and more.
Returns: [the text that's highlighted]
Explanation:
The 5 left-most blocks are:
1. This
2. "is a test"
3. using
4.
(the
quick)
5.
brown
Example: Counting tokens, space-sensitive with ucLeft()
ucLeft(MainString, 5, ucQuote+ucSpace)
This "is a test" using (the
quick) brown fox, and more.
Returns: [the text that's highlighted]
Explanation:
The 5 left-most whitespace-sensitive tokens are:
1. This
2. [space]
3. "is a test"
4.
[space]
5.
using
Returns the right-most n characters, tokens, blocks, or expressions in MainString.
The rules for ucRight are very similar to those for ucLeft. One important difference to consider (especially if speed is a consideration) is that unlike ucLeft, which parses only until it reaches the number of characters or tokens specified, ucRight always parses the entire MainString text before returning the right-most n characters/tokens.
Examples: See ucLeft for the concept of counting by tokens, and blocks, etc.
Returns the length of a string in terms of characters, tokens, blocks, or expressions.
Example: counting tokens
ucLen(MainString, ucQuote)
This "is a
test" using (the quick) brown fox,
and more.
Returns: 13
Explanation:
The counted tokens are:
1.
This
2.
"is
a test"
3.
Using
4.
(
5.
The
6.
Quick
7.
)
8.
Brown
9.
Fox
10.
,
11.
And
12.
More
13.
.
Example: Counting characters with ucLen
ucLen(MainString, ucChar)
This͘"is a test" using (the quick) brown fox, and more.
Returns: 56
Explanation:
This count each character much the same way VB's Len() function would.
Example: counting space-sensitive tokens with ucLen
ucLen(MainString, ucQuote+ucSpace)
This͘"is a test" using (the quick) brown fox, and more.
Returns: 21
Explanation:
The counting here is similar to the first ucLen
example except that each block of whitespace also counts as one unit. The space before "using" for instance
counts as one, and the two consecutive spaces after "using" also
count as one.
Example: Counting blocks with ucLen
ucLen(MainString, ucQuote+ucBlock)
This͘"is a test" using (the quick) brown fox, and more.
Returns: 10
Explanation:
The text blocks that are counted are:
1.
This͘
2.
"is
a test"
3.
using
4.
(the
quick)
5.
brown
6.
fox
7.
,
8.
and
9.
more
10.
.
Returns an uppercase version of MainString.
Pattern – ucUCase without the optional arguments behaves the same way as VB's UCase() function. If the optional pattern argument is used, then only text within MainString that matches Pattern is changed to uppercase.
Example:
MainString
value:
This "is a
test" using (the quick) brown
"fox", (and) more.
ucUCase(MainString, "({text})")
Returns:
This "is a
test" using (THE QUICK) brown "fox", (AND) more.
ucUCase(MainString, "{Q}{text}{Q}")
Returns:
This
"IS A TEST"
using (the quick) brown "FOX", (and) more.
ucUCase(MainString, "", ucSkipOver("({item})")
Returns:
THIS "IS A TEST" USING (the
quick) BROWN "FOX",
(and) MORE.
Explanation: Only text
matching the pattern within the given string was changed to uppercase. SkipOver() causes
it to match everything except the pattern passed to SkipOver.
Returns a lower case version of MainString.
See ucUCase.
Returns a mixed case version of MainString.
See ucUCase.
Returns a subset of a string.
Start – Starting position (counted by character, token, block, or expression)
Count – Length (in terms of characters, tokens, blocks, or expressions) of substring that should be returned.
Note: In this version, the second arg is required for it to work; use -1 to make it go to the end. [to be fixed.]
Example:
ucMid(MainString, 6)
THIS IS A TEST, "This is a test", (a test), a
test, a test.
^ ^
^ ^ ^ ^
1 2
3 4 5 6
Returns
"This is a
test", (a test), a test, a test.
ucMid(MainString, 6, 4, ucQuote)
THIS IS A TEST, "This is a test", (a test), a
test, a test.
^ ^
^ ^ ^ ^
1 2
3 4 5 6
Returns
"This is a
test", (a
The 4 tokens
counted are
1. "This is a test"
2. ,
3. (
4. a
ucMid(MainString, 6, 4, ucChar)
THIS IS A TEST, "This is a test", (a test), a
test, a test.
^^^^^^
123456
Replaces all occurrences of one string pattern within MainString with another string.
Pattern – Pattern of text to find within MainString
Replacement – Replacement string for text that matches the Pattern argument. This string can contain pattern variables from the Pattern argument.
Example: ucReplace with
default setting
ucReplace(MainString, "is {tokens:2}", "was {tokens}?")
THIS IS A TEST, This is a test, (is a test).
Returns
THIS was A TEST? This was a test?, (was a test?).
ucReplace(MainString, "is {tokens:2}", "was {tokens}?", ucSkipOver("This {etc},"))
THIS IS A TEST, This is a test, (is a test).
Returns
THIS IS A TEST, This is a test, (was a test?).
Counts the number of occurrences of text matching a given pattern within a string. The count can be based on characters, tokens, blocks, or expressions. You can select whether or not to make the result whitespace-sensitive.
Examples:
ucTally(MainString, "is")
THIS IS A TEST, "This is a test", (is a test).
Returns: 3
ucTally(MainString, "is", ucQuote)
THIS IS A TEST, "This is a test", (is a test).
Returns: 2
ucTally(MainString, "is", ucChar)
THIS IS A
TEST, "This
is a test", (is a test).
Returns: 5
Extracts characters, tokens, blocks, or expressions up to a given pattern in a string. Similar to PowerBASIC's Extract, except that that one extracts only characters.
Example:
ucExtract(MainString, "<{tag}>{word:2}</{tag}>")
This is <i>Test<br></i>. This is
<b>another test</b>. Testing 123.
Returns: This is <i>Test<br></i>. This is
Explanation:
Returns text leading up to the first occurrence of text matching a pattern
consisting of an HTML tag pair, containing exactly two words (tokens). It skips over <i>Test</i> which
has 4 tokens ("Test", "<", "br",
and ">"), and continues up until the second tag pair.
Returns text following a given pattern in a string. Similar to PowerBASIC's Remain, except that that one only deals with characters.
Example:
ucRemain(MainString, "<{tag}>{anything}</{tag}>")
This is <i>Test<br></i>. This is <b>another test</b>.
Testing 123.
Returns: . This is
<b>another test</b>. Testing 123.
Explanation:
This returns text following (but not including) text that matches the pattern.
Returns a string consisting of only text that matches a given pattern. Similar to PowerBASIC's Retain() function except that that one deals only with characters.
Example:
ucRetain(ucFile("\Test\ucFile\cdcatalog.xml"), "<artist>{etc}</artist>", ucNth(22))
Returns: <artist>Luciano Pavarotti</artist>
Explanation: This example is meant to work with
cdcatalog.xml, downloaded from http://www.w3schools.com/xml/cd_catalog.xml
. ucFile() is
a quick way to obtain all the text from one file. ucNth(22)
tells it to retain only the 22th occurrence of the pattern of text between
<artist> and </artist>.
Returns a string consisting of a computed range of values. The variable x is the default iterator.
Example:
ucRange(65, 67, "Chr(x)")
Returns: ABC
ucRange (65, 67, "Chr(x)+Chr(x+32)", ucDelim(", ", "{", "}"))
Returns: {Aa, Bb, Cc}
Explanation: The expression in quotes is evaluated with values
between 65 and 67. A concatenation of
the ASCII Chr() codes for each is returned. In the second example, a upper and lower case
pairs are computed, and they are separated with a delimiter defined by ucDelim. The first
argument of ucDelim is the separator. The optional second and third arguments are
the opening and closing string that will surround the computed range.
Compares two strings. This powerful function compares two strings, not character by character (unless you use ucChar), but by pattern. Just as with other library routines, you can toggle space, quote, block, and case sensitivity, and you can also conveniently use ucSkip() to have it disregard certain parts when doing the operation (string comparison in this case).
Example:
ucEquals("THIS is a test", "this is a test") ' True
ucEquals("THIS, is a <b>test</b>?", "This is a test.") ' False
ucEquals("THIS, is a <b>test</b>?", "This is a test.", ucSkip("{ , | ? | . | <{tag}> }")) ' True
All library routines allow a last argument that configures properties such as case sensitivity, delimiters, etc. If you will generally be using the same combination of properties, instead of passing it to each call, you can configure it before placing calls. Multiple properties are combined using the "+" operator.
Examples:
ucSetStringDefaults(ucSkip("<{tag}>"))
ucEquals("This <b>is</b> a test", "This is a TEST") ' True
ucEquals("This <b>is</b> a test", "This <i>is</i> a test") ' True
ucEquals("This <b>is</b> a test", "This is a test!") ' False
ucSetStringDefaults(ucCase+ucBlock+ucQuote)
ucLen("This (is a) ""test of tests"" ") ' Returns 3
ucLeft("This (is a) ""test of tests"" ", 2) ' Returns "This (is a)"
ucEqual("THIS IS A TEST", "This is a test") ' False
' In uCalc ConsoleCalculator
C:\> uCalc
.> SetStringDefaults(ucChar)
.> Len("This is a test")
. 14
.> Left("This is a test", 3)
. Thi
Other
functions, which can be used in the Options argument here (or with all the
other routines):
ucStart(Position)
ucLimit(Limit)
ucStartAfter(number)
ucStopAfter(number)
ucBetween(a, b)
ucNth(number)
ucPos(x)
ucLength(x)
ucText(x)
ucDelim(separator [, OpeningString, ClosingString])
ucSplit
This
splits a text into parts. By default it
splits the text into tokens. But you can
use some of the same properties described above, or you can split based on a
pattern. Once the text is split, you can
retrieve the parts with ucStringItem(n), where n is a
value between 1 and the number of parts, which is ucStringItemCount().
Example:
ucSplit("This (is a) 'test'.")
Print
ucStringItemCount() ' This would return 9
Print
ucStringItem(1)
' "This"
Print
ucStringItem(2)
' ("
Print
ucStringItem(3)
' is
Print
ucStringItem(4)
' a
Etc.
ucSplit("This<a> is<b> a<x>test<yz>.", "<{tag}>")
Here
<{tag}> is the delimiter. The
values returned by ucStringItem() would be:
This
Is
A
Test