Right now I am thinking more of "limitations of the x86 Processors", but this could
be a good place to discuss specific features as well.
In working on an algorythm for an enhanced INSTR() function, one that would
support case insensitive matching, I found it very hard to achieve a real efficiency in testing because it always requires several steps, and each step takes up a finite amount of time. The results are strongly influenced by the data to which it is being applied.
This is because the basic architecture of the x86 family does not have any
instructions optimized for case insensitive operations, though it does have some
instructions specifically designed to work with strings. If I wrote a sub to assist
in the testing, it would likely have a low limit (say the value of "A") in the lower
byte of the AX register, and a high limit (the value of "Z") in the upper byte of
the AX register. What I would want to know is if some unknown byte resides
within that range, or outside that range, which would make it a capital letter if
true. If true, I could then force it to lower case with an OR 32 instruction
before attempting to match it to a character set that was alreacy set to lower
case.
Now the problem with case insensitive matching, and with string operations in
general, is that they presume the standard A to Z alphabet and a limited range
of additional symbols, and does not support Unicode. Fact is, Unicode is still
evolving, so who knows where it will eventually lead.
I don't see any way to address the unknown as to the future of Unicode, but
range fixing within a character set would be a very useful tool, if it were a part
of the instruction repertoire of the x86. To emulate this functionality in terms
of individual steps would be laborous and time consuming.
A solution for a powerful INSTR, with unicode support, is the use of the Microsoft VBScript regular expressions engine, available to PB through COM.
It allows to set the search pattern, the scope and an ignore case flag, and returns a collection of matches.
Each Match object represents an occurrence of the pattern, and exposes properties such as Value (the occurrence found), Length (number of characters in the occurrence), and FirstIndex (the position of the occurrence in the source string).
That's likely welcomed news, José. Perhaps you have a working example of how to
do this. so that people can benefit from your knowledge and expereience?
Here is quick and dirty one. Better performance will be obtained using direct interface calls instead of Automation and a collection's enumerator instead of the Item property. But it illustrates the idea:
' SED_PBWIN
#COMPILE EXE
#DIM ALL
#INCLUDE "WIN32API.INC"
$PROGID_VBScriptRegExp = "VBScript.RegExp"
INTERFACE DISPATCH VBScriptRegExp
MEMBER GET Pattern<&H00002711>() AS STRING
MEMBER LET Pattern<&H00002711>() ' Parameter Type AS STRING
MEMBER GET IgnoreCase<&H00002712>() AS INTEGER
MEMBER LET IgnoreCase<&H00002712>() ' Parameter Type AS INTEGER
MEMBER GET Global<&H00002713>() AS INTEGER
MEMBER LET Global<&H00002713>() ' Parameter Type AS INTEGER
MEMBER GET Multiline<&H00002717>() AS INTEGER
MEMBER LET Multiline<&H00002717>() ' Parameter Type AS INTEGER
MEMBER CALL Execute<&H00002714>(IN sourceString AS STRING<&H00000000>) AS VARIANT
MEMBER CALL Test<&H00002715>(IN sourceString AS STRING<&H00000000>) AS INTEGER
MEMBER CALL Replace<&H00002716>(IN sourceString AS STRING<&H00000000>, _
IN replaceVar AS VARIANT<&H00000001>) AS STRING
END INTERFACE
INTERFACE DISPATCH VBScriptMatch
MEMBER GET Value<&H00000000>() AS STRING
MEMBER GET FirstIndex<&H00002711>() AS LONG
MEMBER GET Length<&H00002712>() AS LONG
MEMBER GET SubMatches<&H00002713>() AS VARIANT
END INTERFACE
INTERFACE DISPATCH VBScriptMatchCollection
MEMBER GET Item<&H00000000>(IN index AS LONG<&H00000000>) AS VARIANT
MEMBER GET Count<&H00000001>() AS LONG
END INTERFACE
INTERFACE DISPATCH VBScriptSubMatches
MEMBER GET Item<&H00000000>(IN index AS LONG<&H00000000>) AS VARIANT
MEMBER GET Count<&H00000001>() AS LONG
END INTERFACE
FUNCTION VBInstr (vText AS VARIANT, vPattern AS VARIANT) AS STRING
LOCAL vRes AS VARIANT
LOCAL i AS LONG
LOCAL nCount AS LONG
LOCAL vItem AS VARIANT
LOCAL vIdx AS VARIANT
LOCAL vTRUE AS VARIANT
LOCAL vFALSE AS VARIANT
LOCAL strOutput AS STRING
LOCAL oMatch AS VBScriptMatch
LOCAL oRegEx AS VBScriptRegExp
LOCAL oMatches AS VBScriptMatchCollection
vTRUE= -1
vFALSE = 0
oRegEx = NEW VBScriptRegExp IN "VBScript.RegExp"
OBJECT LET oRegEx.Pattern = vPattern
OBJECT LET oRegEx.Global = vTRUE
OBJECT LET oRegEx.IgnoreCase = vTRUE
OBJECT LET oRegEx.MultiLine = vTRUE
OBJECT CALL oRegEx.Execute(vText) TO vRes
oMatches = vRes
vRes = EMPTY
OBJECT GET oMatches.Count TO vRes
nCount = VARIANT#(vRes)
FOR i = 0 TO nCount - 1
vIdx = i AS LONG
OBJECT GET oMatches.Item(vIdx) TO vItem
IF VARIANT#(vItem) <> %NULL THEN
oMatch = vItem
vItem = EMPTY
OBJECT GET oMatch.Value TO vRes
strOutput = strOutput & "Found " & VARIANT$(vRes)
OBJECT GET oMatch.FirstIndex TO vRes
strOutput = strOutput & " at index " & FORMAT$(VARIANT#(vRes)) & $CRLF
oMatch = NOTHING
END IF
NEXT
oRegEx = NOTHING
oMatches = NOTHING
oMatch = NOTHING
FUNCTION = strOutput
END FUNCTION
FUNCTION PBMAIN
LOCAL vText AS VARIANT
LOCAL vPattern AS VARIANT
LOCAL strOutput AS STRING
vText = "blah blah a234 blah blah x345 blah blah"
vPattern = "[A-Z][0-9][0-9][0-9]"
strOutput = VBInstr(vText, vPattern)
MSGBOX strOutput
END FUNCTION
Quote>the use of the Microsoft VBScript regular expressions engine
I am sure, its always available under Vista.
Are there OS-Conditions, which may need additional Updates before this Call can be used?
Will it run - for example - on any W2K SP1 or will it need updates first?
What about Windows NT?
Win 95 and 98 are out of interest.
Windows 2000 ships with version 5.1. If you don't update, you will be able to run the above code, but you won't be able to use the new methods added to the RegExpr2 interface and the SubMatches collection.
Latest version is 5.6. You can download it at:
http://www.microsoft.com/downloads/details.aspx?familyid=C717D943-7E4B-4622-86EB-95A22B832CAA&displaylang=en
Thanks for the link and the Info, Jose.
My App shall work under W2k "as-is" without the need of getting Internet-Updates first.
That was the reason for the question.
Now I know, that even under this condition, your code can be used.