• Welcome to Jose's Read Only Forum 2023.
 

FreeBASIC CWstr

Started by Juergen Kuehlwein, April 09, 2018, 11:39:00 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

José Roca

#45
Quote
1.), i don´t understand why you first fill a CWstr with spaces and overwrite these spaces with the pad character?

Because the FB String function doesn't work with unicode.

Quote
2.) why would you suppress a null string as pad character? If i pad with a null string then it shouldn´t be padded at all - makes sense to me.

Because a CWSTR is null terminated. Padding it with nulls will truncate it when it is cast to a WSTRING.

Quote
3.) and if the string to pad (e.g. 11 characters) is larger than the resulting string (e.g 10 characters), only pad characters are returned with your code.

This is a bug. I will use FB's RSET to get the same behavior.


' ========================================================================================
PRIVATE FUNCTION AfxStrRSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   DiM cwsPadChar AS CWSTR = wszPadCharacter
   IF cwsPadChar = "" THEN cwsPadChar = " "
   cwsPadChar = LEFT(cwsPadChar, 1)
   DIM cws AS CWSTR = SPACE(nStringLength)
   FOR i AS LONG = 1 TO LEN(cws)
      MID(**cws, i, 1) = cwsPadChar
   NEXT
   RSET **cws, wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================


Quote
FB´s RSET returns the leftmost 10 characters of the string to pad (truncating on the right side), which is quite unexpected to me. PB´s RSET$ returns the rightmost 10 characters (the string to pad is truncated from left to right in this case), which seems to be the most logical choice to me and is, what my code does too.

When in Rome do as the Romans do. I'm trying to follow FB rules.

Juergen Kuehlwein

Quote
Because the FB String function doesn't work with unicode.

... but there is a "WSTRING" function, i could use


so this code:


dim cws as CWstr = wstring(nStringLength, LEFT(wszPadCharacter, 1)) + wszMainstr


should work (even if wszPadCharacter is a null string), or do you see problems here?

I´m asking, because further above you repeatedly proved me wrong. But if it worked, it would avoid the loop, which should result in a speed gain.


JK

José Roca

> ... but there is a "WSTRING" function, i could use

Duh! For some reason I missed it.

> should work (even if wszPadCharacter is a null string), or do you see problems here?

With dim cws as CWstr = wstring(10, 0) + "12345" you will get "12345" as the result, not ten nulls followed by "12345". Since FB does not natively support a dynamic unicode string, all the intrinsic FB functions that deal with unicode strings generate temporary null terminated strings and these end at the first double null. Try to fill a WSTRING with nulls and tell me what you get. It would be possible using our own ad hoc methods, but then be will lose inegration with the FB intrinsic functions.



José Roca

Hope that I have got it right now:


' ========================================================================================
' Returns a string containing a left-justified (padded) string.
' If the optional parameter wszPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of wszPadCharacter
' Example: DIM cws AS CWSTR = AfxStrLSet("FreeBasic", 20, "*")
' ========================================================================================
PRIVATE FUNCTION AfxStrLSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
   MID(**cws, 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string containing a right-justified (padded) string.
' If the optional parameter wszPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of wszPadCharacter.
' Example: DIM cws AS CWSTR = AfxStrRSet("FreeBasic", 20, "*")
' ========================================================================================
PRIVATE FUNCTION AfxStrRSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   IF LEN(wszMainStr) > nStringLength THEN RETURN LEFT(wszMainStr, nStringLength)
   DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
   MID(**cws, nStringLength - LEN(wszMainStr) + 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string containing a centered (padded) string.
' If the optional parameter wszPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of wszPadCharacter.
' Example: DIM cws AS CWSTR = AfxStrCSet("FreeBasic", 20, "*")
' ========================================================================================
PRIVATE FUNCTION AfxStrCSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   IF LEN(wszMainStr) > nStringLength THEN RETURN LEFT(wszMainStr, nStringLength)
   DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
   MID(**cws, (nStringLength - LEN(wszMainStr)) \ 2 + 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================


Juergen Kuehlwein

Ah, i see you don´t need the "LEFT", because "WSTRING" takes the leftmost character by default, good point - i missed it!

is there a special reason (speaking of "AfxStrLSet") why you code:


DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
   MID(**cws, 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws


or would


DIM cws AS CWSTR = wszMainStr + WSTRING(nStringLength, wszPadCharacter)
   RETURN MID(**cws, 1, nStringLength)


be ok as well? And why "MID" and not "LEFT"? Is "MID" faster?


Well, there was a mistake due to my incorrect wording. Speaking of a null string i meant an EMPTY string not a chr$(0) or chr$$(0). But this raises another topic i wasn´t really aware of: you cannot have chr$$(0) inside a CWstr  - you can, but as soon as you use it in a FreeBASIC expression, it gets truncated there. Ok i can live with that restriction.


Thinking about my previous post i more and more tend to replace a passed empty padding string with a space (this is what you initially did and what PB does), because wszPadCharacter defaults to a space. So coding RSET_(s, count) and RSET_(s, count, "") should return the same result. And what sense whould it make, if you used a padding function and passed a padding string, which in effect doesn´t pad at all?


I don´t know, if our discussion here is of interest for others, this is already a lengthy thread and maybe it is going to be even lengthier. So if you want we could go on by e-mail. Please drop me a mail at <jk-ide at t minus online dot de> if you agree. Otherwise i will keep on posting here all CWstr related stuff.


Thanks


JK

Johan Klassen

I find the discussion interesting and educational :)

José Roca

#51
Initially, CWSTR was intended to allow the use of embedded nulls. This is why I use UBYTEs instead of WORDs, and had more methods, such ToStr, but then I discovered a way to allow it to work with the FB instrinsic functions like if it was a native data type. But as this implies casting the returned value to a WSTRING, which is the only unicode data type natively supported by FB, then I had to discard the idea of allowing embedded nulls in exchange of ease of use. Anyway, if you need an string with embedded nulls (generally to store binary data) you can use FB's STRING.

Regarding my use of the MID statement, it is simply a way to avoid the multiple creation of intermediate strings.

With


DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
MID(**cws, 1, LEN(wszMainStr)) = wszMainStr


We just create an intermediate string with WSTRING(nStringLength, wszPadCharacter), that we assign to cws, and then we modify it directly with MID(**cws, 1, LEN(wszMainStr)) = wszMainStr.

With


DIM cws AS CWSTR = wszMainStr + WSTRING(nStringLength, wszPadCharacter)
RETURN MID(**cws, 1, nStringLength)


We create three intermediate strings: one with WSTRING(nStringLength, wszPadCharacter), another to concatenate it with wszMainStr, and another with RETURN MID(**cws, 1, nStringLength). MID as a function creates a temporary string; MID as an statement don't.

I prefer to post in a forum; otherwise I may have to repeat the same explanations several times. Most of my FreeBasic posts can be found in the Planet Squires forum because Paul and I have always worked very well together: I have written the framework and Paul is working in the editor and visual designer.

José Roca

Quote
Thinking about my previous post i more and more tend to replace a passed empty padding string with a space (this is what you initially did and what PB does), because wszPadCharacter defaults to a space. So coding RSET_(s, count) and RSET_(s, count, "") should return the same result. And what sense whould it make, if you used a padding function and passed a padding string, which in effect doesn´t pad at all?

One good thing of FreeBasic is that it allows for optional values in the parameters, even if the parameter is not at the end of the list. If it is in the middle, followed by another optional or non optional parameter, you can omit it with ,, (like with Visual Basic). Overloading and multiple constructors are also a godsend.

José Roca

We can work with arrays of CWSTRs as easily as with arrays of STRINGs.

A two-dimensional array


DIM rg2 (1 TO 2, 1 TO 2) AS CWSTR
rg2(1, 1) = "string 1 1"
rg2(1, 2) = "string 1 2"
rg2(2, 1) = "string 2 1"
rg2(2, 2) = "string 2 2"
print rg2(2, 1)


REDIM PRESERVE / ERASE


REDIM rg(0) AS CWSTR
rg(0) = "string 0"
REDIM PRESERVE rg(0 TO 2) AS CWSTR
rg(1) = "string 1"
rg(2) = "string 2"
print rg(0)
print rg(1)
print rg(2)
ERASE rg


When the array will be destroyed because we erase it or goes out of scope, the destructor of each CWSTR will be called, so you don't have to worry about memory leaks.

José Roca

#54
Thanks to this flexibility, THe CVar class, that implements support for variants, is much more powerful and easier to use that PowerBasic support for them.

We can also have arrays of CVar:


DIM rg(1 TO 2) AS CVAR
rg(1) = "string"
rg(2) = 12345.12
print rg(1)
print rg(2)


And even dynamic arrays of CVar in UDTs:


TYPE MyType
  rg(ANY) AS CVAR
END TYPE

DIM t AS MyType
REDIM t.rg(1 TO 2) AS CVAR
PRINT LBOUND(t.rg)
PRINT UBOUND(t.rg)

t.rg(1) = "String"
t.rg(2) = 12345.12

print t.rg(1)
print t.rg(2)


And also can be used in expressions together with other data types and literals, e.g.:


DIM cws AS CWSTR = "Test string"
DIM cv AS CVAR = 12345.67
PRINT cws & " " & cv & " mixing strings and variants"


Juergen Kuehlwein

A speed test shows, that your code ("MID") is indeed about one third faster than mine. You know more about FreeBASIC intrinsics, because i´m fairly new to FreeBASIC, and that´s why i (will) keep asking...


Quote
Overloading and multiple constructors are also a godsend.


Amen, brother - i absolutely agree!


It is still astonishing to me, how you manged to integrate these data types into a language, which initially wasn´t written for such data types! For me when using a "tool" like this, it is always most important to learn about it´s capacities and limitations in order to make the most out it. Therefore i want to understand as much as possible of how it works and why.


Thanks again


JK

Juergen Kuehlwein

The "@" operator returns the address of the CWstr class. What for would you need this address at all? Wouldn´t it be better (more consistent compared to the other existing string types), if "@" returned a wstring ptr to the wstring data ?


This code:


SELECT CASE AfxStrRSet(s, 20)



fails to compile, because it returns a CWSTR, how to make it return a WSTRING (even if this slows down execution speed), which is accepted? Or does this cause other problems?

What i have in mind, is a one for all solution, which can be used consistently for all available string types (STRNG, ZSTRING, WSTRING and CWSTR) even at the price of slower execution. If i need more speed i can use the more specialized, type specific functions.


JK

José Roca

#57
Quote
The "@" operator returns the address of the CWstr class. What for would you need this address at all? Wouldn´t it be better (more consistent compared to the other existing string types), if "@" returned a wstring ptr to the wstring data ?

To pass a pointer to the class to a procedure that has a parameter declared as CWSTR PTR, to store it in an UDT that has a member declared as CWSTR PTR, etc. Besides, changing the behavior of the @ operator to return a WSTRING PTR to the WSTRING data won't work with your SELECT CASE because you will need to deference it. This is what ** does.

Quote
SELECT CASE AfxStrRSet(s, 20)
fails to compile, because it returns a CWSTR, how to make it return a WSTRING (even if this slows down execution speed), which is accepted? Or does this cause other problems?

To return a WSTRING you will need to declare the return type AS BYREF WSTRING and, as you're returning a reference pointer, you will need to make the variable static, with the problems that we already discussed in the first posts when you wanted to overload the Left operator. Do you remember it?

You can use SELECT CASE AfxStrRSet(s, 20).wstr or SELECT CASE **AfxStrRSet(s, 20)

Quote
What i have in mind, is a one for all solution, which can be used consistently for all available string types (STRNG, ZSTRING, WSTRING and CWSTR) even at the price of slower execution. If i need more speed i can use the more specialized, type specific functions.

This time, it's not a matter of speed. While STRING, ZSTRING and WSTRING are primitive types, natively supported by the compier, CWSTR is a class (or TYPE), and when you use the @ operator with a type it returns the address of it, not to one of its members.

Besides, the cast operator of the CWSTR class should do automatically what you want, and it does it in most cases except LEFT, RIGHT, VAL and SELECT CASE because for these keywords, FB doesn't call the cast operator of the class. This is a reported bug.

Quote
What i have in mind, is a one for all solution, which can be used consistently for all available string types (STRNG, ZSTRING, WSTRING and CWSTR) even at the price of slower execution. If i need more speed i can use the more specialized, type specific functions.

I also would like many things, but I have to settle for what it can be done.

Juergen Kuehlwein

Please, forgive me my ignorance. Yes, i was thinking in circles -we have already been there!


Let´s summarize what i have learned: (please contradict, if something is wrong)


CWstr is a type, which is different from an intrinsic variable type. The "@" operator returns an address to the type an not to it´s data, this is consistent with other types and it is necessary, if you want to be able to work with pointers to this type.

You can access the data of a type through it´s members, it´s member functions and through operators, especially the "CAST" operator allows for accessing the type´s data by it´s pure name (identifier) - you can code "mytype" instead of "mytype.data" for instance.


Unfortunately not every FreeBASIC command implements the "CAST" operator properly, which is a known bug.


To overcome this shortcomming you "mis-use" the "*" operator. Instead of dereferencing the type´s pointer, you made it return a "WSTRING PTR". By prepending another "*" you dereference the returned pointer and FreeBASIC gets to see a BYREF WSTRING (which is what the "CAST" operator does too, but, as we know now, doesn´t always work as it should)


If we return a CWstr in a function as "BYREF WSTRING" (which is possible), the CWstr must not be local in that function, because it goes out of scope, when the function exits (invalidateting the data -> memory corruption or GPF).

If we return a CWstr as CWstr, the data is still valid, even if the local CWstr goes out of scope, because we return it as CWstr. But then some FreeBASIC commands complain of an "invalid data type" ...


So it boils down to 3 ways to go:
1.) fix the compiler
2.) accept a workaround ("**", or .wstr) in ceratin cases
3.) find a way to return a (local) CWstr or WSTRING as a BYREF WSTRING without invalidating the data and without producing memory leaks.


Do you agree so far?


JK


José Roca

I will add a fourth way: Add native support for dynamic unicode strings to the compiler. This would be the ideal solution, but unfortunately I don't think that it will ever happen because of the complexity of doing it with a cross platform compiler.

The wstr method does the same that the cast operator:


' ========================================================================================
' Returns the string data (same as **).
' ========================================================================================
PRIVATE OPERATOR CWstr.CAST () BYREF AS WSTRING
   OPERATOR = *cast(WSTRING PTR, m_pBuffer)
END OPERATOR
' ========================================================================================
' ========================================================================================
' Returns the string data (same as **).
' ========================================================================================
PRIVATE FUNCTION CWstr.wstr () BYREF AS WSTRING
   RETURN *cast(WSTRING PTR, m_pBuffer)
END FUNCTION
' ========================================================================================


But you have to call it explicitly.

** is a shortcut way. It may look strange to some, but it is the only operator that can be double deferenced and serves to two purposes: one * returns the address of the CWSTR buffer, two ** deferences the string data. Some didn't like it and this is why I added the wstr method.