• Welcome to Jose's Read Only Forum 2023.
 

FreeBASIC CWstr

Started by Juergen Kuehlwein, April 09, 2018, 11:39:00 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

José Roca

#75
In this code


PRIVATE FUNCTION Remove_ overload (BYREF w AS WSTRING, byval anyflag as long = 0, BYREF m AS WSTRING, _
                                   byval iflag as long = 0) AS ustring
DIM u    AS ustring = w


if you have declared USTRING as CBSTR, the passed w AS WSTRING is detected as a BSTR and it is ATTACHED, not copied, whereas if USTRING is declared as CWSTR, it is copied, not attached.

Attaching was needed because FB does not make a distinction between a BSTR and a WSTRING, since BSTR is not supported. Therefore, the CBSTR constructor checks if it is a WSTRING or a BSTR, and attaches the handle if it is a BSTR or copies the contents if it is a WSTRING. If we did always copy, the intermediate BSTRings won't never we freed and we will get memory leaks.


' ========================================================================================
PRIVATE CONSTRUCTOR CBStr (BYREF bstrHandle AS AFX_BSTR = NULL, BYVAL fAttach AS LONG = TRUE)
   CBSTR_DP("--BEGIN CBSTR CONSTRUCTOR AFX_BSTR - handle: " & .WSTR(bstrHandle) & " - Attach: " & .WSTR(fAttach))
   IF bstrHandle = NULL THEN
      m_bstr = SysAllocString("")
      CBSTR_DP("CBSTR CONSTRUCTOR SysAllocString - " & .WSTR(m_bstr))
   ELSE
      ' Detect if the passed handle is an OLE string
      ' If it is an OLE string it must have a descriptor; otherwise, don't
      ' Get the length in bytes looking at the descriptor and divide by 2 to get the number of
      ' unicode characters, that is the value returned by the FreeBASIC LEN operator.
      DIM Res AS INTEGER = PEEK(DWORD, CAST(ANY PTR, bstrHandle) - 4) \ 2
      ' If the retrieved length if the same that the returned by LEN, then it must be an OLE string
      IF Res = .LEN(*bstrHandle) AND fAttach <> FALSE THEN
         CBSTR_DP("CBSTR CONSTRUCTOR AFX_BSTR - Attach handle: " & .WSTR(bstrHandle))
         ' Attach the passed handle to the class
         m_bstr = bstrHandle
      ELSE
         CBSTR_DP("CBSTR CONSTRUCTOR AFX_BSTR - Alloc handle: " & .WSTR(bstrHandle))
         ' Allocate an OLE string with the contents of the string pointer by bstrHandle
         m_bstr = SysAllocString(*bstrHandle)
      END IF
   END IF
   CBSTR_DP("--END CBSTR CONSTRUCTOR AFX_BSTR - " & .WSTR(m_bstr))
END CONSTRUCTOR
' ========================================================================================


To force a copy, you need to change DIM u AS ustring = w to DIM u AS ustring = CWSTR(w), but then w won't we freed.

If I have reserved the use of CBSTR to COM it is for a good reason.



José Roca

In


PRIVATE FUNCTION AfxStrReplace OVERLOAD (BYREF wszMainStr AS CONST WSTRING, BYREF wszMatchStr AS WSTRING, BYREF wszReplaceWith AS WSTRING) AS CWSTR
   DIM cwsMainStr AS CWSTR = wszMainStr


as cwsMainStr is declared as a CWSTR, wszMainStr will always be copied, not attached. Therefore, this code works:


DIM cbs AS CBSTR = "1234567890"
print AfxStrReplace(cbs, "5", "x")
print cbs


Now change DIM cwsMainStr AS CWSTR = wszMainStr to DIM cwsMainStr AS CBSTR = wszMainStr and you will be asking for trouble.

Juergen Kuehlwein

OK - i didn´t know or at least i didn´t understand that!


so, when getting passed a CBstr (or an OLE wide string) like this



FUNCTION somefunc(b as CBstr) AS LONG

DIM b1 as CBstr = b

...



b1 is not a copy of b (as i would expect) but is in fact b itself, because only the OLE handle has been copied and not the data. This means when b1 goes out of scope, b is destroyed as well.

This i not what i would call "regular" string behavior! You need this for COM where in special cases the caller is responsible for freeing the passed OLE string - right ?

If this is the case, then i have two more questions:

1.) how does PowerBASIC handle this situation? I´ve never come across this in PB (maybe my fault, i´m by far not as much a COM expert as you are).

2.) if this is for special cases only, wouldn´t it have been better to have a special "attach" operator for exactly these special cases, instead of making it a standard behavior, which opens unexpected traps.


The reason i defined USTRING as CBstr (and not as CWstr, which of course is possible) is, that i hoped to have a "one for all" wide string type. A type which basically works everywhere, without having to makes decisions where to use this and where to use that. When it´s about heavy string manipulation and i want more speed i can always implement CWstr for this - that´s the idea behind it. 


I think, your approach was to have a separate OLE wide string type ONLY for COM, not only because it must be an OLE string, but also for implementing automatic freeing of the passed string handle. That is, when using CBstr with COM you don´t have to care about when to free passed strings and when not to - your CBstr does it automatically for you. Is this correct ?


Would it make a CBstr usable for all situations (which currently is not possible as i have learned), if the CBstr type could decide, if it is assigned a standard OLE wide string (which could happen in COM only, and in which case it should copy the handle) or if it is assigned another CBstr (which cannot be a parameter passed from COM, and in which case it should copy the data) ? Or are there still other reasons, why a CBstr cannot be used just like a CWstr ? (Maybe i have an idea how to make such a decision possible)


JK

José Roca

#78
Quote
OK - i didn´t know or at least i didn´t understand that!
so, when getting passed a CBstr (or an OLE wide string) like this

Code: [Select]
FUNCTION somefunc(b as CBstr) AS LONG
DIM b1 as CBstr = b
...


b1 is not a copy of b (as i would expect) but is in fact b itself, because only the OLE handle has been copied and not the data. This means when b1 goes out of scope, b is destroyed as well.

No. In this case, b1 will be a copy. As both b and b1 are CBSTRings, this is the constructor that will be called:


PRIVATE CONSTRUCTOR CBStr (BYREF cbs AS CBStr)
   m_bstr = SysAllocString(cbs)
END CONSTRUCTOR


Otherwise, both b and b1 will try to free the same memory!

CBSTR will aso make a copy if the parameter is a CWSTR, an ANSI string, a literal or a WSTRING, but will attach it if the passed parameter is a BSTR (although the parameter has been declared as a WSTRING because FB does not natively support BSTRings).

Quote
if this is for special cases only, wouldn´t it have been better to have a special "attach" operator for exactly these special cases, instead of making it a standard behavior, which opens unexpected traps.

The constructor has an optional fAttach parameter:


CONSTRUCTOR CBStr (BYREF bstrHandle AS AFX_BSTR = NULL, BYVAL fAttach AS LONG = TRUE)


When calling a COM function that returns a BSTR, you can do

DIM cbs AS CBSTR = <some function>  ' fAttach defaults to TRUE

or

DIM cbs AS CBSTR = (<some function>, FALSE)

but how are you going to pass this parameter when using the FB intrinsic string functions?

Quote
how does PowerBASIC handle this situation?

PowerBasic natively supports BSTRings and knows when it has to allocate an free them. If FB had also native support for BSTRings there will be no problems, but as it only supports WSTRINGs, its intrinsic functions are prepared to free the termporary WSTRINGs that they generate, but they have no idea of what to do with BSTRings.

Quote
I think, your approach was to have a separate OLE wide string type ONLY for COM, not only because it must be an OLE string, but also for implementing automatic freeing of the passed string handle. That is, when using CBstr with COM you don´t have to care about when to free passed strings and when not to - your CBstr does it automatically for you. Is this correct ?

Yes, and also for efficiency. If the return type is a CBSTR, I can simply return a BSTR, that will be attached. Otherwise, I will have to create a temporary CBSTR, copy the contents of the BSTR to it, free the BSTR and return the temporary CBSTR, whose contents will be copied again.

Quote
The reason i defined USTRING as CBstr (and not as CWstr, which of course is possible) is, that i hoped to have a "one for all" wide string type. A type which basically works everywhere, without having to makes decisions where to use this and where to use that. When it´s about heavy string manipulation and i want more speed i can always implement CWstr for this - that´s the idea behind it.

I know, but I have warned you that they are not interchangeable. BSTRings are managed by te Windows COM library, not FreeBasic. The first string class that I tried to write was CBSTR and I did lose countless hours trying to solve all the problems. Finally, I decided to write CWSTR for general use and relegate CBSTR for COM use.

Of course, you can try to write your "interchangeable" BSTR class. If you still have some hair in your head, you will lose it.

Juergen Kuehlwein

Well, looking closer at your code for CBstr, you already did, what i had in mind. If a CBstr is assigned another CBstr, in fact it creates a new string and copies it´s data. It attaches only, if it is assigned an OLE string, which isn´t a CBstr.


So as a consequence - it already does, what i want, if i code it like this:



PRIVATE FUNCTION Remove_ overload (BYREF w AS USTRING, byval anyflag as long = 0, BYREF m AS WSTRING, _
                                   byval iflag as long = 0) AS ustring
DIM u    AS ustring = w



Do you see other problems with this approach in general (other than to have to adapt my code in some places)?


JK




Juergen Kuehlwein

Oh, i see we cross posted!


Quote
If you still have some hair in your head, you will lose it.


I just had a look in the mirror - enough hair for many years to come (even if i must admit, there were times when there were even more) ;-).

Juergen Kuehlwein

#81
José,


it´s not about writing an interchangeable BSTR class.

I want to use CBstr as the standard wide string class instead of CWstr. In a previous post i asked, if i could implement a CBstr wherever i can implement a CWstr, your answer was - yes, but CWstr are faster (if i recall this correctly). 


I repeat my question,



PRIVATE FUNCTION Remove_ overload (BYREF w AS USTRING, byval anyflag as long = 0, BYREF m AS WSTRING, _
                                   byval iflag as long = 0) AS ustring
DIM u    AS ustring = w


... aviods the ambiguity of "BYREF w AS WSTRING" for the receiving CBstr ...


do you see other problems with this approach (defining all ingoing strings as USTRING = CBstr) in general (arising from the fact that i explicitly use a CBstr here, other than having to adapt my code in some places)? I accept, that this may not be the fastest possible way in favour of having a generic way. If there is need for speed, i can switch to CWstr.


JK

José Roca

#82
You can do it, but in an inneficient way, using only intrinsic functions, just as a beginner will do it.


PRIVATE FUNCTION _StrRemove OVERLOAD (BYREF wszMainStr AS USTRING, BYREF wszMatchStr AS WSTRING) AS USTRING
   DIM ustr AS USTRING = wszMainStr
   DIM nLen AS LONG = LEN(wszMatchStr)
   DO
      DIM nPos AS LONG = INSTR(**ustr, wszMatchStr)
      IF nPos = 0 THEN EXIT DO
      ustr = MID(ustr, 1, nPos - 1) & MID(ustr, nPos + nLen)
   LOOP
   RETURN ustr
END FUNCTION


Multiple concatenations, multiple creation/destruction of temporary types, multiple assignments. You can say goodbye to any speed advantage when defining USTRING as CWSTR.

What I wonder is what advantage do you think you will have using CBSTR as your general data type.

Juergen Kuehlwein

Well, the advantage would be to have an universal string data type for all possible implementations!

Not everyone has your expertise and experience in coding. Look at me, i dare say, i´m definitely not a beginner, but  i had to ask a lot of questions (and maybe will have to) in order to be able to implement your work properly. Not everyone has such a long breath like me, asking and asking again until the last uncertainty is fixed.

Implementing your work into my IDE i want to present an easy to use "interface", which just works (without too many restrictions and special cases) Everyone who wants to dig deeper and wants to make the most out of it, can do so and will have to learn what i had to learn about it. But nobody should be forced to do so (my point of view)!

Maybe this is a matter of design philosophy and where to set the border, you decided to set the border when it comes to COM. Which is a logical choice - a new area requiring a new data type. Coming from PowerBASIC, where this "border" doesn´t exist, i think it would be nice to have it like there. And it would make things easier for newbees in COM.

Let´s see, what´s possible - i´m almost certain there will be more questions. Thanks for your patience!


JK



José Roca

Well, not fully universal. Don't use USTRING defined as CWSTR with COM and don't use USTRING defined as CBSTR with the functions of my framework that use CWSTR in the internal code. I think that it is a bad idea and can't anticipate all the troubles that these redefinitions can cause.

Juergen Kuehlwein

Quote
Don't use USTRING defined as CWSTR with COM

this is exactly what i have in mind:

- if your framework is not included, i will define USTRING to be a clone of CWstr. How should someone use COM without your framework? So, no problem here. 

- if your framework is included, i will define USTRING to be a CBstr, which works universal, if i adapt my functions accordingly. Your framework isn´t affected in any way, because it is written with the original definitions (a CWstr remains a CWstr and a CBstr remains a CBstr there). And as you said passing a CBstr to a function expecting a CWstr is no problem at all (if you drop the speed loss for the conversions).

The only critical situation would be, what i initially coded: passing a CBstr to a function, which expects a "byref wstring" and this wstring gets assigned to a CBstr inside the function, which would result in an unwanted "attach" rather than a "copy". I searched your framework for such a construct and couldn´t find any! So, if this is true (please contradict, if i´m wrong), we should be on the safe side regarding this.

FreeBASIC allows for overloading, so i could optimize my functions for CBstr AND CWstr separately and the "#ifdef" metastatement allows for "activating" the ones needed for the specific situation (with your framework included or without). I can have an universal wide string type (which is always "USTRING") and in case your framework is included, i have an additional wide string type for optimum speed (CWstr) and i have specialized functions for both, which share the same syntax - finally i´m getting nearer, where i wanted to get!


JK

José Roca

#86
Well, in post #64, I said: "I think that it is better that you add overloaded functions to work with your UISTRING."

In my string functions, I'm using BYREF CONST AS WSTRING for two reasons:

1.- One function fits all. It works with all the string data types: STRING, ZSTRING, WSTRING, CWSTR and CBSTR, and also string literals and CVARs.

2.- It is more efficient when passing a CWSTR or CBSTR because no conversion is performed since what is being passed is a pointer to the string data.

Microsoft didn't implement the Automation data types with speed in mind: BSTRings, VARIANTs and SAFEARRAYs are somewhat inneficient. Automation was designed mainly for Visual Basic, VBScript and Office. It is very flexible, but slow, and a pain to use with languages that don't support it natively.

Because the use of VARIANTs and SAFEARRAYs are sometimes unavoidable with COM, I have implemented CVAR and CSafeArray. My implementation of these data types is much more powerful and flexible that the PowerBasic ones.

Here you have a function that works with all the string data types:


#include once "Afx\CVAR.inc"

PRIVATE FUNCTION StrRemove (BYREF cvMainStr AS CVAR, BYREF cvMatchStr AS CVAR) AS CVAR
   DIM cv AS CVAR = cvMainStr
   DIM nLen AS LONG = LEN(cvMatchStr.wstr)
   DO
      DIM nPos AS LONG = INSTR(cv, cvMatchStr)
      IF nPos = 0 THEN EXIT DO
      cv = MID(cv, 1, nPos - 1) & MID(cv, nPos + nLen)
   LOOP
   RETURN cv
END FUNCTION

print StrRemove("Hello World. Welcome to the Freebasic World", "World")
DIM s AS STRING = "Hello World. Welcome to the Freebasic World"
PRINT StrRemove(s, "World")
DIM cws AS CWSTR = "Hello World. Welcome to the Freebasic World"
PRINT StrRemove(cws, "World")
DIM cbs AS CBSTR = "Hello World. Welcome to the Freebasic World"
PRINT StrRemove(cbs, "World")


Can you do something like this with PowerBasic support for variants?

It also works without problems with my existing string functions, so there is not need to write new ones:


DIM cv AS CVAR = "Hello World. Welcome to the Freebasic World"
PRINT AfxStrRemove(cv, "World")


You can also use them with the FB intrinsic functions:


DIM cv AS CVAR = "Test string"
cv = cv & " 123"
cv = cv & 45
cv += " - some more text"
print cv
PRINT LEFT(cv, 4)


They can store almost any data type:


DIM cv AS CVAR = "Test string"
DIM cv2 AS CVAR = 12345.67
print cv & " " & cv2


You can have arrays, safe arrays, associative arrays, stacks and queues...

If speed doesn't worry you, maybe this is the universal data type you're looking for... :)

Note: The use of my framework is required. Sorry.

Juergen Kuehlwein

José,


it´s not about getting rid of your framework!

BTW,  « Last Edit: Today at 04:30:05 AM by José Roca » when do you sleep, do you sleep at all?


Coming back to the "unversal" thing: in post #35 you wrote

Quote
DIM cbs AS CBSTR, and pass cbs or cbs.sptr to IN parameters and cbs.vptr to OUT/INOUT parameters.

This means for IN paramaters (to be read only) you pass a pointer to the actual data, which in turn means, i could pass a CWstr as well, i could even pass a WSTRING PTR - is this correct?

For an OUT/INOUT parameter (returned string/ parameter which might be modified) you pass an OLE string handle. therefore it MUST be a CBstr - and you must pass it as "CBstr.vptr" (not only "CBstr", which would pass the data - different syntax!)  In case of a return value of a method CBstr recognizes, if it is receiving a BSTR or not, and acts accordingly (no different syntax - but it MUST be a CBstr for proper working).

So in cases, where i must pass a BSTR to COM, i CANNOT have a consistent syntax, i MUST have "...vptr" anyway - is this correct?


If this is correct, in fact i don´t have any advantage defining USTRING as CBstr, CWstr would be the better choice. But at least having a CWstr as IN parameter in COM shouldn´t be a problem then.


JK

José Roca

#88
You can't pass a CWSTR or a WSTRING to a COM procedure that expects a BSTR. The main difference between them is that a BSTR carries its length with it. If you pass a WSTRING or CWSTR, what will happen when the called code will call SysStringLen to get the length of the string?

Procedures that expect a WSTRING retrieve the length searching for a double null, but procedures that expect a BSTR retrieve the length calling SysStringLen.

CWSTR and FB's WSTRING are equivalent to PowerBasic WSTRINGZ. Can you use WSTRINGZ with procedures that expect a BSTR (WSTRING in PowerBasic)?

> So in cases, where i must pass a BSTR to COM, i CANNOT have a consistent syntax, i MUST have "...vptr" anyway - is this correct?

For OUT and IN/OUT parameters you must use .vptr. In my first implementation I used an overloaded @ operator, but then there was the problem that I could not use @ to get the address of the class.

Juergen Kuehlwein

Quote
Procedures that expect a WSTRING retrieve the length searching for a double null, but procedures that expect a BSTR retrieve the length calling SysStringLen.


Ok - my error! I cannot pass a CWstr or WSTRING directly. But when an IN parameter is defined as CBstr in the procedure header, passing a CWstr or a WSTRNG instead of a CBstr (to the property, not to the COM object) should work anyway, because the incoming data type is automatically converted into an intermediate CBstr, if i recall it right.



...

property someprop(byref p as CBstr) as long
...

dim cws as CWstr = "Hello"
dim n as long   
...

n = someprop(cws)
...



As long as "p" in "someprop" is an IN parameter, this will work in general (because of the intermediately created CBstr) - is this correct?
I understand, that it fail for an IN/OUT parameter and of course i cannot code:



property someprop(byref p as CWstr) as long




JK