Jose's Read Only Forum 2023

IT-Berater: Theo Gottwald (IT-Consultant) => Freebasic => Topic started by: Juergen Kuehlwein on April 09, 2018, 11:39:00 PM

Title: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 09, 2018, 11:39:00 PM
Well José i would have preferred to ask these questions here http://www.planetsquires.com/protect/forum/index.php?topic=4049.0 (http://www.planetsquires.com/protect/forum/index.php?topic=4049.0), but registration is not possible as it seems.

First of all, congratulations for what you have done with WinFBX! I´m most interested in the Unicode string part (CWstr, CBstr) for FreeBASIC.


1.) My first question is, why two versions of wide strings? I read a post, where you stated you are using CBstr for COM - is CWstr not working for COM ?


2.) "sptr" and "vptr" do the same thing, i would have expected "sptr" to return a pointer to the data and "vptr" to return a pointer to the type - just like "strptr" and "varptr". Is this by intention or by error ?


3.) There is a problem with "LEFT", "RIGHT" and some other statements in FreeBASIC, maybe i found a solution for this.

PRIVATE FUNCTION Left OVERLOAD (BYREF cws AS CWSTR, BYVAL nChars AS INTEGER) byref as wstring

static s as cwstr

   s = LEFT(**cws, nChars)
   RETURN *cast(wstring ptr, *s)


END FUNCTION

Implementing this you may use "LEFT" (and others) in regular code just as usual (without the need of "**"). Do you see any problems here ?


4.) To my surprise the following code:

himage = loadimage(hinst, d, %IMAGE_BITMAP, 0, 0, %LR_DEFAULTCOLOR)

works for "DIM d AS STRING" AND for "DIM d AS CWstr" - how could that happen ?

"d" (the bitmaps resource name - STRING or CWstr) is passed: BYVAL NAME AS LPCSTR. I understand how this works for a STRING, but i don´t understand how this could work for a CWstr - but it definitely does (32 and 64 bit) while

himage = loadimage(hinst, STRPTR(d), %IMAGE_BITMAP, 0, 0, %LR_DEFAULTCOLOR)

works for a string, but not for a CWstr.


5.) "STRPTR" doesn´t work for a CWstr at all (compiler error), you must use "STRPTR(**d) or just *d, how could we make it accept "STRPTR(d)" ?


6.) having the exact same syntax for STRING (ANSI) and CWstr (Unicode) would make it possible to avoid multiple

#ifdef %unicode   
dim   d AS CWstr
#ELSE
dim   d AS STRING
#ENDIF

and to have one

#ifdef %unicode   
#define dstr CWstr
#ELSE
#define dstr STRING
#ENDIF

... and then ...

DIM d AS dstr

where "dstr" stands for "dynamic string" and could be used for ANSI and Unicode - if they only shared the same syntax...


Eager to hear your explanations and thoughts,


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 10:46:30 AM
Quote
Well José i would have preferred to ask these questions here http://www.planetsquires.com/protect/forum/index.php?topic=4049.0, but registration is not possible as it seems.

See the "READ ME FIRST" post: http://www.planetsquires.com/protect/forum/index.php?topic=3777.0

Quote
Registrations are disabled.

You MUST email support@planetsquires.com and request to be added as a member.

Please provide a "username" or "handle" when you request registration.
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 11:04:59 AM
Quote
1.) My first question is, why two versions of wide strings? I read a post, where you stated you are using CBstr for COM - is CWstr not working for COM ?

COM uses BSTRings, that are allocated and freed by the Windows OLE library. A CWSTR is a class with an underlying double null terminated buffer that is allocated, manipulated and freed by the class, using string build techniques. The advantage of using CWSTR over CBSTR for general purpose is that CWSTR is much faster.

A BSTR carries its length with it, but a pointer to a CWSTR not. Therefore, it is not suitable for use with COM.
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 11:12:32 AM
Quote
2.) "sptr" and "vptr" do the same thing, i would have expected "sptr" to return a pointer to the data and "vptr" to return a pointer to the type - just like "strptr" and "varptr". Is this by intention or by error ?

They don't do the same thing. sptr returns a pointer to the CWSTR buffer variable and vptr to the string data. You can use * or ** instead. I have implemented sptr and vptr for those that are not comfortable using * or **. I ended having to use ** instead of @ because, otherwise, I could not use @ to get the address of the class.

However, with CBSTR, vptr is important to be used with OUT parameters:


' ========================================================================================
' * Frees the underlying BSTR and returns the BSTR pointer.
' To pass the underlying BSTR to an OUT BYVAL BSTR PTR parameter.
' If we pass a CBSTR to a function with an OUT BSTR parameter without first freeing it
' we will have a memory leak.
' ========================================================================================
PRIVATE FUNCTION CBStr.vptr () AS AFX_BSTR PTR
   CBSTR_DP("CBSTR vptr")
   IF m_bstr THEN
      SysFreeString(m_bstr)
      m_bstr = NULL
   END IF
   RETURN @m_bstr
END FUNCTION
' ========================================================================================



Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 11:15:10 AM
Quote
3.) There is a problem with "LEFT", "RIGHT" and some other statements in FreeBASIC, maybe i found a solution for this.

I'm already using


' ========================================================================================
PRIVATE FUNCTION Left OVERLOAD (BYREF cws AS CWSTR, BYVAL nChars AS INTEGER) AS CWSTR
   RETURN LEFT(*cast(WSTRING PTR, cws.m_pBuffer), nChars)
END FUNCTION
' ========================================================================================
' ========================================================================================
PRIVATE FUNCTION Right OVERLOAD (BYREF cws AS CWSTR, BYVAL nChars AS INTEGER) AS CWSTR
   RETURN RIGHT(*cast(WSTRING PTR, cws.m_pBuffer), nChars)
END FUNCTION
' ========================================================================================
' ========================================================================================
PRIVATE FUNCTION Val OVERLOAD (BYREF cws AS CWSTR) AS DOUBLE
   RETURN .VAL(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' ========================================================================================


You can use CWSTR as if it was a native data type, even with arrays. Currently, it works with all the intrinsic FreeBasic functions and operators except MID when used as a statement. Something like MID(cws, 2, 1) = "x" compiles but does not change the contents of the dynamic unicode string. MID(cws.wstr, 2, 1) = "x" or MID(**cws, 2, 1) = "x" works. This is because when simply passing cws, FreeBasic creates a temporary WSTRING and changes the content of that temporary string.
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 11:22:03 AM
Quote
4.) To my surprise the following code:
Code: [Select]

himage = loadimage(hinst, d, %IMAGE_BITMAP, 0, 0, %LR_DEFAULTCOLOR)

works for "DIM d AS STRING" AND for "DIM d AS CWstr" - how could that happen ?

Because of casting. I have an overloaded cast operator that returns a pointer to the content of the underlying null terminated buffer.


' ========================================================================================
' Returns the string data (same as **).
' ========================================================================================
PRIVATE OPERATOR CWstr.CAST () BYREF AS WSTRING
   CWSTR_DP("CWSTR CAST BYREF AS WSTRING - buffer: " & .WSTR(m_pBuffer))
   OPERATOR = *cast(WSTRING PTR, m_pBuffer)
END OPERATOR
' ========================================================================================

Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 11:30:46 AM
Quote
5.) "STRPTR" doesn´t work for a CWstr at all (compiler error), you must use "STRPTR(**d) or just *d, how could we make it accept "STRPTR(d)" ?

You will have to undef STRPTR and then implement overloads for all the other data types. Too much messy work for something that it is unneeded, since by virtue of the cast operator you don't need to use STRPTR: just pass the CWSTR variable.


' ========================================================================================
' Returns a pointer to the CWSTR buffer.
' ========================================================================================
PRIVATE OPERATOR CWstr.CAST () AS ANY PTR
   CWSTR_DP("CWSTR CAST ANY PTR - buffer: " & .WSTR(m_pBuffer))
   OPERATOR = cast(ANY PTR, m_pBuffer)
END OPERATOR
' ========================================================================================

Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 11:49:27 AM
Quote
6.) having the exact same syntax for STRING (ANSI) and CWstr (Unicode) would make it possible to avoid multiple

I no longer use ANSI. Like the windows API, all my framework uses Unicode. You can use ansi strings with it if you want: they will be automatically converted. My only use for FreeBasic ansi strings is when I want to easily allocate a byte buffer instead of using manual allocation/deallocation.

Where is the need of using?


#ifdef %unicode   
#define dstr CWstr
#ELSE
#define dstr STRING
#ENDIF


Use always unicode and forget the API "A" functions. In Windows, the "A" functions are mere wrappers that convert the string parameters to unicode and call the "W" functions. Therefore, they are slightly slower and use more memory. If I could, I would remove all that ansi stuff.
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 12:47:28 PM
BTW if you enable the Scintilla control to use UTF and save the file as UTF8 or UTF16, with BOM, you can use unicode string literals, e.g.

DIM cws AS CWSTR = "Дми́трий Дми́триевич Шостако́вич"

Paul's WinFBE editor for FreeBasic allows to do it.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 10, 2018, 08:51:08 PM
Thanks José for you enlightening reply!

@ 5.) so the compiler looks, if there is a matching cast operator for a type, when it encounters an unexpected type (ptr) ? - that´s pretty cool. But why then "STRPTR" and others keep failing ? Maybe the compiler doesn´t do this in all cases ? I understand and i accept that just omitting "STRPTR" is a perfectly working solution. But i want to understand the background, why it works here and doesn´t work there


@ 3.) please try the following code:

#Include Once "windows.bi"
#Include Once "Afx\AfxStr.inc"

DIM d AS CWstr = "12345"

SELECT CASE LEFT(d, 2)
  case "12"
    print "ok"
end select

sleep

it throws a compiler error! I think this is because "SELECT CASE" expects one of the native string types, but it receives a WCstr (which it doesn´t know). If you use my function (3.)), it works, because "SELCET CASE" gets a WSTRING (which it does know). My code works at the price of an intermediate copy, but maybe we could get rid of this copy. My tries kept failing, but you are the by far more experienced person with FreeBASIC, so maybe you know how to do it.



Is it possible to overload the "MID" sub in a similar way ? So you don´t have to code "**":

d = MID(**d, 2,2)

and can do

d = mid(d, 2,2)

with CWstr



Quote
BTW if you enable the Scintilla control to use UTF and save the file as UTF8 or UTF16, with BOM, you can use unicode string literals, e.g.

DIM cws AS CWSTR = "Дми́трий Дми́триевич Шостако́вич"

Paul's WinFBE editor for FreeBasic allows to do it.


Thanks, i will have a look at it


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 09:20:30 PM
> Is it possible to overload the "MID" sub in a similar way ? So you don´t have to code "**":

There is not neet to overload it. Thanks to the cast operator that I have implemented it works as is:


DIM d AS CWstr = "12345"
d = MID(d, 2, 2)
print d


Prints: 23.
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 09:32:32 PM
With SELECT CASE LEFT you need to use SELECT CASE LEFT(**d, 2).

Using ** is the fastest way, specially when working with big strings, since you access directly the string data without having to create intermediate strings.

However, there is no problem with MID. This works :

SELECT CASE MID(d, 1, 2)

The only problem are LEFT and RIGHT.
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 09:50:37 PM
Quote
@ 5.) so the compiler looks, if there is a matching cast operator for a type, when it encounters an unexpected type (ptr) ? - that´s pretty cool. But why then "STRPTR" and others keep failing ? Maybe the compiler doesn´t do this in all cases ? I understand and i accept that just omitting "STRPTR" is a perfectly working solution. But i want to understand the background, why it works here and doesn´t work there

Because STRPTR is only prepared to work with STRING and WSTRING. An when used with WSTRING you must be aware that it still returns a ZSTRING PTR, not a WSTRING PTR.

There are operators (most of them) that are prepared to work with user defined types and others not.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 10, 2018, 09:59:13 PM
Sorry, stupid me, i mixed it up


dim d as CWSTR = "123456789"
mid(d, 2,2) = ""AB"

doesn´t work, but

mid(**d,2,2) = "AB"

does

I want to get rid of the need to prepend "**" in this case (maybe by an overloaded sub "MID" ...)



Quote
With SELECT CASE LEFT you need to use SELECT CASE LEFT(**d, 2).
You don´t, if you implement it the way i proposed - and you may use "**" without problems nevertheless, if it must be as fast as possible. I wonder, if it is possible to have the best of both ways by avoiding the intermediate copy, but with my (still) limited knowledge in FreeBASIC, i wasn´t able to find a solution - maybe you can do it. 



JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 10:09:06 PM
Keep in mind that FB, as any other programming language, has bugs and quirks.

The behavior of LEFT and RIGHT with user defined types has already been reported.
See: https://sourceforge.net/p/fbc/bugs/843/

So maybe one day it will work without workarounds. Then I will have to remove the current overloads for them.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 10, 2018, 10:33:30 PM
Yes, i know. But my point was, maybe it is possible to improve the existing workarounds for "LEFT/RIGHT" and to write a new one for the MID statement (not function - this already works) in order to get a consistent syntax (as far as possible, just like with the other, native string types), and not to have to resort to "special" handling for special cases (even if these are rare cases).

Your work on CWSTR is absolutly great and almost perfect, i don´t want to criticize it! My intention is to help to make it as perfect as possible NOW (i don´t want to wait until "one day" :-)).


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 10, 2018, 11:43:41 PM
I appreciate your suggestions, but they're flawed. We have to be very cautious when using BYREF AS WSTRING.


PRIVATE FUNCTION Leftx (BYREF cws AS CWSTR, BYVAL nChars AS INTEGER) byref as wstring

static s as cwstr

   s = LEFT(**cws, nChars)
   RETURN *cast(wstring ptr, *s)

END FUNCTION

DIM cws AS CWSTR = "12345"

print LEFTx(cws, 2) & LEFTx(cws, 3)   ' wrongly prints 1212
print LEFTx(cws, 2) & LEFTx(cws, 3) &  LEFTx(cws, 5)   ' wrongly prints 121212


And regarding MID as a statement, it will lose speed, a fatal flaw since the main reason of using MID as a statement it is because it is fast.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 12, 2018, 12:11:59 AM
Ok José, you got me!


The intermediate storage (which i would prefer to avoid anyway) bites me. What about this:

PRIVATE FUNCTION Leftx (BYref cws AS cwstr, BYVAL nChars AS INTEGER) as string
  function = LEFT(*cast(wSTRING PTR, cws.m_pBuffer), nChars)
END FUNCTION

according to my tests it works, can you make it fail ?


Another thing i don´t understand, is why the compiler won´t let me do this:

Function Leftx ( ByRef str As Const WString, ByVal n As Integer ) As WString


the code is taken from the help file ("Left"), but it doesn´t compile ("Expected pointer in: <this line>")


All this type casting, even with pointers, is driving me crazy. You know i have some assembler background and there a pointer is a pointer and (seemingly) non-matching types aren´t just rejected with error messages, i don´t understand. In FreeBASIC it is sometimes really frustrating to find a way to make the compiler happy.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 12, 2018, 01:56:29 AM
> according to my tests it works, can you make it fail ?

Of course. Try using it with Russian, for example, instead of "12345".


PRIVATE FUNCTION Leftx (BYref cws AS cwstr, BYVAL nChars AS INTEGER) as string
  function = LEFT(*cast(wSTRING PTR, cws.m_pBuffer), nChars)
END FUNCTION

DIM cws AS CWSTR = "Дмитрий Дмитриевич Шостакович"
PRINT LEFTx(cws, 3)


Will print ???", which is the expected result since you are returning it as an ansi string.

Believe me. I did lose countless hours trying to find an acceptable solution, but since it is a bug of the compiler, the solution will be to fix the bug.

Anyway, with the current overloads, it works except with SELECT CASE LEFT/RIGHT. I didn't even know because I never have used LEFT and RIGHT with SELECT CASE. Indeed there are easy workarounds: assign it to a variable first, use MID or use **.

I have managed to get an almost total integration with FreeBasic instrinsics. It even works with files, e.g.:


DIM cws AS CWSTR = "Дмитрий Дмитриевич Шостакович"

DIM f AS LONG = FREEFILE
OPEN "test.txt" FOR OUTPUT ENCODING "utf16" AS #f
PRINT #f, cws
CLOSE #f

Title: Re: FreeBASIC CWstr
Post by: José Roca on April 12, 2018, 02:00:19 AM
Quote
Another thing i don´t understand, is why the compiler won´t let me do this:

Function Leftx ( ByRef str As Const WString, ByVal n As Integer ) As WString


A WSTRING in FB is not like a WSTRING in PB. The PB equivalent is WSTRINGZ and, of course, you can't return a WSTRINGZ or an ASCIIZ as the result of a function. The compiler is expecting AS WSTRING PTR.

Quote
the code is taken from the help file ("Left"), but it doesn´t compile ("Expected pointer in: <this line>")

It is not an example, but a prototype to document the function. It means that the native Left function will return a WSTRING, not thatyyou can use AS WSTRING as the result of your own function. Don't confuse the prototypes used to document the FB keywords with code.


Title: Re: FreeBASIC CWstr
Post by: José Roca on April 12, 2018, 02:05:04 AM
BTW I don't understand why you want to make your future visual designer to generate ansi or unicode. Only unicode is needed. There is not a single advantage of making an ansi GUI. Even the latest PBWIN compiler uses unicode only.
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 12, 2018, 02:21:50 AM
Quote
All this type casting, even with pointers, is driving me crazy. You know i have some assembler background and there a pointer is a pointer and (seemingly) non-matching types aren´t just rejected with error messages, i don´t understand. In FreeBASIC it is sometimes really frustrating to find a way to make the compiler happy.

FB has been written by C programmers and these guys are used to use casting in almost each line of code. This could be changed by using ANY PTR in the declares. FB uses BASIC syntax, but the declares have been prototyped to use them like with C.

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 12, 2018, 07:44:54 PM
José,

Quote
Believe me. I did lose countless hours trying to find an acceptable solution

... accepted! Nevertheless this discussion wasn´t useless (at least for me). I definitely learned something and you showed a workaround for LEFT/RIGHT - MID, which can have the same syntax for both!



Quote
I don't understand why you want to make your future visual designer to generate ansi or unicode


Well, there weren´t dynamic Unicode strings in FreeBASIC before you came. Therefore i think not many at FreeBASIC are familiar with it. If the Visual Designer produced only Unicode, many could be deterred. So the idea is to pick them up where they are (ANSI) and show and easy way to switch to Unicode. As far as possible i would like everybody to be able to have it his/her way.


Thanks a lot - i will ask questions again as they arise...


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 12, 2018, 08:42:02 PM
In fact, because it has casts to return either the content of the string or a pointer ro the beginning of the string data, depending on the target type, there is not need for STRPTR, which won't work anyway.


' // Populate the ListView with some data
DIM cwsTxt AS CWSTR
DIM lvi AS LVITEM
lvi.mask = LVIF_TEXT
FOR i AS LONG = 0 to 29
   lvi.iItem = i
   lvi.iSubItem = 0
   cwsTxt = "Column 0 Row " & i
   lvi.pszText = cwsTxt
   ListView_InsertItem(hListView, @lvi)
   FOR x AS LONG = 1 TO 4
      cwsTxt = "Column " & x & " Row " & i
      ListView_SetItemText(hListView, i, x, cwsTxt)
   NEXT
NEXT

Title: Re: FreeBASIC CWstr
Post by: José Roca on April 12, 2018, 09:01:03 PM
Another workaround for SELECT CASE LEFT


SELECT CASE LEFT(cws.wstr, 2)


It also works for MID


MID(cws.wstr, 3, 2) = "AB"


It does the same that **, but those not familiar with pointers may prefer it:


' ========================================================================================
' Returns the string data (same as **).
' ========================================================================================
PRIVATE FUNCTION CWstr.wstr () BYREF AS WSTRING
   CWSTR_DP("CWSTR wstr - buffer: " & .WSTR(m_pBuffer))
   RETURN *cast(WSTRING PTR, m_pBuffer)
END FUNCTION
' ========================================================================================

Title: Re: FreeBASIC CWstr
Post by: José Roca on April 12, 2018, 09:06:40 PM
And if someone can't live without STRPTR, instead of


lvc.pszText = cwsText


he can use


lvc.pszText = cwsText.sptr

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 12, 2018, 09:51:18 PM
José,


i can live without "STRPTR" perfectly well, especially since i know (and i understand why) i can pass the string instead of its pointer. "MID" can be a replacement for the buggy "LEFT/RIGHT". So i can have the same syntax for both, which is my objective.

I want to present a working solution first - then i can come up with explanations, that this or that, which may look strange, isn´t due to my bad code or some shortcommings in CWStr but a known compiler error and that you must accept certain minor restrictions when using it. To my experience most of times people don´t want to hear (or even know) about things that don´t work - they want to hear about things, that work (and maybe then about how and why it works).


Thanks again,


JK
Title: Re: Unicode
Post by: Juergen Kuehlwein on April 13, 2018, 07:02:05 PM
Maybe a stupid question, but how to enter russian letters in CSED? I must check "Enable unicode (UTF-8 encoding)" and then? E.g. Alt + 0411 (Numpad) should result in a cyrillic capital b. I can paste cyrillic characters, but i don´t know how to type them.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 13, 2018, 07:44:46 PM
I never have tried. You have to activate cyrillic support.
See: https://www.wikihow.com/Type-Russian-Characters
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 14, 2018, 01:53:28 PM
José,


while FreeBASIC accepts ANSI, UTF-8 and UTF-16 encoded code files as sources, PB´s compiler accepts only ANSI files, so how to embed unicode literals in code? You may append "$$", but that doesn´t allow for giving unicode insde the quotation marks. I could do something like this:


$$russian = chr$$(&H416,&H416)        -> Scintilla + PB´s IDE
w$$ = Utf8ToChr$("ФЫЙЦ")              -> Scintilla (PB´s IDE cannot display UTF-8 properly)


did i miss something ?


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 14, 2018, 02:47:51 PM
My suggestion was to allow to use unicode string literals with FreeBasic. It won't work with PowerBasic because it only supports ansi source code files.
Title: Re: FreeBASIC CWstr
Post by: Theo Gottwald on April 16, 2018, 04:15:38 PM
As far as I know the Source-Code for Freebasic is available for download - not?
Also from what i saw on the WEB-Site the programmers run away from it already.
So if there are problems, why not make an own version?

Currently i am still fine with PB as it hast the best andeasiest Strings of all type.
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 16, 2018, 04:24:03 PM
Which problems? The problem is with PowerBasic.

> Also from what i saw on the WEB-Site the programmers run away from it already.

The same that run away from other sites. Low-level compilers are not for everybody.

> Currently i am still fine with PB as it hast the best andeasiest Strings of all type.

Free Basic strings are much faster that PB strings. The problem is that it does not support dynamic unicode strings natively, but my class adds support for them and are faster than the PB ones.

Anyway, the main purpose of using Free Basic by some PB programmers has been because it supports 64 bit. Now that I have mastered it, I don't miss PB at all.

With my framework, there is nothing that you can do with PB that you can't do with FB.

http://www.jose.it-berater.org/WinFBX/WinFBX.html

And Paul is adding a Visual Designer to his editor for FB.
Title: Re: FreeBASIC CWstr
Post by: Patrice Terrier on April 16, 2018, 04:37:50 PM
QuoteNow that I have mastered it, I don't miss PB at all.
He he, that's almost the same for all those who made the effort to go outside of their comfort zone.
Better late than never ;)
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 06, 2018, 09:22:55 PM
José,


one more question about CWSTR and CBSTR. You said CBSTRs are necessary for COM, but in general you recommend using CWSTR, because CWSTRs are faster. Do you always convert automatically from CWSTR to CBSTR whenever necessary in COM, or is it the user´s responsibility to pass the correct data type?

In other words: in an application implementing COM would i have to convert all the CWSTRs i´m using to CBSTRs before i can pass them to a COM method, or is this done automatically for me and i can pass just a CWSTR (which then gets converted to CBSTR, and CBSTR only exists to make this automatic conversion possible)? According to some tests i ran automatic conversion seems to take place. But i´m not absolutely sure, if i can rely on that under all circumstances, or if i found a code sample where by chance it works.   


Thanks


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 06, 2018, 10:25:41 PM
There is automatic conversion between CWSTR and CBSTR. Therefore, you can pass a CWSTR to a custom procedure that expects a CBSTR, but not to a procedure that expects a BSTR (without the leading "C"), no matter if it has been declared as BSTR, AFX_BSTR, WSTRING PTR or ANY PTR. For these kind of procedures, usually COM methods, use a CBSTR, eg. DIM cbs AS CBSTR, and pass cbs or cbs.sptr to IN parameters and cbs.vptr to OUT/INOUT parameters.

It is not possible to simulate a BSTR using our own allocated memory because BSTRs are managed by the COM library and must be allocated/freed with SysAllocString/SysFreeString. Any attempt to cheat will cause problems sooner or later.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 06, 2018, 11:36:05 PM
Quote
usually COM methods


Ok - but where from do i know EXACTLY that a BSTR is expected? Each and every COM method or property?


Your code sample


...
dim t as CWSTR = "Hello World"

  #INCLUDE ONCE "Afx/AfxCom.inc"
  #INCLUDE ONCE "Afx/AfxSapi.bi"
  using Afx
  ' // The COM library must be initialized to call AfxNewCom
  CoInitialize NULL
  ' // Create an instance of the Afx_ISpVoice interface
  DIM pSpVoice AS Afx_ISpVoice PTR = AfxNewCom("SAPI.SpVoice")
  DIM pCComPtrSpVoice AS CComPtr = pSpVoice
  ' // Call the Speak method
  pSpVoice->Speak(t, 0, NULL)
  ' // Uninitialize the COM library
  CoUninitialize
  PRINT
  PRINT "Press any key..."
  SLEEP


works like a charm with a CWSTR!


Is there a FreeBASIC version of your typelib browser btw. and does it tell me when i MUST pass a CBSTR as you descibed?


So to be on the safe side in general and especially when dealing with COM i could go with CBSTR and implement CWSTR only when speed matters. Or are there problems as well and it is better to have CWSTR as a standard and implement CBSTR only where necessary ? But when is CBSTR REALLY necessary and is the speed gain with CWSTR really worth the conversions needed then.

As  far as i understand it up to now, i could take both of it as my standard Unicode dynamic string type, but which one should i take - the faster one or the more universal one? Currently i tend to use CBSTR as a standard and use CWSTR only, if speed is important.

Would automatic conversion cost much time, if a custom procedure had been written for CBSTR (in/out) and i passed a CWSTR instead. How could i define parameters (in CBSTR)) and return type (out CBSTR) in order to minimize speed loss, it a CWSTR is passed and expected as return type instead of a CBSTR?


Tell me, if i´m wrecking your nerves, but as always, i want nothing more than the optimum ;-)


JK,
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 07, 2018, 12:29:21 AM
> Ok - but where from do i know EXACTLY that a BSTR is expected? Each and every COM method or property?

Reading the documentation.

> Your code sample works like a charm with a CWSTR!

Indeed, because this method expects a null terminated unicode string, not a BSTR!
See: https://msdn.microsoft.com/en-us/library/ee125024(v=vs.85).aspx

Automation interfaces work with BSTR and Variants, but low-lewel COM interfaces can work with any data type, and you need to read the documentation.

> Is there a FreeBASIC version of your typelib browser btw. and does it tell me when i MUST pass a CBSTR as you descibed?

There is one, TLB_100 ( available at https://github.com/JoseRoca/WinFBX ), although it is always advisable to read the documentation because sometimes the low-level interfaces expect or return a pointer to a null terminated unicode string allocated with CoTaskMemAlloc, that must be freed with CoTaskMemFree, and you can't neither use CBSTR or CWSTR with them directly.

> As  far as i understand it up to now, i could take both of it as my standard Unicode dynamic string type, but which one should i take - the faster one or the more universal one?

The loss of speed won't be significant in most cases, but if you have to do thousands of string concatenations, as it is the case with my TypeLib Browser, then it will be noticeable. CWSTR is so fast than when you click the type library to parse, the code is generated instantaneously (there is not a separate option to generate the code because it is not needed).

When I need to work with COM and BSTRs, I use CBSTR; otherwise, I use CWSTR. But if you don't need to work with big strings or do heavy string manipulation, you can use any of them or even mix them. They work transparently, so if a function returns a CWSTR you can assign it to a CBSTR or to a STRING, and viceversa. You can also mix CBSTR, CWSTR, STRING a WSTRING when concatenating strings.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 07, 2018, 12:46:57 AM
> How could i define parameters (in CBSTR)) and return type (out CBSTR) in order to minimize speed loss, it a CWSTR is passed and expected as return type instead of a CBSTR?

If you declare a parameter as BYREF CBSTR and you pass a CWSTR, a conversion will be performed. It is unavoidable. But the loss of speed won't be significant. It is like if in PowerBasic you have a parameter declared as WSTRING and you pass a STRING variable: The STRING must be converted to unicode. With normal usage, you don't have to worry. You may need to use CWSTR in the cases in which you will use StringBuilder with PowerBasic. CWSTR is faster than CBSTR because it uses string builder techniques to minimize memory allocations.

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 07, 2018, 11:46:13 PM
Thanks José, for your patience and your explanations!


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 11, 2018, 12:08:18 AM
Hi José,


CWstr (just like CBstr) have an upper limit of DWORD (2^32) size, is this correct ? In theory you could make it even larger (in 64 bit), if you defined all "UINT" as "UINTEGER", or is the restriction to 2^32 on purpose?


next question, this code gpfs:


#Define UNICODE
#include once "windows.bi"
#include once "Afx\AfxStr.inc"

#define ustring cwstr


declare function FB_MAIN as uinteger


END FB_MAIN


'***********************************************************************************************


FUNCTION FB_MAIN AS uinteger
'***********************************************************************************************
'
'***********************************************************************************************
dim i as long
dim s as string       = "ЙЦУК"
dim z as zstring * 64 = "ЙЦУКЙЦУКЙЦУК"
dim w as Wstring * 64 = "ЙЦУ"
dim u as Ustring      = "ЙЦУКЕ"
dim x as Ustring      = "ЙЦУКЕ"



'ods("start")
'  for i = 1 to 1000
'    x = rset$(s, 10)
'    x = rset$(z, 10)
'    x = rset$(w, 10)
'    x = rset$(u, 10)
'  next i
'ods("end")


using afx

'ods("start")
  for i = 1 to 1000
    x = AfxStrRSet(s, 10, " ")
    x = AfxStrRSet(z, 10, " ")
    x = AfxStrRSet(w, 10, " ")
    x = AfxStrRSet(u, 10, " ")
  next i
'ods("end")


  function = 0


end function


'***********************************************************************************************



while this code:



PRIVATE FUNCTION RSet_ overload (BYREF wszMainStr AS WSTRING, BYVAL nStringLength AS ULONG, BYref wszPadCharacter AS WSTRING = " ") AS ustring
'***********************************************************************************************

'***********************************************************************************************
dim cws as ustring = wstring(nStringLength, LEFT(wszPadCharacter, 1)) + wszMainstr

static c as long

  c = c + 1

'outputdebugstringa("ustring"+str(c))
  RETURN right(**cws, nStringLength)


END FUNCTION



succeeds.


(last sentence withdrawn ...)


JK


Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 01:32:24 AM
> CWstr (just like CBstr) have an upper limit of DWORD (2^32) size, is this correct ? In theory you could make it even larger (in 64 bit), if you defined all "UINT" as "UINTEGER", or is the restriction to 2^32 on purpose?

Yes, it is on purpose. The maximum size (in characters) is 2147483647, as all the FreeBasic string types (String, ZString, WString) and also as COM BSTR.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 02:26:42 AM
> next question, this code gpfs:

Worked with 32 bit, but GPFed with 64 bit. 32 bit and 64 bit use different assemblers. Thanks for reporting it.

I have changed the code to:


PRIVATE FUNCTION AfxStrRSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   DiM cwsPadChar AS CWSTR = wszPadCharacter
   IF cwsPadChar = "" THEN cwsPadChar = " "
   cwsPadChar = LEFT(cwsPadChar, 1)
   DIM cws AS CWSTR = SPACE(nStringLength)
   FOR i AS LONG = 1 TO LEN(cws)
      MID(**cws, i, 1) = cwsPadChar
   NEXT
   MID(**cws, nStringLength - LEN(wszMainStr) + 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws
END FUNCTION


I have also changed the code for AfxStrLSet and AfxStrCSet:


' ========================================================================================
' Returns a string containing a left-justified (padded) string.
' If the optional parameter wszPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of wszPadCharacter
' Example: DIM cws AS CWSTR = AfxStrLSet("FreeBasic", 20, "*")
' ========================================================================================
PRIVATE FUNCTION AfxStrLSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   DiM cwsPadChar AS CWSTR = wszPadCharacter
   IF cwsPadChar = "" THEN cwsPadChar = " "
   cwsPadChar = LEFT(cwsPadChar, 1)
   DIM cws AS CWSTR = SPACE(nStringLength)
   FOR i AS LONG = 1 TO LEN(cws)
      MID(**cws, i, 1) = cwsPadChar
   NEXT
   MID(**cws, 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================



' ========================================================================================
' Returns a string containing a centered (padded) string.
' If the optional parameter wszPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of wszPadCharacter.
' Example: DIM cws AS CWSTR = AfxStrCSet("FreeBasic", 20, "*")
' ========================================================================================
PRIVATE FUNCTION AfxStrCSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   DiM cwsPadChar AS CWSTR = wszPadCharacter
   IF cwsPadChar = "" THEN cwsPadChar = " "
   cwsPadChar = LEFT(cwsPadChar, 1)
   DIM cws AS CWSTR = SPACE(nStringLength)
   FOR i AS LONG = 1 TO LEN(cws)
      MID(**cws, i, 1) = cwsPadChar
   NEXT
   MID(**cws, (nStringLength - LEN(wszMainStr)) \ 2 + 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 11, 2018, 10:13:59 AM
Quote
Yes, it is on purpose. The maximum size (in characters) is 2147483647, as all the FreeBasic string types (String, ZString, WString) and also as COM BSTR.

But in theory you could make it even larger (in 64 bit), if you defined all "UINT" as "UINTEGER", or are there still other reasons, why  it wouldn´t work then ?


Comparing your code for RSET and mine:
1.), i don´t understand why you first fill a CWstr with spaces and overwrite these spaces with the pad character?
2.) why would you suppress a null string as pad character? If i pad with a null string then it shouldn´t be padded at all - makes sense to me.
3.) and if the string to pad (e.g. 11 characters) is larger than the resulting string (e.g 10 characters), only pad characters are returned with your code.

FB´s RSET returns the leftmost 10 characters of the string to pad (truncating on the right side), which is quite unexpected to me. PB´s RSET$ returns the rightmost 10 characters (the string to pad is truncated from left to right in this case), which seems to be the most logical choice to me and is, what my code does too.

JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 01:23:09 PM
> But in theory you could make it even larger (in 64 bit), if you defined all "UINT" as "UINTEGER", or are there still other reasons, why  it wouldn´t work then ?

The FreeBasic intrinsic string functions won't work with them. Also you can't allocate a BSTR bigger than 2147483647 characters.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 01:32:02 PM
Quote
1.), i don´t understand why you first fill a CWstr with spaces and overwrite these spaces with the pad character?

Because the FB String function doesn't work with unicode.

Quote
2.) why would you suppress a null string as pad character? If i pad with a null string then it shouldn´t be padded at all - makes sense to me.

Because a CWSTR is null terminated. Padding it with nulls will truncate it when it is cast to a WSTRING.

Quote
3.) and if the string to pad (e.g. 11 characters) is larger than the resulting string (e.g 10 characters), only pad characters are returned with your code.

This is a bug. I will use FB's RSET to get the same behavior.


' ========================================================================================
PRIVATE FUNCTION AfxStrRSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   DiM cwsPadChar AS CWSTR = wszPadCharacter
   IF cwsPadChar = "" THEN cwsPadChar = " "
   cwsPadChar = LEFT(cwsPadChar, 1)
   DIM cws AS CWSTR = SPACE(nStringLength)
   FOR i AS LONG = 1 TO LEN(cws)
      MID(**cws, i, 1) = cwsPadChar
   NEXT
   RSET **cws, wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================


Quote
FB´s RSET returns the leftmost 10 characters of the string to pad (truncating on the right side), which is quite unexpected to me. PB´s RSET$ returns the rightmost 10 characters (the string to pad is truncated from left to right in this case), which seems to be the most logical choice to me and is, what my code does too.

When in Rome do as the Romans do. I'm trying to follow FB rules.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 11, 2018, 01:57:15 PM
Quote
Because the FB String function doesn't work with unicode.

... but there is a "WSTRING" function, i could use


so this code:


dim cws as CWstr = wstring(nStringLength, LEFT(wszPadCharacter, 1)) + wszMainstr


should work (even if wszPadCharacter is a null string), or do you see problems here?

I´m asking, because further above you repeatedly proved me wrong. But if it worked, it would avoid the loop, which should result in a speed gain.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 03:42:09 PM
> ... but there is a "WSTRING" function, i could use

Duh! For some reason I missed it.

> should work (even if wszPadCharacter is a null string), or do you see problems here?

With dim cws as CWstr = wstring(10, 0) + "12345" you will get "12345" as the result, not ten nulls followed by "12345". Since FB does not natively support a dynamic unicode string, all the intrinsic FB functions that deal with unicode strings generate temporary null terminated strings and these end at the first double null. Try to fill a WSTRING with nulls and tell me what you get. It would be possible using our own ad hoc methods, but then be will lose inegration with the FB intrinsic functions.


Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 04:25:57 PM
Hope that I have got it right now:


' ========================================================================================
' Returns a string containing a left-justified (padded) string.
' If the optional parameter wszPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of wszPadCharacter
' Example: DIM cws AS CWSTR = AfxStrLSet("FreeBasic", 20, "*")
' ========================================================================================
PRIVATE FUNCTION AfxStrLSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
   MID(**cws, 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string containing a right-justified (padded) string.
' If the optional parameter wszPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of wszPadCharacter.
' Example: DIM cws AS CWSTR = AfxStrRSet("FreeBasic", 20, "*")
' ========================================================================================
PRIVATE FUNCTION AfxStrRSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   IF LEN(wszMainStr) > nStringLength THEN RETURN LEFT(wszMainStr, nStringLength)
   DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
   MID(**cws, nStringLength - LEN(wszMainStr) + 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string containing a centered (padded) string.
' If the optional parameter wszPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of wszPadCharacter.
' Example: DIM cws AS CWSTR = AfxStrCSet("FreeBasic", 20, "*")
' ========================================================================================
PRIVATE FUNCTION AfxStrCSet (BYREF wszMainStr AS CONST WSTRING, BYVAL nStringLength AS LONG, BYREF wszPadCharacter AS WSTRING = " ") AS CWSTR
   IF LEN(wszMainStr) > nStringLength THEN RETURN LEFT(wszMainStr, nStringLength)
   DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
   MID(**cws, (nStringLength - LEN(wszMainStr)) \ 2 + 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws
END FUNCTION
' ========================================================================================

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 11, 2018, 06:50:59 PM
Ah, i see you don´t need the "LEFT", because "WSTRING" takes the leftmost character by default, good point - i missed it!

is there a special reason (speaking of "AfxStrLSet") why you code:


DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
   MID(**cws, 1, LEN(wszMainStr)) = wszMainStr
   RETURN cws


or would


DIM cws AS CWSTR = wszMainStr + WSTRING(nStringLength, wszPadCharacter)
   RETURN MID(**cws, 1, nStringLength)


be ok as well? And why "MID" and not "LEFT"? Is "MID" faster?


Well, there was a mistake due to my incorrect wording. Speaking of a null string i meant an EMPTY string not a chr$(0) or chr$$(0). But this raises another topic i wasn´t really aware of: you cannot have chr$$(0) inside a CWstr  - you can, but as soon as you use it in a FreeBASIC expression, it gets truncated there. Ok i can live with that restriction.


Thinking about my previous post i more and more tend to replace a passed empty padding string with a space (this is what you initially did and what PB does), because wszPadCharacter defaults to a space. So coding RSET_(s, count) and RSET_(s, count, "") should return the same result. And what sense whould it make, if you used a padding function and passed a padding string, which in effect doesn´t pad at all?


I don´t know, if our discussion here is of interest for others, this is already a lengthy thread and maybe it is going to be even lengthier. So if you want we could go on by e-mail. Please drop me a mail at <jk-ide at t minus online dot de> if you agree. Otherwise i will keep on posting here all CWstr related stuff.


Thanks


JK
Title: Re: FreeBASIC CWstr
Post by: Johan Klassen on May 11, 2018, 07:41:06 PM
I find the discussion interesting and educational :)
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 09:24:38 PM
Initially, CWSTR was intended to allow the use of embedded nulls. This is why I use UBYTEs instead of WORDs, and had more methods, such ToStr, but then I discovered a way to allow it to work with the FB instrinsic functions like if it was a native data type. But as this implies casting the returned value to a WSTRING, which is the only unicode data type natively supported by FB, then I had to discard the idea of allowing embedded nulls in exchange of ease of use. Anyway, if you need an string with embedded nulls (generally to store binary data) you can use FB's STRING.

Regarding my use of the MID statement, it is simply a way to avoid the multiple creation of intermediate strings.

With


DIM cws AS CWSTR = WSTRING(nStringLength, wszPadCharacter)
MID(**cws, 1, LEN(wszMainStr)) = wszMainStr


We just create an intermediate string with WSTRING(nStringLength, wszPadCharacter), that we assign to cws, and then we modify it directly with MID(**cws, 1, LEN(wszMainStr)) = wszMainStr.

With


DIM cws AS CWSTR = wszMainStr + WSTRING(nStringLength, wszPadCharacter)
RETURN MID(**cws, 1, nStringLength)


We create three intermediate strings: one with WSTRING(nStringLength, wszPadCharacter), another to concatenate it with wszMainStr, and another with RETURN MID(**cws, 1, nStringLength). MID as a function creates a temporary string; MID as an statement don't.

I prefer to post in a forum; otherwise I may have to repeat the same explanations several times. Most of my FreeBasic posts can be found in the Planet Squires forum because Paul and I have always worked very well together: I have written the framework and Paul is working in the editor and visual designer.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 09:56:44 PM
Quote
Thinking about my previous post i more and more tend to replace a passed empty padding string with a space (this is what you initially did and what PB does), because wszPadCharacter defaults to a space. So coding RSET_(s, count) and RSET_(s, count, "") should return the same result. And what sense whould it make, if you used a padding function and passed a padding string, which in effect doesn´t pad at all?

One good thing of FreeBasic is that it allows for optional values in the parameters, even if the parameter is not at the end of the list. If it is in the middle, followed by another optional or non optional parameter, you can omit it with ,, (like with Visual Basic). Overloading and multiple constructors are also a godsend.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 10:01:45 PM
We can work with arrays of CWSTRs as easily as with arrays of STRINGs.

A two-dimensional array


DIM rg2 (1 TO 2, 1 TO 2) AS CWSTR
rg2(1, 1) = "string 1 1"
rg2(1, 2) = "string 1 2"
rg2(2, 1) = "string 2 1"
rg2(2, 2) = "string 2 2"
print rg2(2, 1)


REDIM PRESERVE / ERASE


REDIM rg(0) AS CWSTR
rg(0) = "string 0"
REDIM PRESERVE rg(0 TO 2) AS CWSTR
rg(1) = "string 1"
rg(2) = "string 2"
print rg(0)
print rg(1)
print rg(2)
ERASE rg


When the array will be destroyed because we erase it or goes out of scope, the destructor of each CWSTR will be called, so you don't have to worry about memory leaks.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 11, 2018, 10:13:21 PM
Thanks to this flexibility, THe CVar class, that implements support for variants, is much more powerful and easier to use that PowerBasic support for them.

We can also have arrays of CVar:


DIM rg(1 TO 2) AS CVAR
rg(1) = "string"
rg(2) = 12345.12
print rg(1)
print rg(2)


And even dynamic arrays of CVar in UDTs:


TYPE MyType
  rg(ANY) AS CVAR
END TYPE

DIM t AS MyType
REDIM t.rg(1 TO 2) AS CVAR
PRINT LBOUND(t.rg)
PRINT UBOUND(t.rg)

t.rg(1) = "String"
t.rg(2) = 12345.12

print t.rg(1)
print t.rg(2)


And also can be used in expressions together with other data types and literals, e.g.:


DIM cws AS CWSTR = "Test string"
DIM cv AS CVAR = 12345.67
PRINT cws & " " & cv & " mixing strings and variants"

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 12, 2018, 12:40:38 AM
A speed test shows, that your code ("MID") is indeed about one third faster than mine. You know more about FreeBASIC intrinsics, because i´m fairly new to FreeBASIC, and that´s why i (will) keep asking...


Quote
Overloading and multiple constructors are also a godsend.


Amen, brother - i absolutely agree!


It is still astonishing to me, how you manged to integrate these data types into a language, which initially wasn´t written for such data types! For me when using a "tool" like this, it is always most important to learn about it´s capacities and limitations in order to make the most out it. Therefore i want to understand as much as possible of how it works and why.


Thanks again


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 12, 2018, 02:59:46 PM
The "@" operator returns the address of the CWstr class. What for would you need this address at all? Wouldn´t it be better (more consistent compared to the other existing string types), if "@" returned a wstring ptr to the wstring data ?


This code:


SELECT CASE AfxStrRSet(s, 20)



fails to compile, because it returns a CWSTR, how to make it return a WSTRING (even if this slows down execution speed), which is accepted? Or does this cause other problems?

What i have in mind, is a one for all solution, which can be used consistently for all available string types (STRNG, ZSTRING, WSTRING and CWSTR) even at the price of slower execution. If i need more speed i can use the more specialized, type specific functions.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 12, 2018, 04:10:24 PM
Quote
The "@" operator returns the address of the CWstr class. What for would you need this address at all? Wouldn´t it be better (more consistent compared to the other existing string types), if "@" returned a wstring ptr to the wstring data ?

To pass a pointer to the class to a procedure that has a parameter declared as CWSTR PTR, to store it in an UDT that has a member declared as CWSTR PTR, etc. Besides, changing the behavior of the @ operator to return a WSTRING PTR to the WSTRING data won't work with your SELECT CASE because you will need to deference it. This is what ** does.

Quote
SELECT CASE AfxStrRSet(s, 20)
fails to compile, because it returns a CWSTR, how to make it return a WSTRING (even if this slows down execution speed), which is accepted? Or does this cause other problems?

To return a WSTRING you will need to declare the return type AS BYREF WSTRING and, as you're returning a reference pointer, you will need to make the variable static, with the problems that we already discussed in the first posts when you wanted to overload the Left operator. Do you remember it?

You can use SELECT CASE AfxStrRSet(s, 20).wstr or SELECT CASE **AfxStrRSet(s, 20)

Quote
What i have in mind, is a one for all solution, which can be used consistently for all available string types (STRNG, ZSTRING, WSTRING and CWSTR) even at the price of slower execution. If i need more speed i can use the more specialized, type specific functions.

This time, it's not a matter of speed. While STRING, ZSTRING and WSTRING are primitive types, natively supported by the compier, CWSTR is a class (or TYPE), and when you use the @ operator with a type it returns the address of it, not to one of its members.

Besides, the cast operator of the CWSTR class should do automatically what you want, and it does it in most cases except LEFT, RIGHT, VAL and SELECT CASE because for these keywords, FB doesn't call the cast operator of the class. This is a reported bug.

Quote
What i have in mind, is a one for all solution, which can be used consistently for all available string types (STRNG, ZSTRING, WSTRING and CWSTR) even at the price of slower execution. If i need more speed i can use the more specialized, type specific functions.

I also would like many things, but I have to settle for what it can be done.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 13, 2018, 02:08:15 PM
Please, forgive me my ignorance. Yes, i was thinking in circles -we have already been there!


Let´s summarize what i have learned: (please contradict, if something is wrong)


CWstr is a type, which is different from an intrinsic variable type. The "@" operator returns an address to the type an not to it´s data, this is consistent with other types and it is necessary, if you want to be able to work with pointers to this type.

You can access the data of a type through it´s members, it´s member functions and through operators, especially the "CAST" operator allows for accessing the type´s data by it´s pure name (identifier) - you can code "mytype" instead of "mytype.data" for instance.


Unfortunately not every FreeBASIC command implements the "CAST" operator properly, which is a known bug.


To overcome this shortcomming you "mis-use" the "*" operator. Instead of dereferencing the type´s pointer, you made it return a "WSTRING PTR". By prepending another "*" you dereference the returned pointer and FreeBASIC gets to see a BYREF WSTRING (which is what the "CAST" operator does too, but, as we know now, doesn´t always work as it should)


If we return a CWstr in a function as "BYREF WSTRING" (which is possible), the CWstr must not be local in that function, because it goes out of scope, when the function exits (invalidateting the data -> memory corruption or GPF).

If we return a CWstr as CWstr, the data is still valid, even if the local CWstr goes out of scope, because we return it as CWstr. But then some FreeBASIC commands complain of an "invalid data type" ...


So it boils down to 3 ways to go:
1.) fix the compiler
2.) accept a workaround ("**", or .wstr) in ceratin cases
3.) find a way to return a (local) CWstr or WSTRING as a BYREF WSTRING without invalidating the data and without producing memory leaks.


Do you agree so far?


JK

Title: Re: FreeBASIC CWstr
Post by: José Roca on May 13, 2018, 03:59:32 PM
I will add a fourth way: Add native support for dynamic unicode strings to the compiler. This would be the ideal solution, but unfortunately I don't think that it will ever happen because of the complexity of doing it with a cross platform compiler.

The wstr method does the same that the cast operator:


' ========================================================================================
' Returns the string data (same as **).
' ========================================================================================
PRIVATE OPERATOR CWstr.CAST () BYREF AS WSTRING
   OPERATOR = *cast(WSTRING PTR, m_pBuffer)
END OPERATOR
' ========================================================================================
' ========================================================================================
' Returns the string data (same as **).
' ========================================================================================
PRIVATE FUNCTION CWstr.wstr () BYREF AS WSTRING
   RETURN *cast(WSTRING PTR, m_pBuffer)
END FUNCTION
' ========================================================================================


But you have to call it explicitly.

** is a shortcut way. It may look strange to some, but it is the only operator that can be double deferenced and serves to two purposes: one * returns the address of the CWSTR buffer, two ** deferences the string data. Some didn't like it and this is why I added the wstr method.

Title: Re: FreeBASIC CWstr
Post by: José Roca on May 13, 2018, 04:54:30 PM
I also added the sptr method, that does the same that a single *, for those used to STRPTR that forget that a CWSTR is not a string, but a TYPE, and that STRPTR can only be used with strings, not with TYPEs. With a CWSTR you don't need to use STRPTR because it has an overloaded cast operator (OPERATOR CWstr.CAST () AS ANY PTR) that allows you to pass the CWSTR variable directly to a procedure that expects a pointer to the string data, but some expect to work with it exactly as with the natively supported strings types, which is not 100% possible.

Title: Re: FreeBASIC CWstr
Post by: José Roca on May 14, 2018, 12:05:47 AM
Quote
If we return a CWstr as CWstr, the data is still valid, even if the local CWstr goes out of scope, because we return it as CWstr. But then some FreeBASIC commands complain of an "invalid data type" ..

When you return a CWSTR, CBSTR, STRING, WSTRING or string literal in a function that has a CWSTR as the return type, the compiler creates a temporary CWSTR and calls its appropriate constructor. It is important to remember that you must use RETURN <CWSTR or CBSTR> instead of FUNCTION = <CWSTR or CBSTR>. The reason is that when we use RETURN the constructor is called BEFORE the local variable goes out of scope, whereas if we use FUNCTION =, the constructor of the temporry CWSTR to be returned is called AFTER the local variable has been destroyed, making impossible to copy it. I don't know if this beavior of FUNCTION = is a bug or intentional (if it is intentional, it doesn't make sense to me).

The problem was that I had not implemented a copy operator (see below).
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 14, 2018, 01:07:59 PM
I have done further tests and FUNCTION = works fine. Maybe when I did the first tests I had not yet implemented the copy operator because I didn't know how exactly FUNCTION = worked. I'm glad that it was my mistake and not a bug.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 15, 2018, 09:59:43 PM
Well José,


i will put this aside for the moment, and direct my focus on the integration of my VD with FreeBASIC again.


But i´m, pretty sure there is a solution for avoiding "**" or ".wstr" somehow or other. Look at this code:



#compiler freebasic
#compile console 32 exe

#define unicode
#include once "windows.bi"
#include once "Afx\AfxStr.inc"
#define USTRING CBSTR


declare function FB_MAIN as uinteger


END FB_MAIN


'***********************************************************************************************
'***********************************************************************************************


function returnbyref(u as ustring) byref as wstring
'***********************************************************************************************
'
'***********************************************************************************************
static i as long
static a(1 to 10) as ubyte ptr


  i = i + 1
  if i > 10 then i = 1

  if a(i) > 0 then
    deallocate a(i)
  end if

  a(i) = u.m_pBuffer


  function = *cast(WSTRING PTR, u.m_pBuffer)
  u.m_pBuffer = 0                                     'prevent buffer from beeing deallocated


end function


'***********************************************************************************************


PRIVATE FUNCTION left2 overload (BYREF w AS WSTRING, BYVAL n AS LONG) byref AS Wstring 'const wstring
'***********************************************************************************************
' return the leftmost n chars of w
'***********************************************************************************************
dim u as ustring = left(w, n)


  return returnbyref(u)


END FUNCTION


'***********************************************************************************************


FUNCTION FB_MAIN AS uinteger
'***********************************************************************************************
' main
'***********************************************************************************************
dim i as long
dim n as long
dim s as string        = "1234" 
dim w as Wstring * 64  = "56789"
dim u as Ustring       = "abcdef"
dim x as Ustring     


  SELECT CASE LEFT2(s, 2)           
    case "12"
      print "ok"
  end select

  sleep


  x = left2(s, 1) + left2(s, 2) + left2(s, 3)
  print x
  x = left2(w, 1) + left2(w, 2) + left2(w, 3)
  print x
  x = left2(u, 1) + left2(u, 2) + left2(u, 3)
  print x
  sleep


end function



It pools the pointers to the CWstr data and prohibits the the data to be deallocated, when the CWstr goes out of scope and deallocates it later. I know this not a serious solution for many reasons. E.g. it fails, if there are too many pointers to be be stored simultaneously, but it gives an idea what could be done. I have yet another idea for a completely different approach, but this will require some research - maybe later.


Another point - if i define a CBstr as USTRING, the "DelChars" function is missing when using the string helper functions (AfxStrRemove, etc.). I know, YOU don´t need this, but would it be possible to add it (DelChars to CBstr), so i could have both:

#define USTRING CWstr
#define USTRING CBstr

As far as i can see, it´s the only one missing to make it interchangeable


Thanks


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 16, 2018, 12:49:07 AM
They're easily converted from one to another, but they are not interchangeable. Redefining CWSTR to CBSTR, or viceversa, with defines is likely to cause many problems. I think that it is better that you add overloaded functions to work with your UISTRING.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 26, 2018, 11:44:55 AM
Hi José,


it´s me again ...


Working with what i find in "AfxStr.inc", i think i found some problems. I put it together in this code:



'console 32

#include "afx\afxstr.inc"


'dim t as CWstr = "ФЫВЙЦУФЫ"                           'fails for "any"
'dim t as CBstr = "ФЫВЙЦУФЫ"                           'fails for "any"
dim t as wstring * 64 = "ФЫВЙЦУФЫ"                    'fails for "any"

print str(AfxStrTallyAny(t, "ФЫ"))                    'fails for wide strings (returns 0)
print str(AfxStrTally(t, "ФЫ"))                       'this works


'dim t1 as Zstring * 10 = "asdfgas"
dim t1 as string * 10 = "asdfgas"                     
print str(AfxStrTallyAny(t1, "as"))                   'this works
print str(AfxStrTally(t1, "as"))                      'this works


print str(AfxStrParseCountAny(t1, "as"))              'returns 3 - should return 5 ?


t = "aЙЦäУКЙЦöööУКЙЦУК123üüüü45"
messageboxw 0, "-" + afxstrshrink(t, " äöü") + "-", "Error", 0
messageboxw 0, "-" + afxstrshrink(t, " Й") + "-", "Error", 0
messageboxw 0, "-" + afxstrreverse(t) + "-", "Error", 0



The more, i wrote own functions for sting manipulation in FreeBASIC mostly using your code with small changes ("fb_str.inc" in "Ustring.zip" attachment). You must add a "#define USTRING Afx.CBstr" and it makes use of some constants defined in "fb.inc" e.g "any_".

Woud you please be so kind as to review it or even test it with your methods, so i can be sure, that it is bug free ? The basic idea is making it possible to have an implementation of your CWstr (defined as USTRING) and string helper functions for it (and all other FreeBASIC string types) even, if WINFBX is not present or intentionally isn´t used. When WINFBX is included, USTRING gets defined as CBstr and all WINFBX functionality can be used without interfering with my aditions (which are always available nevertheless).

You should not use my IDE for testing, because the latest public version still has some inconsistencies regarding FreeBASIC. There will be an update fixing this, but i would like to include the attached code and i want to be as sure as possible, that it works properly.


Thanks,


JK 
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 26, 2018, 01:25:15 PM
Seems to be a problem with INSTR when using a CWSTR variable instead of **. This works:


PRIVATE FUNCTION AfxStrTallyAny (BYREF wszMainStr AS CONST WSTRING, BYREF wszMatchStr AS WSTRING) AS LONG
   IF LEN(wszMainStr) = 0 OR LEN(wszMatchStr) = 0 THEN EXIT FUNCTION
   ' // Remove possible duplicates in the matches string
   DIM nPos AS LONG
   DIM cwsMatchStr AS CWSTR = wszMatchStr
   FOR i AS LONG = 1 TO LEN(cwsMatchStr)
      nPos = INSTR(**cwsMatchStr, MID(wszMatchStr, i, 1))
      IF nPos = 0 THEN cwsMatchStr += MID(wszMatchStr, i, 1)
   NEXT
   ' // Do the count
   DIM nCount AS LONG
   FOR i AS LONG = 1 TO LEN(cwsMatchStr)
      nPos = 1
      DO
         nPos = INSTR(nPos, wszMainStr, MID(**cwsMatchStr, i, 1))
         IF nPos = 0 THEN EXIT DO
         IF nPos THEN
            nCount += 1
            nPos += 1
         END IF
      LOOP
   NEXT
   RETURN nCount
END FUNCTION


I will revise the string functions that use INSTR.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 26, 2018, 02:04:40 PM
I have modified the string functions that used cws instead of **cws.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 26, 2018, 02:27:21 PM
Hi José,


i can download the attachment, but 7zip cannot open it. What did you use for creating it ?


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 26, 2018, 02:40:22 PM
I used RAR. New attachment with .zip extension.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 26, 2018, 02:41:28 PM
Your mid_ function is buggy. Must be if n = 0 instead of <> 0.


PRIVATE FUNCTION mid_ (BYREF w AS WSTRING, BYVAL i AS ULONG, n AS ULONG = 0) AS ustring
'  if n <> 0 then
  if n = 0 then
    return mid(w, i)
  else
    return mid(w, i, n)
  end if
end function 


BTW if I were you I will always add explicit BYVAL or BYREF to the parameters; otherwise, you will get many warnings when compiling with the -w pedantic option.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 26, 2018, 02:57:24 PM
Yep, was meant the other way round! Thanks for the .zip


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 26, 2018, 03:29:02 PM
Changed AfxStrParseCountAny  to be consistent with PB's PARSECOUNT:


PRIVATE FUNCTION AfxStrParseCountAny (BYREF wszMainStr AS CONST WSTRING, BYREF wszDelimiter AS WSTRING = ",") AS LONG
   DIM nCount AS LONG = 1
   FOR i AS LONG = 1 TO LEN(wszDelimiter)
      nCount += AfxStrParseCount(wszMainStr, MID(wszDelimiter, i, 1))
   NEXT
   RETURN nCount
END FUNCTION
[code]
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 26, 2018, 06:40:04 PM
When using Replace_, the contents of the orginal string are sometimes changed.


DIM ustr AS USTRING = "1234567890"
print Replace_(ustr, "5", "x")
print ustr


Same behavior with Insert_.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 26, 2018, 08:38:54 PM
Also with Remove_.


DIM ustr AS USTRING = "1234567890"
print Remove_(ustr, "23")
print ustr


Looks like these are side effects of defining a USTRING as a CBSTR instead of a CWSTR. I told you that they were'nt interchangeable. Changing the define from #define USTRING Afx.CBstr to #define USTRING Afx.CWstr works fine.
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 26, 2018, 09:19:50 PM
In this code


PRIVATE FUNCTION Remove_ overload (BYREF w AS WSTRING, byval anyflag as long = 0, BYREF m AS WSTRING, _
                                   byval iflag as long = 0) AS ustring
DIM u    AS ustring = w


if you have declared USTRING as CBSTR, the passed w AS WSTRING is detected as a BSTR and it is ATTACHED, not copied, whereas if USTRING is declared as CWSTR, it is copied, not attached.

Attaching was needed because FB does not make a distinction between a BSTR and a WSTRING, since BSTR is not supported. Therefore, the CBSTR constructor checks if it is a WSTRING or a BSTR, and attaches the handle if it is a BSTR or copies the contents if it is a WSTRING. If we did always copy, the intermediate BSTRings won't never we freed and we will get memory leaks.


' ========================================================================================
PRIVATE CONSTRUCTOR CBStr (BYREF bstrHandle AS AFX_BSTR = NULL, BYVAL fAttach AS LONG = TRUE)
   CBSTR_DP("--BEGIN CBSTR CONSTRUCTOR AFX_BSTR - handle: " & .WSTR(bstrHandle) & " - Attach: " & .WSTR(fAttach))
   IF bstrHandle = NULL THEN
      m_bstr = SysAllocString("")
      CBSTR_DP("CBSTR CONSTRUCTOR SysAllocString - " & .WSTR(m_bstr))
   ELSE
      ' Detect if the passed handle is an OLE string
      ' If it is an OLE string it must have a descriptor; otherwise, don't
      ' Get the length in bytes looking at the descriptor and divide by 2 to get the number of
      ' unicode characters, that is the value returned by the FreeBASIC LEN operator.
      DIM Res AS INTEGER = PEEK(DWORD, CAST(ANY PTR, bstrHandle) - 4) \ 2
      ' If the retrieved length if the same that the returned by LEN, then it must be an OLE string
      IF Res = .LEN(*bstrHandle) AND fAttach <> FALSE THEN
         CBSTR_DP("CBSTR CONSTRUCTOR AFX_BSTR - Attach handle: " & .WSTR(bstrHandle))
         ' Attach the passed handle to the class
         m_bstr = bstrHandle
      ELSE
         CBSTR_DP("CBSTR CONSTRUCTOR AFX_BSTR - Alloc handle: " & .WSTR(bstrHandle))
         ' Allocate an OLE string with the contents of the string pointer by bstrHandle
         m_bstr = SysAllocString(*bstrHandle)
      END IF
   END IF
   CBSTR_DP("--END CBSTR CONSTRUCTOR AFX_BSTR - " & .WSTR(m_bstr))
END CONSTRUCTOR
' ========================================================================================


To force a copy, you need to change DIM u AS ustring = w to DIM u AS ustring = CWSTR(w), but then w won't we freed.

If I have reserved the use of CBSTR to COM it is for a good reason.


Title: Re: FreeBASIC CWstr
Post by: José Roca on May 26, 2018, 09:33:01 PM
In


PRIVATE FUNCTION AfxStrReplace OVERLOAD (BYREF wszMainStr AS CONST WSTRING, BYREF wszMatchStr AS WSTRING, BYREF wszReplaceWith AS WSTRING) AS CWSTR
   DIM cwsMainStr AS CWSTR = wszMainStr


as cwsMainStr is declared as a CWSTR, wszMainStr will always be copied, not attached. Therefore, this code works:


DIM cbs AS CBSTR = "1234567890"
print AfxStrReplace(cbs, "5", "x")
print cbs


Now change DIM cwsMainStr AS CWSTR = wszMainStr to DIM cwsMainStr AS CBSTR = wszMainStr and you will be asking for trouble.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 27, 2018, 11:58:14 AM
OK - i didn´t know or at least i didn´t understand that!


so, when getting passed a CBstr (or an OLE wide string) like this



FUNCTION somefunc(b as CBstr) AS LONG

DIM b1 as CBstr = b

...



b1 is not a copy of b (as i would expect) but is in fact b itself, because only the OLE handle has been copied and not the data. This means when b1 goes out of scope, b is destroyed as well.

This i not what i would call "regular" string behavior! You need this for COM where in special cases the caller is responsible for freeing the passed OLE string - right ?

If this is the case, then i have two more questions:

1.) how does PowerBASIC handle this situation? I´ve never come across this in PB (maybe my fault, i´m by far not as much a COM expert as you are).

2.) if this is for special cases only, wouldn´t it have been better to have a special "attach" operator for exactly these special cases, instead of making it a standard behavior, which opens unexpected traps.


The reason i defined USTRING as CBstr (and not as CWstr, which of course is possible) is, that i hoped to have a "one for all" wide string type. A type which basically works everywhere, without having to makes decisions where to use this and where to use that. When it´s about heavy string manipulation and i want more speed i can always implement CWstr for this - that´s the idea behind it. 


I think, your approach was to have a separate OLE wide string type ONLY for COM, not only because it must be an OLE string, but also for implementing automatic freeing of the passed string handle. That is, when using CBstr with COM you don´t have to care about when to free passed strings and when not to - your CBstr does it automatically for you. Is this correct ?


Would it make a CBstr usable for all situations (which currently is not possible as i have learned), if the CBstr type could decide, if it is assigned a standard OLE wide string (which could happen in COM only, and in which case it should copy the handle) or if it is assigned another CBstr (which cannot be a parameter passed from COM, and in which case it should copy the data) ? Or are there still other reasons, why a CBstr cannot be used just like a CWstr ? (Maybe i have an idea how to make such a decision possible)


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 27, 2018, 01:37:37 PM
Quote
OK - i didn´t know or at least i didn´t understand that!
so, when getting passed a CBstr (or an OLE wide string) like this

Code: [Select]
FUNCTION somefunc(b as CBstr) AS LONG
DIM b1 as CBstr = b
...


b1 is not a copy of b (as i would expect) but is in fact b itself, because only the OLE handle has been copied and not the data. This means when b1 goes out of scope, b is destroyed as well.

No. In this case, b1 will be a copy. As both b and b1 are CBSTRings, this is the constructor that will be called:


PRIVATE CONSTRUCTOR CBStr (BYREF cbs AS CBStr)
   m_bstr = SysAllocString(cbs)
END CONSTRUCTOR


Otherwise, both b and b1 will try to free the same memory!

CBSTR will aso make a copy if the parameter is a CWSTR, an ANSI string, a literal or a WSTRING, but will attach it if the passed parameter is a BSTR (although the parameter has been declared as a WSTRING because FB does not natively support BSTRings).

Quote
if this is for special cases only, wouldn´t it have been better to have a special "attach" operator for exactly these special cases, instead of making it a standard behavior, which opens unexpected traps.

The constructor has an optional fAttach parameter:


CONSTRUCTOR CBStr (BYREF bstrHandle AS AFX_BSTR = NULL, BYVAL fAttach AS LONG = TRUE)


When calling a COM function that returns a BSTR, you can do

DIM cbs AS CBSTR = <some function>  ' fAttach defaults to TRUE

or

DIM cbs AS CBSTR = (<some function>, FALSE)

but how are you going to pass this parameter when using the FB intrinsic string functions?

Quote
how does PowerBASIC handle this situation?

PowerBasic natively supports BSTRings and knows when it has to allocate an free them. If FB had also native support for BSTRings there will be no problems, but as it only supports WSTRINGs, its intrinsic functions are prepared to free the termporary WSTRINGs that they generate, but they have no idea of what to do with BSTRings.

Quote
I think, your approach was to have a separate OLE wide string type ONLY for COM, not only because it must be an OLE string, but also for implementing automatic freeing of the passed string handle. That is, when using CBstr with COM you don´t have to care about when to free passed strings and when not to - your CBstr does it automatically for you. Is this correct ?

Yes, and also for efficiency. If the return type is a CBSTR, I can simply return a BSTR, that will be attached. Otherwise, I will have to create a temporary CBSTR, copy the contents of the BSTR to it, free the BSTR and return the temporary CBSTR, whose contents will be copied again.

Quote
The reason i defined USTRING as CBstr (and not as CWstr, which of course is possible) is, that i hoped to have a "one for all" wide string type. A type which basically works everywhere, without having to makes decisions where to use this and where to use that. When it´s about heavy string manipulation and i want more speed i can always implement CWstr for this - that´s the idea behind it.

I know, but I have warned you that they are not interchangeable. BSTRings are managed by te Windows COM library, not FreeBasic. The first string class that I tried to write was CBSTR and I did lose countless hours trying to solve all the problems. Finally, I decided to write CWSTR for general use and relegate CBSTR for COM use.

Of course, you can try to write your "interchangeable" BSTR class. If you still have some hair in your head, you will lose it.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 27, 2018, 01:53:03 PM
Well, looking closer at your code for CBstr, you already did, what i had in mind. If a CBstr is assigned another CBstr, in fact it creates a new string and copies it´s data. It attaches only, if it is assigned an OLE string, which isn´t a CBstr.


So as a consequence - it already does, what i want, if i code it like this:



PRIVATE FUNCTION Remove_ overload (BYREF w AS USTRING, byval anyflag as long = 0, BYREF m AS WSTRING, _
                                   byval iflag as long = 0) AS ustring
DIM u    AS ustring = w



Do you see other problems with this approach in general (other than to have to adapt my code in some places)?


JK



Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 27, 2018, 02:02:33 PM
Oh, i see we cross posted!


Quote
If you still have some hair in your head, you will lose it.


I just had a look in the mirror - enough hair for many years to come (even if i must admit, there were times when there were even more) ;-).
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 27, 2018, 02:24:29 PM
José,


it´s not about writing an interchangeable BSTR class.

I want to use CBstr as the standard wide string class instead of CWstr. In a previous post i asked, if i could implement a CBstr wherever i can implement a CWstr, your answer was - yes, but CWstr are faster (if i recall this correctly). 


I repeat my question,



PRIVATE FUNCTION Remove_ overload (BYREF w AS USTRING, byval anyflag as long = 0, BYREF m AS WSTRING, _
                                   byval iflag as long = 0) AS ustring
DIM u    AS ustring = w


... aviods the ambiguity of "BYREF w AS WSTRING" for the receiving CBstr ...


do you see other problems with this approach (defining all ingoing strings as USTRING = CBstr) in general (arising from the fact that i explicitly use a CBstr here, other than having to adapt my code in some places)? I accept, that this may not be the fastest possible way in favour of having a generic way. If there is need for speed, i can switch to CWstr.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 27, 2018, 03:34:00 PM
You can do it, but in an inneficient way, using only intrinsic functions, just as a beginner will do it.


PRIVATE FUNCTION _StrRemove OVERLOAD (BYREF wszMainStr AS USTRING, BYREF wszMatchStr AS WSTRING) AS USTRING
   DIM ustr AS USTRING = wszMainStr
   DIM nLen AS LONG = LEN(wszMatchStr)
   DO
      DIM nPos AS LONG = INSTR(**ustr, wszMatchStr)
      IF nPos = 0 THEN EXIT DO
      ustr = MID(ustr, 1, nPos - 1) & MID(ustr, nPos + nLen)
   LOOP
   RETURN ustr
END FUNCTION


Multiple concatenations, multiple creation/destruction of temporary types, multiple assignments. You can say goodbye to any speed advantage when defining USTRING as CWSTR.

What I wonder is what advantage do you think you will have using CBSTR as your general data type.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 27, 2018, 07:31:02 PM
Well, the advantage would be to have an universal string data type for all possible implementations!

Not everyone has your expertise and experience in coding. Look at me, i dare say, i´m definitely not a beginner, but  i had to ask a lot of questions (and maybe will have to) in order to be able to implement your work properly. Not everyone has such a long breath like me, asking and asking again until the last uncertainty is fixed.

Implementing your work into my IDE i want to present an easy to use "interface", which just works (without too many restrictions and special cases) Everyone who wants to dig deeper and wants to make the most out of it, can do so and will have to learn what i had to learn about it. But nobody should be forced to do so (my point of view)!

Maybe this is a matter of design philosophy and where to set the border, you decided to set the border when it comes to COM. Which is a logical choice - a new area requiring a new data type. Coming from PowerBASIC, where this "border" doesn´t exist, i think it would be nice to have it like there. And it would make things easier for newbees in COM.

Let´s see, what´s possible - i´m almost certain there will be more questions. Thanks for your patience!


JK


Title: Re: FreeBASIC CWstr
Post by: José Roca on May 27, 2018, 09:26:07 PM
Well, not fully universal. Don't use USTRING defined as CWSTR with COM and don't use USTRING defined as CBSTR with the functions of my framework that use CWSTR in the internal code. I think that it is a bad idea and can't anticipate all the troubles that these redefinitions can cause.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 27, 2018, 11:14:45 PM
Quote
Don't use USTRING defined as CWSTR with COM

this is exactly what i have in mind:

- if your framework is not included, i will define USTRING to be a clone of CWstr. How should someone use COM without your framework? So, no problem here. 

- if your framework is included, i will define USTRING to be a CBstr, which works universal, if i adapt my functions accordingly. Your framework isn´t affected in any way, because it is written with the original definitions (a CWstr remains a CWstr and a CBstr remains a CBstr there). And as you said passing a CBstr to a function expecting a CWstr is no problem at all (if you drop the speed loss for the conversions).

The only critical situation would be, what i initially coded: passing a CBstr to a function, which expects a "byref wstring" and this wstring gets assigned to a CBstr inside the function, which would result in an unwanted "attach" rather than a "copy". I searched your framework for such a construct and couldn´t find any! So, if this is true (please contradict, if i´m wrong), we should be on the safe side regarding this.

FreeBASIC allows for overloading, so i could optimize my functions for CBstr AND CWstr separately and the "#ifdef" metastatement allows for "activating" the ones needed for the specific situation (with your framework included or without). I can have an universal wide string type (which is always "USTRING") and in case your framework is included, i have an additional wide string type for optimum speed (CWstr) and i have specialized functions for both, which share the same syntax - finally i´m getting nearer, where i wanted to get!


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 28, 2018, 03:14:55 AM
Well, in post #64, I said: "I think that it is better that you add overloaded functions to work with your UISTRING."

In my string functions, I'm using BYREF CONST AS WSTRING for two reasons:

1.- One function fits all. It works with all the string data types: STRING, ZSTRING, WSTRING, CWSTR and CBSTR, and also string literals and CVARs.

2.- It is more efficient when passing a CWSTR or CBSTR because no conversion is performed since what is being passed is a pointer to the string data.

Microsoft didn't implement the Automation data types with speed in mind: BSTRings, VARIANTs and SAFEARRAYs are somewhat inneficient. Automation was designed mainly for Visual Basic, VBScript and Office. It is very flexible, but slow, and a pain to use with languages that don't support it natively.

Because the use of VARIANTs and SAFEARRAYs are sometimes unavoidable with COM, I have implemented CVAR and CSafeArray. My implementation of these data types is much more powerful and flexible that the PowerBasic ones.

Here you have a function that works with all the string data types:


#include once "Afx\CVAR.inc"

PRIVATE FUNCTION StrRemove (BYREF cvMainStr AS CVAR, BYREF cvMatchStr AS CVAR) AS CVAR
   DIM cv AS CVAR = cvMainStr
   DIM nLen AS LONG = LEN(cvMatchStr.wstr)
   DO
      DIM nPos AS LONG = INSTR(cv, cvMatchStr)
      IF nPos = 0 THEN EXIT DO
      cv = MID(cv, 1, nPos - 1) & MID(cv, nPos + nLen)
   LOOP
   RETURN cv
END FUNCTION

print StrRemove("Hello World. Welcome to the Freebasic World", "World")
DIM s AS STRING = "Hello World. Welcome to the Freebasic World"
PRINT StrRemove(s, "World")
DIM cws AS CWSTR = "Hello World. Welcome to the Freebasic World"
PRINT StrRemove(cws, "World")
DIM cbs AS CBSTR = "Hello World. Welcome to the Freebasic World"
PRINT StrRemove(cbs, "World")


Can you do something like this with PowerBasic support for variants?

It also works without problems with my existing string functions, so there is not need to write new ones:


DIM cv AS CVAR = "Hello World. Welcome to the Freebasic World"
PRINT AfxStrRemove(cv, "World")


You can also use them with the FB intrinsic functions:


DIM cv AS CVAR = "Test string"
cv = cv & " 123"
cv = cv & 45
cv += " - some more text"
print cv
PRINT LEFT(cv, 4)


They can store almost any data type:


DIM cv AS CVAR = "Test string"
DIM cv2 AS CVAR = 12345.67
print cv & " " & cv2


You can have arrays, safe arrays, associative arrays, stacks and queues...

If speed doesn't worry you, maybe this is the universal data type you're looking for... :)

Note: The use of my framework is required. Sorry.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 28, 2018, 01:17:37 PM
José,


it´s not about getting rid of your framework!

BTW,  « Last Edit: Today at 04:30:05 AM by José Roca » when do you sleep, do you sleep at all?


Coming back to the "unversal" thing: in post #35 you wrote

Quote
DIM cbs AS CBSTR, and pass cbs or cbs.sptr to IN parameters and cbs.vptr to OUT/INOUT parameters.

This means for IN paramaters (to be read only) you pass a pointer to the actual data, which in turn means, i could pass a CWstr as well, i could even pass a WSTRING PTR - is this correct?

For an OUT/INOUT parameter (returned string/ parameter which might be modified) you pass an OLE string handle. therefore it MUST be a CBstr - and you must pass it as "CBstr.vptr" (not only "CBstr", which would pass the data - different syntax!)  In case of a return value of a method CBstr recognizes, if it is receiving a BSTR or not, and acts accordingly (no different syntax - but it MUST be a CBstr for proper working).

So in cases, where i must pass a BSTR to COM, i CANNOT have a consistent syntax, i MUST have "...vptr" anyway - is this correct?


If this is correct, in fact i don´t have any advantage defining USTRING as CBstr, CWstr would be the better choice. But at least having a CWstr as IN parameter in COM shouldn´t be a problem then.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 28, 2018, 01:43:16 PM
You can't pass a CWSTR or a WSTRING to a COM procedure that expects a BSTR. The main difference between them is that a BSTR carries its length with it. If you pass a WSTRING or CWSTR, what will happen when the called code will call SysStringLen to get the length of the string?

Procedures that expect a WSTRING retrieve the length searching for a double null, but procedures that expect a BSTR retrieve the length calling SysStringLen.

CWSTR and FB's WSTRING are equivalent to PowerBasic WSTRINGZ. Can you use WSTRINGZ with procedures that expect a BSTR (WSTRING in PowerBasic)?

> So in cases, where i must pass a BSTR to COM, i CANNOT have a consistent syntax, i MUST have "...vptr" anyway - is this correct?

For OUT and IN/OUT parameters you must use .vptr. In my first implementation I used an overloaded @ operator, but then there was the problem that I could not use @ to get the address of the class.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 28, 2018, 07:34:02 PM
Quote
Procedures that expect a WSTRING retrieve the length searching for a double null, but procedures that expect a BSTR retrieve the length calling SysStringLen.


Ok - my error! I cannot pass a CWstr or WSTRING directly. But when an IN parameter is defined as CBstr in the procedure header, passing a CWstr or a WSTRNG instead of a CBstr (to the property, not to the COM object) should work anyway, because the incoming data type is automatically converted into an intermediate CBstr, if i recall it right.



...

property someprop(byref p as CBstr) as long
...

dim cws as CWstr = "Hello"
dim n as long   
...

n = someprop(cws)
...



As long as "p" in "someprop" is an IN parameter, this will work in general (because of the intermediately created CBstr) - is this correct?
I understand, that it fail for an IN/OUT parameter and of course i cannot code:



property someprop(byref p as CWstr) as long




JK

Title: Re: FreeBASIC CWstr
Post by: José Roca on May 28, 2018, 08:10:08 PM
> As long as "p" in "someprop" is an IN parameter, this will work in general (because of the intermediately created CBstr) - is this correct?

Of course. What I mean is that if you intend to call, for example, the GetParentFolderName of the IFileSystem object

GetParentFolderName (BYVAL Path AS BSTR, BYVAL pbstrResult AS BSTR PTR) AS HRESULT

you have to pass a BSTR with the path and the address of another BSTR to receive the result, and if they aren't true BSTRings it won't work. You can't declare the parameters as CBSTR instead of BSTR because the called method has no idea of what a CBSTR is, and you can't declare these parameters as WSTRINGs because it will fail. What you can do is to pass cbsPath.bptr and cbsRes.vptr.

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 28, 2018, 11:16:20 PM
José,


i don´t intend to call GetparentFolderName directly, i would use the wrapper function your framework supplies, which conveniently accepts CBstrs:


Quote
PRIVATE FUNCTION CFileSys.GetParentFolderName (BYREF cbsFolder AS CBSTR) AS CBSTR


and i would call it like this


Quote
p = <CFileSys>.GetParentFolderName(f)


where p MUST be a CBstr, and f can be a CBstr, CWstr, WSTRING, String and ZSTRING (and if i define USTRING as CWstr, a USTRING as well).


I didn´t find one single instance, where a BSTR is needed as parameter of a procedure i could call directly. As it seems, it is used internally alone. So when using your framework i can get away with a CWstr most of the time and sometimes (only in COM) i MUST use a CBStr for IN/OUT parameters (CBstr.vptr) and for returned strings (CBstr).

In summary this means: when using your framework i can almost have what i want, a universal string data type (USTRING defined as CWstr). And only in some special cases (as said above) i must implement a CBstr, which mimics a BSTR, which is the required string data type for COM. If someone doesn´t want to use your framework in my IDE, i will supply a wide string data type (a clone of your CWstr) and some functions for string manipulation. In this case COM (and all other functionality of your frame work) is not available, and a decision between CBstr and CWstr is immaterial and pointless, because there is no use for CBstr, in fact there is no CBstr at all - so USTRING (= CWstr) can be used universally.


Do you agree?


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 29, 2018, 12:23:04 AM
It is mandatory for parameters. For strings returned as the result of a function, you can assign it to a CBSTR, CWSTR, CVAR, STRING, WSTRING or even ignore it. The returned CBSTR is temporary and destroys itself.

If somebody does not want to use my framework, it will be his loss, not mine. I'm providing with it the functionality available in PowerBasic that is missing in FreeBasic and much more, although, of course, with a different syntax and programming style.

A SDK programmer should not have many problems to use my framework, but unfortunately the PBer's have been spoiled by DDT. Very few, if any, will use CWindow directly. Instead they will use a tool like the Visual Designer in which Paul Squires is working. The generated code will use CWindow and other wrappers internally, but for the custom code they will use the OOP classes in which Paul is also working. Millions of programmers have been spoiled by Visual Basic.

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 29, 2018, 11:55:00 AM
Quote
For strings returned as the result of a function, you can assign it to a CBSTR, CWSTR, CVAR, STRING, WSTRING or even ignore it. The returned CBSTR is temporary and destroys itself.


...even better! So the only place, where CBstr.vptr is mandatory in your framework, is an IN/OUT parameter in a COM method or property.



Quote
If somebody does not want to use my framework, it will be his loss, not mine.


i absolutely agree!



The Visual Designer in my IDE produces pure SDK code without any wrappers (it already does in PowerBASIC and it will do in FreeBASIC). In fact i´ve never understood, what´s the benefit of learning wrappers compared to learning the fundamentals, which can be re-used in almost every programming language for Windows. DDT and Visual Basic were attempts to make things easier (a marketing instrument) for beginners and professionals, but i agree, while maybe easier in the beginning both don´t make it easier in the long run, and both kept/keep people away from learning the fundamentals of the Windows API. I think we share the same point of view here.


Did you find any more problems in the code i posted (fb_str.inc, when USTRING is defined as CWstr)


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on May 29, 2018, 01:26:46 PM
> ...even better! So the only place, where CBstr.vptr is mandatory in your framework, is an IN/OUT parameter in a COM method or property.

Only for OUT parameters. For IN/OUT parameters not (I mentioned it previously by mistake, sorry).

The reason is to avoid memory leaks. The COM rules are strict. When we pass a BSTR pointer to a COM method with an IN/OUT parameter, the COM method reads the content of the passed BSTR, does what it needs to do, frees the passed BSTR and returns a new allocated BSTR. When we pass it to an OUT parameter, it simply returns a pointer to a new allocated BSTR. Therefore, if we pass a BSTR that has contents, it won't we freed and we will have a memory leak. What the .vptr method does is to free the BSTR before pasing the pointer. The alternative is to clear the CBSTR before passing it.

> Did you find any more problems in the code i posted (fb_str.inc, when USTRING is defined as CWstr)

The fundamental flaw was to define it as CBSTR and then use it with a function designed to work with a CWSTR. CBSTR uses attaching instead of copying whenever possible for speed and for ease of use. If it is attached, the BSTR will we freed when the CBSTR is destroyed. If it is copied, you are responsible of freeing the original BSTR.

> I think we share the same point of view here.

If nobody did learn how to use the API, there won't be tools and frameworks available. Even the Windows API is a framework. My framework simply helps to do some things more easily. Instead of using CWSTR, you can work allocating and freeing your own buffers and manipulate them using pointers, but this is very hard.

When a SDK programmer tries to use a new compiler, as we are doing with FreeBasic, all he needs is to learn the intrinsics of the language, but DDTers will ask "What is the equivalent of CONTROL SET TEXT?," "What is the equivalent of CONTROL ADD LISTBOX?," etc. Excepting Paul, that has used it to write his editor, nobody has helped me to debug the framework. Everybody is lurking and waiting for the upcoming visual designer.

Anyway, I believe that almost all the DDTer's will continue to use PB for ever. Therefore, my efforts are directed to extend FreeBasic, not to attract DDTers.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on November 24, 2018, 06:06:32 PM
Hello, José,


having finished my IDE´s todo list for FreeBASIC, i will have time now for returning to CWstr.

The type itself seems to work flawlessly, at least i did not encounter problems anymore using it in the last month. I my view there are two problems remaining:

1.) CWstr is not multiplatform, currently it depends on windows
2.) it doesn´t integrate seamlessly into the compiler (you need to prepend "**")

Well, #1 should be solvable, you must get rid of the Windows API functions. For the copy part i already have specialized assembler functions (32 and 64 bit), which leaves the conversion functions. I didn´t try but maybe "MultiByteToWideChar" could be replaced by FB intrinsic functions (maybe in "utf_conv.bi"). There must be such (multiplatform) functions, because of the automatic conversions between STRING/ZSTRING and WSTRING taking place when assigning one type to the other.

#2 is a compiler problem not always processing CWstr as it should. Fortunately the compiler´s code is available. I don´t know anything (so far) about compiler building, nevertheless it took me only two and a half hours to code a (working) fix for the "SELECT CASE" problem, you don´t need "**" for CWstr anymore after "SELECT CASE". So fixing the compiler´s behavior is doable!


Would you like to help making your excellent work as perfectas possible ? I would appreciate that very much!


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on November 24, 2018, 07:18:58 PM
With Linux, FreeBasic uses UTF-32 (4 bytes for each character). Therefore, you may need not only to get rid of the Windows functions, but a separate class that treats each character as a DWORD.

I never have used Linux and I don't think that I will ever do.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on November 24, 2018, 11:50:15 PM
José,


are you sure ? This would mean that "WSTRING" must be handled differently for Windows and Linux by the compiler. Even if this is the case, just processing a character of dword size instead of word size for Linux shouldn´t be much of a problem. There are intrinsic defines (__FB_LINUX__, etc.) you could use for platform depended conditional compiling in your CWstr class.

I don´t use Linux either, nor do i plan to use it, but there should be enough people in the FB forum, who actually do. Maybe someone is willing to help. Would you mind me asking for help there ? I don´t want to take over your work, but i intend to optimize it, if you don´t mind and if you don´t want to take the lead.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on November 25, 2018, 09:32:52 AM
Open

The encoding to be used when reading or writing text, can be one of:

Encoding "ascii" (ASCII encoding is used, default)
Encoding "utf8" (8-bit Unicode encoding is used)
Encoding "utf16" (16-bit Unicode encoding is used)
Encoding "utf32" (32-bit Unicode encoding is used)

However, I think that Linuxers use mainly utf-8 because utf-32 is wasteful and not widely supported by third party libraries.

Regarding FreeBasic users, I don't see anybody using unicode, neither in Windows nor in Linux. I'm an exception. Even the FreeBasic "Open" statement can't open a file using an unicode file name. Go figure!

Anyway, if you want to play with it, you can use the DWSTRING class that I posted in the FreeBasic forum. It is a variation of CWSTR that uses a WSTRING pointer instead of an UBYTE pointer. You can do with it whatver you wish.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on November 25, 2018, 11:02:23 AM
José,


thanks - i will try my best


JK


Title: Re: FreeBASIC CWstr
Post by: Marc Pons on November 25, 2018, 08:05:17 PM
Hi
happy to see someone interrested on unicode for freebasic

it was a long story for me, José probably remember , specially for concatenation...

please have a look here  https://www.freebasic.net/forum/viewtopic.php?f=17&t=24070&hilit=marpon&start=45 (https://www.freebasic.net/forum/viewtopic.php?f=17&t=24070&hilit=marpon&start=45)

finally i stopped developping that point, i see you want to follow the same route ... good

if you are interrested  my githib with my last code https://github.com/marpon/DWSTR (https://github.com/marpon/DWSTR)
and my first attempt https://github.com/marpon/uStringW (https://github.com/marpon/uStringW)
marc
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on November 26, 2018, 06:40:47 PM
Hi Marc,


thanks, i didn´t know your version, which tries to address Linux and surrogate pairs too. Are there there fundamental differences between your version and José´s version (apart form your´s being multiplatform and the surrogate pair stuff) ?
If yes - what and where are they ?

You say you stopped developping that point - what exactly do you mean by "that point" - the whole thing ? Where there still unsolved problems other than having to prepend "*" ?



JK


Title: Re: FreeBASIC CWstr
Post by: Marc Pons on November 27, 2018, 10:42:46 AM
hi Juergen

i've stopped investing time on  "unicode" globally.
In fact for my own usage i do not need it,  cp1252 is enougth for me.
It was just for me an exercise and a long time attempt to push the FreeBasic team integrating it as a native feature
and making it visible for them to show it is doable quit easily if ...

My last version not have fundamental differences vs José's, that inspired me a lot, execpt:
   - surrogates minimal support
   - linux /windows
   -  *   compared to **   (wich i did not like)
   - strptr overload
   - codepage management for win / linux
and some extra stuff

in my first version uStringW (1 year before DWSTR), you could see more conversion and file support that i did not add to DWSTR

Marc

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 02, 2018, 11:14:06 PM
Ok - i have been successful adapting the compiler´s code so that it now accepts and properly processes "SELECT CASE <USTRING>" and "MID" (function and statement) with USTRINGs.

I need all kinds of sample code snippets, where "**" (José) or "*" (Marc) must be prepended because either
- the compiler throws an error
or (even worse)
- the compiler accepts it but the result is wrong (e.g "MID" as a statement compiles, but the result is wrong for an USTRING).

Please help me with your code, so I don´t miss something...


So far it isn´t as hard as i expected, it´s all about telling the compiler to handle an USTRING just like a WSTRING in different places. I´m quite confident to get this done without major changes to the compilers inner working and therefore i hope to be able to convince the current developers to integrate it.


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 04, 2018, 06:26:12 PM
José, Marc


here is a test version of the compiler (32 and 64 bit). According to my tests prepending "*" isn´t necessary anymore. You should be able to use exactly the same syntax as with other intrinsic FreeBASIC string data types. Please help testing.


JK


PS: I forgot to mention both compilers are compiled for the standalone (not for the installer) version.
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 04, 2018, 07:24:01 PM

DIM cws AS CWSTR = "Сергей Сергеевич Прокофьев"
AfxMsg MID(cws, 2)


prints several "?"


AfxMsg MID(**cws, 2)


works fine.

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 04, 2018, 11:23:44 PM
My bad,

i forgot to mention something really important: you must add "#PRAGMA DWS" before the first "#INCLUDE ..." line. This is a switch i added for dynamic wide strings, without it the compiler works as usual


Sorry,

JK
Title: Re: FreeBASIC CWstr
Post by: Marc Pons on December 05, 2018, 11:26:41 AM
Juergen

what class did you take into account CWSTR or DWSTR ?

with DWSTR, if i test with your modified fbc:
Right / Left  native functions   without my own overloaded functions i still need to dereferrence
trim  and all variation still need to dereferrence

mid seems to work correctly without dereferrencing

'compile with console to view the information

' windows_test_1_dwstr.bas : to test under windows the DWSTR (dynamic Wstring) class

'########################################################################
'this test code assume you are using a system codepage : 1252
' the literal inputs are dependant of that codepage,
' except utf8 inputs which are codepage independant
'########################################################################
#PRAGMA DWS
#DEFINE UNICODE                                  ' needed to messagebox only : to use wstring not string


#INCLUDE ONCE "DWSTR.inc"                        'DWSTR class

#Include Once "crt/time.bi"                      'just to measure the speed


scope                                            'interesting to check the destructor action on debugg mode

   print : print "testing  last  DWSTR.inc  "

DIM cws AS DWSTR = Dw_Wstr( "   Êàêèå-òî êðàêîçÿáðû   " , 1251)

messagebox(0, cws, "test  len =" & len(cws), 0)
messagebox(0, mid(cws,5), "mid(cws,5)   len =" & len(mid(cws,5)), 0)

cws = trim(cws)
messagebox(0, cws, "test  len =" & len(cws), 0)
messagebox(0, mid(cws,5), "mid(cws,5)   len =" & len(mid(cws,5)), 0)

dim dw1 as dwstr = dw_string(23 , &h1D11E)
   messagebox(0 , dw1 & wstr( "   dw_Len ") & wstr(Dw_Len(dw1)) , "test capacity " & wstr(dw1.capacity) , 0)

messagebox(0 , dw1 & wstr( "   SurPair ") & wstr(dw1.Sur_Count) , "test capacity " & wstr(dw1.capacity) , 0)
   dw1.replace( "_it's a test of replacing text " , 21)
   messagebox(0 , dw1 & wstr( "   dw_Len ") & wstr(Dw_Len(dw1)) , "capacity " & wstr(dw1.capacity) , 0)
messagebox(0 , "Sur_Count = " & dw1.Sur_Count & "  nb of surrogate pair " , "len =" & len(dw1) & "  dw_len =" & Dw_Len(dw1), 0)


dim dw4 as dwstr = mid(dw1, 9, 4)
messagebox(0 , ">" & dw4  & "<    len = " & len(dw4) & "    dw_len = " & dw_len(dw4), "test mid as wstring only" , 0)

DIM bs2 AS dwstr = dw_wstr( "Ð"ми́Ñ,рий Ð"ми́Ñ,риевич" , CP_UTF8)
   messagebox 0 , bs2 , "test dw_wstr CP_UTF8" , MB_OK
   messagebox 0 , mid(bs2,5) , "test mid" , MB_OK
   dim z1                as double
   dim z2                as double
dim n as long = 1000000
   PRINT : PRINT
   print : print "Press key to continue !"
   sleep

   dim         as string st1
   dim as string sText = "verif : "



   dim         as DWSTR uws, uws0,uws1,uws2
   
   dim as DWSTR uwsText = "verif : "

   dim x                 as long
   print : print
   print "=========================================="
   print "   Comparaison DWSTR Solutions : concatenation"
   print "==========================================" : print

   

   z1 = clock()
   for x = 1 to n
      st1 += "Line " & n & ", Column " & n & ": " & sText '& sText2
   NEXT
   z2 = clock()
   print : print "STRING using  &" : print right(st1, 38) + "   time = " + str(z2 - z1) + " ms   len = " & len(st1): print
print mid(st1, 37999962) + "   mid ": print
   print "==========================================" : print
   z1 = clock()
   for x = 1 to n
      uws += "Line " + WSTR(n) + ", Column " + WSTR(n) + ": " + *uwsText
   NEXT
   z2 = clock()
   print : print "DWSTR dereferenced  using + " : print right(uws,38) + "   time = " + str(z2 - z1) + " ms   len = " & len(uws): print
print mid(uws, 37999962) + "   mid ": print

z1 = clock()
   for x = 1 to n
      uws0 += "Line " + WSTR(n) + ", Column " + WSTR(n) + ": " & *uwsText
   NEXT
   z2 = clock()
   print : print "DWSTR dereferenced  using & " : print right(uws0,38) + "   time = " + str(z2 - z1) + " ms   len = " & len(uws0): print
print mid(uws0, 37999962) + "   mid ": print

   z1 = clock()
   for x = 1 to n
      uws1 += "Line " + WSTR(n) + ", Column " + WSTR(n) + ": " + uwsText
   NEXT
   z2 = clock()
   print : print "DWSTR not dereferenced  using + " : print right(uws1,38) + "   time = " + str(z2 - z1) + " ms   len = " & len(uws1): print
print mid(uws1, 37999962) + "   mid ": print

   z1 = clock()
   for x = 1 to n
      uws2 += "Line " + WSTR(n) + ", Column " + WSTR(n) + ": " & uwsText
   NEXT
   z2 = clock()
   print : print "DWSTR not dereferenced  using &  new overloaded operator " : print right(uws2, 38) + "   time = " + str(z2 - z1) + " ms   len = " & len(uws2): print
   print mid(uws2, 37999962) + "   mid " : print : print


   
   print : print

end scope


print : print "Press key to finish !"
sleep


could you post somewhere (better github) your evolution of fbc?
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 05, 2018, 04:26:25 PM
Marc, José,


currently it accounts for "CWSTR" , "CBSTR", "DWSTR" and "JK_CWSTR", so it should work with José´s WINFBX, with Marc´s DWSTR and my IDE.

LEFT, RIGHT and VAL(INT, etc.) can be fixed with overloaded functions, so no need for a fix inside the compiler.

I changed processing of "SELECT CASE", "MID" (statement and function), "INSTR(REV)", "LSET/RSET" and all converting functions (e.g "CINT") to work without prepending "*". I may have made a mistake with "TRIM" - that´s, what tests are for.

Please make further tests and help finding other possible problems. I didn´t touch "STRPTR", but in my view it would make sense, if it worked just like for STRING returning a WSTRING PTR to the string data for a dynamic wide string.

I´m going to investigate what´s wrong with "TRIM"


JK
Title: Re: FreeBASIC CWstr
Post by: Marc Pons on December 05, 2018, 05:10:19 PM
Juergen

Quote from: Juergen Kuehlwein on December 05, 2018, 04:26:25 PM
I didn´t touch "STRPTR", but in my view it would make sense, if it worked just like for STRING returning a WSTRING PTR to the string data for a dynamic wide string.

I've already overloaded strptr to return a WSTRING PTR of the data buffer

and overloaded the operator  &   for concatenation with DWSTR; string ; numeric val   

one question why do you create the pragma switch ?  to isolate code for tests?
in my opinion, if that feature is implemented officially into compiler, better not have that kind of switch
Title: Re: FreeBASIC CWstr
Post by: Marc Pons on December 05, 2018, 06:20:00 PM
Juergen

WSTR function needs also prepend * , can not be overloaded as simple as left/right
better to act at compiler level too
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 05, 2018, 07:15:28 PM
Marc,


Quoteone question why do you create the pragma switch ?  to isolate code for tests?


Yes - it will be removed later on. Currently all of my added code is enclosed by an "IF" .... "END IF" clause testing for that pragma. I´m far from understanding everything the compiler does, so i thought it would be good idea having an opportunity for testing with my code set to be active and vice versa, when problems occur.

I think i fixed the "TRIM" bug. Why would you want to code "WSTR(<wide string>)", which returns itself ? Could you please post code where it fails.


attached is what i currently have


JK



 
Title: Re: FreeBASIC CWstr
Post by: Marc Pons on December 06, 2018, 09:53:29 AM
Juergen,
I've just tested your last compiler version

Trim, Ltrim, Rtrim  are ok now

why adapt wstr ?
to be homogeneous   wstr already does a "conversion" from wstring
QuoteDeclare Function WStr ( ByVal str As Const WString Ptr ) As WString
with DWSTR variables  uws2 , uwsText
it is faster doing uws2 &= "Line " & WSTR(n) & ", Column " & WSTR(n) & ": " & WSTR(uwsText)
compare to uws2 &= "Line " & WSTR(n) & ", Column " & WSTR(n) & ": " & uwsText

I"ve tested with your last evolution and seems to work !   did you modify it, on your last version?  still not working in fact (without error message, so risky)

and again could you post your source code compiler  somewhere?
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 06, 2018, 06:34:09 PM
I think i fixed STR and WSTR as well now.

Attached is what i currently have: source code + 32 bit fbc.exe. Look at "rtl-string.bas", most changes are there. You may search the files for " JK " to find all changes applied, i added comments starting with " JK " to every new section or line.

Apart from adding a new pragma and a new compiler option for test purposes the basic principle is almost the same everywhere: look, if the current expression is a dynamic wide string (UDT + type = JK_CWSTR or CWSTR or CBSTR or DWSTR) -> jump to where WSTRINGs are processed.


I talked with coderjeff at the FreeBASIC forum about this topic and he wanted me to push my branch to the GitHub repository. I´m going to do this , if we are done with testing


JK
Title: Re: FreeBASIC CWstr
Post by: Marc Pons on December 07, 2018, 11:45:03 AM
Juergen,
I've tested and wstr and str are working now!

good job.

QuoteI talked with coderjeff at the FreeBASIC forum about this topic and he wanted me to push my branch to the GitHub repository. I´m going to do this , if we are done with testing

i think i would be better to commit into master branch , if you make your separate branch it will not be included as normal standard evolution.
it was what i've done with my proposed __FB_GUI__ switch wich is now included , check with coderjeff.


Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 07, 2018, 02:33:25 PM
@Marc,

thanks - has your your version (DWSTR) been tested with LINUX and other targets too ? Or should there tests still be made ? The compiler definitely must be tested with LINUX and others, but i cannot do that, who could help ? Maybe i should start a new thread at the FreeBASIC forum  - or should i go on with yours (https://www.freebasic.net/forum/viewtopic.php?f=17&t=24070&hilit=marpon&start=45) ?


@José,

were your test successful now as well ? Should we ask Paul about "USTRING" (see below) too, so working together we could establish a "quasi standard" for dynamic wide strings ? What do you think ?


@both

In order to avoid confusion i think the new dynamic wide string type should be named "USTRING" (This is in a row with ZSTRING and WSTRING) There is no need to actually rename it in your, José´s or my code. IMHO a #DEFINE is the best solution, so the different versions can exist side by side and it´s only a matter of a different #DEFINE to switch between them.

But we should work on a common include file, which should be added as a universal (Windows/Linux, etc) include file (.bi) to the official distribution. Maybe there should some of the "missing" (PB point of view) string handling functions be added.



JK


PS: and maybe we should supply test code for testing on other targets, which actually runs all kinds of tests, where USTRINGs had known problems in the past, and prints an error message, if it finds an error.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 08, 2018, 03:28:31 AM
Hi everyone.  My name is Jeff, one of the developers on freebasic compiler.  Juergen had contacted me was asking some questions about this.  I only played around with the dwstring.bi class that José posted on fb.net, but I imagine the other classes you are discussing have similar quirks.  Hopefully I can help with this a little, and I thought this would be best place.

José, nice work on the string classes.

Juergen, I saw the modified source code you made for the compiler and I understand what you are attempting to do.  Very good effort, it takes dedication to find the way through a big & hairy program like fbc to get to a place you can make a change that does something you want.  Sincerely, I encourage you to keep at it.  The actual issue is earlier on in the translation, and hopefully I can explain.

Take this type for example:

type T
__ as integer
declare operator cast () byref as wstring
declare operator cast () as wstring ptr
declare operator cast () as string
end type

sub proc overload( byref w as wstring )
end sub

sub proc( byval w as wstring ptr )
end sub

sub proc( byref s as string )
end sub

dim x as T
proc( x )  '' error: ambiguous


type T represents kind of what's happening in the compiler.  We have string-like type that can be automatically converted to any other string-like type, and then call a function to work on it.

When it comes to implicit UDT conversion, fbc looks at all the possible matches (CAST operator) and ranks them from best to worst.  With an exact match of data type & constness being the best score.  However, as in this example, fbc doesn't know how to decide what the best match is because there is nothing to indicate what the preferred conversion & call should be. 

For the built-in string types, this decision is hard coded in to the logic, choosing the best string type, conversion, and function to call and ignoring the normal rules for overloaded functions & operators.

So here's what I was thinking, that a TYPE could be marked in source code (with a #pragma or some special syntax) to indicate that different rules should be used for overload resolution.  It could be as simple as the first declared CAST operator is the best choice when fbc has to automatically convert the type.  So this change would then be a more general feature applied to any type, not just wstrings, to give better control to the programmer over implicit casting (automatically done by compiler).



Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 08, 2018, 01:14:56 PM
Hi Jeff,


great to have you here in this thread. I´m going to be quite exhaustive in this post just to be sure we all are talking about the same thing...


So the changes to the compiler are correct, but you see possible problems arising form multiple CAST operators. The type uses two different Cast operators in two slightly different versions:

José s Code
PRIVATE OPERATOR CWstr.CAST () AS ANY PTR
   OPERATOR = cast(ANY PTR, m_pBuffer)
END OPERATOR
' ===========================================================
' ===========================================================
' Returns the string data (same as **).
' ===========================================================
PRIVATE OPERATOR CWstr.CAST () BYREF AS WSTRING
   OPERATOR = *cast(WSTRING PTR, m_pBuffer)
END OPERATOR     


Marc´s Code
'============================================================
' Cast implicitly DWSTR to different types.
' ============================================================
PRIVATE OPERATOR DWSTR.CAST() BYREF AS WSTRING
RETURN * m_pBuffer
END OPERATOR
' ============================================================
' ============================================================
PRIVATE OPERATOR DWSTR.CAST() AS ANY PTR
RETURN cast(ANY PTR , m_pBuffer)
END OPERATOR


to my understanding though internally using different definitions both return the same: an "any ptr" and a "byref wstring". So what is exposed to the compiler should be the same. Do you agree ?


A quick test shows that "any ptr" is necessary, because without it, it compiles, but the linker complains. Casting it only "as wstring" is not working.


Typically i add something like this to the compiler´s code in appropriate places;

'*************************************************************************
' JK - check for dws
'*************************************************************************
      if env.clopt.dws then
dim jk__zz as zstring ptr

        if (dtype = FB_DATATYPE_STRUCT) then
          jk__zz = nd_text->subtype->id.name

          if (*jk__zz = "JK_CWSTR") or (*jk__zz = "CWSTR") or (*jk__zz = "CBSTR") or (*jk__zz = "DWSTR")then
            goto do_ustring     
          end if
        end if
      end if



I added #pragma dws", which sets compiler option "dws" when and as soon as found. All new code i added for getting rid of "**" is enclosed by an "IF ... END IF" clause. If "#pragma dws" is not present in the code to compile, my changes don´t become active. Initially this was a meant to be a switch for testing the compiler with and without my code, in case there were problems with the re-compiled compiler - fortunately there weren´t any. In fact this pragma is not necessary for what i want to do, but it could be re-used for other things or be removed entirely.


I think, i understand the problem (for the compiler) you describe with multiple cast operators. When working on additional string handling functions, which should work for the new type(s) and the existing ones as well i got this error many times. But i was able to avoid it by adapting my code. And using all of this for quite some time, i didn´t experience errors (ambiguous ...), which couldn´t be fixed by tweaking the code. So i always thought, it´s my bad coding (being not that experienced in FreeBASIC), rather than a possible compiler problem. The bottomline is: i understand the problem you describe, but for me personally it didn´t occur (as of now) - the compiler seems to do it right.

It would help to have an example of failing code coming from this problem! 


I didn´t have a look at how the compiler resolves overloaded functions or operators. If there really is a problem with the new type, it must be fixed. But IMHO making this a general thing for all types is not a good idea. I would rather be notified about a problem (ambiguous ...), so i´m forced to fix it exactly the way i want it to work, instead of the compiler making "guesses" (even if these follow rules), which might result in malfunctioning code under the hood. A pragma would switch this feature on or off for all types in use (which as described above i would avoid). They only thing that would make sense to me, is having a new keyword like "DEFAULT" to mark a function or operator as the default one to take, if the compiler cannot resolve it, e.g:

PRIVATE OPERATOR (OVERLOAD) DEFAULT DWSTR.CAST() BYREF AS WSTRING

This gives individual control to the coder without making (maybe unwanted, because the coder isn´t aware of the ambiguity at all) guesses. If someone decides to code "DEFAULT", then he must be aware of it and then it is his responsibility.


Thanks for discussing all of this with us


JK



 




Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 08, 2018, 07:34:39 PM
Hey Juergen,

I think we are talking about the same thing as an end result.  Though, I think we might be worlds apart in understanding on how to get there. But, you genuinely seem to have an enthusiasm for improvement and I think we both want to see a solution here.

I have read this topic completely and thoroughly from the beginning, so I have an idea of what I'm in for =).  I hope you don't mind I make multiple posts here, and I will try to answer whatever questions you raise.

The reference implementation I have been working with, I just pushed to https://github.com/jayrm/dwstring .  I wrote that around Oct 2017.  It is not fast: memory functions are hand-made but could be optimized with platform specific calls by replacing WSTRING_ALLOC, WSTRING_FREE, WSTRING_MOVEN, WSTRING_COPYN.  The guts of it probably look a lot like Jose's dwstring.bi, with the major difference I had CONST types in mind when I wrote it.  It will have similar issues with fbc's builtin functions llike LEFT & UCASE, etc.  I stopped working on it at the time because 1) I wanted to rewrite fbc's test suite, and 2) STRING/WSTRING handling within fbc is inconsistent and that needs to be fixed.

What you want, for any implementation is a test-suite that proves all the capabilities, in a way that is independent of the thing you are testing.  Something automated that does not rely on you inspecting (visually) the output of a test program.  Either pass or fail.  You can see this in https://github.com/jayrm/dwstring/blob/master/tests.bas with the hCheckString() macro.

The challenge with a dynamic wstring type, is that there is some support for wstring already built in to fbc, but not everything, and it is currently not consistent.  If we were implementing some other kind of type, say dynamic UTF8 where there are no built in fbc functions to use, I think the actual issues might be more obvious.

I hope you don't mind I make multiple posts.  I will try to work through your last post.  Thanks.

Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 08, 2018, 07:59:16 PM
Quote from: Juergen Kuehlwein on December 08, 2018, 01:14:56 PM
So the changes to the compiler are correct, but you see possible problems arising form multiple CAST operators. The type uses two different Cast operators in two slightly different versions:

Not really, sorry.  You are in the correct place for the fbc compiler code you want to influence, but not in the correct way.  Parts of rtl-string.bas haven't been changed since year 2006, and looks like has never been updated to work with UDT's.

Specifically, for example from rtlStrLTrim()

if( dtype <> FB_DATATYPE_WCHAR ) then
f = PROCLOOKUP( STRLTRIM )
else
f = PROCLOOKUP( WSTRLTRIM )
end if

Is basically saying, if fbc didn't get a WSTRING here, then assume it's a STRING.  Most of the built in fbc string functions work this way, and it's all fine until you throw UDT's at them.  So I would say that's a bug in how fbc handles a UDT with a CAST as WSTRING PTR with some built in string functions.  Again, this part of fbc compiler code hasn't been touched in a decade.

In comparison, LEFT & RIGHT are just overloaded functions, and actually have better integration with UDT's, though you do have to overload the function to let them work.

What we really want, is that fbc knows we want to use use the WSTRING version of LTRIM long before we ever get to rtlStrLTrim.  Which means that the issue is earlier on in the translation.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 08, 2018, 08:09:04 PM
Quote from: Juergen Kuehlwein on December 08, 2018, 01:14:56 PM
to my understanding though internally using different definitions both return the same: an "any ptr" and a "byref wstring".

The CAST() as ANY PTR is irrelevant.  It actually doesn't matter for the string conversions.  What it allows is implicit cast to any ptr, and making the UDT passable to anything that would accept a pointer, which maybe makes passing a custom UDT to WINAPI functions easier, though there is a drawback.

For example, in this stripped down version:

sub procAnyPtr( byval arg as any ptr )
end sub

sub procDblPtr( byval arg as double ptr )
end sub

dim s as string
procAnyPtr( s ) '' error: type mismatch
procDblPtr( s ) '' error: type mismatch

type T
__ as integer
declare operator cast() as any ptr
end type

dim x as T
procAnyPtr( x ) '' No error - OK
procDblPtr( x ) '' No error - but probably should have


The implicit CAST() as ANY ptr, allows the UDT to be passed, without compiler error/warning to many kinds of parameters, at the cost any kind of type checking.  Possibly convenient, also possibly creating hard to find bugs.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 08, 2018, 08:36:35 PM
Quote from: Juergen Kuehlwein on December 08, 2018, 01:14:56 PM
Typically i add something like this to the compiler´s code in appropriate places;

'*************************************************************************
' JK - check for dws
'*************************************************************************
      if env.clopt.dws then
dim jk__zz as zstring ptr

        if (dtype = FB_DATATYPE_STRUCT) then
          jk__zz = nd_text->subtype->id.name

          if (*jk__zz = "JK_CWSTR") or (*jk__zz = "CWSTR") or (*jk__zz = "CBSTR") or (*jk__zz = "DWSTR")then
            goto do_ustring     
          end if
        end if
      end if


I added #pragma dws", which sets compiler option "dws" when and as soon as found. All new code i added for getting rid of "**" is enclosed by an "IF ... END IF" clause. If "#pragma dws" is not present in the code to compile, my changes don´t become active. Initially this was a meant to be a switch for testing the compiler with and without my code, in case there were problems with the re-compiled compiler - fortunately there weren´t any. In fact this pragma is not necessary for what i want to do, but it could be re-used for other things or be removed entirely.

If this were the solution, and we are talking fbc compiler internals, then the #pragma should just affect the UDT.  Similar to internal macros symbSetUDTIsUnion() where a status bit is attached to the UDT's information.  That way you only need to test that the UDT's typedef has the bit set and it is not tied to a specific UDT name.  This makes the solution more generic.  I've come across this many times when fixing bugs in the compiler, having to identify a root cause and solve it there, rather than the end use.

Bottom line is, it's better to solve the problem at the point where it is caused, rather than at the point which you notice it.  Unfortunately, the point where you notice it is usually more obvious than the point at which it is caused.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 08, 2018, 09:04:09 PM
Quote from: Juergen Kuehlwein on December 08, 2018, 01:14:56 PM
I think, i understand the problem (for the compiler) you describe with multiple cast operators. When working on additional string handling functions, which should work for the new type(s) and the existing ones as well i got this error many times. But i was able to avoid it by adapting my code. And using all of this for quite some time, i didn´t experience errors (ambiguous ...), which couldn´t be fixed by tweaking the code. So i always thought, it´s my bad coding (being not that experienced in FreeBASIC), rather than a possible compiler problem. The bottomline is: i understand the problem you describe, but for me personally it didn´t occur (as of now) - the compiler seems to do it right.

It would help to have an example of failing code coming from this problem! 

What "works" and what is "correct" are completely different.  While you may observe that LTRIM "works" with the new type, internally I would say that LEFT is more "correct".  And it all has to do with implicit CAST overload resolution.

If we are implementing the dynamic wide string type (dwstring) as a UDT, then for implicit CASTing it boils down to:

type T
    __ as integer
   declare operator CAST() as wstring ptr
   declare CAST as string
end type

because "WSTRING PTR" and "STRING" are the only 2 types that fbc really knows how to handle.  But this is what introduces the ambiguous call that fbc (currently) does not know how to resolve.

If you look at https://github.com/jayrm/dwstring/blob/master/tests.bas , it provides an example.  Let me know if you have any trouble compiling or using.  It seems to me that the pattern for PB'ers is to include everything from one file, containing the complete implementation.  I understand the benefits.  For me, there is benefit having the interface (.bi) file separate from the implementation (.bas) file.

I don't know what your tests are, but the reason you didn't encounter any errors, is that you are specifically looking at dwstring.bi (and it's variants) as the "correct" way to implement the class.  Jose has done a superb job at that, but he is also working around all the current rules and quirks of fbc, and trying to make something usable within those constraints.

I guess what I'm trying to say is that Jose's implementations are very good for the current state of fbc, but that doesn't necessarily make them "correct", if it's possible to say such a thing as "correct".  If fbc has different (hopefully better) rules, then the implementation will be different.  The rules of the compiler define the required implementation, not the other way around.  Fortunately, we are in a position to set the rules.

Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 08, 2018, 09:14:13 PM
Quote from: Juergen Kuehlwein on December 08, 2018, 01:14:56 PMI didn´t have a look at how the compiler resolves overloaded functions or operators. If there really is a problem with the new type, it must be fixed. But IMHO making this a general thing for all types is not a good idea. I would rather be notified about a problem (ambiguous ...), so i´m forced to fix it exactly the way i want it to work, instead of the compiler making "guesses" (even if these follow rules), which might result in malfunctioning code under the hood. A pragma would switch this feature on or off for all types in use (which as described above i would avoid). They only thing that would make sense to me, is having a new keyword like "DEFAULT" to mark a function or operator as the default one to take, if the compiler cannot resolve it, e.g:

PRIVATE OPERATOR (OVERLOAD) DEFAULT DWSTR.CAST() BYREF AS WSTRING

This gives individual control to the coder without making (maybe unwanted, because the coder isn´t aware of the ambiguity at all) guesses. If someone decides to code "DEFAULT", then he must be aware of it and then it is his responsibility.

Currently, fbc's #pragma's currently affect the compiler's decisions in a global way, they are not specific to context.  I am thinking that, if there is no formal way to declare the behaviour in the TYPE's syntax, then a #pragma could help control it, but it should be for the UDT specifically, and not globally.  Or use the #prgram PUSH/POP mechanism for it.

The "default" overload resolution basically follows what would happen in a C++ program.  It doesn't have to happen that way, it's just what we've chosen as a baseline, so far.

Specifically, in the fbc compiler code, we'd be looking at symb-proc.bas:symbFindCastOvlProc() and maybe symbFindClosestOvlProc().  Rather than finding all possible matches, an algorithm to find the best possible match, even if there are multiple matches that would work.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 09, 2018, 12:20:20 AM
Jeff,


a lot to read and digest, for me (Germany) it´s past midnight now - more tomorrow...


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 09, 2018, 02:48:07 PM
Jeff,


QuoteBut, you genuinely seem to have an enthusiasm for improvement and I think we both want to see a solution here.

Yes!


First of all you must know i live in Germany and i´m not a native speaker, so there is always a chance of misunderstandings and incorrect wording on my side, especially with a quite complicated matter like this.


Then you must know, that my first and main goal was having a working solution, maybe not perfect from a compiler coder´s perspective - but perfectly working. So i coded a quick (and dirty) approach in order to get working seamlessly, what we currently have. My initial intention was to keep changes as minimal as possible, because i thought a minimal approach would be more likely to be accepted and integrated into master than a generic one, which would require changing much more code.


@post #119
QuoteWhat we really want, is that fbc knows we want to use use the WSTRING version of LTRIM long before we ever get to rtlStrLTrim. Which means that the issue is earlier on in the translation.

I know and i understand that the place i applied my changes is not the correct place for a generic solution. But i intentionally coded a specific solution for these specific variations of a dynamic wide string type we have. I didn´t want to code a generic solution.

So what i coded basically works for what we have, but it doesn´t resolve inconsistencies and quirks in fbc. We are aware of these inconsistencies and as you said José sure had a hard time to find workarounds.

If you tell me you want a generic solution - fine. This opens a totally new perspective


@post #120
Maybe my wording was misleading. Of course "any ptr" and "byref wstring" are not the same, i know that. What i wanted to say was, José and Marc define the internal data buffer differently, both have two cast operators, nevertheless José´s cast operators return the same types as Marc´s (one of them returns an "any ptr" and the other one returns a "byref wstring").

Regarding "any ptr" i cannot tell where and why it is needed, but i can tell that linking fails, when you outcomment it. Maybe it is necessary for "MultiByteToWideChar" and "WideCharToMultiByte", which are called in José´s code. José is the creator and mastermind of all of this - maybe he can tell us.

As far as i know there are intrinsic ansi/wide conversion functions, maybe implementing these, would let us get rid of the need for casting to any ptr.


@post # 121
forget about that pragma thing, i don´t want it, i don´t need it. As already said, it was security and debugging thing for my personal use and maybe one more argument to convince you to accept changes in the compiler. Implementing a pragma you can just switch off my code. I didn´t expect you to want and to accept more than minimal changes to the compiler.


@post #122
As said above my first goal was to have a working solution not a "correct" one. A correct one requires much more changes to the compiler than i did - i understand that.

QuoteJose has done a superb job at that, but he is also working around all the current rules and quirks of fbc, and trying to make something usable within those constraints.

What else should he(we) have done? Left and Right can be overloaded, the others can´t.


QuoteIf we are implementing the dynamic wide string type (dwstring) as a UDT, then for implicit CASTing it boils down to:

type T
    __ as integer
   declare operator CAST() as wstring ptr
   declare CAST as string
end type

because "WSTRING PTR" and "STRING" are the only 2 types that fbc really knows how to handle.  But this is what introduces the ambiguous call that fbc (currently) does not know how to resolve.


Maybe i still don´t understand the problem, but neither José nor Marc have or need a cast as string operator. In fact you don´t need to implicitly cast the data to a string type as you do in your code. If you really want to do that (converting from wide to ansi isn´t always a lossles conversion) the compiler does it for you automatically.

e.g

dim s as string
dim u as ustring

  u = "123"
  s = u
 
print s 


works as expect with José´s CWSTR type. In my opinion because it isn´t a lossless conversion, there shouldn´t be any implicit casting form wide to ansi, there should be an error message (type mismatch, ambiguous..., whatever). This helps avoiding possible code malfunction coming from an inadverted conversion.

Please have a closer look at Jose´s string helper functions (AfxStr.inc) in his WINFBX suite, or have look at the attached file, which contains similar string handling functions built-in in my IDE. These functions work for all available string types (STRING, ZSTRING, WSTRING and USTRING) ansi to wide and wide to ansi conversions are done automatically. If you use my IDE you must add #include "ustring.inc" to be able to use the built-in dynamic wide string type (USTRING, essentially a clone of José´s CWSTR) and  the mentioned string helper functions. There is more information in the help file (FreeBASIC/...)



@post #123
Quotebut it should be for the UDT specifically

So why use a pragma for this, why not add to the syntax of types in general. In parser-proc.bas there is a function "cProcHeader" parsing a procedure´s head line. Why not parse it for "DEFAULT", and set the attrib parameter accordingly (requires to add "FB_SYMBATTRIB_DEFAULT" to FB_SYMBATTRIB enum) and check for this attribute later on, if needed for overload resolution in the functions you mentioned.

This way ("DEFAULT" as a keyword) you must have it inside the necessary definitions for a type, a pragma could be anywhere in code, and it´s easy to oversee it when debugging. Another point is, you can use #define(s) for the types name (e.g "USTRING"). The only way to make a pragma type specific i can think of, is using the type´s name. Having #define(s) then could cause confusion.

 

Well, to summarize it:

- we want to have a working and "correct" solution for dynamic wide strings
- we have a working solution (José´s and Marc´s code + my adaptions for the compiler), which can deal with the current inconsistencies in fbc. This solution is specific to the new dynamic wide string type, it doesn´t solve existing problems in fbc, it is inconsistent in itself - but it works
- you want to have a more generic approach for UDTs and strings in general fixing those quirks and inconsistencies in fbc

Do you agree so far ?

As mentioned above either i still don´t understand it or there might be a misconception on your side (prove me wrong) in that there is a need for a "cast as string" operator (which definitely causes problems). To my understanding, there is no need for such an operator.


It´s not a problem for me, if you don´t accept my changes just as they are right now - let´s do it better. Personally i would like to have three things:

- the new type should be implemented as included file (José, Marc, my clone of José´s, maybe others)
- "USTRING" should become a reserved word for a #define for the new type (so everyone can code whatever he prefers: #define ustring JK_CWSTR/CWSTR/CBSTR/DWSTR/whatever). In other words, nobody should name his version of the new type "USTRING".
- it should allow for a seamless integration into fbc (being able to use the same syntax just like with other string types) and (if possible) it shouldn´t break already existing code, which implements José´s or Marc´s code.


Maybe it is possible to get rid of the cast as any ptr operator, then there would be only one cast operator left, so no chance for ambiguities anymore.


JK



PS: José could you please explain (possibly once more) what for is the cast as any ptr operator needed.
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 09, 2018, 03:27:58 PM
> PS: José could you please explain (possibly once more) what for is the cast as any ptr operator needed.

To be able to use a CWSTR directly with some Windows API functions without having to use casting, e.g.


' // Writing to a file
DIM cwsFilename AS CWSTR = "тест.txt"
DIM cwsText AS CWSTR = "Дмитрий Дмитриевич Шостакович"
DIM hFile AS HANDLE = CreateFileW(cwsFilename, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL)
IF hFile THEN
   DIM dwBytesWritten AS DWORD
   DIM bSuccess AS LONG = WriteFile(hFile, cwsText, LEN(cwsText) * 2, @dwBytesWritten, NULL)
   CloseHandle(hFile)
END IF


In the Windows API there are parameters declared as LPCWSTR, LPCVOID, LPBYTE, WCHAR PTR, etc., and some may cause an error or a warning if you pass a WSTRING PTR instead of ANY PTR.

Title: Re: FreeBASIC CWstr
Post by: José Roca on December 09, 2018, 04:06:20 PM
BTW I also don't use * and ** instead of only * by caprice. I needed to have the equivalents to VARPTR and STRPTR. I first used & to emulate VARPTR, but then I could not use & to get the address of the class, so I removed it. You may like or not my workaround, but it works. Also, being a COM programmer, I'm comforable using double indirection. For those not comfortable with it, they can use the vptr and sptr methods.

For example, in this code:


DIM cwsFilename AS CWSTR = "тест.txt"
DIM cwsText AS CWSTR = "Дмитрий Дмитриевич Шостакович"
DIM hFile AS HANDLE = CreateFileW(cwsFilename, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL)
IF hFile THEN
   DIM dwBytesWritten AS DWORD
   DIM bSuccess AS LONG = WriteFile(hFile, cwsText, LEN(cwsText) * 2, @dwBytesWritten, NULL)
   CloseHandle(hFile)
END IF

hFile = CreateFileW(cwsFilename, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL)
IF hFile THEN
   DIM dwFileSize AS DWORD = GetFileSize(hFile, NULL)
   IF dwFileSize THEN
      DIM cwsOut AS CWSTR = WSPACE(dwFileSize \ 2)
      DIM bSuccess AS LONG = ReadFile(hFile, *cwsOut, dwFileSize, NULL, NULL)
      CloseHandle(hFile)
      PRINT cwsOut
   END IF
END IF


I'm using *cwsOut with ReadFile.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 09, 2018, 04:52:55 PM
Thanks José,


so the cast to any ptr operator could basically be dropped, if Jeff still sees a problem here. You added it for convenience, but without it explicit casting to the required type would work as well  - no other effects, than being not so convenient.

Is this correct ?



QuoteBTW I also don't use * and ** instead of only * by caprice. I needed to have the equivalents to VARPTR and STRPTR

I think we all know and understand that. The good thing about changing the compiler is, that indirection is not necessary anymore and the other good thing is, that you can still use indirection, if you want. You can have it both ways then


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 09, 2018, 06:21:24 PM
> Is this correct ?

Yes, it will be called only when the target is a pointer other than a wstring pointer. If you remove it, then you will need to use casting in some cases.

I will be very pleased if support for dynamic unicode strings is implemented in the compiler and more still if they also add support for BSTRings.

Every time that somebody asked for dynamic unicode string support, the replies were that it was unlikely to happen, so I decided to write my own class. The first one that I wrote was CBSTR, that deals with BSTRings. Then, to try to make it to work faster, I started to write a string builder class. I used an UBYTE PTR instead of a WSTRING PTR because it was intended to work with ansi and unicode. Then I decided to relegate CBSTR for use with COM and convert the string builder into a class to work with dynamic null terminated strings.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 09, 2018, 08:41:23 PM
Ok José,


i tried removing the cast as any ptr operator, but this breaks a lot of the string helper functions (essentially the same as yours with some variations) in that now "**" is needed in places, it wasn´t necessary before - even with the adapted compiler version. So this cast as any ptr operator is maybe used in more places than we think right now.

Having a cast as any operator let´s me get away with my fairly simple changes to the compiler, removing it breaks integration in places where it worked before. So integrating CWSTR into the compiler seems to work mostly because of this operator.


Jeff,

is that true, and is it that, what is giving you headaches, with the current version of the new type and my changes to the compiler ? Without a cast as any ptr operator, the only cast operator left is cast as byref wstring, and then it needs different coding to get it to work. Is it this, what you consider as the "correct" way ?

Regarding the intrinsic string functions ("TRIM", MID", etc.) "as any" is the easy way, but there is no type checking at all and you could literally pass anything, even wrong types introducing hard to find bugs. You want better type checking there, so that only "approved" data is passed, therefore the cast as any operator should be removed or type checking shouldn´t let pass as any pointers - right ?


JK


Title: Re: FreeBASIC CWstr
Post by: José Roca on December 09, 2018, 09:17:37 PM
Quote
i tried removing the cast as any ptr operator, but this breaks a lot of the string helper functions (essentially the same as yours with some variations) in that now "**" is needed in places, it wasn´t necessary before - even with the adapted compiler version. So this cast as any ptr operator is maybe used in more places than we think right now.

I did mean that you can remove it from the class and the class will still work, not that it is not needed by my framework. If you remove it, you will have to modify some of the code. How much? I don't know and I don't care, because I'm not going to remove it.

Title: Re: FreeBASIC CWstr
Post by: Marc Pons on December 10, 2018, 05:06:37 PM
@Juergen
you are reactiving my interrest on that subject ( after 2 years)
My new DWSTR  include file is on github https://github.com/marpon/DWSTR (https://github.com/marpon/DWSTR)
i have cleaned some mistakes from it

I've also done some simple tests with linux64, it seems working but not exhaustive tests done.
(i use linux not very often) and did not notice a real need for linux, coders have the opportunity to use UTF8 ...


@Jeff
happy to see you on that topic
i think it is only 2 solutions to include dynamic wide strings into FreeBasic

first option ,
           add a new type accepted by the parser as   dynamic wide string, with  all the code to work with included into compiler
           it is my prefered option for sure

second option,
          make minimal tweaks to help external codes to work in simplest way as today
          that's what Juergen is providing, i understand it's not as "correct" as the first option
          but with few tweaks on the native string handling functions it can make the deal (mainly for umbigous pointers)

@José
QuoteYes, it will be called only when the target is a pointer other than a wstring pointer. If you remove it, then you will need to use casting in some cases.
i aggree with you, cast to any ptr simplilies way of doing in various case, what interest avoiding the ** or * for some cases if we have to explicitly cast in other cases?
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 13, 2018, 03:03:44 AM
@Juergen, no, the cast as any ptr is not giving me headaches.  What I think of it or my personal preference, is irrelevant.  It's available as a choice for implementing the class, and the designer can make whatever choices they want.  I did look at many of the classes in WinFBX, and the use of '* operator' is used in a very consistent way to get a pointer to the underlying type.  And '**' obviously dereferences the pointer.  This pattern would work nicely for a variety of underlying types, not just wstring ptr.

Yes, I am looking for a solution that is a more generic than what you are proposing.  Swapping the data type based on the specific TYPE name at the last moment just before calling intrinsic string ignores many other issues related to UDT's, conversions, string/wstring disambiguation.  But if you really think this is the solution, then keep at it, create a pull request, and get the feedback from the other developers also.

I re-read your idea about the "DEFAULT" specifier and maybe something like that could work, with UDT's in general, not just [w]string's.  Allowing the programmer to tell the compiler what the preferred implicit conversion should be, and not be so strict about type matching.  Again, it will take time to develop.

@Marc, thanks,

first option, having built-in dynamic wide string is likely where we are headed.  But José is correct, the typical response is that it is unlikely to be done.  I predict it would take me about 4 months to add built in dwstring, if I work on nothing else but that.  We know the work involved tend to not promise something that is going to get delivered soon.

second option, what Juergen is providing is just not there yet, in my opinion.  It needs more.  Believe me, the quick and dirty fixes in the compiler end up being 10 year old bugs, or features that are now so difficult to implement they never get started.

So I don't know, I wish I had better news.  If it's just me, next 1.06 release probably will take about 2 months to get out, and updating bindings probably another 3 months, and maybe dwstrings after that for 4 months.

third option
In the meantime, I don't know, maybe just use the framework as-is, because that's the best option immediately available.  José has been generous enough to share in the hopes that it is useful.

I appreciate you guys welcoming me here and the discussion.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 13, 2018, 10:45:44 PM
Jeff,


QuoteI appreciate you guys welcoming me here and the discussion.

ditto!



It´s amazing what José did, isn´t it ?


QuoteAnd '**' obviously dereferences the pointer.  This pattern would work nicely for a variety of underlying types, not just wstring ptr.

you need "**" everytime, where automatic casting (cast byref as wstring, in our case) doesn´t work. Apart from this you can use it like a native variable type. So maybe a clean solution would be to fix this - make implicit casting work everywhwere. This is, what you would prefer.

But i´m asking myself, is this really necessary everywhere and does it make sense? Coming for PowerBASIC with an Assembler background, initially i considered the extensive type casting needed in FreeBASIC a burden, e.g in PowerBASIC wparam and lparam of the SendMessage API are definded as LONG and you can pass nearly anything of long size to it without getting complains of the compiler. In Assembler there is essentially no type casting needed at all. In FreeBASIC i must cast even a LONG to wparam/lparam respectively. Ok - this language doesn´t take a relaxed stance on type casting, which enforces coding discipline and helps avoiding errors (to name to pros).

But why then would it be desirable to have automatic (implicit) casting from UDTs to other types, when it is possible to cast or convert explicitly and when casting is enforced everywhere else? This is inconsistent IMHO (if you don´t reduce the need for type casting everywhere else).


In our case we have a working extension of dynamic zero terminated wide strings closing an existing shortcoming in FreeBASIC. And we want to integrate it seamlessly into the compiler, which is a special case of a class-like UDT. We want to able to use just like all the other already existing string types. In all other cases i can live with (i would even prefer) the need of having to code castings or conversions.


I think initially nobody thought of the possibility of creating a string type the way José did it, therefore (and maybe, but this is speculation, on purpose) the code doesn´t account for UDTs in this case. I wouldn´t call it a bug. It´s just something nobody thought of being possible or even necessary. Looking at the compiler´s code i see some places, where this (automatic casting) already works, and i see many places, where it cannot work, because of how it is coded.

So i see three options:

- make the compiler work seamlessly with what we have
- make it work in general (re-work UDT type casting everywhere)
- make all statements overload-able like "LEFT" an others.


QuoteBut if you really think this is the solution

I don´t think it is the solution, but it is a solution (the best one we have so far).


QuoteSwapping the data type based on the specific TYPE name

I explicitly check for UDT as datatype and i check for the name ("jk_cwstr, dwstring, cwstr, cbstr") of this udt, all other UDTs are rejected. Checking for the type´s name, this is what the compiler does for all other variable types too (e.g. "LONG"). "Long" is a reserved word as everybody knows, so where is the problem ? Establish "USTRING" (and the underlying type names) as reserved words for dynamic wide strings and everything will be fine.


My approach is a quick and dirty fix - yes. But as a first step it would allow using José´s dynamic wide strings just like native variable types (without the need for the otherwise extremly clever "**" workaround construct). The compiler can evolve (just like everthing else). Including my changes into the next release, doesn´t mean accepting it forever as it is, but accepting it, until we have something better. It doesn´t establish anything, which cannot be evolved any further and it doesn´t establish in what direction this evolution would have to go.

You could announce it as what it is: a first step, a still pragma isolated part (if we want to leave it as it is) of the compiler for those who want to use it. It doesn´t affect all others.


I´m confident to get this type running for Linux too, so it would really add value for all users not only the Windows fraction.



In the meantime i did a lot of tests with my changes and there will be code for thoroughly testing every aspect of the new type and it´s integration into the compiler. I hope to be able to present code and test code soon.


I cannot guarantee for an exact schedule, because i caught a cold this week and don´t feel very well, so i don´t know how much progress i can make in the next days. As soon as i´m ready, i will post here what i have. Then there must be tests for LINUX (i already have someone for this, but of course Jeff you are invited to test also). When everything is ok, i will push it to the repo and create a pull request.


JK

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 16, 2018, 03:58:20 PM
@Marc,


Quotei have cleaned some mistakes from it

please could you explain which mistakes and what was the problem - thanks. I plan to take your version as a basis for a universal ustring.inc, while José´s WINFBX should be a (preferred) Windows alternative.


@all,


here is, what i have right now - it is work in progress. I removed everything from José´s version, which isn´t absolutely necessary and named it "ustring.inc", the pragma now is"ustring" too, i added it to "ustring.inc".

There is a #define (#define ustring jk.DWSTR) making it work with "USTRING" as data type for dynamic wide strings. José could add a similar #define (#define ustring afx.CWSTR) to make his framework work with "USTRING" as data type for dynamic wide strings as well. So it is possible to have the same code for different implementations, if you use "USTRING" as data type (dim u as ustring ...). Of course the original name(s) (afx.cwstr, jk.dwstr, etc. as hardcoded into the compiler) may be used without any restrictions. So nobody (especially José) will have to change existing code.

I had to make two decisive changes to José´s code:
- in OPERATOR DWSTR.[] the line "nIndex -= 1" must be removed to make it consistent with wstring, otherwise u[1] <> w[1]
- m_pBuffer must be the first member variable of the type in order to make "STRPTR" work for ustrings

Please test (maybe add own tests to test.bas) and try to find all the bugs i missed so far, obviously this is for Windows only so far. The compiler should work without having to prepend "**" to ustrings anymore.


@Jeff,


as i´m fairly new to GIT, i need your help. Initially i cloned from the GIT repository without creating a fork, then i created my own branch on my local repo. You wanted it on an extra fork at GitHub - how should i proceed ?

Looking at the code, where the decision is made, which overloaded procedure should be taken, i see the compiler just throws an error (ambiguous ...) if it finds more than one possibly matching procedure. You want it to take the best matching or (if given) the "default" (see previous discussion) procedure. And you only want an error message, if there is no matching procedure at all - right?


Attached: compiler.zip (the compiler´s code + ustring.zip, which includes ustring.inc and test.bas)


Title: Re: FreeBASIC CWstr
Post by: José Roca on December 16, 2018, 05:02:49 PM
Quote
- in OPERATOR DWSTR.[] the line "nIndex -= 1" must be removed to make it consistent with wstring, otherwise u[1] <> w[1]

You're wrong. FreeBsic's [] operator is zero based.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 16, 2018, 07:17:01 PM
José,


please run this code (for console):

#include "afx\afxstr.inc"


dim i as integer
dim u as cwstr = "1234"
dim w as wstring * 16 = "1234"


  for i as integer = 1 to len(u)
    print i
    print u[i]
    print w[i]

    if(u[i] <> w[i]) then
      print "error"
      exit for
    end if
  next i


  sleep


end



it fails on my IDE as well as on Paul´s. If you remove or outcomment the line i mentioned - it runs. Please re-check.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 16, 2018, 07:29:46 PM
Your code is wrong. Try:


'#CONSOLE ON
#define UNICODE
#INCLUDE ONCE "windows.bi"
#INCLUDE ONCE "Afx/cwstr.inc"

dim u as cwstr = "1234"
dim w as wstring * 16 = "1234"

for i as integer = 0 to len(w) - 1
   print i
   print chr(u[i]); "***"
   print chr(w[i]); "---"
next

PRINT
PRINT "Press any key..."
SLEEP

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 16, 2018, 09:34:04 PM
Sorry José,


this what i get with your code, see Jose.png

when i change one line of it to this: print chr(u[i+1]); "***", it delivers what i would call the expected result, see JK.png


i used paul´s IDE for both and this is the code i have from WINFBX:

PRIVATE OPERATOR CWStr.[] (BYVAL nIndex AS UINT) AS USHORT
   IF nIndex < 1 OR nIndex > m_BufferLen \ 2 THEN EXIT OPERATOR
   ' Get the numeric character code at position nIndex
   nIndex -= 1
   OPERATOR = PEEK(USHORT, m_pBuffer + (nIndex * 2))
END OPERATOR


So for nIndex = 0, it returns nothing - as you can see in the screenshot...


This is what i use:

PRIVATE OPERATOR DWSTR.[] (BYVAL nIndex AS ulong) AS USHORT         
'***********************************************************************************************
' Returns the corresponding ASCII or Unicode integer representation of the character at
' the position specified by the nIndex parameter. allows to use the [] syntax, e.g. value = cws[1].
' Can't be used to change a value!
'***********************************************************************************************
  IF nIndex > m_BufferLen \ 2 THEN EXIT OPERATOR
  OPERATOR = PEEK(USHORT, m_pBuffer + (nIndex * 2))
END OPERATOR



Please tell me, what´s wrong


Attached are two screenshots of the two runs with different code (marked bold above)


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 16, 2018, 09:47:43 PM
José,


put the blame on me! I just downloaded your latest version of WINFBX and there the offending line has been removed. My version of your WINFBX i used for testing was outdated. So we have the same opinion here - it´s zerobased.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 16, 2018, 10:16:07 PM
It was changed several months ago.
See: https://www.planetsquires.com/protect/forum/index.php?topic=4167.msg31799#msg31799
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 18, 2018, 03:45:54 PM
@Jeff,


forget the question about GIT, i think i managed it myself. My fork is at: https://github.com/jklwn/fbc, the latest code is in "JK-USTRING" branch. I didn´t make a pull request yet, because i want to be sure, that bugs have been found and fixed as far as possible before.


@all,


no bugs ? I´m going to add some more tests and string handling functions to ustring.inc. Next i will try to get it running for LINUX and then will be the time for a pull request, i think.



JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 20, 2018, 03:02:05 AM
Hey Juergen, that's correct for git, putting your changes on a separate branch.

I'm probably not going to be able to keep up with your posts. =)

So in earlier posts, I was talking about a variety of issues, and I know, some are not specific to what you are focused on.  Basically, we don't want to break user code now or later; sometimes it happens, and if it does, we try to have a good reason.  It's OK to have a temporary fix, but we should at least have an idea where the end goal is in the future, and that the goal is a possibility, even if it doesn't get done right away.  Really just leaving options open for ourselves for the eventual compiler feature or fix, whatever that is.   And hopefully we don't need to break user code now or in future.  The 3 issues I think are, adding the built-in dynamic wstring type, fixing UDT implicit casting for built in [w]string rtlib functions, solving ambiguous implicit UDT casting in general.  All pretty much independent of each other for implementation, but have to work together eventually in user code.

---

I only briefly looked at your branch:

You should generalize your test:

     if env.clopt.ustring then
       if (*jk__zz = "JK_CWSTR") or (*jk__zz = "CWSTR") or (*jk__zz = "CBSTR") or (*jk__zz = "DWSTR")then

to something like if( astMaybeUDTisaWstring( expr ) ) then or something similar, then just modify the AST expr.  Avoid the GOTO's, some of them are skipping over DIM statements.

Better would be to use #pragma push/pop, and attach the "UDT-is-a-Wstring" flag to the UDT's type-def.  That way any UDT NAME could work as a "wstring" for any UDT, now or future, other users can write their own classes if they want.  And if the type is declared in a namespace, or is a define, or a type alias, etc, it won't matter because the information that the UDT should be handled as a wstring is attached to the UDT typedef itself.

There seems to be a few extra files, not sure what's going on there in the ./contrib directory, the .jkp file, uppercaseing filenames in .gitignore.  If you really mean to make changes to build process, etc, that should be a separate pull request.  In otherwords, just include what is needed for the change you are proposing.

Writing tests means adding automated tests to the ./tests directory in the test-suite.

Sorry, that's all I have time for just now. =)
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 20, 2018, 04:12:21 PM
Jeff,


regarding GIT, i (at least intentionally) didn´t change anything else than some files in the compiler folder. The .jkp file is the project file for my IDE i obviously forgot to add to the ignore list. So please turn down all other changes.


QuoteAnd hopefully we don't need to break user code now or in future.
I absolutely agree that this should not happen, this is one reason why i want to be able to use the already existing syntax for the new dynamic wide string type - make (keep) it consistent.


I also agree on your listing of issues. Currently we have #1 solved externally, having it built-in would require a lot of extra work. #2 works specifically for the class names we have, generalizing it would be a next possible step. I could care for #3, if you want, if all we have now is tested and implemented.

 
Quotethen just modify the AST expr.
I actually don´t change the expr, i just redirect the code flow, based on the fact that one of our UDTs is recognized (by it´s name for now). The runtime library functions can deal with it without any changes. But in case of an UDT the code flow never reaches them.


QuoteAvoid the GOTO's, some of them are skipping over DIM statements.
A compiler is a very complicated piece of software, and as long as i don´t understand in depth, what happens, and why exactly it happens, i´m extremely careful with changes and try to isolate my new code as much as possible. In other words i know that "goto" is often considered to be bad coding style. But the alternative would be to make major changes in code structure and thus potentially breaking things, i even don´t know of. I took care that skipping DIM statements is safe, in that the skipped variables are not needed anymore in places i´m jumping to.


QuoteBetter would be to use #pragma push/pop
I understand the concept of #pragma, but what for is #pragma push/pop? As already explained above, it isn´t really necessary, i currently use it to isolate my new code from what´s already there. So, just in case there is unexpected behavior of the compiler, i have a means of switching my changes off, and i can test if this behavior persists or not. It´s more or less a debugging thing.


Quoteattach the "UDT-is-a-Wstring" flag to the UDT's type-def
Which would require some new syntax for specifying this flag. Why not just check for "can-cast-as-wstring", which would have the same effect without the need for any syntax change/addition.


For a schedule i would propose these steps:

1.) make sure, that, what we have now, really works FLAWLESSLY for Windows, LINUX and maybe other targets, which can deal with wide strings.

2.) integrate this into the next official release, maybe with #2 of your list, maybe without it. This is a timing thing: when will be the next release, and how much time will it need to get #2 tested and implemented?

3.) solve #2 and #3 of your list.

4.) make USTRING a built-in type (#1 of your list). Personally i can live with fact of having to include an extra file for dynamic wide strings, so it is last in my personal priority and my schedule. What do you think?


All of this doesn´t establish anything which breaks existing code or sets standards we must keep later on except for one single new reseved word: "USTRING" as new name for a dynamic wide string type.


I´m a great fan of doing things step by step (especially in coding and especially when applying changes to an otherwise working software), make sure that the current step really works before taking the next step. Everything else will lead to a big mess sooner or later.


JK

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 23, 2018, 03:25:15 PM
Just made a push to my fork at: https://github.com/jklwn/fbc

This is still work in progress, so the code doesn´t look very tidy in various places. As far as i can tell, there seem to be no errors anymore. I did tests on 32 bit and 64 bit with a test app (test.bas - in attached ustring.zip) i wrote.

- ustrings now work with all fb intrinsic string functions without the need for "**"
- "strptr" returns a zstring ptr for zstring and string, and it now returns a wstring ptr for wstring and ustring
  (this requires a small change to José s code - see below)
- i added string (helper) functions (included in ustring.inc) similar to those available in PowerBASIC, this is based on José´s work in WINFBX, but i made it work with the same syntax as in PB. E.g. u2 = Extract_(u, any u1), u1 = Pathname_(path, u)


Changing the behavior of "strptr" shouldn´t break existing code: previously strptr returned a zstring ptr for a wstring, which was in effect useless unless casted to a different ptr. Now strptr returns a wstring ptr, and casting a wstring ptr to another ptr (or itself) doesn´t raise problems - so existing code should run anyway.

To get this to work with José´s code a small change is necessary: for "strptr" the compiler relies on the fact that for the "string" type (which is an UDT internally too) the data buffer is the first member variable. So for ustring i changed it like this:

TYPE CWSTR
    m_pBuffer AS UBYTE PTR        ' Pointer to the buffer, moved in first place

   Private:
      m_Capacity AS UINT            ' The total size of the buffer
      m_GrowSize AS LONG = 260 * 2  ' How much to grow the buffer by when required

   Public:
'      m_pBuffer AS UBYTE PTR        ' Pointer to the buffer (removed here)
      m_BufferLen AS UINT           ' Length in bytes of the current string in the buffer




José, do you see any problems here ? As m_pBuffer is public anyway, moving it in first position shouldn´t change or break anything else. "Strptr" doesn´t work for your CBstr yet, but i think i could fix this too. This is still somehow experimental, but would you change it for a final (release) version of all of this ?


I think there is a problem with the overloaded functions "CLNGINT" and "CULNGINT", please run this code with your code and then again use mine (ustring.inc):


  dim w as wstring * 50 = wstr(70000000001)
  dim u as ustring = wstr(70000000001)

  print "Test: CLNGINT"
  print "  u: -" & CLNGINT(u) & "  -  " & u
  print "  w: -" & CLNGINT(w) & "  -  " & w

  if CLNGINT(w) <> CLNGINT(u) then
    print
    Print "--- ERROR ---"
  else
    print
    print "---  OK  ---"
  end if


  print
  print "Test: CULNGINT"
  print "  u: -" & CULNGINT(u) & "  -  " & u
  print "  w: -" & CULNGINT(w) & "  -  " & w

  if CULNGINT(w) <> CULNGINT(u) then
    print
    Print "--- ERROR ---"
  else
    print
    print "---  OK  ---"
  end if


  sleep


end




I might be offline for a few days - Merry Christmas to all !


JK



PS:
@Jeff,

Quotefixing UDT implicit casting for built in [w]string rtlib functions

I think i have a quite simple solution for this, just have a look at "Parser-Compound-Select.bas", line 119. We could make can-cast-to-(w)string functions out of it and insert tests, in all places i currently check for the UDT´s name. This would be a generic approach then.
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 23, 2018, 07:14:19 PM
Quote
José, do you see any problems here ? As m_pBuffer is public anyway, moving it in first position shouldn´t change or break anything else. "Strptr" doesn´t work for your CBstr yet, but i think i could fix this too. This is still somehow experimental, but would you change it for a final (release) version of all of this ?

I don't see any need to add support for STRPTR. Why to use the verbose STRPTR(cws) if you can use *cws, or simply cws if you're passing it to a procedure? Besides, I don't think that the FB team is going to accept a change that uses a hack.

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 27, 2018, 12:24:16 PM
Hi José,


I primarily wanted to know, if you see technical problems (codewise) with that change. You are the expert for your work and you proved me wrong more than one time in the past - so i don´t see a problem here, but nevertheless i´m just asking, if you see a problem.


I didn´t ask so much, if you actually would want to apply this change, but let´s discuss this topic too.


This is the way i see it:

I´m not working against you! I want to make it easier for other people to benefit from your work. José, your WINFBX is outstanding, and you did your very best to integrate it with the compiler. My goal is to adapt the compiler for a seamless integration without breaking your work.

And i know, you don´t need it for yourself, but if you want to have other people using your work, it should integrate seamlessly, which it currently doesn´t. This is not a shortcoming of your work, but a problem with the compiler. You found workarounds doing the job perfectly well, but nevertheless people will find workarounds difficult and discouraging. So in order to encourage people a seamless integration would be big step forward.

The more, you told me that you are not interested in Linux, but others are. They could benefit from your work as well, at least for the dynamic wide string part.

So the idea (as long as we don´t have an intrinsic dynamic wide string type) is to have an include file (ustring.inc) for this. This include file enables dynamic wide strings in Windows and Linux (and maybe others). Obviously your Framework works only for Windows. In Windows adding "ustring.inc" adds it´s dynamic wide string type, unless your CWstr is included. In other words: "ustring.inc" makes your type (CWstr) the standard dynamic wide string, if present. If CWstr is not present for whatever reason, it offers a fallback for a dynamic wide string. For Linux this fallback would be standard then.


Of course there is no need to use "strptr" (you supplied an even shorter workaround) and nobody forces you or everyone else to do so. But the task here is to make it consistent, so not being able to use "strptr" just like with the other (instrinsic) string types, would break consistency. Maybe there are other´s wanting to use strptr, because it´s the regular way for getting a pointer to the string´s data. FB´s intrinsic "STRING" dynamic string type can do that!

And in fact, what i do is not a hack, it is exactly how the compiler does it for it´s intrinsic dynamic string ("STRING") type, which internally is an UDT too. This UDT holds 3 members: the first one is a pointer to the data (just like m_pBuffer in CWstr), the second one holds the current length of the data, and the third one holds the the maximum size of the current buffer.

Your layout is similar but different in order. By making m_pBuffer the first member variable of CWstr you get the same layout in that the first member variable points to the data. So in fact a pointer to the type is a pointer to the pointer of the data. You only need to dereference the pointer to the type in order to get a pointer to the data. This means the same code in the compiler can be used to get a valid pointer to the data of CWstr (aka "strptr"). This simple change (and of course some small changes in the codeflow, to account for an UDT being passed to it) enable the compiler to work with it, just like with an intrinsic string type.

The very best of it is: It doesn´t cost nobody anything! It doesn´t break existing code, but it makes "Strptr" possible for CWstr without a hack!

Besides i think you didn´t write code, which depends on the order of members in CWstr, you always use operators or functions to access tha data, that´s what they are for. You always use it´s name, you never use the offset for retrieving the data pointer. So far for the theory, in practice, as far as i can tell, moving m_pBuffer into first position has no side effect on my applications depending on your WINFBX (amongst others a 32/64bit debugger for FB).


Why would you refuse to apply a change that doesn´t cost you anything, but will be a gain for others ? Please think about it. Of course such a change would become necessary only, if the FB team accepted my compiler changes - but this is my problem.


Did you run the test code i posted (CLNGINT ...)? When run with CWstr there is a difference in result between WSTRING and CWstr.


In the meantime i made "ustring.inc" independent of my IDE and independent of additional include files. The current state of the compiler is at: https://github.com/jklwn/fbc, and attached is what i currently have for "ustring.inc" + test code


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 27, 2018, 04:16:35 PM
I'm not refusing to apply that change in my header. What I'm saying is that I don't think that the PB team will accept your hack, but I may be wrong.

> Did you run the test code i posted (CLNGINT ...)? When run with CWstr there is a difference in result between WSTRING and CWstr.

An also if you use ustring. They will work correctly if you use **.

> The more, you told me that you are not interested in Linux, but others are. They could benefit from your work as well, at least for the dynamic wide string part.

You may need to change the code because on Linux wstrings are encoded in UCS-4 and a character takes up 4 bytes.

> but nevertheless people will find workarounds difficult and discouraging

In fact, there are only a couple of Chinese guys that use CWSTR, and these doesn't seem to be easily discouraged. Most of the other users use STRING and rely on automatic conversions.

My goal was to provide a framework that works both with ansi and unicode strings without having to provide "A" and "W" versions.

To guys wanting to work with unicode with FreeBasic, dynamic strings aren't its only problem. None of the FB intrinsic functions that deal with files accept and unicode string for the file name. My framework includes several classes to deal with files.

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 27, 2018, 06:02:24 PM
QuoteAn also if you use ustring. They will work correctly if you use **.

When using the new compiler version, i get  a correct result for ustring and i get a correct result for **CWstr, but the result is wrong for CWstr without "**"! Why is that?

Prepending "**" makes the compiler see a wstring while it is in fact an UDT. For a wstring it uses an internal wstring conversion function, for an UDT is must use an overloaded function. These overloaded functions differ.

ustring.inc:

PRIVATE FUNCTION Valint OVERLOAD (BYREF cws AS jk.DWSTR) AS long
   RETURN .VALINT(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION

PRIVATE FUNCTION ValLNG OVERLOAD (BYREF cws AS jk.DWSTR) AS longint
   RETURN .VALLNG(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION

PRIVATE FUNCTION ValUint OVERLOAD (BYREF cws as jk.DWSTR) AS ulong
   RETURN .VALUINT(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION

PRIVATE FUNCTION ValULNG OVERLOAD (BYREF cws AS jk.DWSTR) AS ulongint
   RETURN .VALULNG(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION



CWstr.inc


' =====================================================================================
' Converts the string to a 32bit integer
' =====================================================================================
PRIVATE FUNCTION ValLng OVERLOAD (BYREF cws AS CWSTR) AS LONG
   RETURN .ValLng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================
' =====================================================================================
PRIVATE FUNCTION ValInt OVERLOAD (BYREF cws AS CWSTR) AS LONG
   RETURN .ValInt(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================

' =====================================================================================
' Converts the string to an unsigned 32bit integer
' =====================================================================================
PRIVATE FUNCTION ValULng OVERLOAD (BYREF cws AS CWSTR) AS ULONG
   RETURN .ValULng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================
PRIVATE FUNCTION ValUInt OVERLOAD (BYREF cws AS CWSTR) AS ULONG
   RETURN .ValUInt(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================

' =====================================================================================
' Converts the string to a 64bit integer
' =====================================================================================
PRIVATE FUNCTION ValLongInt OVERLOAD (BYREF cws AS CWSTR) AS LONGINT
   RETURN .ValLng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================

' =====================================================================================
' Converts the string to an unsigned 64bit integer
' =====================================================================================
PRIVATE FUNCTION ValULongInt OVERLOAD (BYREF cws AS CWSTR) AS ULONGINT
   RETURN .ValULng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION
' =====================================================================================



in fact the compiler implements these overloaded functions for the corresponding "c.." conversion functions for UDTs. As long as you prepend "**" the compiler sees a wstring and everything will be fine, because it is masked - the overloaded functions will never be used.

In the new compiler it is passed as what it is, an UDT, then the overloaded functions come into play and the difference becomes obvious.

According to the help file "VAL(U)LNG" should return a "(U)LONGINT". Please re-check this.


...and yes, there still was an error in "test.bas" not revealing this for "VAL(U)LNG as well. The number was not big enough to show the difference it should have been "wstr(12345678900)" to catch it.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 27, 2018, 08:31:54 PM
I get correct results if I only include "ustring.inc", but wrong results if I also include "cwstr.inc".


'#CONSOLE ON
#define UNICODE
#INCLUDE ONCE "windows.bi"
'#INCLUDE ONCE "Afx/cwstr.inc"    ' Wrong results if you unrem it
#INCLUDE ONCE "ustring.inc"

  dim w as wstring * 50 = wstr(70000000001)
  dim u as ustring = wstr(70000000001)

  print "Test: CLNGINT"
  print "  u: -" & CLNGINT(u) & "  -  " & u
  print "  w: -" & CLNGINT(w) & "  -  " & w

  if CLNGINT(w) <> CLNGINT(u) then
    print
    Print "--- ERROR ---"
  else
    print
    print "---  OK  ---"
  end if


  print
  print "Test: CULNGINT"
  print "  u: -" & CULNGINT(u) & "  -  " & u
  print "  w: -" & CULNGINT(w) & "  -  " & w

  if CULNGINT(w) <> CULNGINT(u) then
    print
    Print "--- ERROR ---"
  else
    print
    print "---  OK  ---"
  end if


PRINT
PRINT "Press any key..."
SLEEP

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 28, 2018, 11:45:20 AM
Ok!


Now (still using the new compiler) when you change the code in CWstr.inc like this:

PRIVATE FUNCTION ValLng OVERLOAD (BYREF cws AS CWSTR) AS LONGINT
   RETURN .ValLng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION

PRIVATE FUNCTION ValULng OVERLOAD (BYREF cws AS CWSTR) AS ULONGINT
   RETURN .ValULng(*cast(WSTRING PTR, cws.m_pBuffer))
END FUNCTION

- it delivers correct results.


Please add a debug message to both of these functions and run it with "u" and with "**u", and with "AS LONG(INT)" and "AS ULONG(INT)" respectively. You may change "C(U)LNGINT" to "VAL(U)LNG" in the test code and see, what happens then

The naming of these overloadable functions (VALLNG, VALULNG) is unfortunate, if not misileading...


JK



Title: Re: FreeBASIC CWstr
Post by: José Roca on December 28, 2018, 03:52:22 PM
Good catch! I have modified these functions.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 29, 2018, 10:50:33 AM
José,


yet another thing - when you change the overloaded [] operator to:


PRIVATE OPERATOR DWSTR.[] (BYVAL nIndex AS ulong) byref AS USHORT
'***********************************************************************************************
' Returns the corresponding ASCII or Unicode integer representation of the character at the
' zerobased position specified by the nIndex parameter. Can be used to change a value too.
'***********************************************************************************************
static zero as ushort                                 'fallback for nIndex outside valid data

  IF nIndex > (m_BufferLen \ 2) - 1 THEN
    zero = 0
    OPERATOR = zero                                   'return 0
    exit operator
  end if
 
  OPERATOR = *cast(USHORT ptr, m_pBuffer + (nIndex * 2))
END OPERATOR



- it works both ways. That is, you can not only get a character value, but you also can set one! Please check.


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 29, 2018, 03:08:10 PM
Yes, it works. It can also be implemented for CBSTR:


' ========================================================================================
' Returns the corresponding ASCII or Unicode integer representation of the character at
' the zero-based position specified by the nIndex parameter (0 for the first character,
' 1 for the second, etc.), e.g. value = cws[1], cws[1] = value.
' ========================================================================================
PRIVATE OPERATOR CWStr.[] (BYVAL nIndex AS UINT) BYREF AS USHORT
   STATIC Zero AS USHORT = 0
   IF nIndex < 0 OR nIndex > (m_BufferLen \ 2) - 1 THEN RETURN Zero
   ' Get the numeric character code at position nIndex
   OPERATOR = *CAST(USHORT PTR, m_pBuffer + (nIndex * 2))
END OPERATOR
' ========================================================================================

' ========================================================================================
' Returns the corresponding ASCII or Unicode integer representation of the character at
' the zero-based position specified by the nIndex parameter (0 for the first character,
' 1 for the second, etc.), e.g. value = cbs[1], cbs[1] = value.
' ========================================================================================
PRIVATE OPERATOR CBStr.[] (BYVAL nIndex AS UINT) BYREF AS USHORT
   STATIC Zero AS USHORT = 0
   IF nIndex < 0 OR nIndex > SysStringLen(m_bstr) - 1 THEN RETURN Zero
   ' Get the numeric character code at position nIndex
   OPERATOR = *CAST(USHORT PTR, m_bstr + nIndex)
END OPERATOR
' ========================================================================================

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 29, 2018, 07:20:31 PM
José,


you must set "zero" to zero each time, as i did in my code. Initializing it to zero is not enough.

See, what happens without it, if you run this code:


dim u as Ustring = "asdfg"

  u[-1] = 1234

  print  u[-1]
  sleep



Maybe we should set the result to hFFFF, if an invalid index is given, because hFFFF is an invalid UTF16 character too - 0 isn´t, and there could be 0 at a valid index too. Or am´i wrong with this?


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on December 29, 2018, 08:28:45 PM
What I think is that we should remove error checking and warn that if the index is invalid the result will be undefined, as the FB intrinsic [] does. The main purpose of using [] is speed and we will lose it adding error checking. Those wanting error checking can use the Char property instead.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 30, 2018, 06:30:00 AM
Quote from: Juergen Kuehlwein on December 20, 2018, 04:12:21 PM
regarding GIT, i (at least intentionally) didn´t change anything else than some files in the compiler folder. The .jkp file is the project file for my IDE i obviously forgot to add to the ignore list. So please turn down all other changes.
I see now it is due to the added .gitattributes and your local machine probably has the default system wide setting of core.autocrlf=false.  To work better with the fbc sources on multiple platforms, it is better (for the fbc source tree in particular) to have this set to core.autocrlf=true.  Line endings will automatically be converted to CRLF for all files on windows (LF on linux).  There's only a couple places in the source tree that we force LF line endings.

Also, fbc source mostly has TAB character for indent.  Comments for statements on the line above (not in line or at end of line).  And no need for long '' --- * 70 comment breaks.

>> then just modify the AST expr.
>I actually don´t change the expr, i just redirect the code flow, based on the fact that one of our UDTs is recognized (by it´s name for now).
>>Avoid the GOTO's, some of them are skipping over DIM statements.
> But the alternative would be to make major changes in code structure and thus potentially breaking things

You are making it really hard on your self, and me trying to read your code.  Have a look at https://github.com/jayrm/fbc/commits/udt-wstring specifically in rtl-string.bas.  50 lines changed instead of 500.

QuoteBetter would be to use #pragma push/pop
> but what for is #pragma push/pop?

Push/Pop, or assignment, allows to turn the new behaviour on and off within the same source code.


>>attach the "UDT-is-a-Wstring" flag to the UDT's type-def[/quote]
> Which would require some new syntax for specifying this flag. Why not just check for "can-cast-as-wstring", which would have the same effect without the need for any syntax change/addition.

I agree.  There should be no need to check for special names (it slows down the compiler).  Attaching "can-cast-as-wstring" flag to the UDT can help speed up the compiler if the check is made often.


>For a schedule i would propose these steps:
>1.) make sure, that, what we have now, really works FLAWLESSLY for Windows, LINUX and maybe other targets, which can deal with wide strings.

Where are your test files?  What source you are using to check that your changes work?

> 2.) integrate this into the next official release, maybe with #2 of your list, maybe without it. This is a timing thing: when will be the next release, and how much time will it need to get #2 tested and implemented?

"fixing UDT implicit casting for built in [w]string rtlib functions", which is what you've been working on?  The #pragma tells fbc to prefer casting to WSTRING type if it is a UDT, rather than STRING type which fbc does by default.  That's a pretty good solution, especially if you are building UNICODE only programs.

> 3.) solve #2 and #3 of your list.

The problems of #3 will be noticed with respect to strings when a UDT has both cast to wstring and string.  However, if we have fbc just prefer one or the other, there's no ambiguity for the built in run time functions.  Though the problem still exists.

> 4.) make USTRING a built-in type (#1 of your list). Personally i can live with fact of having to include an extra file for dynamic wide strings, so it is last in my personal priority and my schedule. What do you think?

Yes, later.


----
in https://github.com/jayrm/fbc/commits/udt-wstring
- I (re)added the pragma as push/pop
- made some changes to rtl-string.bas
- I didn't look much in to the other additions you are making, as I have no idea what the cases are that you are fixing.

For example, here is a test I created for the few additions that I made (sorry, macro abuse):

#include once "../WinFBX/Afx/CWStr.inc"

#pragma udt_wstring=true

type some_wide_string as CWStr

private function EscapeUnicodeToAscii _
( _
byref w as const wstring _
) as string

dim ret as string

for i as integer = 0 to len(w)-1
select case w[i]
case 9
ret &= "\t"
case 32 to 127
ret &= chr(w[i])
case else
ret &= "\u" & hex(w[i],sizeof(wstring)*2)
end select
next

function = ret

end function

private sub wcheck overload _
( _
byref tag as const string, _
byref expr as const string, _
byref a as const wstring, _
byref b as const wstring _
)

dim fail as boolean

print tag & ", " & expr & ": ";

if( len(a) <> len(b) ) then
print !"Failed!: length does not match"
fail = true
elseif( a <> b ) then
print !"Failed!: not matched"
fail = true
else
print !"OK"
end if

if( fail ) then
print !"\tA: """ & EscapeUnicodeToAscii( a ) & """"
print !"\tB: """ & EscapeUnicodeToAscii( b ) & """"
end 1
end if
print

end sub

private sub wcheck overload _
( _
byref tag as const string, _
byref expr as const string, _
byval a as const integer, _
byval b as const integer _
)

print tag & ", " & expr & ": ";

if( a = b ) then
print !"OK"
else
print !"Failed"
end if
print

end sub

#macro t( expr )
#define Xexpr(X) expr
a2 = Xexpr(a1)
b2 = Xexpr(b1)
wcheck( "A=B", #expr, a2, b2 )
wcheck( "f(A)=f(B)", #expr, Xexpr(a2), Xexpr(b2) )
wcheck( "f(A)=f(**B)", #expr, Xexpr(a2), Xexpr(**b2) )
#undef Xexpr
#endmacro

#macro t_int( expr )
#define Xexpr(X) expr
a_int = Xexpr(a1)
b_int = Xexpr(b1)
c_int = Xexpr(c1)
wcheck( "f(A)=f(B)", #expr, a_int, b_int )
wcheck( "f(A)=f(**B)", #expr, a_int, c_int )
#undef Xexpr
#endmacro

private sub do_test _
( _
byref text as const wstring _
)

dim a1 as wstring * 40 = text
dim b1 as some_wide_string = text
dim c1 as wstring * 40 = **b1

dim a2 as wstring * 40
dim b2 as some_wide_string

dim a_int as integer
dim b_int as integer
dim c_int as integer


t( left( X, 3 ) )
t( right( X, 3 ) )

t( ltrim( X ) )
t( rtrim( X ) )
t( trim( X ) )

t( ltrim( X, any !" \t" ) )
t( rtrim( X, any !" \t" ) )
t( trim( X, any !" \t" ) )

t( mid( X, 2 ) )
t( mid( X, 3 ) )
t( mid( X, 2, 3 ) )
t( mid( X, 4, 6 ) )

t( lcase( X ) )
t( ucase( X ) )

t_int( asc( X ) )

t_int( instr( X, !"\u304B" ) )
t_int( instr( X, !"not-here" ) )
t_int( instr( X, !"Hel" ) )
t_int( instr( X, !"\u3055\u3093" ) )

t_int( instrrev( X, !"\u304B" ) )
t_int( instrrev( X, !"not-here" ) )
t_int( instrrev( X, !"Hel" ) )
t_int( instrrev( X, !"\u3055\u3093" ) )

end sub

do_test( !"Hello \u304A\u304B\u3042\u3055\u3093" )
do_test( !"  Hello \u304A\u304B\u3042\u3055\u3093  " )
do_test( !" \tHello \u304A\u304B\u3042\u3055\u3093\t  " )
do_test( !" \u3042 " )
do_test( !"\u3042" )



I wrote the test this way so I can inspect the results on any dumb ascii terminal, and can convert to fbc's test-suite later.  Anything you change or add, you should have a test in mind, that can be automated.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 30, 2018, 03:38:56 PM
Jeff,

QuoteYou are making it really hard on your self, and me trying to read your code.  Have a look at https://github.com/jayrm/fbc/commits/udt-wstring specifically in rtl-string.bas.  50 lines changed instead of 500.


Please have a look at post #145:

QuotePS:
@Jeff,

    fixing UDT implicit casting for built in [w]string rtlib functions


I think i have a quite simple solution for this, just have a look at "Parser-Compound-Select.bas", line 119. We could make can-cast-to-(w)string functions out of it and insert tests, in all places i currently check for the UDT´s name. This would be a generic approach then.

you coded, what i basically proposed as a generic approach.


As for the test code: every time i had a new version of "ustring.inc", the code to test (test.bas) and/or the compiler, i added it as "ustring.zip" as an attachment at the bottom of my post. You find the last one in post #147 (at the bottom of it) As a registered member of this forum you can download it. I removed previous ones in previous posts in order not to waste José´s web space by outdated versions. My latest is attached right here


JK

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 30, 2018, 05:03:44 PM
And Jeff,


i don´t want to make it hard on anyone! As i said before:

QuoteA compiler is a very complicated piece of software, and as long as i don´t understand in depth, what happens, and why exactly it happens, i´m extremely careful with changes and try to isolate my new code as much as possible. In other words i know that "goto" is often considered to be bad coding style. But the alternative would be to make major changes in code structure and thus potentially breaking things, i even don´t know of. I took care that skipping DIM statements is safe, in that the skipped variables are not needed anymore in places i´m jumping to.

How long have you been working on the compiler and how long have i been working on it? There is a huge difference in knowledge and experience between us on that matter. So how would you start to change a unknown software consisting of almost 200 separate files? Very carefully, i think.


Don´t let yourself be fooled by the style i added my changes. I did it exactly this way, deliberately and on purpose. It makes debugging easier for me, because i change the original code and code flow as minimal as possible in order not to introduce bugs, which weren´t there before. When using a generic function, a change in one place affects code in many places. If i keep it separate (accepting the overhead of doing the same or at least similar in several places, as i did) i have a chance of testing each change separately, which otherwise could cross influence each other.

This doesn´t mean that i expect it to stay this way. The current version is a test version for me, where i test:
- is it possible at all (yes, i´m convinced, it is possible)
- do the changes work as expected (as far as i can tell - yes)
- do the changes break existing things (as far as i can tell - no)

I don´t insist on long '' --- * 70 comment breaks. I´m not partricularily fond of my initials (JK) even, if i sign my posts with it. I use my initials to mark the code i added, so i just have to search for "jk", which isn´t a very common character sequence in FB. And i use the heavy comment breaks to visually mark my changes. So i can easily decide, what was has already been there, and what is my addition/change. All of this "jk"- specific things should be removed for a release version.


Quoteas I have no idea what the cases are that you are fixing

I fixed every place where a wstring worked and an ustring didn´t without prepending "**". Currently i´m in a position (and this was my first goal), where i can say: it is possible and it works. It needs more testing by other people to verify this. I provided test code, maybe you would like to add some more tests, or maybe you will find yet another syntax element or a bug, which must be fixed.

I have several own projects with ustrings, which compile and work with the new compiler version(s) and i can compile Paul´s WinFBE with the new compiler, even if i remove all prepended "**". (the latest version of the compiler is in my fork, a compiled Windows version of the compiler + ustring.inc + test code is in the attachment a the end of my previous post)


Ustring.inc has been adapted to (hopefully) work for Linux too. Stw currently runs tests with it, because i don´t have Linux.


Let´s assume it works in Linux as well, then we have a proof concept in that the compiler and ustring.inc work together flawlessly. As a next step, we could make it generic and replace code, which is all the same in various places. In fact it isn´t all the same in all places and i would have had a hard time debugging my changes, if i hadn´t kept it separate - maybe you wouldn´t, because you are much more familiar with the compiler than i am.


If you need some references about my coding skills, take a look at my IDE (https://jk-ide.jimdo.com/), which works for PowerBASIC and FreeBASIC. There might be some undetected quirks with FreeBASIC, because it is a recent adaption still, but it is the only IDE i know of for FreeBASIC, with a built-in stepping debugger for 32 and 64 bit apps. There are other built-in debugging features, which might come in handy.


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on December 30, 2018, 09:59:50 PM
Juergen, knowledge and experience of the compiler or coding skill has nothing to do with it.  I'm only looking at the code in front of me, and offering suggestions for next steps, as if someone wanted to merge this in to fbc's code base.  I don't how else to offer suggestions.  More smilies? :) ;D

Factoring common logic in to a subroutine, making the code concise, is a good programming practice.  Eliminates duplicate code.  And a well named subroutine can make the caller's code easier to read for the programmer.  Also, well named variables can give the reader an idea of what the variable is used for without having to read through all the logic to figure it out.

The other "style" choices (like tabs vs spaces) are a historical artifact of the compiler's development.  Unneccessary changes create meaningless diffs that are a pain to look through when inspecting change sets.  I've seen other projects that have very rigid formatting rules, and the code is beautiful to read.  We are kind of lazy about it, so if it doesn't get addressed up front, it probably never will. 

I got the ustring.inc file, I missed seeing that before, thanks.  Yes, you can not underestimate the value of a well written test-suite.  Have you looked at any of tests in ./tests to see the format?  wstring tests are somewhat lacking.  ustring.inc is a decent start.  It's time consuming to write good tests that cover the range of use.

Anyway, consider using astMaybeConvertUdtToWstring(), I have a feeling it will simplify code in more places than you think.  That way you can debug/change common logic in one place.  (For example, check if UDT has both cast wstring/string and do something different if it does, etc, in future.) 

Also, careful in astNewCONV.  The tricky part of this procedure is that callers expect it to behave a certain way.  Changing astNewCONV won't break user code, instead you get bugs down the road where USTRING might silently succeed in places where it shouldn't.  Making changes to astNewCONV usually means inspecting every place it is called from.

I want to work on fbc 1.06 release.  I was hoping to wrap up varargs feature, but I have a feeling the internal compiler changes are too aggressive to merge in just before a release, and there are a few loose ends on the feature, so I might hold off on that just now to get working on the release.  I am guessing 2 months to get a release done.  Mostly because I haven't done it in like 10 years, and my tool chains for some targets are broken.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on December 30, 2018, 11:18:00 PM
Jeff,

Quote
Anyway, consider using astMaybeConvertUdtToWstring(), I have a feeling it will simplify code in more places than you think.  That way you can debug/change common logic in one place.  (For example, check if UDT has both cast wstring/string and do something different if it does, etc, in future.)

you are always one step ahead. I´m not reluctant to do it the way you propose, this will be the next step. But i hope you understand now, why i did it the way i did it. I must be sure not to spoil something and i must be able to debug it, until i can be sure that my changes don´t have unwanted side effects. This is because currently i understand about 10% of the code. The other 90% are still "unknown ground" i at best can make guesses at.


Did you have a look at "test.bas", which tries to test the applied changes? Should i adapt it to the format used in \tests?


I will try to comply to the coding rules you stated for FB in order to make merging as easy as possible. Don´t hesitate to correct me in this area and don´t hesitate to correct me, if i´m wrong with my coding.


Regarding astNewCONV: i changed the codeflow for conversions to Single and Double from Ustring and i changed "ldtype" to "FB_DATATYPE_WCHAR" whenever an ustring is passed. As far as i understand it, this doesn´t break anything, it enables ustrings to be processed like wstrings, which seems to work in my test.bas and other places - or did i miss something?


Happy New Year - i´ll probably be offline the next two days


JK
Title: Re: FreeBASIC CWstr
Post by: Paul Squires on December 31, 2018, 04:07:06 PM
Quote from: Jeff Marshall on December 30, 2018, 09:59:50 PM
The other "style" choices (like tabs vs spaces) are a historical artifact of the compiler's development.  Unneccessary changes create meaningless diffs that are a pain to look through when inspecting change sets.  I've seen other projects that have very rigid formatting rules, and the code is beautiful to read.  We are kind of lazy about it, so if it doesn't get addressed up front, it probably never will. 

Hi Jeff, I love the idea of having a formatting code rulebook for the compiler (or FB source code in general). I need to adopt such a consistent style to my own coding as well as I have flip flopped between styles over the years and have yet to find one that I particularly like. I find the compiler source code very nice because it uses liberal use of whitespace, large indents, and consistent use of spacing between variable names, unary operators, parenthesis, etc.

Would you be cool if I start assembling a list of such formatting items that the compiler source code uses and post them here or in the FB forum for yourself and the other compiler pros to comment on and improve? It could possibly help with future contributors wanting to share source code.

Happy New Year to you! You're doing great work with FB and should be acknowledged for it.

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 01, 2019, 07:22:00 PM
Formatting...

while i´m wiling to accept, what Jeff stated as formatting rules, i prefer a slightly different format, which i have been using for many years.

My formatting rules are:

- Indentation by 2 spaces
- Procedures (or other logic entities) are separated by a separating line (*, =, whatever)
- A comment box after the procedure header explains what the procedure is for and evtl. what goes in and what goes out
- All local variables are defined at the beginning of a procedure, each one getting a separate line
- Comments before code lines only for important things
- Comments at the end of a code line (column aligned) for a regular comment



personally i like to comment very much, so for me this is harder to read and to understand:

...
'***********************************************************************************************
' do this or that, return succes = 1, fail = 0
'***********************************************************************************************
function do_it(s as string, p as long) as long
'loop variable
dim i as long   
'character
dim c as byte         


  'scan characters
  for i = 1 to len(s)     
    c = s[i]

    'do this
    if p > c then     
      return 1

    'do that
    elseif p < c then   
      return 1

    'otherwise do the following
    elseif p = c then   
      return 0
       
    end if
  next i
 
 
end function


'***********************************************************************************************
...


than this

...
'***********************************************************************************************


function do_it(s as string, p as long) as long
'***********************************************************************************************
' do this or that, return succes = 1, fail = 0
' maybe more lines here, if needed ...
'***********************************************************************************************
dim i as long                                         'loop variable
dim c as byte                                         'character


  for i = 1 to len(s)                                 'scan characters
    c = s[i]

    if p > c then                                     'do this
      return 1

    elseif p < c then                                 'do that
      return 1

    elseif p = c then                                 'otherwise do the following
      return 0
       
    end if
  next i
 
 
end function


'***********************************************************************************************
...


Due to the structure you can easily detect the procedure header, the variables used in this procedure, where it starts, where it ends and what it does in general.

On the left side is the code without distracting interspersed comments, this way it is tighter, you need less lines and thus you can see more code lines on one screen.

On the right side neatly aligned are short comments explaining what the code does. So you have code and comments separated but still linked to each other.

I don´t use tabs, because i always found it irritating, when lines jump horizontally. One space for indentation is not enough for visually structuring nested code, two spaces do the job quite well. More spaces tend to stretch the code to much horizontally for heavily nested code, especially if you want to have comments in the same line. My IDE (ab)uses the tab key to move the caret to a fixed column and to align existing comments to this column, if possible.

I use meaningful names for procedures, but i tend to use short names for local variables like i,n,x,y,z for numeric types and a,b,c,d,s for string types. Whenever such a "meaningless" name is used, i add a comment to the definition line for clarity.

After having written and tested a procedure, i remove debugging code and add all kinds of comments in places, which proved to be difficult to code. e.g why i did it this way and why that way doesn´t work, possible side effects and the like. This seems to be much of an effort, but in the long run, at least for me, it has proved to be of enormous help.


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 01, 2019, 07:39:49 PM
@Paul,


obviously you are following this thread and you could be of great help, because your WinBFE is a large project implementing José´s WinFBX.

Currently i´m able to compile your sources (all "**" removed) and as far as i can tell, it runs as expected. Because it is your project, you must have better means of testing than i have. Testing only ones own code is by no means sufficient for detecting all bugs, as we both definitely know. Would you help me testing the compiler changes with your project ? There soon will be a new version with generic routines as Jeff requested and i would appreciate someone as experienced as you running tests with it.


JK
Title: Re: FreeBASIC CWstr
Post by: Paul Squires on January 02, 2019, 05:39:17 AM
Quote from: Juergen Kuehlwein on January 01, 2019, 07:39:49 PM
@Paul,


obviously you are following this thread and you could be of great help, because your WinBFE is a large project implementing José´s WinFBX.

Currently i´m able to compile your sources (all "**" removed) and as far as i can tell, it runs as expected. Because it is your project, you must have better means of testing than i have. Testing only ones own code is by no means sufficient for detecting all bugs, as we both definitely know. Would you help me testing the compiler changes with your project ? There soon will be a new version with generic routines as Jeff requested and i would appreciate someone as experienced as you running tests with it.

JK
Hi JK,

Yes, I am following this thread because I have a huge investment in wanting a dynamic unicode string data type to succeed in FreeBASIC. Obviously, my preference would be for such a native data type to be built into the compiler but has Jeff Marshall has intimated, that is a huge undertaking.

I am using Jose's CWSTR class for all my unicode string needs at the moment and I am happy with the implementation. I am hesitant to change.

I can not use WinFBE as a testing bed for your proposed changes. As you can appreciate, the WinFBE code base is very large and is constantly changing as it is an unfinished product. I can not introduce more risk into the code base at this time as any potential encountered problems would slow WinFBE development to a crawl. I can not spend time chasing unicode errors when I have so much other work to do on the editor and visual designer. I will try to help test later once WinFBE is further developed and I can fork a version just to be used for unicode testing.

As an aside, I have started a coding style guide (written using GitHub markdown) and am flushing out the various sections based on guides I have read for VB, C, C++, .NET, and others. Needless to say, there are widely differing opinions on some topics but others are very uniformly adopted. I will use the guide in my own programming for a while first and once flushed out a bit then I'll post it for open criticism. I have illusions that such a document would ever become a standard guide for code styling for FB.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 02, 2019, 12:27:52 PM
Hi Paul,


thanks for your reply. I wasn´t actually asking you to change anything! Ustring is a fallback for Linux (maybe others) and for those, who for whatever reason don´t want to use José´s WinFBX in Windows. So please, stick to WinFBX! The new compiler version allows for skipping "**" in general (even with WinFBX), but of course you can stick to it too, if you want.

Quote...and I can fork a version just to be used for unicode testing.

This is more what i´m asking you to do. You don´t need to manage a public fork, just for yourself take what you have, remove all "**" (for the new syntax) and use the new compiler version for compiling - just for a test. You know your code better than i do. So you know the critical sections, where it is most likely to fail, if unicode doesn´t work properly - you know better how to test your work than i do.

In other words:

Your part would be to look, if there is a different behavior between versions of your code compiled with the existing compiler version and the new compiler version using the new syntax (no "**") - not all the time, but e.g when you finished a new version. In such a case you would tell me: it works as expected, or: the new compiler raises a problem with...

My part would be to supply the new compiler versions (see attachment) and then possibly to find out, what exactly is going wrong and why.


Attached is a new compiler version, which should meet Jeff´s wishes about a more generic approach, the code in the repo hasn´t been cleaned yet, i will do that, when i´m sure that everything works.


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 04, 2019, 07:46:57 PM
My latest "ustring.inc" + "test.bas" seem to work in Linux too (attached)


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 05, 2019, 12:27:36 AM
Jeff,

i pushed cleaned code to my fork (https://github.com/jklwn/fbc), "ustring" is still used as a marker, until we can be absolutely sure, that there are no bugs anymore. All my tests show, that there are no more bugs in Windows and Linux - please re-check this.

There is a new repository for "ustring.inc" + "test.bas" here (https://github.com/jklwn/ustring), which contains the latest version of both files and some other files you can ignore.


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on January 05, 2019, 11:51:02 PM
Quote from: Paul Squires on December 31, 2018, 04:07:06 PMI love the idea of having a formatting code rulebook for the compiler (or FB source code in general).
>Would you be cool if I start assembling a list of such formatting items...?

Sure, go for it, though I think it will be difficult to get everyone to agree.  For the compiler, the main items for me are:
- TAB character for indent
- lines less than 70 or 80 characters if possible
- comments start with double apostrophe '', indented to same level of scope
- comments on their own line preceding the executable statements
- there's different "rules" for rtlib/gfxlib2 source because it's in C

There's probably many habits I have, that I don't even think about.  Mostly I follow the "style" of what's already in the code base.  Sometimes when I go back and look at old code, I can't tell if I wrote it, or v1ctor wrote, or dkl wrote it, because we all basically follow what's already there.  I think I could probably write a short story on how I format code and why, though I'm not sure it would matter to anyone but me.  Maybe if you ask some specific questions, I could answer with an opinion.

The important goal is consistency.  When reading through 1000's of lines of code, it doesn't matter too much what the style is (everyone will have their own preference).  It matters more that it is all roughly the same style, making it easier on the eyes with few disruptions/distractions.  When I was rewriting the test-suite, I thought of creating a simple code formatting program, just to apply a few basic rules just to sanitize the code (mostly whitspace related) before committing.  I think in the end I used a sed script.  dkl was a little irritated at all the white-space changes.  In future, I would do the white-space changing commits separate from the content changing commits.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on January 06, 2019, 12:08:30 AM
Quote from: Juergen Kuehlwein on January 01, 2019, 07:22:00 PM
Formatting...
> My formatting rules are:
> ...

Yeah, if I look back at my code from 2005 or earlier, I have about similar style, mostly due to habits from using QB/VB editors and small display screens in the 1990's.  When I started on FreeBASIC project I changed my style to match.  Some habits I kept and so new code I write, even if it is just for me has different style than what I would have written 20 years ago.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on January 06, 2019, 12:46:42 AM
Quote from: Juergen Kuehlwein on December 30, 2018, 11:18:00 PM
Did you have a look at "test.bas", which tries to test the applied changes? Should i adapt it to the format used in \tests?

Yes.  And yes, eventually.  Any place you change in fbc code needs a test.  I noticed your reference implementation in ustring.inc + parser-procall-args.bas adds "TALLY", "PARSECOUNT", etc, features.  This is beyond what I'm familiar with.

> Regarding astNewCONV: i changed the codeflow for conversions to Single and Double from Ustring and i changed "ldtype" to "FB_DATATYPE_WCHAR" whenever an ustring is passed. As far as i understand it, this doesn´t break anything, it enables ustrings to be processed like wstrings, which seems to work in my test.bas and other places - or did i miss something?

I started to investigate this.  It's difficult, but not impossible, to work with your branch.   I will comment more in another post.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on January 06, 2019, 02:35:40 AM
Quote from: Juergen Kuehlwein on January 05, 2019, 12:27:36 AM
i pushed cleaned code to my fork (https://github.com/jklwn/fbc), "ustring" is still used as a marker, until we can be absolutely sure, that there are no bugs anymore. All my tests show, that there are no more bugs in Windows and Linux - please re-check this.

There is a new repository for "ustring.inc" + "test.bas" here (https://github.com/jklwn/ustring), which contains the latest version of both files and some other files you can ignore.
Ok, I started to look at your previous branch from last week.  I have not gone in depth to your latest branch; just saw it a couple hours ago.

I know I am being picky (specific, pedantic, critical) about your branch.  So, maybe if I provide some context, you will understand why.

1) For perspective, here is what my local repository looks like: jayrm-fbc-graph-20190105.png (http://www.execulink.com/~coder/tmp/jayrm-fbc-graph-20190105.png)
- To switch between branches (git checkout), I don't expect to have to do much.
- My focus currently is 1.06.0 branch because I am creating releases.
- jklwn/JK-USTRING is your most recent push to your public repo
- jk-ustring is my local branch, with edits that cleans up all the meaningless differences, gets rid of the "#compile" directives that are specific to your IDE, etc.  As of your previous jklwn/JK-USTRING branch, actual number of files changed, is about 10 files.

2) When I compare origin/master to jklwm/JK-USTRING, here's what I see: jayrm-jklwn-diff-20190105.png (http://www.execulink.com/~coder/tmp/jayrm-jklwn-diff-20190105.png)
- It shows me differences that I don't care about.  Many differences are meaningless and is result of the way you are working with fbc/master and git checkout
- you really need to set your git config core.autocrlf = true.  Then all files that are checked out from git will be converted to CRLF line endings and you can get rid of '.gitattributes' file
- eventually, before you create a pull request to fbc/master, you should do a git rebase so that your end result is concise number of commits.  No fault to Marc Pons, because it's only in hindsight, that's what should have been done with __FB_GUI__ pull request.  But it never happened.  I'm trying to learn from past experience.

3) And when I look at specific code: jayrm-jklwn-file-diff-20190105.png (http://www.execulink.com/~coder/tmp/jayrm-jklwn-file-diff-20190105.png), these changes are now meaningless.  As you spend more time in the code base, these kinds of comments are not needed since should be obvious just reading the code.

4) Copyright on ustring.inc
Quote from: ustring.inc
' ****************************************************************************************
' This code is copied and adapted from WinFBX with explicit permission of José Roca
' under the condition that the original copyright applies (see below).
' All changes and additions are Copyright (c) 2018 Juergen Kuehlwein
' Freeware. Use at your own risk.
' THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
' EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF
' MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.
' ****************************************************************************************

' ########################################################################################
' Microsoft Windows
' Implements a dynamic data type for null terminated unicode strings.
' Compiler: Free Basic 32 & 64 bit
' Copyright (c) 2016 Paul Squires & José Roca, with the collaboration of Marc Pons.
' Freeware. Use at your own risk.
' THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
' EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF
' MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.
' ########################################################################################
- What is intent here for licensing it's use?  Included as part of FreeBASIC releases?
- As FreeBASIC developer, what am I allowed to do?  Can I distrubute?  You are stating that if I modify it, you still have copyright on it.  That don't think that is compatible with FreeBASIC's other licenses.
- This needs to be addressed.  For ustring.bi reference implementation, the licensing terms can not be any less restrictive than current rtlib/gfxlib2 licensing.  Of course, you can create a full featured, any extensions you like version with any copyright/licensing you choose otherwise.  But, for anything that is packaged with FreeBASIC, the developer team needs control over licensing.
- Also, if it's to be included as part of FreeBASIC release, to be use by users, then should look at separate .bi interface + .bas implementation; not all users #include sources from one main file.

So, ustring.inc or some variation of it, eventually needs to get added as reference implementation.  As a reference implementation, it does not need to be exactly like DWSTRING, CWSTR, etc.  It's only there to test the new feature of UDT => WSTRING implicit casting.

For me, to test your branch, I must edit several files in your branch, create a new branch, and create the automatic tests that I can run on each system.  For the current time, I can give you some guidance only.  Otherwise, just now, it just takes too much time to switch branches and work with your code.

I hope I explain in a way that is not too discouraging.
Title: Re: FreeBASIC Code Styling
Post by: Paul Squires on January 06, 2019, 04:42:33 PM
Quote from: Jeff Marshall on January 05, 2019, 11:51:02 PM
Sure, go for it, though I think it will be difficult to get everyone to agree.  For the compiler, the main items for me are:
- TAB character for indent
- lines less than 70 or 80 characters if possible
- comments start with double apostrophe '', indented to same level of scope
- comments on their own line preceding the executable statements
- there's different "rules" for rtlib/gfxlib2 source because it's in C

There's probably many habits I have, that I don't even think about.  Mostly I follow the "style" of what's already in the code base.  Sometimes when I go back and look at old code, I can't tell if I wrote it, or v1ctor wrote, or dkl wrote it, because we all basically follow what's already there.  I think I could probably write a short story on how I format code and why, though I'm not sure it would matter to anyone but me.  Maybe if you ask some specific questions, I could answer with an opinion.

The important goal is consistency.  When reading through 1000's of lines of code, it doesn't matter too much what the style is (everyone will have their own preference).  It matters more that it is all roughly the same style, making it easier on the eyes with few disruptions/distractions.  When I was rewriting the test-suite, I thought of creating a simple code formatting program, just to apply a few basic rules just to sanitize the code (mostly whitspace related) before committing.  I think in the end I used a sed script.  dkl was a little irritated at all the white-space changes.  In future, I would do the white-space changing commits separate from the content changing commits.

Thanks Jeff, I started such a document because I've spent the past week refactoring my source code. It is amazing the number of nuances that you encounter when trying to stylize code. You are right about consistency though... style is very subjective, but if you deviate from that style then it is confusing not only to you as the programmer, but to the reader as well.

My document has evolved past a simple style guide for the fb compiler source code. It is more now like a discussion on various style topics and in particular how I am formatting my code. Kind of like a self documentation exercise that may prove useful to the greater FB community at some point if others wish to have a starting point for their code formatting "rules" :)   One thing is for sure, such a document would surely stir some interesting debate.

Here are some of the topics so far:
- Disk File Structure and Layout (src, bin, doc, inc, lib, doc, tests, etc)
- Filenames
- Indentation (Tabs vs. Spaces)
- Blank lines
- Whitespace
- Line length
- Comments
-- Header boilerplates (licenses)
-- File header comments
-- Function description comments
-- Multiline comments
-- Single line comments
-- End of line comments
- Keyword and variable casing
- Types, Classes, Enums (naming and format)
- Subs/Functions and use of pendantic ByRef, ByVal, etc
- Variable declarations (one or more per line? initialization on same line? Grouping of similar definitions? Top of file or next to use area?)
- Variable names (upper, lower, camel, pascal, underscores, hungarian, etc)
- Spacing (whitespace amongst keywords, parenthesis, unary operators)
- Line breaking (multiple lines via ":" operator, and "_" underscore line continuations)
- Long vs Integer (issues switching between 32/64 systems and Windows API)
- Private vs Public Functions
- Modules vs Includes (linking multiple single object files vs #include source files into main file)
- Sub vs Function (make everything Function for consistency and ease of future changing of sode use?)
- Formatting of (If ElseIf Then, Select Case, Do Loop, For Next)



Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 06, 2019, 05:52:05 PM
Jeff,

Quotein a way that is not too discouraging
no - not at all.

the latest commit is the first one you should work with, because i removed a lot of unnecessary things and i hope there won´t be much changes any more. See below for more...

QuoteThis is beyond what I'm familiar with

I added functions for string manipulation, which proved to be useful in PowerBASIC. It´s mostly clones of functions José included in his WinFBX but with a more PB-like syntax. In order to have a consistent syntax for the "ANY" keyword and in order to enable the function "Pathname_" to work just like in PB, i added code in parser-procall-args.bas, which makes this possible. These functions are not a necessary part of ustring.inc and will be placed in a separate file later, but for testing it´s easier to have it it in one single file - at least i thought so.

Quoteand create the automatic tests that I can run on each system

no - you just have to run test.bas, which is a collection of tests trying to address all possible aspects of implementing ustings and a collection of tests for the new string functions which simultaneously test ustrings themselves too.

QuoteWhat is intent here for licensing it's use?  Included as part of FreeBASIC releases?

Basically i wanted not be held liable for any damage, which might arise form using this code, and i wanted to prevent anyone else from making money out of it, because it´s free. Otherwise you can do with it, whatever you want. In other words: José, Paul. Marc and me don´t want anyone else to be able to take ownership and sell this code for money in any way. It is acceptable though creating a commercial application, which implements this or derived code as a part of it´s source code. 

If this code is to become a part of the FreeBASIC distribution, there will be no problem on my side (and i think José, Paul and Marc agree too) changing it to whatever is needed for FreeBASIC. It will be "ustring.bi" then and it will contain only what´s necessary for the type to work, everything else, will be in one or more (i´m working on additional array processing functions) separate .bi files, if the developers accept it.

What we currently have, is still a version for testing and not for a release!


So what can we do about my branch in order to make switching branches less work for you ?

I´m new to GIT, actually i just recently wrote a GIT integration for my IDE making use of TortoiseGit, but honestly i don´t understand, what each and every change of setting or implementation of Git-commands does to the code or to the remote repository. So far i managed to work with it, but i´m definitely not an expert. I did set git config core.autocrlf = true!
 
In case there still are bugs, which need to be fixed, i need some "markers" in the code, which allow for fast finding all places, where i added or changed it. I hate doing things twice (digging through 170 code files), therefore i usually take precautions not to have to do it twice - i add a specific comment.

I could remove all the end of line comments and add something like "(ustring)" to the preceding comment or add "ustring" as a new comment. Would this be acceptable?

Is there way to transfer your cleaned code then to my fork, or maybe create a new fork, i could use in the future? The one i´m currently working with does contain these differences (spaces vs. tabs, LF/CRLF) and i don´t know how to get rid of all of them in order to make you happy.

I will try to rewrite test.bas so that it can be added to \test. Is there a tutorial other than the attached readme?


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 07, 2019, 12:05:01 AM
Jeff,

thinking about it, i could create a new fork (hopefully avoiding the previous errors) and re-apply my changes. Then i would discard the current one and you could use the new (cleaner) one. Does this sound more acceptable?

Furthermore i would add a cleaned "ustring.bi" (and maybe others) to the \inc folder and i would add an "ustring" folder to the \tests folder containing tests for ustrings + related additions.

Where should i put adjacent documentation, and which format would you prefer?


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on January 11, 2019, 10:20:18 PM
Quote from: Juergen Kuehlwein on January 07, 2019, 12:05:01 AM
i could create a new fork (hopefully avoiding the previous errors) and re-apply my changes. Then i would discard the current one and you could use the new (cleaner) one. Does this sound more acceptable?
I think that's a good approach.  I often do this myself.  I will work in a branch for a while as a work in progress (WIP).  And then create a new branch to reapply the changes from the WIP.  Doing this will clean up all the temporary commits where the feature is being revised, and the end result is a commit history that is much easier for the developers (including yourself) to follow; making it all the more likely that the pull request will be accepted without too many more revisions.

> Furthermore i would add a cleaned "ustring.bi" (and maybe others) to the \inc folder and i would add an "ustring" folder to the \tests folder containing tests for ustrings + related additions.

This sounds correct to me.  ./inc/ustring.bi implements the new feature backed by changes in ./src/compiler, and ./tests/ustring for test modules. 
How many new ".bi" files?  Maybe in "./inc/ustring/*.bi" then?

> Where should i put adjacent documentation, and which format would you prefer?

Well, initially, it could just be a ./tests/ustring/ustring.txt document file, but as a permanent feature of fbc, then it should get added to the wiki.  This feature is a little different than anything we've done before, so I would probably just start off with a single page linked from wiki's DocToc and then go from there.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 16, 2019, 04:34:48 PM
Ok Jeff,


please check the "ustring" branch at my fork (https://github.com/jklwn/fbc).

I hope this is better now. All unnecessary code changes were removed, \tests now contains an \ustring folder for ustring tests and dirlist.mk an "ustring \" line. All tests run successful, 32 and 64 bit. I updated "readme.txt" in \tests a bit. A short documentation is in "ustring,txt" and "ustring.bi" and "stringex.bi" were added to \inc.

I´m still struggling a bit with GIT: i had to set core.autcrlf = false in order to prevent GIT from doing unwanted things to .txt files. Setting filemode = false didin´t prevent GIT from staging all (so far untouched by me) .sh files, so i added "*.sh" to .git\info\exclude, which hopefully fixes this. And i had to change the line ends of several (e.g. emit.bi) files from LF to CRLF to make it usable for me. Tell me, if there still is something "wrong" (and maybe how to make it better).


JK

 
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 20, 2019, 11:50:23 AM
Jeff,


there is a new commit at my fork (https://github.com/jklwn/fbc). I added some generic array processing macros for USTRINGs and all other array types (including tests in \tests\array, and a short documentation in array.txt).

In the meantime i realized that adding "*.sh" to .git\info\exclude doesn´t actually exclude these files, it deletes them, which is not what i want either. So i added all *.sh again with this commit. But there may be chmod changes and i don´t know how to keep these files without changing chmod (which is a Windows problem, because Windows doesn´t support an executable flag).

Please check and tell me, if you still see problems with my branch.


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 20, 2019, 01:31:40 PM
Jeff,


i think i finally found a method of leaving *.sh files just as they are. I must remove them manually one by one from the list of files to commit - quite cumbersome, but seems to work. If you know of a more convenient method, please let me know!

I´m going to make a pull request now.


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on January 20, 2019, 05:26:07 PM
Yes, it looks like it only has to do with line endings.

One part of the problem is how fbc project is configured compared to how your system is configured.  And maybe, you are using a .gitattributes file locally to try and manage this CRLF/LF problem.  Not sure.  Regardless, it's the line endings that are causing the grief.

The other part, is that last summer, I applied a patch from SARG and still got mixed line endings stored to the repo.  Some of the changed files you are seeing are all the files that got merged from pull request 96.  https://github.com/freebasic/fbc/pull/96 (like emit.bi edbg_stab.bas).  Where we are just now seeing the LF -> CRLF conversion.

So,

We do want every file we get from freebasic/fbc to have CRLF line endings on windows, and LF on linux.

From fbc wiki https://www.freebasic.net/wiki/wikka.php?wakka=DevGettingTheSourceCode
: The recommended setting for core.autocrlf is true, so that the FB source code in the working tree will have CRLF line endings.

Locally, for a single repo, looks like setting autocrlf can be stored in ./git/config file.  (git config --local).  Need to make sure that works like I think it should.

From this site: https://ywjheart.wordpress.com/2017/03/22/autocrlf/
: use core.autocrlf = true if you plan to use this project under Unix as well

Also, if the setting autocrlf has been changed, or in .gitattributes, then the changes may not get seen right away
Info in https://help.github.com/articles/dealing-with-line-endings/

heh, I'm trying.  Also, the development tree is shared by many people, so I keep most scripts, batch files, IDE files, tools, etc, outside of the fbc directory.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 20, 2019, 07:35:46 PM
Well, you can set autocrlf systemwide, global and local. All of these settings end up in a config file in the corresponding location. As i understand it setting autocrlf to true doesn´t affect .bas/.bi files, it´s for .txt files (at least for me it doesn´t convert -bas/.bi files at all - i tried it). Setting it to false removes all .txt files from the list of files GIT reports to have been modified. So i don´t have to deal with them anymore.

My IDE cannot work with LF line endings (which btw isn´t Windows standard), so i must convert them to CRLF. I tried .gitattribute, but you told me not to use it. Now i wrote a little utility doing that for me (all files in src\compile and all new files i added). Copying these files (with CRLF line endings) to Linux (Ubuntu64 on a virtual machine), didn´t cause any problems there, neither did the test files.

Which files must have LF line endings ? Most files in src\compiler already have CRLF line endings anyway, it´s only a few, which i must convert. If i had a list of them, i could write an (expandable) utility for converting them from CRLF to LF (and vice versa) as needed. A simple .txt file for listing directories or single files could supply input and the utility would convert these files for Windows or Linux. What do you think ?


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on January 20, 2019, 09:18:42 PM
core.autocrlf affects all TEXT files.  Which could be ANY filename.ext, that does not have binary content.  With correct autocrlf setting, and no extra .gitignore or .gitattributes, I don't think you should have to write scripts to deal with line endings.

Easiest way to see the difference is do a new clone & checkout, forcing the autocrlf setting for comparison.

Clone & checkout into fb.0 with whatever line ending was stored in the repo (should be all LF):
git -c core.autocrlf=false  clone https://github.com/freebasic/fbc fb.0

Clone & checkout into fb.1, converting all TEXT files to CRLF line endings:
git -c core.autocrlf=true clone https://github.com/freebasic/fbc fb.1

autocrlf setting is also important when making commits back to fbc repo.
When committing to from windows, autocrlf=true (CRLF's get converted back to LF's)
When committing to from linux, autocrlf=false (line endings get stored as-is which should already have been LF)

The only files that must be LF are
./manual/cache/*.wakka (raw data files from wiki)
./tests/file/big-list-eol-lf.txt (a specific test that needs LF on all targets)
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 20, 2019, 09:58:20 PM
QuoteEasiest way to see the difference is do a new clone & checkout, forcing the autocrlf setting for comparison.
that´s exactly what i did! Regardless of the autocrlf setting, i always have some files with LF line endings after cloning in src\compiler - i spend days trying to get GIT do what i want (CRLF as line endings right after cloning), it just didn´t work. Finally i gave up on it and it took me ten minutes to write own code fixing the issue.

QuoteThe only files that must be LF are
./manual/cache/*.wakka (raw data files from wiki)
./tests/file/big-list-eol-lf.txt (a specific test that needs LF on all targets)
I didn´t touch any of these files, nor were any of them part of a commit, so there should be no problem.

What about just testing and evaluating my changes and additions, while i on my part work on a solution to preserve line endings just as they are in the master branch? Or why not have line endings for the source code consistent - either all LF or all CRLF, instead of having some with LF and having others with CRLF?

Or add a file e.g "rules.txt" specifying rules like naming conventions, required line endings, exceptions from the rule and the like for the root folders of the repo. This would make things clear once and for good.


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 20, 2019, 11:11:50 PM
Re-reading the info in https://help.github.com/articles/dealing-with-line-endings/, what about a .gitattributes file in the master branch? Properly configured this would solve the problems we have for good too, because .gitattributes did work for me - but you advised me not to use it. I understand now that my implementation was improper for general use, even if it worked for me.

But it should be possible to setup a generic .gitattributes file. Because of the fact that it overrides other GIT settings, it would make clones independent of individual user settings in this repect and ensure proper line endings everywhere.

Please, think about it.


JK

 
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 21, 2019, 10:44:20 PM
Jeff,

i ran further tests showing that .gitattributes could really be the solution. As already mentioned above setting autocrlf to true doesn´t convert all .bas/.bi files properly to crlf - emit.bi and others keep failing!

Adding a .gitattributes at root level like this

# Auto detect text files and perform LF normalization
* text=auto
# BASIC source files (need cr/lf line endings)
*.bas text
*.bi text
*.inc text
*.h text
*.bat text
*.txt text
*.jkp text
*.rc text

*.gif binary
*.jpg binary
*.png binary

./manual/cache/*.wakka eol=lf
./tests/file/big-list-eol-lf.txt eol=lf


does the trick. Astonishingly enough all other problems i had with .txt and .sh files seem to be gone as well.

You may have to adapt and refine what i posted here, but in my opinion this the way to go in order to avoid the mess we currently have. It does what i already proposed above, it allows for setting rules for line endings per file type, it s independent of user settings because it overrides them. That is, it forces consistent line endings for the corresponding OS. On the contrary to autocrlf it works - and you can control it.


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on January 24, 2019, 11:53:46 PM
Hey JK, yes, I've seen all your updates.  It's only been a few days, dude!  Please allow me some time to review and respond.  I really want to get the 1.06 release out first.  Then we can be more liberal about what gets merged in to fbc/master.  Just after a release is a good time to add new features.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on January 25, 2019, 03:17:16 PM
QuoteI really want to get the 1.06 release out first

Yes, do that! (and then let´s go on ... :-))


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 07, 2019, 11:41:32 AM
Hi José,


after a little break here is an update of the current situation:

After the 1.06 FreeBASIC release Jeff basically agreed to merge in my pull request. There might be changes to some additions (extra string and array handling functions) but he definitely supports the integration of USTRING into the compiler´s code.

That is, we can finally get rid of the need to prepend "**" for variables and expressions at all. All of these changes work with my implemetation (CWSTR stripped down to the bare minimum) as well as with your original CWSTR. Existing code doesn´t have to be changed, you can still use "**", but you don´t need to anymore.


My code differs in two minor points from yours (CWSTR):
- i had to move "m_pBuffer" in first place of the member variables, this makes an improved "STRPTR" possible


TYPE DWSTR
  m_pBuffer AS UBYTE PTR                              'pointer to the buffer (in first place -> make strptr possible)

  Private:
    m_Capacity AS Ulong                               'the total size of the buffer
    m_GrowSize AS LONG = 260 * SizeOf(WSTRING)        'how much to grow the buffer by when required

  Public:
    m_BufferLen AS ulong                              'length in bytes of the current string in the buffer


STRPTR now returns a WSTRING PTR for Wstrings and Ustrings, which makes much more sense than a previously returned ZSTRING PTR. This change doesn´t break existing code, because you couldn´t use a ZSTRING PTR for processing a Wstring, you had to cast it to a WSTRING PTR or ANY PTR anyway.


- i had to remove the "CONST" qualifier in one place, this was necessary for some string handling functions to compile


    DECLARE OPERATOR CAST () BYREF AS WSTRING
    DECLARE OPERATOR CAST () AS ANY PTR



With these two small changes (which shouldn´t break anything) your CWSTR will be fully compatible to my compiler changes. Basically my USTRING implementation is written in a way, that (if pesent) WINFBX is the preferred way for adding dynamic wide strings. My code is only a fallback, if WINFBX is not used or cannot be used (e.g. LINUX).

Do you see any problems applying these changes to future releases of WINFBX?


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on April 08, 2019, 03:08:05 AM
No problem at all.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on April 28, 2019, 04:29:17 PM
Hi Juergen, I have been reviewing your code last few weekends.

Your original code rebased on to current master I pushed here: https://github.com/jayrm/fbc/tree/jklwn-ustring
Overall, it's a big change to the compiler, though looks like you have made changes in the right places.  However, the logic in some places ignores cases that might be present.

I've been working on adding the changes step-by-step here: https://github.com/jayrm/fbc/tree/udt-wstring

I'm using this new syntax to indicate that a UDT should behave like a wstring (or a zstring):

type T extends wstring '' or zstring
  '' ...
end type


For testing, I am using a minimal implementation.  As far as the compiler changes go, I'm not really interested in how well the UDT works as a dynamic wstring, just that it works as wstring:

type UWSTRING_FIXED extends wstring
private:
_data as wstring * 256
public:
declare constructor()
declare constructor( byval rhs as const wstring const ptr )
declare constructor( byval rhs as const zstring const ptr )
declare operator cast() byref as wstring
declare const function length() as integer
end type

constructor UWSTRING_FIXED()
_data = ""
end constructor

constructor UWSTRING_FIXED( byval s as const wstring const ptr )
_data = *s
end constructor

constructor UWSTRING_FIXED( byval s as const zstring const ptr )
_data = wstr( *s )
end constructor

operator UWSTRING_FIXED.Cast() byref as wstring
operator = *cast(wstring ptr, @_data)
end operator

const function UWSTRING_FIXED.length() as integer
function = len( _data )
end function

operator Len( byref s as const UWSTRING_FIXED ) as integer
return s.Length()
end operator


I'm going step-by-step to determine what changes are needed, why they are needed, and fully test.  For example, test-suite for my branch not passing on Travis-CI due memory leaks I didn't notice (double free).  That could be due my changes, or due missing overloads the minimal implementation.

Also I have been investigating some of the long standing WSTRING bugs posted on sf.net while working on review of your code.

A requirement that the m_pBuffer element be first element should not be required.  Instead, fbc should be calling an operator overload to get the data's address if the user has defined one.  Or maybe it needs to be a requirement of the UDT that the user writes.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on April 28, 2019, 07:56:35 PM
Yeah, we are basically assuming that the existing wstring implementation is good, but I keep running in to annoying bugs.  Like this latest one, I found:

dim w as wstring * 50 = " "
dim r as wstring * 50 = trim(w)

Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 28, 2019, 11:11:43 PM
Jeff,


"EXTENDS WSTRING" in place of "#PRAGMA ..." is fine! But why not use the type in "ustring.bi" for testing? It is thoroughly tested, contains all necessary operators and overloads and thus helps avoiding type related errors.

QuoteA requirement that the m_pBuffer element be first element should not be required.  Instead, fbc should be calling an operator overload to get the data's address if the user has defined one.  Or maybe it needs to be a requirement of the UDT that the user writes.

m_pBuffer being the first element makes it possible to change the compiler´s code for "STRPTR" to work with WSTRING and USTRING. For both WSTRING and USTRING it returns a WSTRING PTR to the wide string data, this is different from what it did before (return a ZSTRING PTR for a WSTRING), which is - at least - undesirable.


JK 
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 29, 2019, 12:02:41 AM
The problem with:

dim w as wstring * 50 = " "
dim r as wstring * 50 = trim(w)


is a "ssize_t" overflow problem. Adding two lines to "...rtlib\strw_trim.c" solves the problem:


        ...

chars = fb_wstr_Len( src );
if( chars <= 0 )
return NULL;

if( wcscmp(src, L" ") == 0)        //added
return NULL;               //added

        ...



JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on April 29, 2019, 12:39:32 AM
I am still using your "ustring.bi" and "tests/ustring/*.bas" at each step to check that the changes to fbc still work with what you expect in "ustring.bi".  But "ustring.bi" is too complex to specifically test each change in the compiler, and most of the features in ustring.bi are not specifically needed to test the changes in fbc compiler.  So I am using a simplified UDT to test each compiler change.  In the end, "ustring.bi" will work too.

FYI, hides the problem, doesn't solve it.

if( wcscmp(src, L" ") == 0)        //added
return NULL;               //added
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on April 29, 2019, 01:32:48 AM
I've started a discussion at https://www.freebasic.net/forum/viewtopic.php?f=17&t=27569, which includes a back link to here.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 29, 2019, 11:24:40 PM
Jeff,


QuoteFYI, hides the problem, doesn't solve it.

i agree! It makes the compiler work, but it doesn´t SOLVE the problem of the following code. Investigating this bug further shows, that "chars" after
chars = fb_wstr_CalcDiff( src, p ) + 1;

gets a value of -2147483648 where you would expect zero (p being one position "before" src should result in -1, +1 should make zero). So it seems "fb_wstr_CalcDiff" doesn´t return the expected result, if p < src.


Adding
  if (p < src)
    return NULL;


before makes it exit and return NULL, when src is all spaces. This would be expected. But "fb_wstr_CalcDiff" is used in other places as well, i didn´t look, if this (end < ini) could be a problem elsewhere too. Hope this helps...


JK
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on April 30, 2019, 10:59:01 PM
Investigating this case still further, i can say that end < ini cannot occur in all other places, where fb_wstr_CalcDiff is called.

I understand the underlying logic of the code in strw_trim.c and as far as i can tell, it is correct. What isn´t correct is the result of fb_wstr_CalcDiff in case p < src. Fb_wstr_CalcDiff should return the difference between the pointers p and scr in wide characters. The corresponding code is:

return ((intptr_t)p - (intptr_t)src) / sizeof( FB_WCHAR );

As said before, this delivers correct results as long as p>src, but for p<src it fails to deliver the expected result. I´m not an expert in C coding at all, but i keep asking myself - why this compilcated? Wouldn´t

chars = p - src + 1;

do instead of fb_wstr_CalcDiff? ... and according to my tests it does, even if p<src.


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on May 02, 2019, 10:41:47 PM
JK, thanks for investigating.  You have the right idea.

I think your expression: chars = p - src + 1;, is valid and safe.  gcc should be doing the right thing with the pointer arithmetic.

fb_wstr_CalcDiff provides some self documentation as to what the code is supposed to be doing, so might be good to keep the function.   

/* Calculate the number of characters between two pointers. */
static __inline__ ssize_t fb_wstr_CalcDiff( const FB_WCHAR *ini, const FB_WCHAR *end )
{
return end - ini;
}


For your interest:
fb_wstr_SkipCharRev function itself can return a pointer to one element before the first element, which is actually undefined behaviour in C, but often works because of how the C compiler handles it.  So I think I will make a small change in the logic for fb_wstr_SkipCharRev to avoid that and to get rid of the '+1' needed after the function is called.  The result will be that fb_wstr_CalcDiff won't ever see "end" pointer less than "ini" pointer to begin with. 

And, in the the original expression:
return ((intptr_t)end - (intptr_t)ini) / sizeof( FB_WCHAR );
the symptoms of the problem we are seeing is in how gcc optimizes the expression by translating a division by a power of 2 to a shift instructions.  fbc does this too, but in gcc, the actual optimization is different depending on type of the divisor:

((int)end - (int)ini) / sizeof(FB_WCHAR)      '' gcc optimizes to SHR instruction (wrong!)
((int)end - (int)ini) / (int)sizeof(FB_WCHAR) '' gcc optimizes to SAR instruction (correct)

SHR instruction doesn't preserve the sign and so fails for negative values, as would be the case in (end < ini).

I don't see any reason why fb_wstr_CalcDiff must cast pointers to integers.  But that code is nearly the same since 2005, so maybe related to an old version of gcc, I don't know.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 02, 2019, 11:58:18 PM
Jeff,

fb_wstr_CalcDiff provides some self documentation as to what the code is supposed to be doing, so might be good to keep the function.

this is, why i like line comments (comments added at the end of a line) so much. It doesn´t distract the eye when reading the code, you can easily ignore the text in green (or whatever color you prefer for comments) especially if it is properly aligned. But in case you don´t understand what´s going on and why, such a comment will save you a lot of time, if you must revisit this code location later on.

My memory is still working quite well, but i´m not sure, if i can recall all of i know right now about this code in let´s say in 6 or twelve month. Things, which are obvious right now, will become obscure over time ...

chars = p - src + 1;                                                //calculate the difference in "characters"

solves this problem for good.

Interspersed comments should be used for important information. Too many of them make the code hard to read. But those aligned comments on the right side don´t do any harm and are quite useful for explaining things, which are of lower priority but nevertheless useful for understanding the code flow.

Analyzing someone else´s code is always a challenge. Adding comments to places, where you finally understand, what the code does, helps preserving your effort. In other words i would very much appreciate, if you accepted adding line comments to the sources. I think many people interested in participating would benefit from more readable sources. As you can see from the work i have done, it´s not impossible to understand the sources, but in order to get the whole picture more comments and explanations would help a lot.

Please think about it,


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on May 03, 2019, 12:34:29 AM
The name alone, fb_wstr_CalcDiff, without any other comment or information is the documentation.  Any time this name appears, it can be known that the expression is taking difference of 2 wstr pointers.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 03, 2019, 02:59:48 PM
Well, just coding "p - src" would make this one-line function "fb_wstr_CalcDiff" obsolete and would additionally save us the overhead of a function call. The downside is, that the information provided by the descriptive function name is lost then too - a comment would help here.

I don´t want to argue, but i think you get my point!


Just another topic we need a decision for is STRPTR. The way i coded it requires USTRING (or any clone of it) to have the data pointer in first place of the member variables. We could just make this a requirement. But what if someone doesn´t comply to this rule? Unstable, erratic and maybe crashing code would be the (undesirable) result.

So a much cleaner solution would be to make STRPTR an overloadable operator. Then specific code for returning a WSTRING PTR for the STRPTR operator had to be supplied or a compiler error (Invalid data type) would be thrown.

What do you think?


JK


later: in the meantime i know how to do it (STRPTR as overloaded operator) - works like a charm. I really think it´s better than relying on the data´s position.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on May 04, 2019, 05:30:52 AM
> just coding "p - src" would make this one-line function "fb_wstr_CalcDiff" obsolete and would additionally save us the overhead of a function call

true, fb_wstr_CalcDiff has all the properties of a function: a name, parameters, type checking, a typed return value.  Except this one (like many of the functions in fb_unicode.h) is an inline function.  gcc will optimize fb_wstr_CalcDiff in line with code where it is used just 2 assembler instructions.

> So a much cleaner solution would be to make STRPTR an overloadable operator.

I think an overloadable operator was one of my thoughts also a few posts ago.  But, in the end though, STRPTR is a lot like a cast() as wstring ptr.  And one of the bugs from sf.net talks about allowing both cast() byref as wstring, cast() as wstring ptr, etc, all in one UDT, changing how fbc ranks string type matching to allow more user defined conversions.  Which made me think if we should have separate WSTRPTR & STRPTR, like we have separate WSTR and STR.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on May 04, 2019, 01:10:14 PM
gcc optimization - ok, i don´t know anything about it, except that it exists.

I will have my own set of sources (with comments) and write a little utility for stripping these comments before pushing a file to the repository. This way we can have both, i can have as much comments as i like and the sources are kept "clean". Merging or rebasing will be more cumbersome then for me, but this is my problem.


WSTRPTR - i don´t see an absolute need for this, because in case of an STRPTR operator it is clear, that i want a pointer and it isn´t a matter of fbc ranking, but a matter of being defined as operator or not in an UDT. On the other hand, there already are some W... functions in FB, and WSTRPTR would make clear what kind of pointer is expected. So STRPTR alone would be sufficient, but WSTRPTR would be a clearer syntax - the more i think about it, the more i like it!

Quoteto allow more user defined conversions

I think in case of STRING/WSTRING this can cause only trouble. Where is the need for returning a STRING AND a WSTRING from the same UDT? Internally it´s an either - or, either handle the data as STRING - or handle the data as WSTRING. If the data internally is a WSTRING, converting it to STRING might spoil this data, because this isn´t a lossless conversion. The other way round would work, but it could easily be done by the user too. So allowing for returning both, might be a problem, because it can cause hard to find (WSTRING to STRING conversion under the hood) errors.


Let me know, if you want to have the code changes for making STRPTR an overloadable operator. There isn´t very much to do, just a few more lines here and there. Currently i cannot just push the sources, because i tried many things in many different places, so it isn´t obvious, what is for what.


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on June 01, 2019, 10:58:08 PM
In the end, I made STRPTR(wstring) return a WSTRING PTR.  I couldn't think of any reason it should not.  To quote the fb wiki, "Note that when passed a Wstring, Operator Strptr still returns a Zstring Ptr, which may not be the desired result.".  My thought is, would it EVER be the desired result.  So I think this is a safe change.

Besides, STR(ustring), and WSTR(ustring) can still be used for specific conversions.

I feel like I have made progress over last several weekends.  I wrote a lot of tests.  The MID assignment statement was a weird one, and I spent  some time on that one.  Maybe it's bugged, I don't know, it has some peculiar behaviours.  For now I left myself a note about it.

I implemented WSTR(ustring) in a way that is close to WSTR(wstring).  When testing ustring = WSTR(ustring), I came across an issue in your implementation of the DWSTR in ustring.bi; DECLARE OPERATOR LET (BYREF pwszStr AS WSTRING PTR) clears the buffer.  And if pwszStr ptr actually points to itself, the buffer is cleared before the contents are read.  Some kind of memory-move would be better.

I think the remaining parts for me are:
- LEFT/RIGHT, which involves fixing a string related bug
- SELECT CASE, which is probably OK
- IIF, which is probably OK, with efficient logic decision
- SWAP, which I need to think about a bit.
- Parameter passing, which is probably OK, so just needs the tests written.

I think I will create a pull request for the work done so far to date.  I'll expand on next steps over at fb.net some time this weekend.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on June 02, 2019, 11:02:15 PM
Jeff,


i read your post in FB-forum (please read my post there too) first and replied there first. So what´s new is the problem with MID (could you supply code, where it shows "some peculiar behaviors"?) and that ustring = WSTR(ustring) fails. To be honest i never tested that, because such code doesn´t make much sense. On the other hand it´s not forbidden and therefore it should work! I will have a look.

José, when you read this, could you have a look too?

As far as i read your changes i like your approach better than mine. My goal was to show, that it is possible, and to make it work (somehow). Your code (no wonder) is a more logical development of the compiler.


JK

   
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on June 03, 2019, 01:11:08 AM
> So what´s new is the problem with MID (could you supply code, where it shows "some peculiar behaviors"?)

In general, the WSTRING in rtlib support is not optimized for speed.  So for most wstring functions, there is no version that will take a length parameter, and length is always calculated with a wstrlen().  Except for MID() assignment, which always assumes the buffer is large enough, if it's a pointer (not fixed length type).  Creates some situations to watch out for:

sub print_wstr( byref s as wstring )
var n = len(s)
print "    " & left( str(s) & space(30), 30);
for i as integer = 0 to n
print hex( s[i], 4 ) & " ";
next
print
end sub

scope
print "show a string"
dim w as wstring * 16 = "abcdefgh"
print_wstr w
print
end scope

scope
print "mid(wstring*n,,) overwrites null terminator"
'' initializer doesn't clear string
dim w as wstring * 16 = "Q"
print_wstr w

'' overwrites null terminator and we get garbage
mid( w, 2, 1 ) = "X"
print_wstr w
print
end scope

scope
print "mid(wstring,,) overwrites null terminator"
dim w as wstring * 16 = "R"

'' try a pointer, it doesn have fixed length limit
dim p as wstring ptr = @w
print_wstr *p

'' overwrites null terminator and we get garbage
mid( *p, 2, 1 ) = "Y"
print_wstr *p
print
end scope

scope
print "mid(wstring,) writes data beyond end of string"
dim t as wstring * 16
dim w as wstring * 16 = "0123456789abcde"
print_wstr w

dim p as wstring ptr = @w
mid( *p, 16 ) = "qrst"

print_wstr w
print_wstr t
print
end scope


The rtlib needs work with new functions to handle string length parameters.  Which should give better speed and allow for embedded null characters.  Both of which is needed for the eventual addition of builtin dyanmic wstring. 
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on June 03, 2019, 01:15:50 AM
> ustring = WSTR(ustring) fails

In one of your tests (tests/ustring/wstr.bas) you have:

      dim w as wstring * 50 = wchr( 1234 )
      dim u as ustring = wchr( 1234 )
      w = wstr(w)
      u = wstr(u)


In your original fbc ustring code, rtlToWstr() doesn't do anything if the argument is a "USTRING".  So it just resolves to u = u and so it calls a let operator that does check buffer address:

PRIVATE OPERATOR DWSTR.Let (BYREF cws AS DWSTR)
  IF m_pBuffer = cws.m_pBuffer THEN EXIT OPERATOR   ' // Ignore cws = cws
  this.Clear
  this.Add(cws)
END OPERATOR


In my implementation of WSTR(udt), it actually does the conversion, and so calls:

PRIVATE OPERATOR DWSTR.Let (BYREF pwszStr AS WSTRING PTR)
  this.Clear
  IF pwszStr = 0 THEN EXIT OPERATOR
  this.Add(*pwszStr)
END OPERATOR

if pwszStr points to m_pBuffer, (or first couple of bytes, looking at Clear), So the string just gets erased before the data is copied.
Title: Re: FreeBASIC CWstr
Post by: José Roca on June 03, 2019, 01:34:49 PM
Quote
The rtlib needs work with new functions to handle string length parameters.  Which should give better speed and allow for embedded null characters.  Both of which is needed for the eventual addition of builtin dyanmic wstring.

You seem to be thinking about BSTRings, that carry its length with them. They're a different data type. FreeBasic's WSTRING is a null terminated unicode string (maybe it should have been named ZWSTRING or WSTRINGZ to avoid confussions); therefore, they can't have embedded nulls.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on June 03, 2019, 10:52:54 PM
Jeff,

so, if you  add this line:
IF m_pBuffer = cast(ubyte ptr, pwszStr) THEN EXIT OPERATOR   'Ignore self-assign
to:

PRIVATE OPERATOR DWSTR.Let (BYREF pwszStr AS WSTRING PTR)
  IF m_pBuffer = cast(ubyte ptr, pwszStr) THEN EXIT OPERATOR   'Ignore self-assign

  this.Clear
  IF pwszStr = 0 THEN EXIT OPERATOR
  this.Add(*pwszStr)
END OPERATOR


it should work again. Could you please test?


JK
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on June 15, 2019, 03:13:34 PM
Hi José,

I've thought quite a bit about the WSTRING meaning, and in hindsight would have made much more sense to name it ZWSTRING for symmetry within fbc's string types.  I have a plan that would rename wstring to zwstring over 2 releases, but it breaks everything to do with wstring, so pretty sure I will get every user angry at me.  Not impossible to implement, but seems impossible to justify, so I think we must live with it.  Maybe in fbc 2.0 I can break everything. :)

I wasn't specifically thinking of embedded nulls in Z|WSTRING when looking at length parameters; only that embedded nulls may be a potential (desired) side effect when combined with a UDT that extends Z|WSTING.  i.e. the rtlib STRING handling functions are same for ZSTRINGS and STRINGS, the difference being where it gets the "length" of string, either from data passed (as in a string descriptor), or from always performing a strlen() call.  I was thinking mostly in terms of speed/performance, that if the length is known, like in a UDT that extends Z|WSTRING and stores length data, that it would be preferable to use the stored length rather than always calling w|strlen(), especially for large strings.

But that's all just implementation in rtlib, not really what the user sees.  I agree, from the user's point of view WSTRING should always be considered zero terminated string and nothing else, because that's what we are advertising a WSTRING to be.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on June 15, 2019, 03:17:10 PM
JK, I added similar logic to ignore a self-assign, and tested, so I would say "it works".  Keep in mind though, depends on what the actual implementation of "this.clear" does; if an extra null terminator is written (double null terminated) to position [1], or memory is released (free'd) instead of writing a single null at
Title: Re: FreeBASIC CWstr
Post by: José Roca on June 15, 2019, 06:51:02 PM
Quote
I was thinking mostly in terms of speed/performance, that if the length is known, like in a UDT that extends Z|WSTRING and stores length data, that it would be preferable to use the stored length rather than always calling w|strlen(), especially for large strings.

That is what I do in my CWSTR class:


PRIVATE OPERATOR LEN (BYREF cws AS CWSTR) AS UINT
   OPERATOR = cws.m_BufferLen \ 2
END OPERATOR

Title: Re: FreeBASIC CWstr
Post by: Charles Pegge on June 16, 2019, 12:29:54 PM
Hi Jeff,

I've been quietly following this topic :)

String types are hard to nail down but adopt the 'w' prefix seems to provide the most logical naming scheme:

char wchar
zstring wzstring
string wstring
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on June 17, 2019, 04:01:15 PM
José,


i just wanted you to be aware of this...

Yesterday Jeff merged the compiler changes for ... EXTENDS WSTRING. Today i made a pull request adding ustring.bi and some USTRING specific tests. Ustring.bi still makes your CWstr the default for USTRINGs.

Do you want this ?

If your answer is yes, you may have to adapt your code in some places to be compatible with Jeff´s code (add "EXTENDS WSTRING", prevent self assignment for WSTRING PTR, remove the "CONST" specifier somewhere). There is no need for "m_pBuffer" to be in first place anymore, this one of the weaknesses of my code Jeff fixed.

"**" still works, but there shouldn´t be a need for it anymore.  I have a working USTRING test version for all statements dealing with files and paths (OPEN, DIR, etc.), but it will take me some time to rewrite it a bit to be compatible and restructure it for a pull request. This will be one of the next steps.

For comparison i attach the current version of ustring.bi here


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on June 17, 2019, 07:24:09 PM
Thanks very much. I will download the latest build to test.

The much hated ** workaround is the best solution that I could find after trying many others. It was not the ideal, but it worked very well. I'm very glad that Jeff has taken the trouble of modifying the compiler to allow better integration.

I will keep it because, otherwise, I will break all of the current Paul Squires' code, and also my framework.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on June 18, 2019, 11:55:23 AM
QuoteThe much hated ** workaround
... was a very clever thing to code! It isn´t needed anymore now, but it still works. So no need to change any code.

The main question is, do you want your CWSTR class to be the default USTRING in Windows? Now is the time for such a decision!

If so, you will have to make it compliant to the new EXTENDS WSTRING feature. The necessary changes shouldn´t break anything. I didn´t try Paul´s code with Jeff´s version yet, but i did try with my compiler version. Removing all "**" from Paul´s code worked, it compiled, and as far as i can tell, WINFBE worked as usual. So i don´t expect major problems here either. But as "**" still works, Paul wouldn´t have to change anything.

Jeff´s new version passed all tests i wrote for own purposes when developing my version. So there are already two persons digging deeply into that matter, who cannot find any bugs anymore.


JK 
Title: Re: FreeBASIC CWstr
Post by: José Roca on June 18, 2019, 07:14:15 PM
> prevent self assignment for WSTRING PTR

Where?

I already have:


' ========================================================================================
PRIVATE OPERATOR CWstr.Let (BYREF cws AS CWSTR)
   CWSTR_DP("CWSTR LET CWSTR - m_pBuffer = " & .WSTR(m_pBuffer) & " - IN buffer = " & .WSTR(cws.m_pBuffer))
   IF m_pBuffer = cws.m_pBuffer THEN EXIT OPERATOR   ' // Ignore cws = cws
   this.Clear
   this.Add(cws)
END OPERATOR
' ========================================================================================



> remove the "CONST" specifier somewhere)

Why?

So far, the only changes that I have needed to do are:


#if __FB_VERSION__ < "1.07.0"
TYPE CWSTR
#else
TYPE CWSTR EXTENDS WSTRING
#endif


And remove a wrong cast in the functions AfxBase64EncodeW and AfxBaseDecodeW.
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on June 18, 2019, 08:37:34 PM
self assign is possible with WSTRING PTR too:

PRIVATE OPERATOR DWSTR.Let (BYREF pwszStr AS WSTRING PTR)
  IF m_pBuffer = cast(ubyte ptr, pwszStr) THEN EXIT OPERATOR              'ignore self assign
  this.Clear
  IF pwszStr = 0 THEN EXIT OPERATOR
  this.Add(*pwszStr)
END OPERATOR


see here http://www.jose.it-berater.org/smfforum/index.php?topic=5253.msg23916#msg23916 (http://www.jose.it-berater.org/smfforum/index.php?topic=5253.msg23916#msg23916)



and i removed CONST here:

    DECLARE OPERATOR CAST () BYREF AS WSTRING
PRIVATE OPERATOR DWSTR.CAST () BYREF AS WSTRING       'returns the string data (same as **).
  OPERATOR = *cast(WSTRING PTR, m_pBuffer)
END OPERATOR


see here http://www.jose.it-berater.org/smfforum/index.php?topic=5253.msg23876#msg23876 (http://www.jose.it-berater.org/smfforum/index.php?topic=5253.msg23876#msg23876)

I kept getting compiler errors, removing the CONST qualifier solved the problem. I´m sure Jeff has mentioned it too, but i cannot find it right now.


JK

Title: Re: FreeBASIC CWstr
Post by: José Roca on June 18, 2019, 08:57:22 PM
I'm not getting any error. Even Jeff is using byref as const wstring in https://www.freebasic.net/forum/viewtopic.php?f=17&p=261843#p261830
Title: Re: FreeBASIC CWstr
Post by: Juergen Kuehlwein on June 18, 2019, 09:15:32 PM
please try the following code:

dim u as ustring = wchr( 1234 )
      u = wstr(u)
      print u


and see what happens, fixing self asginment helps.


this one gives me a compiler error (with CONST):

      dim w1 as wstring * 50 = wspace(5) & "asdfghjklmnop"
      dim u1 as ustring = w1
      dim w  as Wstring * 50 = wspace(25)
      dim u  as ustring      = wspace(25)
      lset w, w1
      lset u, u1
      print u
      print w



without CONST it works properly


JK
Title: Re: FreeBASIC CWstr
Post by: José Roca on June 18, 2019, 10:09:45 PM
Ok. I have made the changes.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on June 30, 2019, 09:17:53 PM
Quote from: Charles Pegge on June 16, 2019, 12:29:54 PM
char wchar
zstring wzstring
string wstring


Agreed, hindsight is 20/20.  Except, I think what we have available in fbc is:

null terminated => zstring & wstring
var-len => string & ??string


It's possible to change current wstring meaning to be named wzstring, as in it is technically possible.  But, it would break so much user source code, I would be afraid that users would find my house and burn it down. ;)  So, I think we need to find a different name for "var-len wstring" to reserve, and live with the asymmetry of the type naming, forever.
Title: Re: FreeBASIC CWstr
Post by: Jeff Marshall on June 30, 2019, 09:42:03 PM
Quote from: Juergen Kuehlwein on June 18, 2019, 09:15:32 PM
this one gives me a compiler error (with CONST):

      dim w1 as wstring * 50 = wspace(5) & "asdfghjklmnop"
      dim u1 as ustring = w1
      dim w  as Wstring * 50 = wspace(25)
      dim u  as ustring      = wspace(25)
      lset w, w1
      lset u, u1
      print u
      print w


Yeah, just choose whatever is suitable for your usage.

declare operator cast() byref as CONST wstring
- fbc should throw an error if used with lset, rset, swap, mid statement, or passed to a non-const parameter in a procedure.
- this is useful if the supporting class has it's own rules, for example ensuring that the string is double-null-terminated, length member is set, etc.
- because, writing to the raw wstring data bypasses any logic in the class let operators or constructors

declare operator cast() byref as wstring
- this should allow the raw wstring data to be modified without any warning or error, expecting that the user is aware that any special logic in the class operators is bypassed.
Title: Re: FreeBASIC CWstr
Post by: José Roca on June 30, 2019, 10:22:53 PM
In my string functions in AfxStr.inc, I have added CONST to all the wstring parameters (only to avoid compiler errors if the user passes a constant string/wstring), except one in AfxStrPathName (BYREF wszFileSpec AS WSTRING), because although the function does not modify the passed parameter, it gives me an error when I try to assign it to an instance of CWSTR (cws = wszFileSpec).