• Welcome to Jose's Read Only Forum 2023.
 

FreeBASIC CWstr

Started by Juergen Kuehlwein, April 09, 2018, 11:39:00 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

José Roca


DIM cws AS CWSTR = "Сергей Сергеевич Прокофьев"
AfxMsg MID(cws, 2)


prints several "?"


AfxMsg MID(**cws, 2)


works fine.


Juergen Kuehlwein

My bad,

i forgot to mention something really important: you must add "#PRAGMA DWS" before the first "#INCLUDE ..." line. This is a switch i added for dynamic wide strings, without it the compiler works as usual


Sorry,

JK

Marc Pons

#107
Juergen

what class did you take into account CWSTR or DWSTR ?

with DWSTR, if i test with your modified fbc:
Right / Left  native functions   without my own overloaded functions i still need to dereferrence
trim  and all variation still need to dereferrence

mid seems to work correctly without dereferrencing

'compile with console to view the information

' windows_test_1_dwstr.bas : to test under windows the DWSTR (dynamic Wstring) class

'########################################################################
'this test code assume you are using a system codepage : 1252
' the literal inputs are dependant of that codepage,
' except utf8 inputs which are codepage independant
'########################################################################
#PRAGMA DWS
#DEFINE UNICODE                                  ' needed to messagebox only : to use wstring not string


#INCLUDE ONCE "DWSTR.inc"                        'DWSTR class

#Include Once "crt/time.bi"                      'just to measure the speed


scope                                            'interesting to check the destructor action on debugg mode

   print : print "testing  last  DWSTR.inc  "

DIM cws AS DWSTR = Dw_Wstr( "   Êàêèå-òî êðàêîçÿáðû   " , 1251)

messagebox(0, cws, "test  len =" & len(cws), 0)
messagebox(0, mid(cws,5), "mid(cws,5)   len =" & len(mid(cws,5)), 0)

cws = trim(cws)
messagebox(0, cws, "test  len =" & len(cws), 0)
messagebox(0, mid(cws,5), "mid(cws,5)   len =" & len(mid(cws,5)), 0)

dim dw1 as dwstr = dw_string(23 , &h1D11E)
   messagebox(0 , dw1 & wstr( "   dw_Len ") & wstr(Dw_Len(dw1)) , "test capacity " & wstr(dw1.capacity) , 0)

messagebox(0 , dw1 & wstr( "   SurPair ") & wstr(dw1.Sur_Count) , "test capacity " & wstr(dw1.capacity) , 0)
   dw1.replace( "_it's a test of replacing text " , 21)
   messagebox(0 , dw1 & wstr( "   dw_Len ") & wstr(Dw_Len(dw1)) , "capacity " & wstr(dw1.capacity) , 0)
messagebox(0 , "Sur_Count = " & dw1.Sur_Count & "  nb of surrogate pair " , "len =" & len(dw1) & "  dw_len =" & Dw_Len(dw1), 0)


dim dw4 as dwstr = mid(dw1, 9, 4)
messagebox(0 , ">" & dw4  & "<    len = " & len(dw4) & "    dw_len = " & dw_len(dw4), "test mid as wstring only" , 0)

DIM bs2 AS dwstr = dw_wstr( "Ð"ми́Ñ,рий Ð"ми́Ñ,риевич" , CP_UTF8)
   messagebox 0 , bs2 , "test dw_wstr CP_UTF8" , MB_OK
   messagebox 0 , mid(bs2,5) , "test mid" , MB_OK
   dim z1                as double
   dim z2                as double
dim n as long = 1000000
   PRINT : PRINT
   print : print "Press key to continue !"
   sleep

   dim         as string st1
   dim as string sText = "verif : "



   dim         as DWSTR uws, uws0,uws1,uws2
   
   dim as DWSTR uwsText = "verif : "

   dim x                 as long
   print : print
   print "=========================================="
   print "   Comparaison DWSTR Solutions : concatenation"
   print "==========================================" : print

   

   z1 = clock()
   for x = 1 to n
      st1 += "Line " & n & ", Column " & n & ": " & sText '& sText2
   NEXT
   z2 = clock()
   print : print "STRING using  &" : print right(st1, 38) + "   time = " + str(z2 - z1) + " ms   len = " & len(st1): print
print mid(st1, 37999962) + "   mid ": print
   print "==========================================" : print
   z1 = clock()
   for x = 1 to n
      uws += "Line " + WSTR(n) + ", Column " + WSTR(n) + ": " + *uwsText
   NEXT
   z2 = clock()
   print : print "DWSTR dereferenced  using + " : print right(uws,38) + "   time = " + str(z2 - z1) + " ms   len = " & len(uws): print
print mid(uws, 37999962) + "   mid ": print

z1 = clock()
   for x = 1 to n
      uws0 += "Line " + WSTR(n) + ", Column " + WSTR(n) + ": " & *uwsText
   NEXT
   z2 = clock()
   print : print "DWSTR dereferenced  using & " : print right(uws0,38) + "   time = " + str(z2 - z1) + " ms   len = " & len(uws0): print
print mid(uws0, 37999962) + "   mid ": print

   z1 = clock()
   for x = 1 to n
      uws1 += "Line " + WSTR(n) + ", Column " + WSTR(n) + ": " + uwsText
   NEXT
   z2 = clock()
   print : print "DWSTR not dereferenced  using + " : print right(uws1,38) + "   time = " + str(z2 - z1) + " ms   len = " & len(uws1): print
print mid(uws1, 37999962) + "   mid ": print

   z1 = clock()
   for x = 1 to n
      uws2 += "Line " + WSTR(n) + ", Column " + WSTR(n) + ": " & uwsText
   NEXT
   z2 = clock()
   print : print "DWSTR not dereferenced  using &  new overloaded operator " : print right(uws2, 38) + "   time = " + str(z2 - z1) + " ms   len = " & len(uws2): print
   print mid(uws2, 37999962) + "   mid " : print : print


   
   print : print

end scope


print : print "Press key to finish !"
sleep


could you post somewhere (better github) your evolution of fbc?

Juergen Kuehlwein

Marc, José,


currently it accounts for "CWSTR" , "CBSTR", "DWSTR" and "JK_CWSTR", so it should work with José´s WINFBX, with Marc´s DWSTR and my IDE.

LEFT, RIGHT and VAL(INT, etc.) can be fixed with overloaded functions, so no need for a fix inside the compiler.

I changed processing of "SELECT CASE", "MID" (statement and function), "INSTR(REV)", "LSET/RSET" and all converting functions (e.g "CINT") to work without prepending "*". I may have made a mistake with "TRIM" - that´s, what tests are for.

Please make further tests and help finding other possible problems. I didn´t touch "STRPTR", but in my view it would make sense, if it worked just like for STRING returning a WSTRING PTR to the string data for a dynamic wide string.

I´m going to investigate what´s wrong with "TRIM"


JK

Marc Pons

Juergen

Quote from: Juergen Kuehlwein on December 05, 2018, 04:26:25 PM
I didn´t touch "STRPTR", but in my view it would make sense, if it worked just like for STRING returning a WSTRING PTR to the string data for a dynamic wide string.

I've already overloaded strptr to return a WSTRING PTR of the data buffer

and overloaded the operator  &   for concatenation with DWSTR; string ; numeric val   

one question why do you create the pragma switch ?  to isolate code for tests?
in my opinion, if that feature is implemented officially into compiler, better not have that kind of switch

Marc Pons

Juergen

WSTR function needs also prepend * , can not be overloaded as simple as left/right
better to act at compiler level too

Juergen Kuehlwein

Marc,


Quoteone question why do you create the pragma switch ?  to isolate code for tests?


Yes - it will be removed later on. Currently all of my added code is enclosed by an "IF" .... "END IF" clause testing for that pragma. I´m far from understanding everything the compiler does, so i thought it would be good idea having an opportunity for testing with my code set to be active and vice versa, when problems occur.

I think i fixed the "TRIM" bug. Why would you want to code "WSTR(<wide string>)", which returns itself ? Could you please post code where it fails.


attached is what i currently have


JK



 

Marc Pons

#112
Juergen,
I've just tested your last compiler version

Trim, Ltrim, Rtrim  are ok now

why adapt wstr ?
to be homogeneous   wstr already does a "conversion" from wstring
QuoteDeclare Function WStr ( ByVal str As Const WString Ptr ) As WString
with DWSTR variables  uws2 , uwsText
it is faster doing uws2 &= "Line " & WSTR(n) & ", Column " & WSTR(n) & ": " & WSTR(uwsText)
compare to uws2 &= "Line " & WSTR(n) & ", Column " & WSTR(n) & ": " & uwsText

I"ve tested with your last evolution and seems to work !   did you modify it, on your last version?  still not working in fact (without error message, so risky)

and again could you post your source code compiler  somewhere?

Juergen Kuehlwein

I think i fixed STR and WSTR as well now.

Attached is what i currently have: source code + 32 bit fbc.exe. Look at "rtl-string.bas", most changes are there. You may search the files for " JK " to find all changes applied, i added comments starting with " JK " to every new section or line.

Apart from adding a new pragma and a new compiler option for test purposes the basic principle is almost the same everywhere: look, if the current expression is a dynamic wide string (UDT + type = JK_CWSTR or CWSTR or CBSTR or DWSTR) -> jump to where WSTRINGs are processed.


I talked with coderjeff at the FreeBASIC forum about this topic and he wanted me to push my branch to the GitHub repository. I´m going to do this , if we are done with testing


JK

Marc Pons

Juergen,
I've tested and wstr and str are working now!

good job.

QuoteI talked with coderjeff at the FreeBASIC forum about this topic and he wanted me to push my branch to the GitHub repository. I´m going to do this , if we are done with testing

i think i would be better to commit into master branch , if you make your separate branch it will not be included as normal standard evolution.
it was what i've done with my proposed __FB_GUI__ switch wich is now included , check with coderjeff.



Juergen Kuehlwein

#115
@Marc,

thanks - has your your version (DWSTR) been tested with LINUX and other targets too ? Or should there tests still be made ? The compiler definitely must be tested with LINUX and others, but i cannot do that, who could help ? Maybe i should start a new thread at the FreeBASIC forum  - or should i go on with yours (https://www.freebasic.net/forum/viewtopic.php?f=17&t=24070&hilit=marpon&start=45) ?


@José,

were your test successful now as well ? Should we ask Paul about "USTRING" (see below) too, so working together we could establish a "quasi standard" for dynamic wide strings ? What do you think ?


@both

In order to avoid confusion i think the new dynamic wide string type should be named "USTRING" (This is in a row with ZSTRING and WSTRING) There is no need to actually rename it in your, José´s or my code. IMHO a #DEFINE is the best solution, so the different versions can exist side by side and it´s only a matter of a different #DEFINE to switch between them.

But we should work on a common include file, which should be added as a universal (Windows/Linux, etc) include file (.bi) to the official distribution. Maybe there should some of the "missing" (PB point of view) string handling functions be added.



JK


PS: and maybe we should supply test code for testing on other targets, which actually runs all kinds of tests, where USTRINGs had known problems in the past, and prints an error message, if it finds an error.

Jeff Marshall

Hi everyone.  My name is Jeff, one of the developers on freebasic compiler.  Juergen had contacted me was asking some questions about this.  I only played around with the dwstring.bi class that José posted on fb.net, but I imagine the other classes you are discussing have similar quirks.  Hopefully I can help with this a little, and I thought this would be best place.

José, nice work on the string classes.

Juergen, I saw the modified source code you made for the compiler and I understand what you are attempting to do.  Very good effort, it takes dedication to find the way through a big & hairy program like fbc to get to a place you can make a change that does something you want.  Sincerely, I encourage you to keep at it.  The actual issue is earlier on in the translation, and hopefully I can explain.

Take this type for example:

type T
__ as integer
declare operator cast () byref as wstring
declare operator cast () as wstring ptr
declare operator cast () as string
end type

sub proc overload( byref w as wstring )
end sub

sub proc( byval w as wstring ptr )
end sub

sub proc( byref s as string )
end sub

dim x as T
proc( x )  '' error: ambiguous


type T represents kind of what's happening in the compiler.  We have string-like type that can be automatically converted to any other string-like type, and then call a function to work on it.

When it comes to implicit UDT conversion, fbc looks at all the possible matches (CAST operator) and ranks them from best to worst.  With an exact match of data type & constness being the best score.  However, as in this example, fbc doesn't know how to decide what the best match is because there is nothing to indicate what the preferred conversion & call should be. 

For the built-in string types, this decision is hard coded in to the logic, choosing the best string type, conversion, and function to call and ignoring the normal rules for overloaded functions & operators.

So here's what I was thinking, that a TYPE could be marked in source code (with a #pragma or some special syntax) to indicate that different rules should be used for overload resolution.  It could be as simple as the first declared CAST operator is the best choice when fbc has to automatically convert the type.  So this change would then be a more general feature applied to any type, not just wstrings, to give better control to the programmer over implicit casting (automatically done by compiler).




Juergen Kuehlwein

Hi Jeff,


great to have you here in this thread. I´m going to be quite exhaustive in this post just to be sure we all are talking about the same thing...


So the changes to the compiler are correct, but you see possible problems arising form multiple CAST operators. The type uses two different Cast operators in two slightly different versions:

José s Code
PRIVATE OPERATOR CWstr.CAST () AS ANY PTR
   OPERATOR = cast(ANY PTR, m_pBuffer)
END OPERATOR
' ===========================================================
' ===========================================================
' Returns the string data (same as **).
' ===========================================================
PRIVATE OPERATOR CWstr.CAST () BYREF AS WSTRING
   OPERATOR = *cast(WSTRING PTR, m_pBuffer)
END OPERATOR     


Marc´s Code
'============================================================
' Cast implicitly DWSTR to different types.
' ============================================================
PRIVATE OPERATOR DWSTR.CAST() BYREF AS WSTRING
RETURN * m_pBuffer
END OPERATOR
' ============================================================
' ============================================================
PRIVATE OPERATOR DWSTR.CAST() AS ANY PTR
RETURN cast(ANY PTR , m_pBuffer)
END OPERATOR


to my understanding though internally using different definitions both return the same: an "any ptr" and a "byref wstring". So what is exposed to the compiler should be the same. Do you agree ?


A quick test shows that "any ptr" is necessary, because without it, it compiles, but the linker complains. Casting it only "as wstring" is not working.


Typically i add something like this to the compiler´s code in appropriate places;

'*************************************************************************
' JK - check for dws
'*************************************************************************
      if env.clopt.dws then
dim jk__zz as zstring ptr

        if (dtype = FB_DATATYPE_STRUCT) then
          jk__zz = nd_text->subtype->id.name

          if (*jk__zz = "JK_CWSTR") or (*jk__zz = "CWSTR") or (*jk__zz = "CBSTR") or (*jk__zz = "DWSTR")then
            goto do_ustring     
          end if
        end if
      end if



I added #pragma dws", which sets compiler option "dws" when and as soon as found. All new code i added for getting rid of "**" is enclosed by an "IF ... END IF" clause. If "#pragma dws" is not present in the code to compile, my changes don´t become active. Initially this was a meant to be a switch for testing the compiler with and without my code, in case there were problems with the re-compiled compiler - fortunately there weren´t any. In fact this pragma is not necessary for what i want to do, but it could be re-used for other things or be removed entirely.


I think, i understand the problem (for the compiler) you describe with multiple cast operators. When working on additional string handling functions, which should work for the new type(s) and the existing ones as well i got this error many times. But i was able to avoid it by adapting my code. And using all of this for quite some time, i didn´t experience errors (ambiguous ...), which couldn´t be fixed by tweaking the code. So i always thought, it´s my bad coding (being not that experienced in FreeBASIC), rather than a possible compiler problem. The bottomline is: i understand the problem you describe, but for me personally it didn´t occur (as of now) - the compiler seems to do it right.

It would help to have an example of failing code coming from this problem! 


I didn´t have a look at how the compiler resolves overloaded functions or operators. If there really is a problem with the new type, it must be fixed. But IMHO making this a general thing for all types is not a good idea. I would rather be notified about a problem (ambiguous ...), so i´m forced to fix it exactly the way i want it to work, instead of the compiler making "guesses" (even if these follow rules), which might result in malfunctioning code under the hood. A pragma would switch this feature on or off for all types in use (which as described above i would avoid). They only thing that would make sense to me, is having a new keyword like "DEFAULT" to mark a function or operator as the default one to take, if the compiler cannot resolve it, e.g:

PRIVATE OPERATOR (OVERLOAD) DEFAULT DWSTR.CAST() BYREF AS WSTRING

This gives individual control to the coder without making (maybe unwanted, because the coder isn´t aware of the ambiguity at all) guesses. If someone decides to code "DEFAULT", then he must be aware of it and then it is his responsibility.


Thanks for discussing all of this with us


JK



 





Jeff Marshall

Hey Juergen,

I think we are talking about the same thing as an end result.  Though, I think we might be worlds apart in understanding on how to get there. But, you genuinely seem to have an enthusiasm for improvement and I think we both want to see a solution here.

I have read this topic completely and thoroughly from the beginning, so I have an idea of what I'm in for =).  I hope you don't mind I make multiple posts here, and I will try to answer whatever questions you raise.

The reference implementation I have been working with, I just pushed to https://github.com/jayrm/dwstring .  I wrote that around Oct 2017.  It is not fast: memory functions are hand-made but could be optimized with platform specific calls by replacing WSTRING_ALLOC, WSTRING_FREE, WSTRING_MOVEN, WSTRING_COPYN.  The guts of it probably look a lot like Jose's dwstring.bi, with the major difference I had CONST types in mind when I wrote it.  It will have similar issues with fbc's builtin functions llike LEFT & UCASE, etc.  I stopped working on it at the time because 1) I wanted to rewrite fbc's test suite, and 2) STRING/WSTRING handling within fbc is inconsistent and that needs to be fixed.

What you want, for any implementation is a test-suite that proves all the capabilities, in a way that is independent of the thing you are testing.  Something automated that does not rely on you inspecting (visually) the output of a test program.  Either pass or fail.  You can see this in https://github.com/jayrm/dwstring/blob/master/tests.bas with the hCheckString() macro.

The challenge with a dynamic wstring type, is that there is some support for wstring already built in to fbc, but not everything, and it is currently not consistent.  If we were implementing some other kind of type, say dynamic UTF8 where there are no built in fbc functions to use, I think the actual issues might be more obvious.

I hope you don't mind I make multiple posts.  I will try to work through your last post.  Thanks.


Jeff Marshall

Quote from: Juergen Kuehlwein on December 08, 2018, 01:14:56 PM
So the changes to the compiler are correct, but you see possible problems arising form multiple CAST operators. The type uses two different Cast operators in two slightly different versions:

Not really, sorry.  You are in the correct place for the fbc compiler code you want to influence, but not in the correct way.  Parts of rtl-string.bas haven't been changed since year 2006, and looks like has never been updated to work with UDT's.

Specifically, for example from rtlStrLTrim()

if( dtype <> FB_DATATYPE_WCHAR ) then
f = PROCLOOKUP( STRLTRIM )
else
f = PROCLOOKUP( WSTRLTRIM )
end if

Is basically saying, if fbc didn't get a WSTRING here, then assume it's a STRING.  Most of the built in fbc string functions work this way, and it's all fine until you throw UDT's at them.  So I would say that's a bug in how fbc handles a UDT with a CAST as WSTRING PTR with some built in string functions.  Again, this part of fbc compiler code hasn't been touched in a decade.

In comparison, LEFT & RIGHT are just overloaded functions, and actually have better integration with UDT's, though you do have to overload the function to let them work.

What we really want, is that fbc knows we want to use use the WSTRING version of LTRIM long before we ever get to rtlStrLTrim.  Which means that the issue is earlier on in the translation.