• Welcome to Jose's Read Only Forum 2023.
 

FreeBASIC CWstr

Started by Juergen Kuehlwein, April 09, 2018, 11:39:00 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Jeff Marshall

Yes, it looks like it only has to do with line endings.

One part of the problem is how fbc project is configured compared to how your system is configured.  And maybe, you are using a .gitattributes file locally to try and manage this CRLF/LF problem.  Not sure.  Regardless, it's the line endings that are causing the grief.

The other part, is that last summer, I applied a patch from SARG and still got mixed line endings stored to the repo.  Some of the changed files you are seeing are all the files that got merged from pull request 96.  https://github.com/freebasic/fbc/pull/96 (like emit.bi edbg_stab.bas).  Where we are just now seeing the LF -> CRLF conversion.

So,

We do want every file we get from freebasic/fbc to have CRLF line endings on windows, and LF on linux.

From fbc wiki https://www.freebasic.net/wiki/wikka.php?wakka=DevGettingTheSourceCode
: The recommended setting for core.autocrlf is true, so that the FB source code in the working tree will have CRLF line endings.

Locally, for a single repo, looks like setting autocrlf can be stored in ./git/config file.  (git config --local).  Need to make sure that works like I think it should.

From this site: https://ywjheart.wordpress.com/2017/03/22/autocrlf/
: use core.autocrlf = true if you plan to use this project under Unix as well

Also, if the setting autocrlf has been changed, or in .gitattributes, then the changes may not get seen right away
Info in https://help.github.com/articles/dealing-with-line-endings/

heh, I'm trying.  Also, the development tree is shared by many people, so I keep most scripts, batch files, IDE files, tools, etc, outside of the fbc directory.

Juergen Kuehlwein

Well, you can set autocrlf systemwide, global and local. All of these settings end up in a config file in the corresponding location. As i understand it setting autocrlf to true doesn´t affect .bas/.bi files, it´s for .txt files (at least for me it doesn´t convert -bas/.bi files at all - i tried it). Setting it to false removes all .txt files from the list of files GIT reports to have been modified. So i don´t have to deal with them anymore.

My IDE cannot work with LF line endings (which btw isn´t Windows standard), so i must convert them to CRLF. I tried .gitattribute, but you told me not to use it. Now i wrote a little utility doing that for me (all files in src\compile and all new files i added). Copying these files (with CRLF line endings) to Linux (Ubuntu64 on a virtual machine), didn´t cause any problems there, neither did the test files.

Which files must have LF line endings ? Most files in src\compiler already have CRLF line endings anyway, it´s only a few, which i must convert. If i had a list of them, i could write an (expandable) utility for converting them from CRLF to LF (and vice versa) as needed. A simple .txt file for listing directories or single files could supply input and the utility would convert these files for Windows or Linux. What do you think ?


JK

Jeff Marshall

core.autocrlf affects all TEXT files.  Which could be ANY filename.ext, that does not have binary content.  With correct autocrlf setting, and no extra .gitignore or .gitattributes, I don't think you should have to write scripts to deal with line endings.

Easiest way to see the difference is do a new clone & checkout, forcing the autocrlf setting for comparison.

Clone & checkout into fb.0 with whatever line ending was stored in the repo (should be all LF):
git -c core.autocrlf=false  clone https://github.com/freebasic/fbc fb.0

Clone & checkout into fb.1, converting all TEXT files to CRLF line endings:
git -c core.autocrlf=true clone https://github.com/freebasic/fbc fb.1

autocrlf setting is also important when making commits back to fbc repo.
When committing to from windows, autocrlf=true (CRLF's get converted back to LF's)
When committing to from linux, autocrlf=false (line endings get stored as-is which should already have been LF)

The only files that must be LF are
./manual/cache/*.wakka (raw data files from wiki)
./tests/file/big-list-eol-lf.txt (a specific test that needs LF on all targets)

Juergen Kuehlwein

QuoteEasiest way to see the difference is do a new clone & checkout, forcing the autocrlf setting for comparison.
that´s exactly what i did! Regardless of the autocrlf setting, i always have some files with LF line endings after cloning in src\compiler - i spend days trying to get GIT do what i want (CRLF as line endings right after cloning), it just didn´t work. Finally i gave up on it and it took me ten minutes to write own code fixing the issue.

QuoteThe only files that must be LF are
./manual/cache/*.wakka (raw data files from wiki)
./tests/file/big-list-eol-lf.txt (a specific test that needs LF on all targets)
I didn´t touch any of these files, nor were any of them part of a commit, so there should be no problem.

What about just testing and evaluating my changes and additions, while i on my part work on a solution to preserve line endings just as they are in the master branch? Or why not have line endings for the source code consistent - either all LF or all CRLF, instead of having some with LF and having others with CRLF?

Or add a file e.g "rules.txt" specifying rules like naming conventions, required line endings, exceptions from the rule and the like for the root folders of the repo. This would make things clear once and for good.


JK

Juergen Kuehlwein

#184
Re-reading the info in https://help.github.com/articles/dealing-with-line-endings/, what about a .gitattributes file in the master branch? Properly configured this would solve the problems we have for good too, because .gitattributes did work for me - but you advised me not to use it. I understand now that my implementation was improper for general use, even if it worked for me.

But it should be possible to setup a generic .gitattributes file. Because of the fact that it overrides other GIT settings, it would make clones independent of individual user settings in this repect and ensure proper line endings everywhere.

Please, think about it.


JK

 

Juergen Kuehlwein

Jeff,

i ran further tests showing that .gitattributes could really be the solution. As already mentioned above setting autocrlf to true doesn´t convert all .bas/.bi files properly to crlf - emit.bi and others keep failing!

Adding a .gitattributes at root level like this

# Auto detect text files and perform LF normalization
* text=auto
# BASIC source files (need cr/lf line endings)
*.bas text
*.bi text
*.inc text
*.h text
*.bat text
*.txt text
*.jkp text
*.rc text

*.gif binary
*.jpg binary
*.png binary

./manual/cache/*.wakka eol=lf
./tests/file/big-list-eol-lf.txt eol=lf


does the trick. Astonishingly enough all other problems i had with .txt and .sh files seem to be gone as well.

You may have to adapt and refine what i posted here, but in my opinion this the way to go in order to avoid the mess we currently have. It does what i already proposed above, it allows for setting rules for line endings per file type, it s independent of user settings because it overrides them. That is, it forces consistent line endings for the corresponding OS. On the contrary to autocrlf it works - and you can control it.


JK

Jeff Marshall

Hey JK, yes, I've seen all your updates.  It's only been a few days, dude!  Please allow me some time to review and respond.  I really want to get the 1.06 release out first.  Then we can be more liberal about what gets merged in to fbc/master.  Just after a release is a good time to add new features.

Juergen Kuehlwein

QuoteI really want to get the 1.06 release out first

Yes, do that! (and then let´s go on ... :-))


JK

Juergen Kuehlwein

Hi José,


after a little break here is an update of the current situation:

After the 1.06 FreeBASIC release Jeff basically agreed to merge in my pull request. There might be changes to some additions (extra string and array handling functions) but he definitely supports the integration of USTRING into the compiler´s code.

That is, we can finally get rid of the need to prepend "**" for variables and expressions at all. All of these changes work with my implemetation (CWSTR stripped down to the bare minimum) as well as with your original CWSTR. Existing code doesn´t have to be changed, you can still use "**", but you don´t need to anymore.


My code differs in two minor points from yours (CWSTR):
- i had to move "m_pBuffer" in first place of the member variables, this makes an improved "STRPTR" possible


TYPE DWSTR
  m_pBuffer AS UBYTE PTR                              'pointer to the buffer (in first place -> make strptr possible)

  Private:
    m_Capacity AS Ulong                               'the total size of the buffer
    m_GrowSize AS LONG = 260 * SizeOf(WSTRING)        'how much to grow the buffer by when required

  Public:
    m_BufferLen AS ulong                              'length in bytes of the current string in the buffer


STRPTR now returns a WSTRING PTR for Wstrings and Ustrings, which makes much more sense than a previously returned ZSTRING PTR. This change doesn´t break existing code, because you couldn´t use a ZSTRING PTR for processing a Wstring, you had to cast it to a WSTRING PTR or ANY PTR anyway.


- i had to remove the "CONST" qualifier in one place, this was necessary for some string handling functions to compile


    DECLARE OPERATOR CAST () BYREF AS WSTRING
    DECLARE OPERATOR CAST () AS ANY PTR



With these two small changes (which shouldn´t break anything) your CWSTR will be fully compatible to my compiler changes. Basically my USTRING implementation is written in a way, that (if pesent) WINFBX is the preferred way for adding dynamic wide strings. My code is only a fallback, if WINFBX is not used or cannot be used (e.g. LINUX).

Do you see any problems applying these changes to future releases of WINFBX?


JK

José Roca


Jeff Marshall

Hi Juergen, I have been reviewing your code last few weekends.

Your original code rebased on to current master I pushed here: https://github.com/jayrm/fbc/tree/jklwn-ustring
Overall, it's a big change to the compiler, though looks like you have made changes in the right places.  However, the logic in some places ignores cases that might be present.

I've been working on adding the changes step-by-step here: https://github.com/jayrm/fbc/tree/udt-wstring

I'm using this new syntax to indicate that a UDT should behave like a wstring (or a zstring):

type T extends wstring '' or zstring
  '' ...
end type


For testing, I am using a minimal implementation.  As far as the compiler changes go, I'm not really interested in how well the UDT works as a dynamic wstring, just that it works as wstring:

type UWSTRING_FIXED extends wstring
private:
_data as wstring * 256
public:
declare constructor()
declare constructor( byval rhs as const wstring const ptr )
declare constructor( byval rhs as const zstring const ptr )
declare operator cast() byref as wstring
declare const function length() as integer
end type

constructor UWSTRING_FIXED()
_data = ""
end constructor

constructor UWSTRING_FIXED( byval s as const wstring const ptr )
_data = *s
end constructor

constructor UWSTRING_FIXED( byval s as const zstring const ptr )
_data = wstr( *s )
end constructor

operator UWSTRING_FIXED.Cast() byref as wstring
operator = *cast(wstring ptr, @_data)
end operator

const function UWSTRING_FIXED.length() as integer
function = len( _data )
end function

operator Len( byref s as const UWSTRING_FIXED ) as integer
return s.Length()
end operator


I'm going step-by-step to determine what changes are needed, why they are needed, and fully test.  For example, test-suite for my branch not passing on Travis-CI due memory leaks I didn't notice (double free).  That could be due my changes, or due missing overloads the minimal implementation.

Also I have been investigating some of the long standing WSTRING bugs posted on sf.net while working on review of your code.

A requirement that the m_pBuffer element be first element should not be required.  Instead, fbc should be calling an operator overload to get the data's address if the user has defined one.  Or maybe it needs to be a requirement of the UDT that the user writes.

Jeff Marshall

Yeah, we are basically assuming that the existing wstring implementation is good, but I keep running in to annoying bugs.  Like this latest one, I found:

dim w as wstring * 50 = " "
dim r as wstring * 50 = trim(w)


Juergen Kuehlwein

#192
Jeff,


"EXTENDS WSTRING" in place of "#PRAGMA ..." is fine! But why not use the type in "ustring.bi" for testing? It is thoroughly tested, contains all necessary operators and overloads and thus helps avoiding type related errors.

QuoteA requirement that the m_pBuffer element be first element should not be required.  Instead, fbc should be calling an operator overload to get the data's address if the user has defined one.  Or maybe it needs to be a requirement of the UDT that the user writes.

m_pBuffer being the first element makes it possible to change the compiler´s code for "STRPTR" to work with WSTRING and USTRING. For both WSTRING and USTRING it returns a WSTRING PTR to the wide string data, this is different from what it did before (return a ZSTRING PTR for a WSTRING), which is - at least - undesirable.


JK 

Juergen Kuehlwein

The problem with:

dim w as wstring * 50 = " "
dim r as wstring * 50 = trim(w)


is a "ssize_t" overflow problem. Adding two lines to "...rtlib\strw_trim.c" solves the problem:


        ...

chars = fb_wstr_Len( src );
if( chars <= 0 )
return NULL;

if( wcscmp(src, L" ") == 0)        //added
return NULL;               //added

        ...



JK

Jeff Marshall

#194
I am still using your "ustring.bi" and "tests/ustring/*.bas" at each step to check that the changes to fbc still work with what you expect in "ustring.bi".  But "ustring.bi" is too complex to specifically test each change in the compiler, and most of the features in ustring.bi are not specifically needed to test the changes in fbc compiler.  So I am using a simplified UDT to test each compiler change.  In the end, "ustring.bi" will work too.

FYI, hides the problem, doesn't solve it.

if( wcscmp(src, L" ") == 0)        //added
return NULL;               //added