FreeBASIC CWstr

Paul Squires · January 02, 2019, 05:39:17 AM

Quote from: Juergen Kuehlwein on January 01, 2019, 07:39:49 PM
@Paul,

obviously you are following this thread and you could be of great help, because your WinBFE is a large project implementing José´s WinFBX.

Currently i´m able to compile your sources (all "**" removed) and as far as i can tell, it runs as expected. Because it is your project, you must have better means of testing than i have. Testing only ones own code is by no means sufficient for detecting all bugs, as we both definitely know. Would you help me testing the compiler changes with your project ? There soon will be a new version with generic routines as Jeff requested and i would appreciate someone as experienced as you running tests with it.

JK

Hi JK,

Yes, I am following this thread because I have a huge investment in wanting a dynamic unicode string data type to succeed in FreeBASIC. Obviously, my preference would be for such a native data type to be built into the compiler but has Jeff Marshall has intimated, that is a huge undertaking.

I am using Jose's CWSTR class for all my unicode string needs at the moment and I am happy with the implementation. I am hesitant to change.

I can not use WinFBE as a testing bed for your proposed changes. As you can appreciate, the WinFBE code base is very large and is constantly changing as it is an unfinished product. I can not introduce more risk into the code base at this time as any potential encountered problems would slow WinFBE development to a crawl. I can not spend time chasing unicode errors when I have so much other work to do on the editor and visual designer. I will try to help test later once WinFBE is further developed and I can fork a version just to be used for unicode testing.

As an aside, I have started a coding style guide (written using GitHub markdown) and am flushing out the various sections based on guides I have read for VB, C, C++, .NET, and others. Needless to say, there are widely differing opinions on some topics but others are very uniformly adopted. I will use the guide in my own programming for a while first and once flushed out a bit then I'll post it for open criticism. I have illusions that such a document would ever become a standard guide for code styling for FB.

Juergen Kuehlwein · January 02, 2019, 12:27:52 PM

Hi Paul,

thanks for your reply. I wasn´t actually asking you to change anything! Ustring is a fallback for Linux (maybe others) and for those, who for whatever reason don´t want to use José´s WinFBX in Windows. So please, stick to WinFBX! The new compiler version allows for skipping "**" in general (even with WinFBX), but of course you can stick to it too, if you want.

Quote...and I can fork a version just to be used for unicode testing.

This is more what i´m asking you to do. You don´t need to manage a public fork, just for yourself take what you have, remove all "**" (for the new syntax) and use the new compiler version for compiling - just for a test. You know your code better than i do. So you know the critical sections, where it is most likely to fail, if unicode doesn´t work properly - you know better how to test your work than i do.

In other words:

Your part would be to look, if there is a different behavior between versions of your code compiled with the existing compiler version and the new compiler version using the new syntax (no "**") - not all the time, but e.g when you finished a new version. In such a case you would tell me: it works as expected, or: the new compiler raises a problem with...

My part would be to supply the new compiler versions (see attachment) and then possibly to find out, what exactly is going wrong and why.

Attached is a new compiler version, which should meet Jeff´s wishes about a more generic approach, the code in the repo hasn´t been cleaned yet, i will do that, when i´m sure that everything works.

JK

Juergen Kuehlwein · January 04, 2019, 07:46:57 PM

My latest "ustring.inc" + "test.bas" seem to work in Linux too (attached)

JK

Juergen Kuehlwein · January 05, 2019, 12:27:36 AM

Jeff,

i pushed cleaned code to my fork (https://github.com/jklwn/fbc), "ustring" is still used as a marker, until we can be absolutely sure, that there are no bugs anymore. All my tests show, that there are no more bugs in Windows and Linux - please re-check this.

There is a new repository for "ustring.inc" + "test.bas" here (https://github.com/jklwn/ustring), which contains the latest version of both files and some other files you can ignore.

JK

Jeff Marshall · January 05, 2019, 11:51:02 PM

Quote from: Paul Squires on December 31, 2018, 04:07:06 PMI love the idea of having a formatting code rulebook for the compiler (or FB source code in general).

>Would you be cool if I start assembling a list of such formatting items...?

Sure, go for it, though I think it will be difficult to get everyone to agree. For the compiler, the main items for me are:
- TAB character for indent
- lines less than 70 or 80 characters if possible
- comments start with double apostrophe '', indented to same level of scope
- comments on their own line preceding the executable statements
- there's different "rules" for rtlib/gfxlib2 source because it's in C

There's probably many habits I have, that I don't even think about. Mostly I follow the "style" of what's already in the code base. Sometimes when I go back and look at old code, I can't tell if I wrote it, or v1ctor wrote, or dkl wrote it, because we all basically follow what's already there. I think I could probably write a short story on how I format code and why, though I'm not sure it would matter to anyone but me. Maybe if you ask some specific questions, I could answer with an opinion.

The important goal is consistency. When reading through 1000's of lines of code, it doesn't matter too much what the style is (everyone will have their own preference). It matters more that it is all roughly the same style, making it easier on the eyes with few disruptions/distractions. When I was rewriting the test-suite, I thought of creating a simple code formatting program, just to apply a few basic rules just to sanitize the code (mostly whitspace related) before committing. I think in the end I used a sed script. dkl was a little irritated at all the white-space changes. In future, I would do the white-space changing commits separate from the content changing commits.

Jeff Marshall · January 06, 2019, 12:08:30 AM

Quote from: Juergen Kuehlwein on January 01, 2019, 07:22:00 PM
Formatting...

> My formatting rules are:
> ...

Yeah, if I look back at my code from 2005 or earlier, I have about similar style, mostly due to habits from using QB/VB editors and small display screens in the 1990's. When I started on FreeBASIC project I changed my style to match. Some habits I kept and so new code I write, even if it is just for me has different style than what I would have written 20 years ago.

Jeff Marshall · January 06, 2019, 12:46:42 AM

Quote from: Juergen Kuehlwein on December 30, 2018, 11:18:00 PM
Did you have a look at "test.bas", which tries to test the applied changes? Should i adapt it to the format used in \tests?

Yes. And yes, eventually. Any place you change in fbc code needs a test. I noticed your reference implementation in ustring.inc + parser-procall-args.bas adds "TALLY", "PARSECOUNT", etc, features. This is beyond what I'm familiar with.

> Regarding astNewCONV: i changed the codeflow for conversions to Single and Double from Ustring and i changed "ldtype" to "FB_DATATYPE_WCHAR" whenever an ustring is passed. As far as i understand it, this doesn´t break anything, it enables ustrings to be processed like wstrings, which seems to work in my test.bas and other places - or did i miss something?

I started to investigate this. It's difficult, but not impossible, to work with your branch. I will comment more in another post.

Jeff Marshall · January 06, 2019, 02:35:40 AM

Quote from: Juergen Kuehlwein on January 05, 2019, 12:27:36 AM
i pushed cleaned code to my fork (https://github.com/jklwn/fbc), "ustring" is still used as a marker, until we can be absolutely sure, that there are no bugs anymore. All my tests show, that there are no more bugs in Windows and Linux - please re-check this.

There is a new repository for "ustring.inc" + "test.bas" here (https://github.com/jklwn/ustring), which contains the latest version of both files and some other files you can ignore.

Ok, I started to look at your previous branch from last week. I have not gone in depth to your latest branch; just saw it a couple hours ago.

I know I am being picky (specific, pedantic, critical) about your branch. So, maybe if I provide some context, you will understand why.

1) For perspective, here is what my local repository looks like: jayrm-fbc-graph-20190105.png
- To switch between branches (git checkout), I don't expect to have to do much.
- My focus currently is 1.06.0 branch because I am creating releases.
- jklwn/JK-USTRING is your most recent push to your public repo
- jk-ustring is my local branch, with edits that cleans up all the meaningless differences, gets rid of the "#compile" directives that are specific to your IDE, etc. As of your previous jklwn/JK-USTRING branch, actual number of files changed, is about 10 files.

2) When I compare origin/master to jklwm/JK-USTRING, here's what I see: jayrm-jklwn-diff-20190105.png
- It shows me differences that I don't care about. Many differences are meaningless and is result of the way you are working with fbc/master and git checkout
- you really need to set your git config core.autocrlf = true. Then all files that are checked out from git will be converted to CRLF line endings and you can get rid of '.gitattributes' file
- eventually, before you create a pull request to fbc/master, you should do a git rebase so that your end result is concise number of commits. No fault to Marc Pons, because it's only in hindsight, that's what should have been done with __FB_GUI__ pull request. But it never happened. I'm trying to learn from past experience.

3) And when I look at specific code: jayrm-jklwn-file-diff-20190105.png, these changes are now meaningless. As you spend more time in the code base, these kinds of comments are not needed since should be obvious just reading the code.

4) Copyright on ustring.inc

Quote from: ustring.inc
' ****************************************************************************************
' This code is copied and adapted from WinFBX with explicit permission of José Roca
' under the condition that the original copyright applies (see below).
' All changes and additions are Copyright (c) 2018 Juergen Kuehlwein
' Freeware. Use at your own risk.
' THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
' EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF
' MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.
' ****************************************************************************************

' ########################################################################################
' Microsoft Windows
' Implements a dynamic data type for null terminated unicode strings.
' Compiler: Free Basic 32 & 64 bit
' Copyright (c) 2016 Paul Squires & José Roca, with the collaboration of Marc Pons.
' Freeware. Use at your own risk.
' THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
' EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF
' MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.
' ########################################################################################

- What is intent here for licensing it's use? Included as part of FreeBASIC releases?
- As FreeBASIC developer, what am I allowed to do? Can I distrubute? You are stating that if I modify it, you still have copyright on it. That don't think that is compatible with FreeBASIC's other licenses.
- This needs to be addressed. For ustring.bi reference implementation, the licensing terms can not be any less restrictive than current rtlib/gfxlib2 licensing. Of course, you can create a full featured, any extensions you like version with any copyright/licensing you choose otherwise. But, for anything that is packaged with FreeBASIC, the developer team needs control over licensing.
- Also, if it's to be included as part of FreeBASIC release, to be use by users, then should look at separate .bi interface + .bas implementation; not all users #include sources from one main file.

So, ustring.inc or some variation of it, eventually needs to get added as reference implementation. As a reference implementation, it does not need to be exactly like DWSTRING, CWSTR, etc. It's only there to test the new feature of UDT => WSTRING implicit casting.

For me, to test your branch, I must edit several files in your branch, create a new branch, and create the automatic tests that I can run on each system. For the current time, I can give you some guidance only. Otherwise, just now, it just takes too much time to switch branches and work with your code.

I hope I explain in a way that is not too discouraging.

Paul Squires · January 06, 2019, 04:42:33 PM

Quote from: Jeff Marshall on January 05, 2019, 11:51:02 PM
Sure, go for it, though I think it will be difficult to get everyone to agree. For the compiler, the main items for me are:
- TAB character for indent
- lines less than 70 or 80 characters if possible
- comments start with double apostrophe '', indented to same level of scope
- comments on their own line preceding the executable statements
- there's different "rules" for rtlib/gfxlib2 source because it's in C

There's probably many habits I have, that I don't even think about. Mostly I follow the "style" of what's already in the code base. Sometimes when I go back and look at old code, I can't tell if I wrote it, or v1ctor wrote, or dkl wrote it, because we all basically follow what's already there. I think I could probably write a short story on how I format code and why, though I'm not sure it would matter to anyone but me. Maybe if you ask some specific questions, I could answer with an opinion.

The important goal is consistency. When reading through 1000's of lines of code, it doesn't matter too much what the style is (everyone will have their own preference). It matters more that it is all roughly the same style, making it easier on the eyes with few disruptions/distractions. When I was rewriting the test-suite, I thought of creating a simple code formatting program, just to apply a few basic rules just to sanitize the code (mostly whitspace related) before committing. I think in the end I used a sed script. dkl was a little irritated at all the white-space changes. In future, I would do the white-space changing commits separate from the content changing commits.

Thanks Jeff, I started such a document because I've spent the past week refactoring my source code. It is amazing the number of nuances that you encounter when trying to stylize code. You are right about consistency though... style is very subjective, but if you deviate from that style then it is confusing not only to you as the programmer, but to the reader as well.

My document has evolved past a simple style guide for the fb compiler source code. It is more now like a discussion on various style topics and in particular how I am formatting my code. Kind of like a self documentation exercise that may prove useful to the greater FB community at some point if others wish to have a starting point for their code formatting "rules"

One thing is for sure, such a document would surely stir some interesting debate.

Here are some of the topics so far:
- Disk File Structure and Layout (src, bin, doc, inc, lib, doc, tests, etc)
- Filenames
- Indentation (Tabs vs. Spaces)
- Blank lines
- Whitespace
- Line length
- Comments
-- Header boilerplates (licenses)
-- File header comments
-- Function description comments
-- Multiline comments
-- Single line comments
-- End of line comments
- Keyword and variable casing
- Types, Classes, Enums (naming and format)
- Subs/Functions and use of pendantic ByRef, ByVal, etc
- Variable declarations (one or more per line? initialization on same line? Grouping of similar definitions? Top of file or next to use area?)
- Variable names (upper, lower, camel, pascal, underscores, hungarian, etc)
- Spacing (whitespace amongst keywords, parenthesis, unary operators)
- Line breaking (multiple lines via ":" operator, and "_" underscore line continuations)
- Long vs Integer (issues switching between 32/64 systems and Windows API)
- Private vs Public Functions
- Modules vs Includes (linking multiple single object files vs #include source files into main file)
- Sub vs Function (make everything Function for consistency and ease of future changing of sode use?)
- Formatting of (If ElseIf Then, Select Case, Do Loop, For Next)

Juergen Kuehlwein · January 06, 2019, 05:52:05 PM

Jeff,

Quotein a way that is not too discouraging

no - not at all.

the latest commit is the first one you should work with, because i removed a lot of unnecessary things and i hope there won´t be much changes any more. See below for more...

QuoteThis is beyond what I'm familiar with

I added functions for string manipulation, which proved to be useful in PowerBASIC. It´s mostly clones of functions José included in his WinFBX but with a more PB-like syntax. In order to have a consistent syntax for the "ANY" keyword and in order to enable the function "Pathname_" to work just like in PB, i added code in parser-procall-args.bas, which makes this possible. These functions are not a necessary part of ustring.inc and will be placed in a separate file later, but for testing it´s easier to have it it in one single file - at least i thought so.

Quoteand create the automatic tests that I can run on each system

no - you just have to run test.bas, which is a collection of tests trying to address all possible aspects of implementing ustings and a collection of tests for the new string functions which simultaneously test ustrings themselves too.

QuoteWhat is intent here for licensing it's use? Included as part of FreeBASIC releases?

Basically i wanted not be held liable for any damage, which might arise form using this code, and i wanted to prevent anyone else from making money out of it, because it´s free. Otherwise you can do with it, whatever you want. In other words: José, Paul. Marc and me don´t want anyone else to be able to take ownership and sell this code for money in any way. It is acceptable though creating a commercial application, which implements this or derived code as a part of it´s source code.

If this code is to become a part of the FreeBASIC distribution, there will be no problem on my side (and i think José, Paul and Marc agree too) changing it to whatever is needed for FreeBASIC. It will be "ustring.bi" then and it will contain only what´s necessary for the type to work, everything else, will be in one or more (i´m working on additional array processing functions) separate .bi files, if the developers accept it.

What we currently have, is still a version for testing and not for a release!

So what can we do about my branch in order to make switching branches less work for you ?

I´m new to GIT, actually i just recently wrote a GIT integration for my IDE making use of TortoiseGit, but honestly i don´t understand, what each and every change of setting or implementation of Git-commands does to the code or to the remote repository. So far i managed to work with it, but i´m definitely not an expert. I did set git config core.autocrlf = true!

In case there still are bugs, which need to be fixed, i need some "markers" in the code, which allow for fast finding all places, where i added or changed it. I hate doing things twice (digging through 170 code files), therefore i usually take precautions not to have to do it twice - i add a specific comment.

I could remove all the end of line comments and add something like "(ustring)" to the preceding comment or add "ustring" as a new comment. Would this be acceptable?

Is there way to transfer your cleaned code then to my fork, or maybe create a new fork, i could use in the future? The one i´m currently working with does contain these differences (spaces vs. tabs, LF/CRLF) and i don´t know how to get rid of all of them in order to make you happy.

I will try to rewrite test.bas so that it can be added to \test. Is there a tutorial other than the attached readme?

JK

Juergen Kuehlwein · January 07, 2019, 12:05:01 AM

Jeff,

thinking about it, i could create a new fork (hopefully avoiding the previous errors) and re-apply my changes. Then i would discard the current one and you could use the new (cleaner) one. Does this sound more acceptable?

Furthermore i would add a cleaned "ustring.bi" (and maybe others) to the \inc folder and i would add an "ustring" folder to the \tests folder containing tests for ustrings + related additions.

Where should i put adjacent documentation, and which format would you prefer?

JK

Jeff Marshall · January 11, 2019, 10:20:18 PM

Quote from: Juergen Kuehlwein on January 07, 2019, 12:05:01 AM
i could create a new fork (hopefully avoiding the previous errors) and re-apply my changes. Then i would discard the current one and you could use the new (cleaner) one. Does this sound more acceptable?

I think that's a good approach. I often do this myself. I will work in a branch for a while as a work in progress (WIP). And then create a new branch to reapply the changes from the WIP. Doing this will clean up all the temporary commits where the feature is being revised, and the end result is a commit history that is much easier for the developers (including yourself) to follow; making it all the more likely that the pull request will be accepted without too many more revisions.

> Furthermore i would add a cleaned "ustring.bi" (and maybe others) to the \inc folder and i would add an "ustring" folder to the \tests folder containing tests for ustrings + related additions.

This sounds correct to me. ./inc/ustring.bi implements the new feature backed by changes in ./src/compiler, and ./tests/ustring for test modules.
How many new ".bi" files? Maybe in "./inc/ustring/*.bi" then?

> Where should i put adjacent documentation, and which format would you prefer?

Well, initially, it could just be a ./tests/ustring/ustring.txt document file, but as a permanent feature of fbc, then it should get added to the wiki. This feature is a little different than anything we've done before, so I would probably just start off with a single page linked from wiki's DocToc and then go from there.

Juergen Kuehlwein · January 16, 2019, 04:34:48 PM

Ok Jeff,

please check the "ustring" branch at my fork (https://github.com/jklwn/fbc).

I hope this is better now. All unnecessary code changes were removed, \tests now contains an \ustring folder for ustring tests and dirlist.mk an "ustring \" line. All tests run successful, 32 and 64 bit. I updated "readme.txt" in \tests a bit. A short documentation is in "ustring,txt" and "ustring.bi" and "stringex.bi" were added to \inc.

I´m still struggling a bit with GIT: i had to set core.autcrlf = false in order to prevent GIT from doing unwanted things to .txt files. Setting filemode = false didin´t prevent GIT from staging all (so far untouched by me) .sh files, so i added "*.sh" to .git\info\exclude, which hopefully fixes this. And i had to change the line ends of several (e.g. emit.bi) files from LF to CRLF to make it usable for me. Tell me, if there still is something "wrong" (and maybe how to make it better).

JK

Juergen Kuehlwein · January 20, 2019, 11:50:23 AM

Jeff,

there is a new commit at my fork (https://github.com/jklwn/fbc). I added some generic array processing macros for USTRINGs and all other array types (including tests in \tests\array, and a short documentation in array.txt).

In the meantime i realized that adding "*.sh" to .git\info\exclude doesn´t actually exclude these files, it deletes them, which is not what i want either. So i added all *.sh again with this commit. But there may be chmod changes and i don´t know how to keep these files without changing chmod (which is a Windows problem, because Windows doesn´t support an executable flag).

Please check and tell me, if you still see problems with my branch.

JK

Juergen Kuehlwein · January 20, 2019, 01:31:40 PM

Jeff,

i think i finally found a method of leaving *.sh files just as they are. I must remove them manually one by one from the list of files to commit - quite cumbersome, but seems to work. If you know of a more convenient method, please let me know!

I´m going to make a pull request now.

JK