• Welcome to Jose's Read Only Forum 2023.
 

Moving TCLib Into Mainstream

Started by Frederick J. Harris, April 02, 2016, 12:03:13 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

James C. Fuller

Fred,
  I've been busy on the File I/O front also.
As I mentioned I could not get your FILE approach to work with bc9 so I wrote my own.
I also have had a bad taste on the way BCX/bc9 handled Line Input. It translated:

OPEN a$ FOR INPUT AS fp1
WHILE NOT EOF(fp1)
  LINE INPUT fp1, a$
  ? a$
WEND

To This:


if((fp1=fopen(a,"r"))==0)
{
fprintf(stderr,"Can't open file %s\n",a);exit(1);
}
while(!EoF(fp1))
  {
    a[0]=0;
    fgets(a, 1048576,fp1);
    if(a[strlen(a)-1]==10)a[strlen(a)-1]=0;
    printf("%s\n",a);
  }


Notice the size of the buffer.

The bcx/bc9 translation of GET and PUT are macros which some compilers complain about because fread returns a value:

GET$   FP1,A$,6

#define GET(A,B,C)fread(B,1,C,A)
#define PUT(A,B,C)fwrite(B,1,C,A)

Your forray into TCLib and my subsequent investigations led me to rewrite File I/O for bc9.
I have the luxury to use anything I want in source form with bc9. I do not have to stuff it into a library.
The way I work UNICODE with the ULEX lexer I can write generic code and either ULEX it or not.

I wrote a specific LineInput routine for TCLib using your String (My fstring).
All file IO is for asci files only. There are no provisions for UNICODE files.
Input is converted to UNICODE for fstring.
fstring Output is converted to ansi before writing to file.
Files are opened with CreateFile
For now there is a 1024 byte ReadBuffer in the LineInput routine for TCLib.
Also note char is defined as _char so ULEX leaves it alone.

TCLib LineInput:

int LineInput (HANDLE hFile, fstring&  sData)
{
    _char*   ReadBuffer = {0};
    DWORD    dwBytesRead = {0};
    DWORD    dwPtr = {0};
    long     where = {0};
    int      j = {0};
    ReadBuffer = new _char[ 1024];
    dwPtr = SetFilePointer( hFile, NULL, NULL, FILE_CURRENT);
    if(FALSE == ReadFile(hFile, ReadBuffer, 1024, &dwBytesRead, NULL))
    {
        return -3;
    }
    if(dwBytesRead == 0 )
    {
        return -1;
    }
    {
        int      i;
        for(i = 0; i < dwBytesRead; i++)
        {
            if(ReadBuffer[i] == 10 )
            {
                where = i;
                j = where;
                if(ReadBuffer[i - 1] == 13 )
                {
                    where = i + 1;
                    j = i - 1;
                }
                break;
            }
        }
    }
    SetFilePointer(hFile, dwPtr + where, NULL, FILE_BEGIN);
    ReadBuffer[j]  = 0;
    wchar_t*  wszTo = new wchar_t[j + 1];
    wszTo[j] = L'\0';
    MultiByteToWideChar(CP_ACP, 0, ReadBuffer, -1, wszTo, (int)j);
    sData = wszTo;
    delete[] wszTo;
    delete [] ReadBuffer;
    return 0;
}


I hacked your original TCLib fprintf file to use HANDLE instead of FILE* but then
decided it was easier on my part to just use bc9 to spit out code so I added an fPrintS function for TCLib use:


int fPrintS(HANDLE hFile, fstring&  sBuffer)
{
    UINT     uLen = {0};
    DWORD    dwBytesWritten = {0};
    BOOL     bErrorFlag = {0};
    uLen = WideCharToMultiByte( CP_ACP, 0, sBuffer.lpStr(), - 1, 0, 0, 0, 0);
    _char*   szTo = new _char[uLen + 2];
    WideCharToMultiByte(CP_ACP, 0, sBuffer.lpStr(), -1, szTo, uLen, NULL, NULL);
    szTo[uLen - 1] = 13;
    szTo[uLen]  = 10;
    bErrorFlag = WriteFile( hFile, szTo, uLen + 1,  &dwBytesWritten, NULL);
    delete []szTo;
    return 0;
}



This is my replacement for GET$ -> FGET.
It uses _msize to determine the buffer size so the buffer must be allocated dynamically.
new,malloc, or calloc

DWORD FGet(HANDLE hFile, LPVOID  Buf, int bites)
{
    DWORD    dwBytesRead = {0};
    BOOL     retval = {0};
    if(_msize(Buf) < bites )
    {
        return 0;
    }
    retval = ReadFile( hFile, Buf, bites,  &dwBytesRead, NULL);
    return dwBytesRead;
}


And the PUT$ -> FPUT

BOOL FPut (HANDLE hFile, LPVOID  Buf, int bites)
{
    BOOL     reval = {0};
    DWORD    dwBytesWritten = {0};
    if(_msize(Buf) < bites )
    {
        return 0;
    }
    return WriteFile(hFile, Buf, bites, &dwBytesWritten, NULL);
}


I will post some bc9Basic examples and my updated TCLib along with a new version of bc9Basic in my area soon.


James

Frederick J. Harris

Ummm!  I see you are deep into it Jim! 

You know, I'm not sure about what to do about the size of the read buffer for things like fgets/Line Input.  If we're talking lines of text in a text file, which I imagine is the typical situation, we really don't need large numbers.  I'd think 256 or 512 would be plenty large enough.  Memory being what it is (practically infinite nowadays), larger numbers like 2048 or 4096 seem reasonable too.  But my TCLib is somewhat of a special circumstance I believe.  For that I'm more in favor of the smaller numbers.

I've worked on an implementation of atof/_wtof today.  Got it working.  Might tweak it some yet.  I just started out with my atol and added code to deal with the decimal point.  I'll post it in a bit.  I had more or less forgotten about that one.  I guess that's how its going to go.  Try to compile something different, and find out what's missing.  Then deal with it.

James C. Fuller

Fred,
  I google atof source and it looked pretty daunting. I'm anxious to see your implementation.
Also, I have code for Unicode text file I/O but I'm not sure how useful it would be.
I'm waiting on feedback from Patrice http://www.jose.it-berater.org/smfforum/index.php?topic=5131.msg21909;topicseen#msg21909
but maybe you have some knowledge in this area?

James

Frederick J. Harris

atof was a piece of cake.  Don't have it handy, but I'll post it in a bit.  All I did was start with atol() and locate the decimal point, then keep deviding by 10 in a loop to get the decimal point where it should go. 

I finally succeeded in getting a major program at work called the TimberBeast to compile/link with TCLib.  However, the battle's far from won.  The program starts and runs OK, but I haven't tried to calculate any of our timber sales with it yet because it surely won't work.  The issue is again - floating point numbers.  This program has hundreds and hundreds of lines of printf statements all over the place with %f format specifiers of various complexity, such as %10.2f, etc.  That is perhaps the largest failing of my TCLib.  That's where the FltToCh.cpp file enters the picture, if you recall.

So the way I see it my options are two....

1) Fix printf and sprint to work exactly like the C Runtime;
2) Alter all my printf statements that use printf with %s format specifier to FltToCh usage.

I'll have to decide what to do.  Option #1 would be best but perhaps hardest mentally to figure out.  Option #2 is the drudgery no brainer option where you just doggedly keep at it till its done.

I don't know what to do about Unicode text files either.  Originally I coded my fgetws to read Unicode text files.  Then I read what you and Patrice were conversing about, and decided maybe best option was to only read ansi files with fgetws and convert to wide character.   Are many text files nowadays in Unicode?

On a somewhat related note I had quite a discussion with Martins Mozeiko over at www.handmadehero.com (when that address worked) about the whole Unicode thing and Microsoft's wchar_t and TCHARs.  He thinks its all a truck load of crap and everybody should be using UTF8. 

I can't really say I'm much up on character sets, but I did some little research on the issue and found out a few things that were very disconcerting to me.

For one thing, I thought this whole issue of a two byte character capable of handling all the world's languages and symbols was settled years ago.  I suppose I got that impression from reading Charles Petzold's last edition of "Programming Windows" - what was that  - 6th edition or something like that circa 1998 or something like that?  Anyway, he went on to say moving from the one byte character to the two byte character was the final answer.  Turns out that it isn't.  There still aren't enough  bytes in 65236 or whatever 2^16 is.  That came out a bit later than his book. 

So.... for Chinese, Japanese, and Korean versions of Windows Microsoft has some kind of special setup like with the old multi-byte character sets where a lead byte specifies what is contained in the next several bytes.  So I believe the original intent was to have Microsoft's wchar_t type (2 byte unsigned short int) be a FIXED WIDTH character encoding - but it didn't turn out that way.  In some areas of the world its still a VARIABLE WIDTH CHARACTER ENCODING.

Martins Mozeiko faulted my String Class on that basis.  All my algorithms in my String Class assume a FIXED WIDTH character; either ansi or two byte wchar_t.  In other words, it would fail on Chinese, Japanese, or Korean versions of Windows if a String was encountered where a character was not exactly one byte or two bytes.  At least that's my understanding of his argument.

His solution or recommendation to me was to exclusively use UTF-8 which is a variable width character encoding where some characters are one byte, other characters are two bytes, and yet others could be three or four bytes!!!!!!!!!!!!!!!!!

And just how exactly does one figure the count of characters in such strings?  Well, its not pretty.  I could spend the rest of my life I guess writing voluminous code to try to figure out how long a string is (he provided me links where folks are doing jnust that).  Maybe in my next life after a reincarnation I could spend that trying to figure out how to implement Mid$ on such a String where a character could be anywhere from one to four bytes long.

At least that's my understanding of the issue.  I could be wrong and if someone knows better perhaps they could straighten me out.     

James C. Fuller

Fred,
Not bugging you (ok I am):)   I want to release an update to bc9Basic with the new TCLib support but I do need the atof to support some of the examples.

James

Frederick J. Harris

Sorry I took so long in posting this.  Its the original I got working, and was thinking I could improve on it by simply recopying the string representation of a number to another buffer without the decimal point, calling atoi on that, then dividing the result in a loop by ten till I got the right value.  Might save a couple lines from what I have now.  But here is what I have now...


//==============================================================================================
//               Developed As An Addition To Matt Pietrek's LibCTiny.lib
//                             By Fred Harris, May 2016
//
//        cl atof.cpp /D "_CRT_SECURE_NO_WARNINGS" /c /W3 /DWIN32_LEAN_AND_MEAN
//==============================================================================================
#include <windows.h>
#include "stdlib.h"
typedef SSIZE_T ssize_t;

double __cdecl atof(const char* pStr)
{
ssize_t lTotal   = 0;
char* pDecPt     = NULL;
char c,cNeg      = NULL;
double dblReturn;
size_t iDiff;

while(*pStr==32 || *pStr==8 || *pStr==48)
    pStr++;
if(*pStr=='-')
{
    cNeg='-';
    pStr++;
}
while(*pStr)
{
    if(*pStr=='.')
    {
       pDecPt=(char*)pStr;
       pStr++;
    }
    else
    {
       c=*pStr++;
       lTotal=10*lTotal+(c-48); // Add this digit to the total.
    }
}
if(pDecPt)
    iDiff=(int)(pStr-pDecPt-1);
else
    iDiff=0;
if(cNeg=='-')                  // If we have a negative sign, convert the value.
    lTotal=-lTotal;
dblReturn=(double)lTotal;
for(size_t i=0; i<iDiff; i++)
     dblReturn=dblReturn/10;

return dblReturn;
}

double __cdecl _wtof(const wchar_t* pStr)
{
ssize_t lTotal = 0;
wchar_t* pDecPt=NULL;
wchar_t c,cNeg=NULL;
double dblReturn;
size_t iDiff;

while(*pStr==32 || *pStr==8 || *pStr==48)
    pStr++;
if(*pStr==L'-')
{
    cNeg=L'-';
    pStr++;
}
while(*pStr)
{
    if(*pStr==L'.')
    {
       pDecPt=(wchar_t*)pStr;
       pStr++;
    }
    else
    {
       c=*pStr++;
       lTotal=10*lTotal+(c-48); // Add this digit to the total.
    }
}
if(pDecPt)
    iDiff=(int)(pStr-pDecPt-1);
else
    iDiff=0;
if(cNeg==L'-')                  // If we have a negative sign, convert the value.
    lTotal=-lTotal;
dblReturn=(double)lTotal;
for(size_t i=0; i<iDiff; i++)
     dblReturn=dblReturn/10;

return dblReturn;
}


Those functions need to be added to stdlib.h too...


// stdlib.h
#ifndef stdlib_h
#define stdlib_h
   #define NULL 0
   extern "C" void*   __cdecl malloc  (size_t          size);
   extern "C" void    __cdecl free    (void*           pMem);
   extern "C" long    __cdecl atol    (const char*     pStr);
   extern "C" int     __cdecl atoi    (const char*     pStr);
   extern "C" long    __cdecl _wtol   (const wchar_t*  pStr);
   extern "C" _int64  __cdecl _atoi64 (const char*     pStr);
   extern "C" _int64  __cdecl _wtoi64 (const wchar_t*  pStr);
   extern "C" double  __cdecl atof    (const char*     pStr);
   extern "C" double  __cdecl _wtof   (const wchar_t*  pStr);
   extern "C" int     __cdecl abs     (int             n   );
   extern "C" long    __cdecl labs    (long            n   );
   extern "C" _int64  __cdecl _abs64  (__int64         n   );
#endif


And maybe to tchar.h if you use those....


// tchar.h
#ifndef tchar_h
   #define tchar_h
   #ifdef  _UNICODE
      typedef wchar_t     TCHAR;
      #define _T(x)       L## x
      #define _tmain      wmain
      #define _tWinMain   wWinMain
      #define _tfopen     _wfopen
      #define _fgetts     fgetws
      #define _tprintf    wprintf
      #define _ftprintf   fwprintf
      #define _stprintf   swprintf
      #define _tcslen     wcslen
      #define _tcscpy     wcscpy
      #define _tcscat     wcscat
      #define _tcsncpy    wcsncpy
      #define _tcscmp     wcscmp
      #define _tcsicmp    _wcsicmp
      #define _tcsncmp    wcsncmp
      #define _tcsnicmp   _wcsnicmp
      #define _tcsrev     _wcsrev
      #define FltToTch    FltToWch
      #define _ttol       _wtol
      #define _ttoi64     _wtoi64
      #define _ttof       _wtof
   #else
      typedef char        TCHAR;
      #define _T(x)       x
      #define _tmain      main
      #define _tWinMain   WinMain
      #define _tfopen     fopen
      #define _fgetts     fgets
      #define _tprintf    printf
      #define _ftprintf   fprintf
      #define _stprintf   sprintf
      #define _tcslen     strlen
      #define _tcscpy     strcpy
      #define _tcscat     strcat
      #define _tcsncpy    strncpy
      #define _tcscmp     strcmp
      #define _tcsicmp    _stricmp
      #define _tcsncmp    strncmp
      #define _tcsnicmp   _strnicmp
      #define _tcsrev     _strrev
      #define FltToTch    FltToCh
      #define _ttol       atol
      #define _ttoi64     _atoi64
      #define _ttof       atof
   #endif
#endif


And in terms of my floating point delema with sprint, printf, and fprintf, I'm just having to leave it alone the way I have it where %f doesn't work, and one must either use my string class String::Format() or FltToTch().

Frederick J. Harris

#81
Had it all along.  Just forgot how I named it (Demo of atof())...


// cl Demo27.cpp /O1 /Os /GS- /link TCLib.lib kernel32.lib
// 5,120 bytes VC15
// 6,144 bytes VC19
#define UNICODE
#define _UNICODE
#include <windows.h>
#include "stdio.h"
#include "stdlib.h"
#include "tchar.h"
extern "C" int _fltused=1.0;

int main()
{
TCHAR* pStrs[]={_T("  -12345.678987654"), _T("0.99"), _T("0"), _T("1"), _T("-0.009")};
TCHAR szConvertedDouble[24];
double dblNumber;

for(size_t i=0; i<sizeof(pStrs)/sizeof(pStrs[0]); i++)
{
     dblNumber=_ttof(pStrs[i]);
     FltToTch(szConvertedDouble,dblNumber,24,9,_T('.'),true);
     _tprintf(_T("szConvertedDouble=%s\n"),szConvertedDouble);
}
getchar();

return 0;
}
// Output:
// ==================
// C:\Code\VStudio\VC15\LibCTiny\x64\Test20>Demo27
// szConvertedDouble=       -12345.678987654
// szConvertedDouble=            0.990000000
// szConvertedDouble=            0.000000000
// szConvertedDouble=            1.000000000
// szConvertedDouble=           -0.009000000


And this is what my TCLib.mak file now looks like.  You don't need those last two if you aren't interested in x86...


PROJ       = TCLib

OBJS       = crt_con_a.obj crt_con_w.obj crt_win_a.obj crt_win_w.obj memset.obj newdel.obj printf.obj \
             sprintf.obj _strnicmp.obj strncpy.obj strncmp.obj _strrev.obj strcat.obj strcmp.obj \
             strcpy.obj strlen.obj getchar.obj alloc.obj alloc2.obj allocsup.obj FltToCh.obj atol.obj \
             _atoi64.obj abs.obj memcpy.obj strchr.obj fopen.obj fprintf.obj _stricmp.obj fgets.obj \
             atof.obj win32_crt_math.obj win32_crt_Float.obj
       
CC         = CL
CC_OPTIONS = /D "_CRT_SECURE_NO_WARNINGS" /O1 /Os /GS- /c /W3 /DWIN32_LEAN_AND_MEAN

$(PROJ).LIB: $(OBJS)
    LIB /NODEFAULTLIB /machine:x64 /OUT:$(PROJ).lib $(OBJS)

.CPP.OBJ:
    $(CC) $(CC_OPTIONS) $<

Frederick J. Harris

#82
Back to the issue of files and unicode again. 

As I suspected (just tested it out now to make absolutely sure), when one uses fwprintf() and there is a wide character parameter, the C Runtime writes the data to the file as asci!  Here's a program to prove it...


// cl Test1.cpp /O1 /Os /MT
// 125,952 ansi
// 128,512 wide
#define UNICODE
#define _UNICODE
#include <stdio.h>
#include <tchar.h>

int main()
{
FILE* fp=NULL;

fp=_tfopen(_T("Data.txt"),_T("w"));
if(fp)
{
    _ftprintf(fp,_T("Hello, World!\n"));
    _tprintf(_T("Hello, World!\n"));
    fclose(fp);
    getchar();
}

return 0;
}


Note the above program isn't using my TCLib.lib, but is rather linking in the typical manner with the C Runtime.  And it opens "Data.txt" and writes "Hello, World!" to the file.  If you check out the count of bytes in the file using Windows Explorer >> File Properties you'll see it is 15 bytes - 13 bytes for "Hello, World!" and two bytes for the CrLf.  This in spite of the fact that all wide character versions of the file opening and writing functions were used.  If the file had been written as UNICODE it should have been 30 bytes. 

And so I have something of a disjuncture with my TCLib and the C Runtime on this issue.  That same program above using my TCLib would have output the string with two bytes per character for a total file size of 30 bytes (I've tested it).  And if you open the file in Notepad and use Save As... it'll show up as a UNICODE file.

So what to do?

I just changed my fwprintf routine to work the way the C Runtime does.  I don't see much else I can do.  I want my TCLib to be as compatible with the C Runtime as possible.  So here is that file now...


//=============================================================
//   Developed As An Addition To Matt Pietrek's LibCTiny.lib
//                By Fred Harris, March 2016
//
//  cl fprintf.cpp /O1 /Os /GS- /c /W3 /DWIN32_LEAN_AND_MEAN   
//=============================================================
// cl sprintf.cpp /O1 /Os /GS- /c /W3 /DWIN32_LEAN_AND_MEAN
#include <windows.h>
#include <stdarg.h>
#include "stdio.h"
#define EOF (-1)
#pragma comment(linker, "/defaultlib:user32.lib")

int __cdecl fprintf(FILE* fp, const char* format, ...)
{
char szBuff[1024];
DWORD cbWritten;
va_list argptr;
int retValue;

va_start(argptr, format);
retValue = wvsprintfA(szBuff, format, argptr);
va_end(argptr);
WriteFile((HANDLE)fp, szBuff, retValue, &cbWritten, 0);

return retValue;
}

int __cdecl fwprintf(FILE* fp, const wchar_t* pszFormat, ...)  // incomming pointers to character strings will be pointing to wide character strings
{                                                             
wchar_t szBuff[512];                                          // wide character strings will be written to here
char szAsci[512];                                             // this variable/buffer will contain converted string (from wide to narrow)
DWORD cbWritten;
va_list argptr;
int retValue;

va_start(argptr, pszFormat);
retValue = wvsprintfW(szBuff, pszFormat, argptr);
va_end(argptr);
WideCharToMultiByte(CP_ACP,0,szBuff,(int)wcslen(szBuff),szAsci,512,NULL,NULL);  // convert wide szBuff to narrow szAsci
WriteFile((HANDLE)fp, szAsci, retValue, &cbWritten, 0);                         // output retValue # of bytes

return retValue;
}


Another change too.  Not related to files though.  Its an improvement on my atol().  It really needs to test each character during the conversion to make sure the character is a digit, i.e., code between 48 and 57...


//==============================================================================================
//               Developed As An Addition To Matt Pietrek's LibCTiny.lib
//                             By Fred Harris, March 2016
//
//        cl atol.cpp /D "_CRT_SECURE_NO_WARNINGS" /c /W3 /DWIN32_LEAN_AND_MEAN
//==============================================================================================
#include "stdlib.h"

long __cdecl atol(const char* pStr)
{
char c,cNeg=NULL;           // c holds the char; cNeg holds the '-' sign.
long lTotal=0;              // The running total.

while(*pStr==32 || *pStr==8)
    pStr++; 
if(*pStr=='-')
{
    cNeg='-';
    pStr++;
}
while(*pStr)
{
    if(*pStr>=48 && *pStr<=57)
    {
       c=*pStr++;
       lTotal=10*lTotal+(c-48); // Add this digit to the total.
    }
    else
       pStr++;     
}
if(cNeg=='-')               // If we have a negative sign, convert the value.
    lTotal=-lTotal;

return lTotal;
}

int __cdecl atoi (const char* pStr)
{
return ((int)atol(pStr));
}

long __cdecl _wtol(const wchar_t* pStr)
{
wchar_t c,cNeg=NULL;        // c holds the char; cNeg holds the '-' sign.
long lTotal=0;              // The running total.

while(*pStr==32 || *pStr==8)
    pStr++; 
if(*pStr==L'-')
{
    cNeg=L'-';
    pStr++;
}
while(*pStr)
{
    if(*pStr>=48 && *pStr<=57)
    {
       c=*pStr++;
       lTotal=10*lTotal+(c-48); // Add this digit to the total.
    }
    else
       pStr++;     
}
if(cNeg==L'-')              // If we have a negative sign, convert the value.
    lTotal=-lTotal;

return lTotal;
}


int __cdecl _wtoi (const wchar_t* pStr)
{
return ((int)_wtol(pStr));
}


All this nastiness is coming out now that I'm actually using this on a major production app.