Print Page - Powerbasic dynamic Strings Memory usage

Title: Powerbasic dynamic Strings Memory usage
Post by: Theo Gottwald on August 04, 2007, 11:19:42 AM

[copied from Dominic Mitchell]

PowerBASIC strings are BSTRs.
For example the string, HI, will look like this in memory.
(HI is a wide string, and the BSTR is allocated with SysAllocString)

Code Select

  
 _______________________________________
|   |   |   |   |   |   |   |   |   |   |
| 4 | 0 | 0 | 0 | H | 0 | I | 0 | 0 | 0 | 
|___|___|___|___|___|___|___|___|___|___|
                                         
|               |               |       |___  
|_____length____|__character____|__terminal_|
     (in bytes)       data           null

SysStringLen will return a value of 2(characters).

PowerBASIC uses the functions that cram two ANSI characters in a single wide character.
SysAllocStringByteLen,SysStringByteLen

SysStringByteLen will return a value of 4(bytes).
A wide character occupies two bytes.

The same string ala PowerBASIC.
(HI is an ANSI string, and the BSTR is allocated with SysAllocStringByteLen)

Code Select


 _______________________________
|   |   |   |   |   |   |   |   |
| 2 | 0 | 0 | 0 | H | I | 0 | 0 | 
|___|___|___|___|___|___|___|___|

If I could hazard a guess, the amount of bytes reserved for a PowerBASIC string including
the terminating null would be
bytes = ((n+1)\2+1)*2

Therefore,
H
| 1 | 0 | 0 | 0 | H | 0 | 0 | 0 |

HI
| 2 | 0 | 0 | 0 | H | I | 0 | 0 |

HIS
| 3 | 0 | 0 | 0 | H | I | S | 0 | 0 | 0 |

Here is another link you might want to check out.
Eric's Complete Guide To BSTR Semantics (http://blogs.msdn.com/ericlippert/archive/2003/09/12/52976.aspx)

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Edwin Knoppert on August 04, 2007, 12:35:53 PM

Good subject i am currently busy using BSTR as handles (c)
--

>PowerBASIC uses the functions that cram two ANSI characters in a single wide character.
SysAllocStringByteLen,SysStringByteLen

SysStringByteLen will return a value of 4(bytes).
A wide character occupies two bytes.

--

Don't think i concure with this statement, SysAllocStringByteLen() just creates an single byte OLECHAR string.

What i would like to know is how to determine a wide string from a byte string by pointer.
A division helps a little, 5 bytes of ansi (SysStringLen()) results in 2 byte via SysStringByteLen()
But that's probably not a guarantee..

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Theo Gottwald on August 04, 2007, 02:37:12 PM

Who's the visual design on that picture? Your new project:-)?

As said these things are from Dominic Mitchell, I copied them because I thought these are facts which may be of use from time to time.

For additional questions on this topic I can only refer you to this link, as I did not yet have reasons to take a deeper look into these subjects myself.

Guide to C++ Strings and String Wrapper Classes (http://www.codeproject.com/string/cppstringguide2.asp)

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Dominic Mitchell on August 06, 2007, 03:10:07 AM

Quote
Don't think i concure with this statement, SysAllocStringByteLen() just creates an single byte OLECHAR string

Edwin, in the land of COM, according to Don Box, OLECHAR is simply a typedef to the C data type wchar_t. Win32
platforms use the wchar_t data type to represent 16-bit unicode characters.

By the way, have you ever seen a string that was created with a PowerBASIC intrinsic with a single null byte at the end?

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Edwin Knoppert on August 06, 2007, 09:22:13 AM

>Edwin, in the land of COM, according to Don Box, OLECHAR is simply a typedef to the C data type wchar_t. Win32
Then you misread, they compare it with that but it isn't.

>By the way, have you ever seen a string that was created with a PowerBASIC intrinsic with a single null byte at the end?
I can't tell if it contains one or more, at least one.
The unicode version should have at least two.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Dominic Mitchell on August 06, 2007, 02:35:26 PM

Then I guess you are in disagreement with the Platform SDK headers and the info in
this link

http://www.ecs.syr.edu/faculty/fawcett/handouts/CSE775/Presentations/BruceMcKinneyPapers/COMstrings.htm
(http://www.ecs.syr.edu/faculty/fawcett/handouts/CSE775/Presentations/BruceMcKinneyPapers/COMstrings.htm)
and this one

http://www.codeproject.com/string/cppstringguide2.asp
(http://www.codeproject.com/string/cppstringguide2.asp)

By the way, because of the way SysAllocStringByteLen works, there will always be(in my opinion) at
least two null bytes at the end of a PowerBASIC dynamic string. You can try to disprove the formula
I posted.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Edwin Knoppert on August 06, 2007, 08:32:04 PM

Later..

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Donald Darden on August 06, 2007, 09:59:04 PM

From the PowerBasic Help File:

QuoteDynamic (Variable-length) strings ($)

Dynamic string variables contain character data of arbitrary length. Internally, each string variable uses four bytes that contain a handle number, which is used to identify and locate information about a string. Dynamic strings can contain up to approximately 2 Gb (2^31) characters. The type-specifier character for a dynamic string is: $.
String variables are designated by following the variable name with a dollar sign ($) or the DEFSTR type definition. You can also declare dynamic string variables using the STRING keyword with the DIM statement. For example:

DIM MyStr AS STRING

PowerBASIC allocates strings using the Win32 OLE string engine. This allows you to pass strings from your program to DLLs, or API calls that support OLE strings. Note, however, that Visual Basic and Visual C++ store data in OLE strings using 16-bits per character (Unicode format) while PowerBASIC stores them in 8-bit format. In PowerBASIC, strings may contain either ASCII or ANSI string data.
The distinction between ASCII and ANSI only becomes important when using the strings for specific tasks. For example, when dealing with API calls, string data is usually interpreted as ANSI data by the API functions, whereas PowerBASIC statements such as UCASE$ treat the string data as ASCII.

Most standard DLLs designed to work with Visual Basic should still work with PowerBASIC, because VB converts OLE strings from Unicode to ANSI before passing them to a DLL, and PowerBASIC will accept and work with the ANSI string data.
The address of the contents of a non-empty string can be obtained with the STRPTR function. The address of the string handle can be obtained with VARPTR function. An empty (null) string may not return a valid STRPTR value.

Dynamic strings move in memory with each assignment statement: that is, STRPTR will return a different address when the content of the string is changed. However, the associated string handle obtained by VARPTR stays constant for the duration of the life (scope) of the string variable.

Note that what PowerBasic calls Dynamic, Variable length strings are not null-byte terminated. They have a pointer and a string length association, and
this allows null-bytes to be contained within the string itself. If you want to
talk about ASCIIZ strings, which are null-byte terminated, that is a different
animal.

I will also point out that there is no absolute way to tell if a string pointer
reference is pointing to a wide string or not. In most cases, if the language is
limited to the American Standard Code symbols, every other byte will be a zero.
That may be of some help.

However, I cannot imagine a case where you would write a subordinate function or sub and just allow the user to send you any type of data during a call, It is normal to set the limits of what the data being passed must conform to, and it is then the responsibility of the calling party to make sure that data passed fits within the limits given, If you look at the Windows APIs as an example of many
functions, in each case the nature of the passed parameter is clearly defined, If you try to pass something other than expected, you do so at your peril.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: José Roca on August 06, 2007, 10:09:37 PM

Quote
Note that what PowerBasic calls Dynamic, Variable length strings are not null-byte terminated.

But it uses the Win32 OLE engine to allocate them, and this engine adds a null byte. This allows to pass, using STRPTR, a dynamic string to a function that expects a null-terminated string without needing to add a null byte to the string (because it already has one).

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Theo Gottwald on August 07, 2007, 07:18:04 AM

What Donald wanted to say was:

They do not need a null-byte termination as lenght limiter for normal string-operation like ASCIIZ.

Of course what Jose and Dominic point out, that PB will anyway append a termination is (while often unknown) the case.

PS: While it will be hard to find anycase where Jose or Dominic are wrong, therefore I would not even try.
There are things in life we just have to accept :-), like the weather or this fact.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Dominic Mitchell on August 07, 2007, 01:34:11 PM

PowerBASIC allocates strings using the Win32 OLE string engine. The OLE SysAllocXXX functions
return pointers to BSTRs. Therefore, statements like this one

Quote
Note that what PowerBasic calls Dynamic, Variable length strings are not null-byte terminated.

makes absolutely no sense.

Here is the definition of a BSTR from MSDN.

BSTR

A BSTR, known as Basic string or binary string, is a string data type that is used by COM, Automation, and Interop functions.

BSTRs have the following characteristics:

1. A BSTR is a composite data type that consists of:

A length prefix
A data string
A terminator

2. Length Prefix:

A four-byte integer
Occurs immediately before the first character of the data string
Contains the number of bytes in the following data string
Does not include the terminator

3. Datastring:

Windows Platform: A string of unicode characters (wide characters, also known as double byte characters). Also referred to as a string of OLECHARs, a data type defined as a typedef to the C data typt wchar_t.
Apple PowerMac: A single-byte string.
The string can contain multiple embedded null characters.

4. Terminator:

Consists of two null characters (0x00).

Use
A BSTR is defined in oleauto.h as follows:

tydef OLECHAR FAR* BSTR;

A BSTR is therefore a pointer. The pointer points to the first character of the data string, not to the length prefix.
The BSTR string type must be used in all interfaces that will be used from Visual Basic or Java.
BSTRs are allocated using COM memory allocation functions. This allows them to be returned from methods without concern for memory allocation.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Eros Olmi on August 07, 2007, 01:44:36 PM

Quote from: Theo Gottwald on August 07, 2007, 07:18:04 AM
There are things in life we just have to accept :-), like the weather or this fact.

I always think that what is right today could be wrong tomorrow. With this in mind, the correct ways to go, for me, is:

document by myself
ask to people working on the subject and can for sure know beter than me on that particular aspect
test by myself

But again, things can change again ... tomorrow. That's evolution, improvement in IT. And we like it, isn't it? I think we like the fact we are people open to changes. That's also why many other people think our job is too much complicated: they are not so open to work on a world continuously changing and evolving.

:D

That said, PB dynamic strings are NULL terminated!!

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Theo Gottwald on August 07, 2007, 04:31:56 PM

QuoteBut again, things can change again ... tomorrow.

Thats why I chose the example with the weather. It may change ...
Besides that I am quite sure that the null-terminated string-weather will stay with us for a while.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Eros Olmi on August 07, 2007, 04:46:28 PM

oops, sorry. I got it in the other way round.
You are right, both weather and null matters ;D

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Donald Darden on August 08, 2007, 06:08:11 AM

The point that is being overlooked here, is that regardless of what mechanism that PowerBasic uses for managing its dynamic strings, it is able to support embedded null bytes. Thus you cannot assume that if you search the string for a null byte to end the string, you may encounter a null character instead. And the length of the dynamic string is managed separately apart from any possible terminating nulls. To a programmer then, the dynamic nature of the strings appears to be organized on a byte basis from the string pointer, for the number of bytes given by the length.

You are also ignoring the fact that PowerBasic has another string type that is
handled strictly as a null-terminated string, that is defined with a maximum length, and which length is currently set by the first null byte encoutnered.
PowerBasic calles these ASCIIZ strings. Arguing as you are without regard for these distinctions only causes others to get confused about the nature of the various strings allowed by PowerBasic. So are you going to be sticklers about mechanisms, or acknowledge intended usage?

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: José Roca on August 08, 2007, 07:38:20 AM

Until PowerBASIC adds native unicode support, it is of some importance to low-level COM programmers to know if PB dynamic strings end always with a null byte or two. Why? Because there is another kind of string, the null-terminated unicode string, not currently supported by PB. Currently, if we don't want to use SysALlocString / FreeString, we have to use a double null terminated dynamic string and use STRPTR to pass a pointer to the string data. If PB dynamic strings end with a single null, we have to add another null to the string; if it ends with two nulls, we don't need to add nothing; if we aren't sure, we have to add always a null to be safe.

My guess is that instead of:

| 2 | 0 | 0 | 0 | H | I | 0 | 0 |

The resulting string will be:

| 2 | 0 | 0 | 0 | H | I | 0 | x |

Where x will be uninitialized data. Sometimes it will be a null character and sometimes not.

BTW the adoption of unicode is unavoidable. Since Windows NT, the Windows API is all unicode, the ansi functions being wrappers for the unicode ones. Therefore, there is a speed penalty using the Windows API ansi functions because the operating system has to convert them to unicode, call the unicode vesion of the function and convert them back to ansi. One day, ansi and asciiz strings will be a thing of the past.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Dominic Mitchell on August 08, 2007, 02:03:06 PM

My observations and the formula I posted don't support that.
They backup what Bruce McKinney said about SysAllocStringByteLen cramming two ANSI characters into each wide character. That is why there are least two or more nulls at the end of a PowerBASIC dynamic string. I do agree with you on playing it safe.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Donald Darden on August 08, 2007, 10:48:56 PM

I might point out that adding another NULL byte to a dynamic string for the purposes of passing it to a Windows API does not free you from the problem of whether a new string space has to be allocated and a new instance of the string constructed. Nor does it mean that the packed byte code in the string will be expanded to wide (16-bit) format. In fact, the most common method of handling dynamic strings before passing them as arguments in API calls is to allocate an ASCIIZ string of sufficient length for the purpose and ask Powerbasic to assign the dynamic string to it: aa = d$.

For converting wide code to byte code and vice-versa, the current PowerBasic
compilers include ACODE$() and UCODE$() intrinsic functions. While it is no major task to write equivalent functions in BASIC or ASM for other dialects, being able to use native functions simplifies the process

I'm not sure that Unicode (16-bit) will ever really replace byte (8-bit) code in common usage. Unless you are prepared to interpret code values beyond the standard set, you will be hard pressed to justify the rework required, and if you are expecting to continue using the standard set, then to what advantage is there in doubling string lengths with every other byte value being set to null?

Microsoft's commitment to Unicode has always been half-hearted, and only progresses in spurts, likely to appease supporters of other languages rather than from any domestic need. If I wanted to write multilingual applications or process data in various tongues, then I might be a proponent of Unicode, but just as the airline industy has found it necessary to adopt a single language for tower and pilot, I expect to see English to stay at the forefront of business and computer
communications for a long time coming. It may be that the Chinese will come to dominate the world and force us to all learn Mandarin or something, but then Unicode will hardly be adequate for the range of symbols then required.

Point is, Microsoft does not decide for me what I will use in my own code. If I need to interface to COM or the APIs. then I am only interested in what form my data has to be in for the purpose of the various calls. I do not plan to write my program in a style that compromises my other goals just because someone else thinks that it is the thing to do.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: José Roca on August 09, 2007, 03:40:06 AM

The purpose of adding native support to the compiler is to allow you to work with it transparently, without having to constantly use UCODE$, ACODE$ or their API counterparts. Two new datatypes can be added, e.g. WSTRING and WCHAR, and the compiler will handle them transparently, so you will work with them as you are doing with STRING and ASCIIZ.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Edwin Knoppert on August 09, 2007, 05:17:22 AM

>so you will work with them as you are doing with STRING and ASCIIZ

Hmm, i always wonder how one should program.
In c# (or a VB6) the string is set like a$ = "hello"
a$'s contents are in unicode but it looks more like a translation from "hello" to "h e l l o " to me.
So if i am chinese, how would i benefit using unicode?
Do i need to enter special charactercodes?

In c they use L"hello" to make it an unicode, this makes it more clear to me one uses ansi notation which will be converted to unicode.
To make it an unicode string they could use "h\0e\0l\0l\0o\0"

I would not mind to support unicode in my tools but it seems i would have conversions from ansi to uni all the time..

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Donald Darden on August 09, 2007, 05:57:17 AM

It really does depend on where you are coming from. Jose has devoted a lot of time to learning how to extend his reach by using APIs, COM, Variants, Objects, and so on, and has really helped show others how to get there through his examples, wrappers, conversions, and explanations. For his purposes, the use of wide character strings serve his other goals best. But if you are english-centric,
and the bulk of your code is to process data using PowerBasic, then you might easily decide that byte strings as provided by PowerBasic is the best way to go.

ACODE$() and UCODE$() provide the means to go from one form to another. If your compiler recognizes WStrings, then you do not need to specify the step of
converting, the compiler will call the necessary function automatically as part of the assign operation. But this is not magic, there is still a time penalty whenever any type of conversion is required. However, it usually takes a sizeable number of conversions before the difference can be noted. You will have observed that it is a common practice to repeat some operations many thousands of times just to get a measure of the time differences between different approaches. In actual use, this is hardly a real factor, because the time involved is actually quite short.

I think that the real compaint about ACODE$() and UCODE$() is that this method may seem a bit kludgy or awkward. It also avoids consideration of any special coding that the UNICODE could support, that would not be reflected in the standard character set. But that is a choice for the programmer or the client to make. Some may temporize by selecting a specific font that supports symbols not available in the standard set. Same byte code, but perhaps the symbols above code value 127 will be shown differently. How such code values are then interpreted would have to be taken into account when writing the supporting program.

Note that this is an evolving area, with a lot of existing art in place. This would make it hard for anyone to come along at this point and mandate that from now own, everything should be done this way or that. It has happened, such as when the original IBM EBCDIC code was largely superceded by ASC, but if you deal with mainframe datasets, then EBCDIC code is still very much alive and in use. Even in systems where it has largely been done away with, it may still be an option for data transfers or storage. That is at least a 50 year period in which the superceded code has continued to endure. And that is just one example of the endurance of coding methods.

Title: Re: Powerbasic dynamic Strings Memory usage
Post by: Theo Gottwald on August 10, 2007, 06:56:17 PM

I'd say that Purebasic also has already native UNICODE Support.
It looks to me as if this will be a new standard, in that case everything else will be outdated somewhen.

Jose's Read Only Forum 2023

IT-Consultant: Charles Pegge => Assembler => Best place to post any assembler code => Topic started by: Theo Gottwald on August 04, 2007, 11:19:42 AM