• Welcome to Jose's Read Only Forum 2023.
 

Powerbasic dynamic Strings Memory usage

Started by Theo Gottwald, August 04, 2007, 11:19:42 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

[copied from Dominic Mitchell]

PowerBASIC strings are BSTRs.
For example the string, HI, will look like this in memory.
(HI is a wide string, and the BSTR is allocated with SysAllocString)
 
_______________________________________
|   |   |   |   |   |   |   |   |   |   |
| 4 | 0 | 0 | 0 | H | 0 | I | 0 | 0 | 0 |
|___|___|___|___|___|___|___|___|___|___|
                                         
|               |               |       |___ 
|_____length____|__character____|__terminal_|
     (in bytes)       data           null                 

SysStringLen will return a value of 2(characters).
     
PowerBASIC uses the functions that cram two ANSI characters in a single wide character.
SysAllocStringByteLen,SysStringByteLen
   
SysStringByteLen will return a value of 4(bytes).                                     
A wide character occupies two bytes.
       
The same string ala PowerBASIC.
(HI is an ANSI string, and the BSTR is allocated with SysAllocStringByteLen)

_______________________________
|   |   |   |   |   |   |   |   |
| 2 | 0 | 0 | 0 | H | I | 0 | 0 |
|___|___|___|___|___|___|___|___|
   
If I could hazard a guess, the amount of bytes reserved for a PowerBASIC string including
the terminating null would be
bytes = ((n+1)\2+1)*2       
   
Therefore,
H
| 1 | 0 | 0 | 0 | H | 0 | 0 | 0 |
     
HI
| 2 | 0 | 0 | 0 | H | I | 0 | 0 |
 
HIS
| 3 | 0 | 0 | 0 | H | I | S | 0 | 0 | 0 | 
     
     
Here is another link you might want to check out.
Eric's Complete Guide To BSTR Semantics


Edwin Knoppert

Good subject i am currently busy using BSTR as handles (c)
--

>PowerBASIC uses the functions that cram two ANSI characters in a single wide character.
SysAllocStringByteLen,SysStringByteLen
   
SysStringByteLen will return a value of 4(bytes).                                     
A wide character occupies two bytes.

--

Don't think i concure with this statement, SysAllocStringByteLen() just creates an single byte OLECHAR string.

What i would like to know is how to determine a wide string from a byte string by pointer.
A division helps a little, 5 bytes of ansi (SysStringLen()) results in 2 byte via SysStringByteLen()
But that's probably not a guarantee..

Theo Gottwald

Who's the visual design on that picture? Your new project:-)?

As said these things are from Dominic Mitchell, I copied them because I thought these are facts which may be of use from time to time.

For additional questions on this topic I can only refer you to this link, as I did not yet have reasons to take a deeper look into these subjects myself.

Guide to C++ Strings and String Wrapper Classes


Dominic Mitchell

Quote
Don't think i concure with this statement, SysAllocStringByteLen() just creates an single byte OLECHAR string
Edwin, in the land of COM, according to Don Box, OLECHAR is simply a typedef to the C data type wchar_t.  Win32
platforms use the wchar_t data type to represent 16-bit unicode characters.

By the way, have you ever seen a string that was created with a PowerBASIC intrinsic with a single null byte at the end?
Dominic Mitchell
Phoenix Visual Designer
http://www.phnxthunder.com

Edwin Knoppert

>Edwin, in the land of COM, according to Don Box, OLECHAR is simply a typedef to the C data type wchar_t.  Win32
Then you misread, they compare it with that but it isn't.

>By the way, have you ever seen a string that was created with a PowerBASIC intrinsic with a single null byte at the end?
I can't tell if it contains one or more, at least one.
The unicode version should have at least two.

Dominic Mitchell

Then I guess you are in disagreement with the Platform SDK headers and the info in
this link

http://www.ecs.syr.edu/faculty/fawcett/handouts/CSE775/Presentations/BruceMcKinneyPapers/COMstrings.htm

and this one

http://www.codeproject.com/string/cppstringguide2.asp


By the way, because of the way SysAllocStringByteLen works, there will always be(in my opinion) at
least two null bytes at the end of a PowerBASIC dynamic string. You can try to disprove the formula
I posted.
Dominic Mitchell
Phoenix Visual Designer
http://www.phnxthunder.com

Edwin Knoppert


Donald Darden

#7
From the PowerBasic Help File:

QuoteDynamic (Variable-length) strings ($)   

Dynamic string variables contain character data of arbitrary length.  Internally, each string variable uses four bytes that contain a handle number, which is used to identify and locate information about a string.  Dynamic strings can contain up to approximately 2 Gb (2^31) characters.  The type-specifier character for a dynamic string is: $.
String variables are designated by following the variable name with a dollar sign ($) or the DEFSTR type definition.  You can also declare dynamic string variables using the STRING keyword with the DIM statement.  For example:

DIM MyStr AS STRING


PowerBASIC allocates strings using the Win32 OLE string engine.  This allows you to pass strings from your program to DLLs, or API calls that support OLE strings.  Note, however, that Visual Basic and Visual C++ store data in OLE strings using 16-bits per character (Unicode format) while PowerBASIC stores them in 8-bit format.  In PowerBASIC, strings may contain either ASCII or ANSI string data.
The distinction between ASCII and ANSI only becomes important when using the strings for specific tasks.  For example, when dealing with API calls, string data is usually interpreted as ANSI data by the API functions, whereas PowerBASIC statements such as UCASE$ treat the string data as ASCII.

Most standard DLLs designed to work with Visual Basic should still work with PowerBASIC, because VB converts OLE strings from Unicode to ANSI before passing them to a DLL, and PowerBASIC will accept and work with the ANSI string data.
The address of the contents of a non-empty string can be obtained with the STRPTR function.  The address of the string handle can be obtained with VARPTR function.  An empty (null) string may not return a valid STRPTR value.

Dynamic strings move in memory with each assignment statement: that is, STRPTR will return a different address when the content of the string is changed.  However, the associated string handle obtained by VARPTR stays constant for the duration of the life (scope) of the string variable.

Note that what PowerBasic calls Dynamic, Variable length strings are not null-byte terminated.  They have a pointer and a string length association, and
this allows null-bytes to be contained within the string itself.  If you want to
talk about ASCIIZ strings, which are null-byte terminated, that is a different
animal.

I will also point out that there is no absolute way to tell if a string pointer
reference is pointing to a wide string or not.  In most cases, if the language is
limited to the American Standard Code symbols, every other byte will be a zero.
That may be of some help.

However, I cannot imagine a case where you would write a subordinate function or sub and just allow the user to send you any type of data during a call,  It is normal to set the limits of what the data being passed must conform to, and it is then the responsibility of the calling party to make sure that data passed fits within the limits given,  If you look at the Windows APIs as an example of many
functions, in each case the nature of the passed parameter is clearly defined,  If you try to pass something other than expected, you do so at your peril.

José Roca

 
Quote
Note that what PowerBasic calls Dynamic, Variable length strings are not null-byte terminated.

But it uses the Win32 OLE engine to allocate them, and this engine adds a null byte. This allows to pass, using STRPTR, a dynamic string to a function that expects a null-terminated string without needing to add a null byte to the string (because it already has one).

Theo Gottwald

What Donald wanted to say was:

They do not need a null-byte termination as lenght limiter for normal string-operation like ASCIIZ.

Of course what Jose and Dominic point out, that PB will anyway append a termination is (while often unknown) the case.

PS: While it will be hard to find anycase where Jose or Dominic are wrong, therefore I would not even try.
There are things in life we just have to accept :-), like the weather or this fact.



Dominic Mitchell

PowerBASIC allocates strings using the Win32 OLE string engine.  The OLE SysAllocXXX functions
return pointers to BSTRs.  Therefore, statements like this one

Quote
Note that what PowerBasic calls Dynamic, Variable length strings are not null-byte terminated. 

makes absolutely no sense.

Here is the definition of a BSTR from MSDN.

BSTR

A BSTR, known as Basic string or binary string, is a string data type that is used by COM, Automation, and Interop functions.

BSTRs have the following characteristics:

1. A BSTR is a composite data type that consists of:
 

     
  • A length prefix
     
  • A data string
     
  • A terminator 
     
2. Length Prefix:
 

     
  • A four-byte integer
     
  • Occurs immediately before the first character of the data string
     
  • Contains the number of bytes in the following data string
     
  • Does not include the terminator
     
3. Datastring:
 

     
  • Windows Platform: A string of unicode characters (wide characters, also known as double byte characters). Also referred to as a string of OLECHARs, a data type defined as a typedef to the C data typt wchar_t.
     
  • Apple PowerMac: A single-byte string.
     
  • The string can contain multiple embedded null characters.
     
4. Terminator:
 

     
  • Consists of two null characters (0x00).
     
Use
A BSTR is defined in oleauto.h as follows:

   tydef OLECHAR FAR* BSTR;
   

  • A BSTR is therefore a pointer. The pointer points to the first character of the data string, not to the length prefix.
  • The BSTR string type must be used in all interfaces that will be used from Visual Basic or Java.
  • BSTRs are allocated using COM memory allocation functions. This allows them to be returned from methods without concern for memory allocation.
Dominic Mitchell
Phoenix Visual Designer
http://www.phnxthunder.com

Eros Olmi

#11
Quote from: Theo Gottwald on August 07, 2007, 07:18:04 AM
There are things in life we just have to accept :-), like the weather or this fact.

I always think that what is right today could be wrong tomorrow. With this in mind, the correct ways to go, for me, is:

  • document by myself
  • ask to people working on the subject and can for sure know beter than me on that particular aspect
  • test by myself

But again, things can change again ... tomorrow. That's evolution, improvement in IT. And we like it, isn't it? I think we like the fact we are people open to changes. That's also why many other people think our job is too much complicated: they are not so open to work on a world continuously changing and evolving.

:D

That said, PB dynamic strings are NULL terminated!!
thinBasic Script Interpreter - www.thinbasic.com | www.thinbasic.com/community
Win7Pro 64bit - 8GB Ram - Intel i7 M620 2.67GHz - NVIDIA Quadro FX1800M 1GB

Theo Gottwald

QuoteBut again, things can change again ... tomorrow.

Thats why I chose the example with the weather. It may change ...
Besides that I am quite sure that the null-terminated string-weather will stay with us for a while.

Eros Olmi

oops, sorry. I got it in the other way round.
You are right, both weather and null matters  ;D
thinBasic Script Interpreter - www.thinbasic.com | www.thinbasic.com/community
Win7Pro 64bit - 8GB Ram - Intel i7 M620 2.67GHz - NVIDIA Quadro FX1800M 1GB

Donald Darden

The point that is being overlooked here, is that regardless of what mechanism that PowerBasic uses for managing its dynamic strings, it is able to support embedded null bytes.  Thus you cannot assume that if you search the string for a null byte to end the string, you may encounter a null character instead.  And the length of the dynamic string is managed separately apart from any possible terminating nulls.  To a programmer then, the dynamic nature of the strings appears to be organized on a byte basis from the string pointer, for the number of bytes given by the length.

You are also ignoring the fact that PowerBasic has another string type that is
handled strictly as a null-terminated string, that is defined with a maximum length, and which length is currently set by the first null byte encoutnered.
PowerBasic calles these ASCIIZ strings.   Arguing as you are without regard for these distinctions only causes others to get confused about the nature of the various strings allowed by PowerBasic.  So are you going to be sticklers about mechanisms, or acknowledge intended usage?