• Welcome to Jose's Read Only Forum 2023.
 

Garbage Collection

Started by Frederick J. Harris, February 22, 2013, 06:12:44 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Frederick J. Harris

Hi Charles!  Is garbage collection the trickiest part of writting a compiler such as you are doing with OxygenBasic? 

Charles Pegge

#1
Hi Fred,

Garbage collection is not too difficult for a compiler, (but a chore for manual coding). The principle is to maintain lists for all the strings and dynamic structures created by the program, then to use the lists to free all the strings when they go out of scope.

The strings I use are indirect bstrings (OLE strings) so the value of the string is a pointer to a position in one of these lists.

When a string is altered, the original Bstr is freed and the new Bstr replaces it in the list. So a string always carries the same pointer value, while the StrPtr is volatile.

To make the listing system capable of handling unlimited strings, the lists are constructed as chainable blocks each capable of holding 256 or so of Bstrings. When a block becomes full, a new block is created, and linked to the previous block. With this system, strings are easily freed in reverse order.

One important aspect of garbage collection lists, is that all the procedures used to manage them should be re-entrant, To avoid conflicts when using multiple threads. Each thread must have its own independent set of lists.

Here is concept a chainable string list :)





Charles

Theo Gottwald

Charles, Garbage Collection has the problem that during erxtensive string operation, it may leave a lot of  fragmented Memory.
A good sollution for garbage collection should also collect smaller blocks and put them together.

Charles Pegge

Theo,
I'm experimenting with a string pool for recycling small strings. This sits between the the OS and memory allocation requests. But GC deletion of strings in reverse order also helps.

Garbage collection can, of course be bypassed by using bstrings directly. Direct Bstrings are supported by all the string functions.

Charles

Frederick J. Harris

Thanks for the information Charles.  Its something I've never given much thought to, until just recently. 

Is it easier to write something like a C compiler, where there is a rather severe limit to the extent that the compiler tries to track what goes on, rather concentrating on just translating statements to ASM or machine code? 

Where I'm going with this is that while that feeling is out there that somehow the C family languages are more 'professional' or serious tools than the basic family of languages, the situation might very well be reversed in terms of said compilers, i.e., really good basic syntax languages with GC are harder to produce than C type languages?  I haven't mentioned anything about C++, because that brings in a whole host of severe problems I'd imagine for compiler writers, i.e., complicated inheritance chains, deleting the same, so on and so forth.

The reason I'm wondering these things is that for the past month, seven days a week, mornings and nights, I've been fighting the worst bug of my life, and I believe its source is GC somewhere along the line.  Thoughts anyone?


Charles Pegge

#5
Hi Fred,

I think the main reason that C has become the dominant language was its adoption by Microsoft for Windows. in the late '80s. Perhaps it was the decision of one individual (Charles Simonyi?). It could have been  Basic+. Flogging Basic (and proto-OS) to home computer makers, was Microsofts main line of business before the PC.

It is certainly an elegant language but its pointer/array system is hard to cope with. Low-level coders have to override the C compiler with endless casts, Also, expressions for calculating pointer address values, do not always behave as expected, due to strange operator associativity.

I found I had to examine the gehaviour of each construct in minute detail, before attempting to use it in my main code.

I bet your code is missing a vital pair of brackets in a pointer statement, Fred :)

Charles



Frederick J. Harris

Quote
I bet your code is missing a vital pair of brackets in a pointer statement, Fred

No, its not C code, if that's what you meant Charles.  Its my PowerBASIC ActiveX Grid code I posted here.  PBWin 10.03 broke it.  It does appear to be a memory corruption  issue such as one would have with memory over/under runs, but as you know with these things, they are hard to find.  It only occurs on Win 2000/XP, and, seemingly, only when multiple grid instantiations occur.  Most times I get GPFs when closing out.  Sometimes it doesn't occur.  I may have it narrowed down to a Parse Statement issue.  That statement was broke in the intial 10.0 release, and I sent in a bug report and it was fixed in 10.01.  10.02 worked fine too.  It was during that time period when I was developing the grid, i.e., spring/summer 2011.  In the 10.03 History.txt file it mentions changes to prevent a possible memory leak.  I'm wondering if something done there could be a cause.  In all my debug logs everything outputs perfect, including all parsed strings.  Nontheless I get the crashes at program termination.  Memory seems to be getting corrupted in a random manner.  I was wondering if attempts were being made to free memory twice.  Still trying to get to the bottom of it.  Everything works perfect on Windows 7, both 32 and 64 bit.  That's the strange thing.  If I don't solve this very soon I'm going to have to give up on it, and just use the grid compiled with 10.01 or 10.02.  I hate to leave it lying like that though, you know - unsolved.

Charles Pegge

#7
Hi Fred,

My approach would be to gradually deconstruct the grid until the malfunction disappears. Ideally disabling large chunks of code then narrowing down in a binary manner. It is ruthless, but I find it never fails, and often the bugs know their fate is inevitable and surrender early :)

Supposing you had 64k lines of code. In an ideal bug-hunt,you would be able to isolate the problem down to a single line in 15 trials.

If the failure shows up intermittently then testing each pass will take much longer, so the first thing to do is come up with a test that always crashes the code.

Charles

Frederick J. Harris

#8
Yes, that's what I've been working on Charles.  That's basically my technique too.  I've eliminated something like 1500 lines out of 3200, and all I'm left with now of the code that creates the grid is a CreateWindowEx call that creates a blank container window, and a Parse statement, the elimination of which stops the crashes.  Tomorrow I'm going to code up my own version of Parse - MyParse, or something like that, and then see what happens.  The thing that has me stumped is that aside from my COM Dll which creates the grid, I have not been able to isolate the problem, that is, I can't find any failure with Parse, except in the context of the larger component.  Of course, in the context of a COM component that creates a GUI object, we're two levels of indirection away from the client that creates the object.  But I don't see what difference that makes.  But it seems to.   

Quote
If the failure shows up intermittently then testing each pass will take much longer, so the first thing to do is come up with a test that always crashes the code.

Yep!  We're right on there too.  That's what has taken me so long. 

Charles Pegge


You could try putting the parse statement outside your grid object in a wrapper function. This would isolate all its internal memory transactions from everything else.

Theo Gottwald

The Garbage collection and here the task to avoid Memory fragmentation is something that came with LISP. I believe that PB also does such stuff internally.
Just imagine that a program works extensively with strings of all sizes, many small and some very large. At the end you get more and more fragmented and unusable Memory, if you do not "collect garbage" from time to time that means also "compact the memory".
Now the arts about it is to do that transparently without significantly slowing down the normal program execution while (possibly) moving around large chunks of memory.

Charles Pegge

Good times to release string memory:

1 When a string is altered.

2 End of all statements, involving functions that return a string.

3 End of functions that use local strings

4 End of programs that have static or global strings.

The OS is smart enough to cope with memory fragmentation issues, but I think having a string recycling pool for small strings (say <60bytes)   helps. A fixed quantum of memory is allocated to each of these. Very simple to maintain.

Application-->GarbageCollector-->Memory Pool-->OS Memory Allocation
                                               \----------------------->

Charles

Patrice Terrier

#12
Fred

I reported a similar problem with the parsing of COMMAND$ (about PB10), and it has never been solved.


QuoteProblem with COMMAND$

--------------------------------------------------------------------------------

I couldn't get the same result trying to pass a full qualified path (using white space within the argument) with the PowerBASIC COMMAND$ and the core API GetCommandLineW.

Here is the lpCmdLine i am using:
D:\Patrice Terrier\Fly!V3\Exe\BBplugin\FLY_Bubble.dll

Using COMMAND$ i get
Terrier\Fly!V3\Exe\BBplugin\FLY_Bubble.dll

Using acode$(PEEK$(lpCmdLine, 260)) from WINMAIN i get also
Terrier\Fly!V3\Exe\BBplugin\FLY_Bubble.dll

While using the core API acode$(PEEK$(GetCommandLineW(), 260)) i get
D:\Patrice Terrier\Fly!V3\Exe\BBplugin\FLY_Bubble.dll
that is the correct result i am looking for.

Also it seems that PB10 breaks the code of previous version by using WSTRINGZ within WINMAIN arguments.
Patrice Terrier
GDImage (advanced graphic addon)
http://www.zapsolution.com

Frederick J. Harris

Since I haven't been doing much DOS for awhile, I hardly use Command$ anymore.  Had it worked correctly for you with embedded spaces such as in long filenames in pre-PB10 versions of the compiler Patrice?

Patrice Terrier

Ok, here is the example that shows the problem i have been faced with COMMAND$ using CreateProcessA.

As you can see, stangely, it works with the API GetCommandLineW (even when using CreateProcessA instead of CreateProcessW)

...
Patrice Terrier
GDImage (advanced graphic addon)
http://www.zapsolution.com