• Welcome to Jose's Read Only Forum 2023.
 

Want to start parsing C code ?

Started by Steve Hutchesson, December 21, 2009, 11:41:56 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Steve Hutchesson

Here is an algo that strips the comments out of C code, both the old type /* comment */ and the line end type // comment.

You will not have to hold your breath waiting for this one.  ;D


#IF 0  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

                          Strip C and C++ comments from source code
                   -------------------------------------------------------------
                   stripcc removes C++ comments // and old style C comments
                   /*------------- old style C comment -----------------*/
                   removes trailing spaces on lines, with or without comments
                   -------------------------------------------------------------

                             Return value is the written length

#ENDIF ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

FUNCTION stripcc(ByVal lpsource as LONG, _
                 ByVal lnsource as LONG, _
                 ByVal lpresult as LONG) as LONG

    ! mov esi, lpsource
    ! mov edi, lpresult
    ! mov ecx, lnsource
    ! add ecx, esi            ' exit condition in ECX

  lbl1:
    ! mov al, [esi]
    ! inc esi
    ! cmp al, "/"
    ! je comment1
  rtn:
    ! cmp al, 13              ' branch to trim trailing spaces
    ! je trimr
  nxt1:
    ! mov [edi], al
    ! inc edi
    ! cmp esi, ecx
    ! je outa_here            ' exit on source length
    ! jmp lbl1

  trimr:                      ' trim trailing spaces
    ! cmp BYTE PTR [edi-1], 32
    ! jne nxt1
    ! dec edi
    ! jmp trimr

  comment1:
    ! cmp BYTE PTR [esi], "/" ' read next character in ESI
    ! je cpp
    ! cmp BYTE PTR [esi], "*"
    ! je oldc
    ! jmp rtn                 ' if not a comment, write byte in AL to [EDI]

  cpp:
    ! mov al, [esi]
    ! inc esi
    ! cmp esi, ecx
    ! je outa_here            ' exit on source length
    ! cmp al, 13
    ! je rtn
    ! jmp cpp

  oldc:
    ! mov al, [esi]
    ! inc esi
    ! cmp esi, ecx
    ! je outa_here            ' exit on source length
    ! cmp al, "*"
    ! je last
    ! jmp oldc

  last:
    ! cmp BYTE PTR [esi], "/"
    ! jne oldc
    ! inc esi
    ! jmp lbl1

  outa_here:

    ! sub edi, lpresult       ' get the byte count written to [edi]

    ! mov FUNCTION, edi

END FUNCTION

' «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

Theo Gottwald

QuoteYou will not have to hold your breath waiting for this one

Nobody expects this when we use your code, Steve :-).

Do you make a C-Compiler after breakfast... or was it just fingertyping for fun?

Steve Hutchesson

Theo,

The algo is very useful for working on the complete collection of Microsoft SDK header files. You actually need a C compiler to get a clear look at their contents but you make the task much simpler by stripping out all of the inline comments so that the prototypes, structures, unions and the like are compacted back together so you at least read them and where necessary convert them to another language. I have to keep a reasonable amount of this stuff handy to update the windows.inc file in masm32 and without being able to automate a lot of this work it would take a lifetime to do it all manually.