SUMMARY: DTK C compiler (V6.3-126) with -E switch produces incorrect output

From: Matti Saarinen <mjs_at_cc.tut.fi>
Date: Fri, 15 Dec 2000 08:35:54 +0200

I received a reply from John Parks (He project leads the DTK C
compiler), whom I would like to thank. His reply goes as follows:

  "cc" was designed to process C source input. It expects C source
  input. And when it produces explicit output, it expects that that
  output will itself be read by a C compiler.
  
  "cc -E" has a potential problem in that its output, when read by a
  compiler, may produce a different tokenization. Consider
  
      #define f(x) x
      int f(y)f(y);
  
  When f(y)f(y) is initially processed by cc, it is comprised of 2
  tokens (y and y). If "cc -E" produces explicit output, the output
  might look like
  
      int yy;
  
  and this, if fed back into a compiler, would produce a single
  identifier token for yy. We call this "de facto token-pasting" and
  it is a problem. Users expect that the following should generate
  EXACTLY the same result
  
      % cc foo.c
  
      % cc -E foo.c > bar.c
      % cc bar.c
  
  If "de facto token-patsing" occurs, this may not be true.
  
  To combat "de facto token-pasting" cc sometimes outputs extra blank
  spaces between tokens that it thinks could possibly be "pasted"
  together if the output was fed back into a compiler. So, for the
  f(y)f(y) case, "cc -E" will output "y y".
  
  Unfortunatley, the compiler is not perfect in this and sometimes
  outputs extra blank spaces. That's what you ran into. Note that
  for C source code the extra blank spaces make NO (zero, zilch)
  difference. They can only matter for non-C source (which you are
  feeding to cc).
  
  So there are two ways you fix your problem:
  
    1) BEST would be for you to use "cpp" rather than "cc -E". cpp is
       a TEXT processor, it is perfectly suited to your task, and it
       will give you your desired result.
  
    2) the "default language mode" in the DTK version of cc is
       "relaxed ANSI". The "default language mode" of the compiler
       that shipped with the OS on V4.0G was "common C". So,
  
           % cc -E is equivalent to cc -std -E
           % cc -nodtk -E is equivalent to cc -std0 -nodtk -E
  
       The "de facto token-pasting" prevention behavior that you're
       running into is enabled in "relaxed ANSI" (-std) mode but not
       "common C" mode (-std0). You can get the behavior you want by
       simply adding and explicit "-std0" to your command-line (rather
       then relying on the default).
  
  I'd strongly urge you to consider option 1. Option 2 continues to
  use cc -E as a general text processor and that is NOT its purpose.


My original question:

> Hello
>
> I tested on 4.0G the Compaq C compiler included in latest DTK and
> noticed that its -E switch does not work correctly.
>
> An example:
>
> % cat a.c
> #ifndef LIB_X11_LIB
> #define LIB_X11_LIB -lX11
> #endif
>
> configure___ LIBX=LIB_X11_LIB
>
> % cc -E a.c
> # 1 "a.c"
>
>
>
>
> configure___ LIBX= - lX11
> /\
> ||
> There is an extra white space before 'l'.
>
> % cc -nodtk -E a.c
> # 1 "a.c"
>
>
>
>
> configure___ LIBX=-lX11
>
>
> I read the documentation and I did not find any mentions about cc -E
> being depreceated or the behaviour of the option being chaged.
> Therefore I think this is a bug.
>
> Of course I can avoid the problem by using cpp or 'cc -E -nodtk'
> while using preprocessor but that is not a good solution. For
> example the configure scripts produced by GNU autoconf use 'cc -E'
> as preprocessor by default.


-- 
- Matti -
Received on Fri Dec 15 2000 - 06:37:16 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT