Detailed analysis of the C language compilation process

C language compilation process

Knowing the C compilation execution process is the beginning of C learning.

Let's simply say that the C language goes through the process of coding compilation to execution:

C language learning group

C source code

Compile -----> Form the target code, the target code is the code running on the target machine.

Connect----> Connect the object code to the C library, and merge the library code used by the source program with the object code to form the final executable binary machine code (program).

Execute -----> Run the C program in a specific machine environment.

If you use a graph to represent:

Compile, the compiler reads the source program (character stream), analyzes the lexical and grammatical functions, converts the high-level language instructions into functional equivalent assembly code, and then converts them into machine language by the assembler, and The executable file format is required to generate an executable program.

C source program header file --> pre-compiled processing (cpp) --> compiler itself --> optimizer --> assembler --> linker --> executable file

Compile preprocessing

Read the c source program, handle the pseudo instructions (the instructions beginning with #) and special symbols

[Analysis] The pseudo-instructions mainly include the following four aspects.

(1) Macro definition instructions, such as #define Name TokenString, #undef, etc. For the previous directive, what precompile has to do is replace all the Names in the program with TokenString, but the Name as a string constant is not replaced. For the latter, the definition of a macro will be removed so that the occurrence of the string will not be replaced in the future.

(2) Conditional compilation instructions such as #ifdef, #ifndef, #else, #elif, #endif, and so on. The introduction of these directives allows the programmer to determine which code the compiler handles by defining different macros. The precompiler will filter out those unnecessary code based on the relevant files.

(3) The header file contains instructions such as #include "FileName" or #include. A large number of macros (most commonly character constants) are generally defined in the header file with the directive #define, and contain declarations of various external symbols. The purpose of using header files is primarily to make certain definitions available to many different C source programs. Because in the C source that needs to use these definitions, just add an #include statement, and you don't have to repeat these definitions in this file. The precompiler will add the definitions in the header file to the output file it produces for the compiler to process.

The header files included in the c source program can be provided by the system. These header files are usually placed in the /usr/include directory. In the program #include them to use angle brackets (<>). In addition, developers can also define their own header files, which are usually placed in the same directory as the c source program. In this case, double quotes ("") should be used in #include.

(4) Special symbols, the pre-compiler can recognize some special symbols. For example, the LINE identifier appearing in the source program will be interpreted as the current line number (decimal number), and FILE will be interpreted as the name of the currently compiled C source program. The precompiler will replace the strings that appear in the source with the appropriate values.

What the precompiler does is basically the "alternative" work on the source program. After this substitution, an output file with no macro definition, no conditional compilation instructions, and no special symbols is generated. The meaning of this file is the same as the source file without preprocessing, but the content is different. Next, this output file will be translated into machine instructions as output from the compiler.

C language learning group

2. Compilation phase

In the precompiled output file, there will be only constants. Such as numbers, strings, variable definitions, and C language keywords, such as main, if, else, for, while, {,}, +, -, *, \, and so on. The precompiler's job is to translate it into equivalent intermediate code representations or assembly code after confirming that all instructions conform to the grammar rules by lexical analysis and parsing.

3. Optimization stage

Optimization processing is a relatively difficult technique in the compilation system. The problems it involves are not only related to the compilation technology itself, but also have a great relationship with the hardware environment of the machine. Part of the optimization is the optimization of the intermediate code. This optimization does not depend on a specific computer. Another optimization is mainly for the generation of object code. In the image above, we put the optimization phase behind the compiler, which is a more general representation.

For the former optimization, the main work is to delete public expressions, loop optimization (extension of code, weakening of strength, transformation loop control conditions, merging of known quantities, etc.), copy propagation, and deletion of useless assignments, and so on.

The latter type of optimization is closely related to the hardware structure of the machine. The most important thing is to consider how to make full use of the values ​​of the variables stored in the machine's various hardware registers to reduce the number of accesses to the memory. In addition, how to adjust the instructions according to the characteristics of the machine hardware execution instructions (such as pipeline, RISC, CISC, VLIW, etc.) makes the target code shorter and the execution efficiency is higher, which is also an important research topic.

The optimized assembly code must be converted to the corresponding machine instructions by the assembly of the assembler before it can be executed by the machine.

4. Assembly process

The assembly process actually refers to the process of translating assembly language code into target machine instructions. For each C language source program processed by the translated system, this processing will be finally obtained to obtain the corresponding object file. The machine language code stored in the object file is the target that is equivalent to the source program.

The target file consists of segments. Usually there are at least two segments in an object file:

Code segment:

The main items included in this paragraph are the instructions of the program. This section is generally readable and executable, but generally not writable.

Data segment:

Mainly store various global variables or static data to be used in the program. General data segments are readable, writable, and executable.

There are three main types of object files in the UNIX environment:

(1) Relocatable files

It contains code and data suitable for other target file links to create an executable or shared object file.

(2) shared object files

This file holds code and data suitable for linking in two contexts. The first type of linker can process it with other relocatable files and shared object files to create another object file; the second is the dynamic linker to share it with another executable file and other shared object files. Combine them together to create a process image.

(3) executable file

It contains a file that can be created by the operating system to execute a process.

The assembler generates the actual type of object file. For the latter two, you need some other processing to get it. This is the work of the linker.

5. Linker

The object files generated by the assembler are not immediately executable, and there may be many unresolved issues. For example, a function in one source file may reference a symbol (such as a variable or a function call) defined in another source file; a function in a library file may be called in the program, and so on. All of these issues need to be resolved by the processing of the linker.

The main job of the linker is to connect the relevant object files to each other, and also to link the symbols referenced in one file with the definition of the symbol in another file, so that all of these object files become one capable of operating the system. Into the unified implementation of the implementation.

According to the way the developer specifies the same library function, the link processing can be divided into two types:

(1) Static link

In this way of linking, the function's code will be copied from its local static link library to the final executable. Thus the code will be loaded into the virtual address space of the process when it is executed. A statically linked library is actually a collection of object files, each of which contains code for one or a set of related functions in the library.

(2) Dynamic link

In this way, the function's code is placed in an object file called a dynamic link library or shared object. What the linker does at this point is to record the name of the shared object and a small amount of other registration information in the final executable. When this executable is executed, the entire contents of the dynamic link library will be mapped to the virtual address space of the corresponding process at runtime. The dynamic linker will find the corresponding function code based on the information recorded in the executable.

For function calls in the executable, you can use dynamic linking or static linking separately. Using dynamic linking can make the final executable short, and saves some memory when shared objects are used by multiple processes, because only one copy of this shared object's code needs to be saved in memory. But not using dynamic links is necessarily better than using static links. In some cases dynamic linking can cause some performance damage.

After the above five processes, the C source program is finally converted into an executable file. By default, the name of this executable is named a.out.

60W Solar Panel

Our Professional 60W solar panel manufacturer is located in China. including Solar Module. PV Solar Module, Silicon PV Solar Module, 60W solar panel for global market.

60W 1

60W solar panel, Solar panel, PV solar panel, Silicon solar panel

Jiangxi Huayang New Energy Co.,Ltd , https://www.huayangenergy.com