64 lines
2.8 KiB
Markdown
64 lines
2.8 KiB
Markdown
# Simple C-compiler
|
|
|
|
This is a simple work-in-progress C-compiler (or C-like) written in C++ with the
|
|
goal of learning to write C++ better in the future, and to provide reference for
|
|
my knowledge about modern C++ programming.
|
|
|
|
As of writing, a simple fibonacci sequence program is already possible to be
|
|
compiled and executed, and can be viewed via [`test.c`](./test.c).
|
|
|
|
As far as compiler-design goes, this project still falls behind my other
|
|
project, [Reid-LLVM](https://git.teascade.net/teascade/reid-llvm), which is
|
|
significantly more capable and more robust.
|
|
|
|
## Structure of the program
|
|
|
|
The program is structured into several different staged, all of which are
|
|
orchestrated via main.cpp.
|
|
|
|
Currently the stages are as follows:
|
|
|
|
1. Firstly, the program is **tokenized**. This stage could also be called the
|
|
lexer, depending on your preference. In this stage, the source code for the
|
|
program is transformed into discrete tokens which can then be used during the
|
|
parsing phase easier than regular text. The code for this stage is mostly in
|
|
[`src/tokens.cpp`](src/tokens.cpp).
|
|
2. **TODO:** Preprocessing stage hasn't yet been developed, but it will go here.
|
|
3. Then the program is **parsed**. This is the stage where the tokens from the
|
|
previous stage(s) are converted into an Abstract Syntax Tree (AST), which is
|
|
a format that is easier for the computer to process. The AST itself lives in
|
|
[`src/ast.h`](src/ast.h), and the code for the parsing phase lives in
|
|
[`src/parsing.cpp`](src/parsing.cpp).
|
|
4. In the typechecking stage we do static analysis on the generated AST to make
|
|
sure expected types match true types, and do other checks (such as checking
|
|
that the correct amount of parameters is provided in function calls). The
|
|
source code for this stage lives in
|
|
[`src/typechecker.cpp`](src/typechecker.cpp).
|
|
5. Finally the program is **compiled**, or in other words **code-generated**,
|
|
hence why this is the **codegen** stage. This is where the AST from the
|
|
previous stages is taken and LLVM Intermediate Representation is produced
|
|
using LLVM-bindings. The source code for this stage resides mostly in
|
|
[`src/codegen.cpp`](src/codegen.cpp).
|
|
|
|
## Compiling and running the program
|
|
|
|
In order to compile the program, you need the following:
|
|
- CMake
|
|
- C++20 (or newer) capable compiler
|
|
- LLVM 21.1.0 or newer
|
|
|
|
And in order to execute the program which is compiled you also need:
|
|
- LLVM 21.1.0 or newer (as it is dynamically linked)
|
|
- `whereis`-utility in `$PATH`
|
|
- `ld`-utility in `$PATH`
|
|
|
|
Then, to compile the program you run:
|
|
```sh
|
|
cmake -Bbuild
|
|
make -C build
|
|
```
|
|
|
|
and to run the program, run simply `./build/llvm_c_compiler`. This will read a
|
|
file called `test.c` from `$PWD`, and produce two files (`test.o` and `test`).
|
|
An executable file called `test` is produced as a result, compiled from the
|
|
original `test.c`. |