# Simple C-compiler This is a simple work-in-progress C-compiler (or C-like) written in C++ with the goal of learning to write C++ better in the future, and to provide reference for my knowledge about modern C++ programming. As of writing, a simple fibonacci sequence program is already possible to be compiled and executed, and can be viewed via [`test.c`](./test.c). As far as compiler-design goes, this project still falls behind my other project, [Reid-LLVM](https://git.teascade.net/teascade/reid-llvm), which is significantly more capable and more robust. ## Structure of the program The program is structured into several different staged, all of which are orchestrated via main.cpp. Currently the stages are as follows: 1. Firstly, the program is **tokenized**. This stage could also be called the lexer, depending on your preference. In this stage, the source code for the program is transformed into discrete tokens which can then be used during the parsing phase easier than regular text. The code for this stage is mostly in [`src/tokens.cpp`](src/tokens.cpp). 2. **TODO:** Preprocessing stage hasn't yet been developed, but it will go here. 3. Then the program is **parsed**. This is the stage where the tokens from the previous stage(s) are converted into an Abstract Syntax Tree (AST), which is a format that is easier for the computer to process. The AST itself lives in [`src/ast.h`](src/ast.h), and the code for the parsing phase lives in [`src/parsing.cpp`](src/parsing.cpp). 4. In the typechecking stage we do static analysis on the generated AST to make sure expected types match true types, and do other checks (such as checking that the correct amount of parameters is provided in function calls). The source code for this stage lives in [`src/typechecker.cpp`](src/typechecker.cpp). 5. Finally the program is **compiled**, or in other words **code-generated**, hence why this is the **codegen** stage. This is where the AST from the previous stages is taken and LLVM Intermediate Representation is produced using LLVM-bindings. The source code for this stage resides mostly in [`src/codegen.cpp`](src/codegen.cpp). ## Compiling and running the program In order to compile the program, you need the following: - CMake - C++20 (or newer) capable compiler - LLVM 21.1.0 or newer And in order to execute the program which is compiled you also need: - LLVM 21.1.0 or newer (as it is dynamically linked) - `whereis`-utility in `$PATH` - `ld`-utility in `$PATH` Then, to compile the program you run: ```sh cmake -Bbuild make -C build ``` and to run the program, run simply `./build/llvm_c_compiler`. This will read a file called `test.c` from `$PWD`, and produce two files (`test.o` and `test`). An executable file called `test` is produced as a result, compiled from the original `test.c`.