Simple C-compiler (or C-like) written using C++20
Go to file
2026-04-16 23:47:42 +03:00
.vscode Add metadata to tokens 2026-04-10 23:40:59 +03:00
src Fix some indexing problems 2026-04-16 23:47:42 +03:00
.gitignore Link everything in main.cpp 2026-04-11 19:14:13 +03:00
CMakeLists.txt Add compilation of casts 2026-04-13 21:08:57 +03:00
README.md Update README.md 2026-04-13 18:28:17 +03:00
test.c Fix some indexing problems 2026-04-16 23:47:42 +03:00

Simple C-compiler

This is a simple work-in-progress C-compiler (or C-like) written in C++ with the goal of learning to write C++ better in the future, and to provide reference for my knowledge about modern C++ programming.

As of writing, a simple fibonacci sequence program is already possible to be compiled and executed, and can be viewed via test.c.

As far as compiler-design goes, this project still falls behind my other project, Reid-LLVM, which is significantly more capable and more robust.

Structure of the program

The program is structured into several different staged, all of which are orchestrated via main.cpp.

Currently the stages are as follows:

  1. Firstly, the program is tokenized. This stage could also be called the lexer, depending on your preference. In this stage, the source code for the program is transformed into discrete tokens which can then be used during the parsing phase easier than regular text. The code for this stage is mostly in src/tokens.cpp.
  2. TODO: Preprocessing stage hasn't yet been developed, but it will go here.
  3. Then the program is parsed. This is the stage where the tokens from the previous stage(s) are converted into an Abstract Syntax Tree (AST), which is a format that is easier for the computer to process. The AST itself lives in src/ast.h, and the code for the parsing phase lives in src/parsing.cpp.
  4. In the typechecking stage we do static analysis on the generated AST to make sure expected types match true types, and do other checks (such as checking that the correct amount of parameters is provided in function calls). The source code for this stage lives in src/typechecker.cpp.
  5. Finally the program is compiled, or in other words code-generated, hence why this is the codegen stage. This is where the AST from the previous stages is taken and LLVM Intermediate Representation is produced using LLVM-bindings. The source code for this stage resides mostly in src/codegen.cpp.

Compiling and running the program

In order to compile the program, you need the following:

  • CMake
  • C++20 (or newer) capable compiler
  • LLVM 21.1.0 or newer

And in order to execute the program which is compiled you also need:

  • LLVM 21.1.0 or newer (as it is dynamically linked)
  • whereis-utility in $PATH
  • ld-utility in $PATH

Then, to compile the program you run:

cmake -Bbuild
make -C build

and to run the program, run simply ./build/llvm_c_compiler. This will read a file called test.c from $PWD, and produce two files (test.o and test). An executable file called test is produced as a result, compiled from the original test.c.