From 763c7981a481dd367526fffde47617c808f1d1c5 Mon Sep 17 00:00:00 2001 From: Matthew Flatt Date: Thu, 9 Jul 2020 13:32:10 -0600 Subject: [PATCH] add a guide implementation This first cut is heavy on describing how backends work, because that's fresh in mind and relevant to ongoing effort. original commit: 964ce95dc910e6c1825b341bf05492af5903cc82 --- IMPLEMENTATION | 812 +++++++++++++++++++++++++++++++++++++++++++++++++ s/cmacros.ss | 91 +++++- 2 files changed, 897 insertions(+), 6 deletions(-) create mode 100644 IMPLEMENTATION diff --git a/IMPLEMENTATION b/IMPLEMENTATION new file mode 100644 index 0000000000..e42e7f9288 --- /dev/null +++ b/IMPLEMENTATION @@ -0,0 +1,812 @@ +Getting Started +--------------- + +Most of the Chez Scheme implementation is in the "s" directory. The +C-implemented kernel is in the "c" directory. + +Some key files in "s": + + * "cmacro.ss": object layouts and other global constants + + * "syntax.ss": the macro expander + + * "cpnanopass.ss": the main compiler + + * "cp0.ss", "cptypes.ss", "cpletrec.ss", etc.: source-to-source + passes that apply before the main compiler + + * "x86_64.ss", "arm64.ss", etc.: backends that are used by + "cpnanopass.ss" + + * "ta6os.def", "tarm64le", etc.: one per OS-architecture combination, + provides platform-specific constants that feed into "cmacro.ss" and + selects the backend used by "cpnanopass.ss" + +Scheme Objects +-------------- + +A Scheme object is represented at run time by a pointer. The low bits +of the pointer indicate the general type of the object, such as "pair" +or "closure". The memory referenced by the pointer may have an +additional tag word to further refine the pointer-tag type. + +See also: + + Don't Stop the BiBOP: Flexible and Efficient Storage Management for + Dynamically Typed Languages. + R. Kent Dybvig, David Eby, and Carl Bruggeman. + Indiana University TR #400, 1994. + +For example, if "cmacro.ss" says + + (define-constant type-pair #b001) + +then that means an address with only the lowest bit set among the low +three bits refers to a pair. To get the address where the pair content +is stored, round *up* to the nearest word. So, on a 64-bit machine, +add 7 to get to the `car` and add 15 to get to the `cdr`. Since +allocation on a 64-byte machine is 16-byte aligned, the hexadecimal +form of every pair pointer will end in "9". + +The `type-typed-object` type, + + (define-constant type-typed-object #b111) + +refers to an object whose first word indicates its type. In the case +of a Scheme record, that first word will be a record-type descriptor +--- that is, a pointer to a record type, which is itself represented +as a record. The based record type, `#!base-rtd` has itself as its +record type. Since the type bits are all ones, on a 64-bit machine, +every object tagged with an additional type workd will end in "F" in +hexadecimal, and adding 1 to the pointer produces the address +containing the record content (which starts with the rrecord type, so +add 9 instead to get to the first field in the record). + +As another example, a vector is represented as `type-typed-object` +pointer where the first word is a fixnum. That is, a fixnum used a +type word indicates a vector. The fixnum value is the vector's length +in wordobjects, but shifted up by 1 bit, and then the low bit is set +to 1 for an immutable vector. + +Most kinds of Scheme values are represented records, so the layout is +defined by `define-record-type` and similar. For the primitive object +types that are not records (and even a few that are), the layouts are +defined in "camcros.ss". For example, an `exactnum` (i.e., a complex +number with exact real and imaginary components) is defined as + + (define-primitive-structure-disps exactnum type-typed-object + ([iptr type] + [ptr real] + [ptr imag])) + +The `type-typed-object` in the first line indicates that an exactnum +is represented by a pointer that is tagged with `type-typed-object`, +and so we should expect the first first to be a type word. That's why +the first field above is `type`, and it turns out that it will always +contain the value `type-inexactnum`. The `iptr` type for `type` means +"a pointer-sized signed integer". The `ptr` type for `real` and `imag` +means "pointer" or "Scheme object". + +Functions and Calls +------------------- + +Scheme code does not use the C stack, except to the degree that it +interacts with C functions. Instead, the Scheme continuation is a +separate, heap-allocated, linked list of stack segments. Locally, you +can just view the continuatiton as a stack and assume that overflow +and continuation operations are handled as needed at the boundaries. + +See also: + + Representing Control in the Presence of First-Class Continuations. + Robert Hieb, R. Kent Dybvig, and Carl Bruggeman. + Programming Language Design and Implementation, 1990. + + Compiler and Runtime Support for Continuation Marks. + Matthew Flatt and R. Kent Dybvig. + Programming Language Design and Implementation, 2020. + +To the degree that the runtime system needs global state, that state +is in the thread context (so, it's thread-local), which we'll +abbreviate as "TC". Some machine register is desgined as the `%tc` +register, and it's initialized on entry to Scheme code. For the +defintion of TC, see `(define-primitive-structure-disps tc ...)` in +"cmacro.ss". + +The first several fields of TC are virtual registers that may be +assigned to machine registers, in which case the TC and registers are +synced on entry and exit from Scheme code, including when calling +kernel functionality from Scheme code. In particular, the SFP (Scheme +frame pointer) virtual register must be assigned to a real register, +because it's the Scheme stack pointer. The TC and SFP registers are +the only two that absolutely must be registers, but AP (allocation +pointer) and TRAP registers are also good candidates on architectures +where plenty of registers are available. + +The Scheme stack grows up, and SFP points to the beginning (i.e., the +low address) of the current stack frame. The first word of a stack +frame is the return address, so a frame looks like this: + + ^ + | (higher addresses) + future + frames + |------------| + | var N | + |------------| + | ... | .... + |------------| + | var 1 | SFP[1] + |------------| + | ret addr | SFP[0] + SFP -> |------------| + previous + frames + | (lower addresses) + v + +On entry to a Scheme function, a check ensures that the difference +between SFP and the end of the current stack segment is big enough to +accomodate the (spilled) variables of the called function, plus enough +slop to deal with some primitive operations. + +A non-tail call moves SFP past all the live variables of the current +function, installs the return address as as pointer within the current +function, and then jumps to the called function. Function calls and +returns do not use machine "call" and "return" instructions; +everything is just a "jump". ("Call" and "return" instructions are +used for for C interactions.) It's the caller's responsibity to reset +SFP back on return, since the caller knows how much it moved SFP +before calling. + +The compiler can use a register for the return address instead of +immediately installing it in SFP[0] on a call. That mode is triggered +by giving one of the regisers the name `%ret` (as described in +"Machine Registers" below). Currently, however, the called Scheme +function will immediatelly copy the register into SFP[0], and it will +always return by jumping to SFP[0]. So, until the compiler improves to +deal with leaf functions differently, using a return register can help +only with hand-coded leaf functions that don't immediately move the +return register into SFP[0]. + +There are two ways that control transitions from C to Scheme: an +initial call through `S_generic_invoke` (see "scheme.c") or via a +foreign callable. Both of those go through `S_call_help` (see +"schlib.c"). The `S_generic_invoke` function calls `S_call_help` +directly. A foreign callable is represented by generated code that +converts arguments and then calls `S_call_help` to run the Scheme +procedure that is wrapped by the callable. + +The `S_call_help` function calls the hand-coded `invoke` code (see +"cpnanopass.ss"). The `invoke` code sets things up for the Scheme +world and jumps to the target Scheme function. When control returns +from the called Scheme function back to `invoke`, `invoke` finishes +not with a C return, but by calling `S_return` (see "schlib.c"), which +gets back to `S_call_help` through a longjmp. The indirect return +through longjmp helps the Scheme stack and C stack be independent, +which is part of how Scheme continuations interact with foreign +functions. + +For a non-tail call in Scheme, the return address is not right after +the jump instruction for the call. Instead, the return address is a +little later, and there's some data just before that return address +that describes the calling function's stack frame. The GC needs that +information, for example, to know which part of the current Scheme +stack is populated by live variables. The data is represented by +either the `rp-header` or `rp-compact-header` (see "cmacro.ss") shape. +So, when you disassemble code generated by the Chez Scheme compiler, +you may see garbage instructions mingled with the well-formed +instructions, but the garbage will always be jumped over. + +Compilation Pipeline +-------------------- + +Compilation + + * starts with an S-expression (possibly with annotations for source + locations), + + * converts it to a syntax object (see "syntax.ss"), + + * expands macros (see "syntax.ss") and produces an `Lsrc` + representation in terms of core forms (see `Lsrc` in + "base-lang.ss"), + + * performs front-end optimizations on that representation (see + "cp0.ss", "cptypes.ss", etc.), + + * and then compiles to machine code (see "cpnanopass.ss"), which + involves many individual passes that convert through many different + intermediate forms (see "np-language.ss"). + +See also: + + Nanopass compiler infrastructure. + Dipanwita Sarkar. + Indiana University PhD dissertation, 2008 + + A Nanopass Framework for Commercial Compiler Development. + Andrew W. Keep. + Indiana University PhD dissertation, 2013 + +Note that the core macro expander always converts its input to the +`Lsrc` intermediate form. That intermediate form can be converted back +to an S-expression (see "uncprep.ss", whose name you should parse as +"undo-compilerpass-representation"). + +In the initial intermediate form, `Lsrc`, all primitive operations are +represented as calls to functions. In later passes in "cpnanopass.ss", +some primitive operations get inlined into a combination of core +forms, some of which are `inline` forms. The `inline` forms eventually +get delivered to a backend for instruction selection. For example, a +use of safe `fx+` is inlines as argument checks that guard an `(inline ++ ...)`, and the `(inline + ...)` eventually becomes a machine-level +addition instruction. + +Machine Registers +----------------- + +Each backend file, such as "x86_64.ss" or "arm64.ss", starts with a +description of the machine's registers. It has three parts in +`define-registers`: + +(define-registers + (reserved + + ...) + (allocable + + ...) + (machine-dependent + + ...)) + +Each has the form + + [ ... ] + + * The s in one will all refer to the same register, and + the first is used as the canonical name. By convention, each + starts with `%`. The compiler gives specific meaning to a + few names listed below, and a backend can use any names otherwise. + + * The information on preserved (i.e, callee-saved) registers helps + the compiler save registers as needed before some C interactons. + + * The value is for the private use of the backend. Typically, + it corresponds to the register's representation within machine + instructions. + + * The is either 'uptr or 'fp, indicating whether the register + holds a pointer/integer value (i.e., an unsigned integer that is + the same size as a pointer) or a floating-point value. For + `allocatable` registers, the different types of registers represent + different allocation pools. + +The `reserved` section describes registers that the compiler needs and +that will be used only for a designated purpose. The registers will +never be allocated to Scheme variables in a compiled function. The +`reserved` section must start with `%tc` and `%sfp`, and it must list +only registers with a recognized name as the canonical name. + +The `machine-dependent` section describes additional registers that +also will not be allocated. They are also not saved automatically for +C interactions. + +The `allocable` section describes registers that may be mapped to +specific purposes by using a recognized canonical name, but generally +these registers are allocated as needed to hold Scheme variables and +temporaries (including registers with recognized names in situations +where the recognized purpose is not needed). Registers in this +category are automatically saved as needed for C interactions. + +The main recognized register names, roughly in order of usefulness as +real machine registers: + + %tc - the first reserved register, must be mapped as reserved + %sfp - the second reserved register, must be mapped as reserved + %ap - allocation pointer (for fast bump allocation) + %trap - counter for when to check signals, including GC signal + + %eap - end of bump-allocatable region + %esp - end of current stack segment + + %cp - used for a procedure about to be called + %ac0 - used for argument count and call results + + %ac1 - various scratch and communication purposes + %xp - ditto + %yp - ditto + +Each of the registers maps to a slot in the TC, so they are sometimes +used to communicate between compiled code and the C-implemented +kernel. For example, `S_call_help` expects the function to be called +in AC1 with the argument count in AC0 (as usual). + +A few more names are recognized to direct the compiler in different +ways: + + %ret - use a return register insteda of just SFP[0] + + %reify1, %reify2 - a kind of manual allocation of registers for + certain hand-coded routines, which otherwise could + run out of registers to use + +Variables and Register Allocation +--------------------------------- + +A variables in Scheme code can be allocated either to a register or to +a location in the stack frame, and the same goes for temporaries that +are needed to evaluate subexpressions. Naturally, variables and +temporaries with non-overlapping extents can be mapped to the same +register or frame location. Currently, only variables with the same +type, integer/pointer versus floating-point, can be allocated to the +same frame location. + +An early pass in the compiler converts mutable variables to +pair-valued immutable variables, but assignment to variables is still +allowed within the compiler's representation. (The early conversion of +mutables variables ensures that mutation is properly shared for, say, +variables in captured continuations.) That is, even though variables +and temporaries are typically assigned only once, the compiler's +intermediate representation is not a single-asssignment form like +SSA. + +Each variable or temporary will be allocated to one spot for it's +whole lifetime. So, from the register-allocation perspective, it's +better to use + + (set! var1 ...) + ... var1 ... + ... code that doesn't use var1 ... + (set! var2 ...) + ... var2 ... + +than to reuse var1 like + + (set! var1 ...) + ... var1 ... + ... code that doesn't use var1 ... + (set! var1 ...) + ... var1 ... + +Intermediate code in later passes of the compiler can also refer to +registers directly, and those uses are taken into account by the +register allocator. + +Overall, the allocator see several kinds of "variables": + + * real registers; + + * Scheme variables and temporaries as represented by `uvar`s, each of + which is eventually allocated to a real register or to a frame + location; + + * unspillable varriables, each of which must be allocated to a real + register; these are introduced by a backend during the + instruction-selection pass, where an instruction may require a + register argument; and + + * pre-colored unspillable variables, each which must be allocated to + a specific real register; these are introduced by a backend where + an instruction may require an argument in a specific registers. + +The difference between a pre-colored unspillable and just using the +real register is that you declare intent to the register allocator, +and it can sometimes tell you if things go wrong. For example, + + (set! %r1 v1) + (set! must-be-r1 v2) + ... use %r1 and must-be-r1 ... + +has clearly gone wrong. In contrast, the register allocator thinks +that + + (set! %r1 v1) + (set! %r1 v2) + ... use %r1, sometimesexpecting v1 and sometimess v2 ... + +looks fine, and it may optimize away the first assignment. [Note: +Optimized-away assignments are one of the most confusing potential +results of register-use mistakes.] + +At the point where the register allocator runs, a Scheme program has +been simplified to a sequence of assignment forms and expression +forms, where the latter are either value-producing and sit on the +right-hand side of an assignment or they are effectful and sit by +themselves. The register allocator sees the first assignment to a +variable/register as the beginning of its live range and the last +reference as the end of its live range. In some cases, an instruction +is written with "dummy" arguments just to expose the fact that it +needs those arguments to stay live; for example, a jump instruction +that implements a function-call return conceptually needs to consume +the result-value registers (because those values need to stay live +throgh the jump), even though the machine-level jump instruction +doens't refer to the result values. The `kill` dummy instruction can +be used with `set!` to indicate that a variable is trashed, but the +`kill` is discarded after register allocation. It's also possible for +an insstruction to produce results in multiple registers. So, besides +using dummy arguments and `kill`, an instruction form can have a +`info-kill*-live*` record attached to it, which lists the `kill*` +variables that the expression effectively assigns and the `live*` +variables that the expression effectively references. (Note: a `set!` +form cannot itself have a `info-kill*-live*` record attached to it, +because the info slot for `set!` be an `info-live` record that records +computed live-variable information.) + +As a first pass, the register allocator can look at an intermediate +instruction sequence and determine that there are too many live +variables, so some of them need to be spilled. The register allocator +does that before consulting the backend. So, some of the variables in +the intermediate form will stay as `uvar`s, and some will get +converted to a frame reference of them form SFP[pos]. When the backend +is then asked to select an instruction for an operation that cosumes +some variables and delivers a result to some destination variable, it +may not be able to work with one or more of the arguments or +destination in SFP[pos] form; in that case, it will create an +unspillable and assign the SFP[pos] value to the unspillable, then use +the unspillable in a generated instruction sequence. Of course, +introducing unspillables may mean that some of the remaining `uvar`s` +to no longer fit in registers after all; when that happens, the +register allocator will discard the tentative instruction selection +and try again after spilling for `uvar`s (which will then create even +more unspillables locally, but those will have short lifetimes, so +register allocation will eventually succeed). Long story short, the +backend can assume that a `uvar` wil be replaced later by a register. + +When reading the compiler's implementation, `make-tmp` in most passes +creates a `uvar` (that may eventually be spilled to a stack-frame +slot). A `make-tmp` in the instruction-selection pass, however, makes +an unspillable. In earlies passes of the compiler, new temporaries +must be bound with a `let` form (i.e., a `let` in the intermediate +repressentation) before they can be used; in later passes, a `set!` +initializes a temporary. + +In all but the very earliest passes, an `mref` form represents a +memory reference. Typically, a memory reference consistents of a +variable and an offset. The general form is two variables and an +offset, all of which are added to obtain an address, because many +machine support indexed memory references of that form. The `%zero` +pseudo-register is used as the second variable in an general `mref` +when only one variable is needed. A variable or memory reference also +has a type, 'uptr or 'fp, in the same way as a register. So, a +variable of a given type may be allocated to a register of that type, +or it may be spilled to a frame location and then referenced through +an `%sfp`-based `mref` using that type. In early passes of the +compiler, `mref`s can be nested and have computed pieces (such as +calulating the offset), but a later pass will introduce temporaries to +flatten `mref`s into just variable/register and immediate-integer +components. + +A backend may introduce an unspillable to hold an `mref` value for +various reasons: because the relevant instruction suports only one +register plus an offset instead of two registers, because the offset +is too big, because the offset does not have a required alignment, and +so on. + +Instruction Selection: Compiler <-> Backend +------------------------------------------- + +For each primitive that the compiler will reference via `inline`, +there must be a `declare-primitive` in "np-language.ss". Each +primitive is either an `effect`, a `value` that must be used on the +right-hand side of a `set!` or a `pred` that must be used immediately +in the test position of an `if` --- where `set!` and `if` here refer +to forms in the input intermediate language of the +instruction-selection compiler pass (see `L15c` in "np-languages.ss"). +Most primitives potentially correspond to a single machine +instruction, but any of them can expand to any number of instructions. + +The `declare-primitive` form binds the name formed by adding a `%` +prefix. So, for example, + + (declare-primitive logand value #t) + +binds `%logand`. The `(%inline name ,arg ...)` macro expands to +`(inline ,null-info ,%name ,arg ...)` macro, so that's why you don't +usually see the `%` written out. + +The backend implementation of a prrimitive is a function that takes as +many arguments as the `inline` form, plus an additional initial +argument for the destination in the case of a `value` primitive on the +right-hand side of a `set!`. The result of the primitive function is a +list of instructions, where an instruction is either a `set!` or `asm` +form in the output intermediate representation of the +instruction-selection pass (see `L15d` in "np-languages.ss"). The +`asm` form in the output language has a function that represents the +instruction; that function again takes the arguments of the `asm` +form, plus an extra leading argument for the destiination if it's on +the right-hand side of a `set!` (plus an argument before that for the +machine-code sequence following the instruction, and it returns an +extended machine-code sequence; that is, a machine-code sequence is +built end-to-start). + +An instruction procedure typically has a name prefixed with `asm-`. +So, for example, the `%logand` primitive's implementation in the +backend may produces a result that includes a reference to an +`asm-logand` instruction procedure. Or maybe the machine instruction +for logical "and" has a variant that sets condition codes and one that +doesn't, and they're both useful, so `asm-logand` actually takes a +curried bboolean to pick; in thatt case, `%logand` returns an +instruction with `(asm-logand #f)`, which produces a function that +takes the destination and `asm` arguments. Whether an argument to +`asm-logand` is suitable for currying or inclusion as an `asm` +argument depends on whether it makes sense in the `asm` grammar and +whether it needs to be exposed for register allocation. + +The compiler may refer to some instructions directly. Of particular +importance are `asm-move` and `asm-fpmove`, which are eventually used +for `set!` forms after the instruction-selection pass. That is, the +output of instruction selection still uses `set!`, and then those are +converted to memory and register-moving instructions later. The +instruction-selecton pass must ensure that any surving `set!`s are +simple enough, though, to map to instructions without further register +allocation. In other words, the backend instruction selector should +only return `set!`s as instructions when they are simple enough, and +it should generate code to simplify the ones that don't start out +simple enough. To give the backend control over `set!`s in the *input* +of instruction selection, those are send to the backend as `%move` and +`%fpmove` primitives (which may simply turn back into `set!s` using +the output language, or they may get simplified). When the compiler +generates additional `set!`s after instruction selection, it generates +only cnstrainted forms, where target or source `mref`s have a single +register and a small, aligned offset. + +To organize all of this work, a backend implementation like +"x86_64.ss" or "arm64.ss" must be organized into three parts, which +are implemented by three S-expressions: + + * `define-registers` + + * a module that implements primitives (that convert to instructions), + installing them with `primitive-handler-set!` + + * a module that implements instructions (that convert to machine + code), a.k.a. the "assembler", defining the instructions as + functions + +That last module must also implement a few things that apply earlier +than assembling (or even instruction selection), notably including +`asm-foreign-call` and `asm-foreign-callable`. For more on those two, +see "Foreign Function ABI" below. + +To summarize the interface between the compiler and backend is: + + primitive : L15c.Triv ... -> (listof L15d.Effect) + + instruction : (listof code) L16.Triv ... -> (listof code) + +A `code` is mostly bytes to be emitted, but it also contains +relocation entries and human-readable forms that are printed when +assembly printing is enabled. The `aop-cons*` helper macro (in +"cpnanopass.ss") is like `cons*`, but it skips its first argument if +human-readable forms aren't being kept. + +Instruction Selection: Backend Structure +---------------------------------------- + +To further organize the work of instruction selection and assembly, +all of the current backends use a particular internal structure: + + * primitives are defined through a `define-instruction` form that + helps with pattern matching and automatic conversion/simplification + of arguments; and + + * instructions are defined as functions that use an `emit` form, + which in turn dispatches to function that represent actual + machine-level operations, where the functions for machine-level + operations typically have names ending in `-op`. + +Consider the "arm64.ss" definition fo `%logand`, which should accept a +destination (here called "z") and two arguments: + + (define-instruction value (logand) + [(op (z ur) (x ur) (y funkymask)) + `(set! ,(make-live-info) ,z (asm ,info ,(asm-logand #f) ,x ,y))] + [(op (z ur) (x funkymask) (y ur)) + `(set! ,(make-live-info) ,z (asm ,info ,(asm-logand #f) ,y ,x))] + [(op (z ur) (x ur) (y ur)) + `(set! ,(make-live-info) ,z (asm ,info ,(asm-logand #f) ,x ,y))]) + +The A64 instruction set supports a logical "and" on either two +registers or a register and an immediate, but the immediate value has +to be representable with a funky encoding. The pattern forms above +require that the destination is always a register/variable, and either +of the arguments can be a literal that fits into the funky encoding or +a register/variable. The `define-instruction` macro is itself +implemented in "arm64.ss", so it can support specialized patterns like +`funkymask`. + +If a call to this `%logand` function is triggered by a form + + `(set! ,info (mref ,var1 ,%zero 8) ,var2 ,7) + +then the code generated by `define-instruction` will notice that the +first argument is not a register/variable, while 7 does encode as a +mask, so it will arrange to produce the same value as + + (let ([u (make-tmp 'u)]) + (list + (%logand u var2 7) + `(set! ,(make-live-info) (mref ,var1 ,%zero 8) ,u))) + +Then, the first case of `%logand` will match, and the result will be +the same as + + (let ([u (make-tmp 'u)]) + (list + `(set! ,(make-live-info) ,u (asm,(asm-logand #f) ,var2 ,7) + `(set! ,(make-live-info) (mref ,var1 ,%zero 8) ,u)))) + +If the offset 8 were instead a very large number, then auto-conversion +would have to generate an `add` into a second temporary variable. +Otherwise, `asm-move` would not be able to deal with the generated +`set!` to move `u` into the destination. The implementation of +`define-instruction` uses a `mem->mem` helper function to simplify +`mref`s. In the "arm32.ss" backend, there's an additional `fpmem` +pattern and `fpmem->fpmem` helper, because the constraints on memory +references for floating-point operations are different than than the +constraints on memory references to load an integer/pointer. + +Note that `%logand` generates a use of the same `(asm-logand #f)` +instruction for the register--register and the register--immediate +cases. A more explicit distinction could be made in the output of +instruction selection, but delaying the choice is anologous to how +assembly languages often use the same mnemonic for related +instructions. The `asm-move` and `asm-fpmove` must accomodate +register--memory, memory--register, and register--register cases, +because `set!` forms after instruction selection can have those +variants. + +The `asm-logand` instruction for "arm64.ss" is implemented as + + (lambda (set-cc?) + (lambda (code* dest src0 src1) + (Trivit (dest src0 src1) + (record-case src1 + [(imm) (n) (emit andi set-cc? dest src0 n code*)] + [else (emit and set-cc? and src0 src1 code*)])))) + +The `set-cc?` argument coresponds to the `#f` in `(asm-logand #f)`. +The inner lambda reprsents the instruction --- that is, it's the +function in an `asm` form. The function takes `code*` first, which is +a list of machine codes for all instructions after the `asm-logand`. +The `dest` argument corresponds to the result register, and `src0` and +`src1` are the two arguments. + +The `Trivit` form is a bridge between intermediate languages. It takes +variables that are boudn already and it rebinds them for the body of +the `Trivit` form. Each rebinding translate the argument from an `L16` +`Triv` record to a list that starts 'reg, 'disp, 'index, 'literal, or +'literal@. (Beware of missing this step, and beware of backends that +sometimes intentionally skip this step because the original is known +to be, say, a register.) + +The `emit` form is defined in the "arm64.ss" backend and others, and +it's just a kind of function call that cooperates with `define-op` +declarations. For example, `(define-op andi logical-op arg1 ...)` +binds `andi-op`, and `(emit andi arg2 ...)` turns into `(logical-op +'and arg1 ... arg2 ...)`; that is, `andi-op` first receives the symbol +'andi, then arguments listed at `define-op`, then arguments listed at +`emit`. The last argument is conventionally `code*`, which is the code +list to be extended with new code at its beginning (because the +machine-code list is built end to start). The bounce from `andi-op` to +`logicial-op` is because many instructions follow a similar encoding, +such as different bitwise-logicial operations like `and` and `or`. +Meanwhile, `logical-op` uses an `emit-code` form, which is also in +"arm64.ss" and other backends, that calls `aop-cons` with a suitable +human-readable addition. + +All of that could be done with just plain functions, but the macros +help with boilerplate and arrange some helpful compile-time checking. + +Foreign Function ABI +-------------------- + +Support for foreign procedures and callables in Chez Scheme boils down +to foriegn calls and callable stubs for the backend. A backend's +`asm-foreign-call` and `asm-forieng-callbable` function receives an +`info-foreign` record, which describes the argument and result types +in relatively primitive forms: + + * double + * float + * [signed] integer of {8,16,32,64} bits + * generic pointer or scheme-object (to treat as a generic pointer) + * a "&" form, which is a pointer on the Scheme side and by-value on + the C side, and can be a struct/union; layout info is reported + by `$ftd-...` helpers + +If the result type is a "&" type, then the function expects an extra +first argument on the Scheme side. That extra argument is reflected by +an extra pointer type at the statr of the argument list, but the "&" +type is also left for the result type as an indication about that +first argument. In other words, the result type is effectively +duplicated in the result (matching the C view) and an argument +(mathing the Scheme view) --- so, overall, the given type matches +neither the C nor Scheme view, but either view can be reconstructed. + +The compiler creates wrappers to take care of further conversion +to/from these primitive shapes. + +The `asm-foreign-call` function returns 5 values: + + * allocate : -> L13.Effect + + Any needed setup, such as allocating C stack space for arguments. + + * c-args : (listof (uvar/reg -> L13.Effect)) + + Generate code to convert each argument. The generated code will be + in reverse order, with the first argument last, because that tends + to improve register allocation. + + If the result type is "&", then `c-arg`s must include a function to + accept the pointer that receives the function result (i.e., the + length of `c-args` should match the length of the agument-type list + in the given `info-foreign`). The pointer may need to be stashed + somewhere by the generated code for use after the function returns. + + The use of the src variable for an argument depends on its type: + + - double or float: an 'fp-typed variable + - integer or pointer: a 'uptr-typed variable that has the integer + - "&": a 'uptr-typed variable that has a pointer to the argument + + * c-call : uvar/reg boolean -> L13.Effect + + Generate code to call the C function whose address is in the given + register. The boolean if #t if the call can assume that the C + function is not a varargs function on platformss where varargs + support is the default. + + * c-result : uvar/reg -> L13.Effect + + Similar to the conversions in `c-args`, but for the result, so the + given argument is a destination variable. This function will not be + used if the foreign call's result type is void. If the result if a + floating-point value, the provided destination variable has type + 'fp. + + * allocate : -> L13.Effect + + Any needd teardown, such as deallocating C stack space. + +The `asm-foreign-callable` function returns 4 values: + + * c-init : -> L13.Effect + + Anything that needs to be done just before transitioning into + Scheme, such as saving preserved registers that call be used within + the callable stub. + + * c-args : (listof (uvar/reg -> L13.Effect)) + + Similar to the `asm-foreign-call` result case, but each function + should fill a destination variable form platform-specific argument + registers and stack locations. + + If the result type is "&", then `c-arg`s must include a function to + produce a pointer that receives the function result. Space for this + pointer may needed to be allocated (probably on the C stack), + possibly in a way that can be found on return. + + The use of the destination variable is different than for the + `asm-foreign-call` in the case of floating-point arguments: + + - double or float: pointer to a flonum to be filled with the value + - integer or pointer: a 'uptr-typed variable to receive the value + - "&": a 'uptr-typed variable to receive the pointer + + * c-result : (uvar/reg -> L13.Effect) or (-> L13.Effect) + + Similar to the `asm-foreign-call` arrgument cases, but for a + floating-point result, the given result register holds pointer to a + flonum. Also, if the function result is a "&" or void type, then + `c-result` takes no argument (because the destination pointer was + already produced or there's no result). + + * c-return : (-> L13.Effect) + + Generate the code for a C return, including any teardown needed to + balance `c-init`. diff --git a/s/cmacros.ss b/s/cmacros.ss index 0561611243..ef99a6e9e1 100644 --- a/s/cmacros.ss +++ b/s/cmacros.ss @@ -12,6 +12,9 @@ ;;; See the License for the specific language governing permissions and ;;; limitations under the License. +;; --------------------------------------------------------------------- +;; Initial helper macros and functions: + (define-syntax disable-unbound-warning (syntax-rules () ((_ name ...) @@ -190,6 +193,13 @@ (lambda (x) (syntax-error x "misplaced aux keyword"))) +;; --------------------------------------------------------------------- +;; Libspec representation: + +;; A libspec is a description of a runtime function to be represenced +;; by machine code, where the linker will find the library funtion and +;; update code to reference it as code is loaded/linked + ;; layout of our flags field: ;; bit 0: needs head space? ;; bit 1 - 9: upper 9 bits of index (lower bit is the needs head space index @@ -290,6 +300,9 @@ (fxlogand (libspec-flags libspec) (fxlognot (fxsll 1 (constant libspec-does-not-expect-headroom-index))))))])) +;; --------------------------------------------------------------------- +;; More helpers: + (define-syntax return-values (syntax-rules () ((_ args ...) (values args ...)))) @@ -328,6 +341,13 @@ [(_ foo e1 e2) e1] ... [(_ bar e1 e2) e2]))))]))) +(define-syntax log2 + (syntax-rules () + [(_ n) (integer-length (- n 1))])) + +;; --------------------------------------------------------------------- +;; Version and machine types: + (define-constant scheme-version #x0905031F) (define-syntax define-machine-types @@ -370,9 +390,8 @@ (define-constant machine-type-name (cdr (assv (constant machine-type) (constant machine-type-alist)))) -(define-syntax log2 - (syntax-rules () - [(_ n) (integer-length (- n 1))])) +;; --------------------------------------------------------------------- +;; Some object-layout constants: ; a string-char is a 32-bit equivalent of a ptr char: identical to a ; ptr char on 32-bit machines and the low-order half of a ptr char on @@ -425,6 +444,9 @@ (define-constant list-bits-mask (- (expt 2 (constant ptr-alignment)) 1)) +;; --------------------------------------------------------------------- +;; Fasl encoding tags: + ;;; fasl codes---see fasl.c for documentation of representation (define-constant fasl-type-header 0) @@ -495,6 +517,12 @@ (bytevector (constant fasl-type-header) 0 0 0 (char->integer #\c) (char->integer #\h) (char->integer #\e) (char->integer #\z))) +;; --------------------------------------------------------------------- +;; Relocation repersentation + +;; A recolcation tells the linker where to update machine code to link +;; in library functions, literal Scheme objects, etc. + (define-syntax define-enumerated-constants (lambda (x) (syntax-case x () @@ -543,6 +571,9 @@ (macro-define-structure (reloc type item-offset code-offset long?)) +;; --------------------------------------------------------------------- +;; Some flags to cooperate with the C-implemented kernel: + (define-constant SERROR #x0000) (define-constant STRVNCATE #x0001) ; V for U to avoid msvc errno.h conflict (define-constant SREPLACE #x0002) @@ -621,6 +652,9 @@ (define-constant ERROR_VALUES 7) (define-constant ERROR_MVLET 8) +;; --------------------------------------------------------------------- +;; GC constants + (define-syntax define-alloc-spaces (lambda (x) (syntax-case x (real swept unswept unreal) @@ -714,7 +748,10 @@ (define-constant countof-phantom 28) (define-constant countof-types 29) -;;; type-fixnum is assumed to be all zeros by at least by vector, fxvector, +;; --------------------------------------------------------------------- +;; Tags that are part of the pointer represeting an object: + +;;; type-fixnum is assumed to be all zeros by at least vector, fxvector, ;;; and bytevector index checks (define-constant type-fixnum 0) ; #b100/#b000 32-bit, #b000 64-bit (define-constant type-pair #b001) @@ -725,6 +762,9 @@ (define-constant type-immediate #b110) (define-constant type-typed-object #b111) +;; --------------------------------------------------------------------- +;; Immediate values; note that these all end with `type-immediate`: + ;;; note: for type-char, leave at least fixnum-offset zeros at top of ;;; type byte to simplify char->integer conversion (define-constant type-boolean #b00000110) @@ -740,6 +780,10 @@ (define-constant ptr sbwp #b01001110) (define-constant ptr ftype-guardian-rep #b01010110) +;; --------------------------------------------------------------------- +;; Initial type word in an object that is represented by a +;; `type-typed-object` pointer: + ;;; on 32-bit machines, vectors get two primary tag bits, including ;;; one for the immutable flag, and so do bytevectors, so their maximum ;;; lengths are equal to the most-positive fixnum on 32-bit machines. @@ -781,6 +825,9 @@ (define-constant type-phantom #b01111110) (define-constant type-record #b111) +;; --------------------------------------------------------------------- +;; Bit and byte offsets for different types of objects: + (define-constant code-flag-system #b0000001) (define-constant code-flag-continuation #b0000010) (define-constant code-flag-template #b0000100) @@ -911,6 +958,9 @@ (fxsll (constant code-flag-single-valued) (constant code-flags-offset)))) +;; --------------------------------------------------------------------- +;; Masks and offsets for checking types: + ;; type checks are generally performed by applying the mask to the object ;; then comparing against the type code. a mask equal to ;; (constant byte-constant-mask) implies that the object being @@ -1041,6 +1091,9 @@ (define-constant stencil-vector-mask-bits (fx- (constant ptr-bits) (constant stencil-vector-mask-offset))) +;; --------------------------------------------------------------------- +;; Helpers to define object layouts: + ;;; record-datatype must be defined before we include layout.ss ;;; (maybe should move into that file??) ;;; We allow Scheme inputs for both signed and unsigned integers to range from @@ -1264,6 +1317,9 @@ (define-constant name-field-disp field-disp) ...))))))]))) +;; --------------------------------------------------------------------- +;; Object layouts: + (define-primitive-structure-disps typed-object type-typed-object ([iptr type])) @@ -1615,6 +1671,9 @@ (with-syntax ([type (datum->syntax #'* (filter-scheme-type 'string-char))]) #''type))) +;; --------------------------------------------------------------------- +;; Flags and structures for the compiler's internal communcation: + (define-constant annotation-debug #b0001) (define-constant annotation-profile #b0010) (define-constant annotation-all #b0011) @@ -1907,7 +1966,8 @@ (syntax-rules () ((_ x) (let ((t x)) (and (pair? t) (symbol? (car t))))))) -;;; heap/stack mangement constants +;; --------------------------------------------------------------------- +;; Heap/stack mangement constants: (define-constant collect-interrupt-index 1) (define-constant timer-interrupt-index 2) @@ -2017,6 +2077,9 @@ (lambda () (mutex-release $tc-mutex) (enable-interrupts)))]) (identifier-syntax critical-section))) +;; --------------------------------------------------------------------- +;; More object-representation flags and offsets: + (define-constant hashtable-default-size 8) (define-constant eq-hashtable-subtype-normal 0) @@ -2046,6 +2109,9 @@ (define-constant time-collector-cpu 5) (define-constant time-collector-real 6) +;; --------------------------------------------------------------------- +;; General helpers for the compiler and runtime implementation: + (define-syntax default-run-cp0 (lambda (x) (syntax-case x () @@ -2417,7 +2483,20 @@ #`(let ([x arg]) (unless (pred x) ($oops who #,(format "~~s is not a ~a" (datum type)) x)))]))) - + +;; --------------------------------------------------------------------- +;; Library entries and C entries + +;; A library entry connects with a libspec to describe a library +;; function that can be referenced directly by machine code and that +;; will need to be updated by the linker. The C-implemented kernel may +;; also refer to these values. + +;; A C entry is a pointer communicated from the C-implemented kernel +;; to the compiler and runtime system. The linker deals with them in a +;; similar way --- it's just that the refer to C functions and globals +;; instead of Scheme-implemented functions. + (eval-when (load eval) (define-syntax lookup-libspec (lambda (x)