thesubhstack

Node JS Fundamentals

8 min readupdated

CPU architecture

Every CPU runs on a clock — a crystal that oscillates billions of times per second, each tick giving the CPU a heartbeat. On every tick, each core runs one step of the fetch-decode-execute cycle. The program counter (PC) holds the memory address of the next instruction to run.

At fetch, the core reaches into RAM (or cache if it's warm) and pulls the instruction sitting at that address. At decode, it figures out what the instruction means — what operation, what data. At execute, it does the work. The moment execution completes, the PC automatically increments to the next instruction's address, and the whole cycle fires again on the next clock tick.

Now the Machine code generation from source is dependent upon CPU architecture - CISC, RISC.

CISC - Complex Instruction Set Computing

This architecture existed at first where memory was expensive. Complex instructions meant fewer lines, less memory usage, and hardware that did the heavy lifting. CISC instructions did a lot in 1 step.

RISC - Reduced Instruction Set Computing

RISC reduced the quantum of work one instruction does. Instead of one instruction doing ten things, you have ten instructions each doing one thing.

This made Pipelining & Parallelism easier for the hardware.

Pipelining works on a single core. Every instruction passes through stages — Fetch, Decode, Execute, Write. Instead of waiting for one instruction to fully complete before starting the next, you overlap them. While instruction 1 is executing, instruction 2 is already decoding. The core is never idle. A complex CISC instruction that hogs the Execute stage for 30 cycles blocks everything behind it — pipelining breaks down with complexity.

Parallelism works across cores. Simple, independent instructions can be handed off to different cores or threads to run at the same time. Complex instructions are indivisible — nobody else can touch anything until they're done. Simple instructions are distributable — the scheduler can freely spread them across whatever resources are available.

Why RISC is power efficient?

Simpler instructions mean simpler hardware — fewer transistors, less circuitry, less electricity consumed per operation. This is why ARM dominates mobile and is rapidly taking over cloud infrastructure. Your phone has no cooling fan and needs a chip that gets work done without burning power. At data centre scale this compounds dramatically — AWS Graviton instances are cheaper partly because they consume significantly less power for equivalent workloads.

When CISC is needed?

Backward compatibility is CISC's trump card. Decades of software written for x86 — enterprise systems, games, legacy tooling — simply works without recompilation, and throwing that away is not an option for most organizations. This is why Intel and AMD still dominate desktops and traditional servers despite the power disadvantage — and why modern x86 chips made a quiet compromise: staying CISC on the outside for compatibility, while becoming RISC on the inside, silently translating every complex instruction into simpler micro-operations before execution.

Compiled Languages

Many languages like C, Rust, Go are compiled languages. This means the machine code for their source which is responsible for running the actual program is created Ahead of Time, not during runtime.

Following example is taken for C, other languages will vary as per their rules.

The compiler works one .c file at a time, translating it into machine code. At this stage, it doesn't know where external functions like add() or printf() live, so it leaves placeholders (unresolved symbols) instead of actual addresses. The output is an object file (.o) that contains valid machine instructions — but not yet fully connected.

Object Files are incomplete. Each object file (main.o, add.o) contains correct machine code for its own functions, but they are incomplete in isolation. For e.g. - main.o contains instructions like “call add” or “call printf” but it still doesn’t know the actual memory addresses of those functions. These missing links are what the linker will resolve next.

The linker (LD) takes all object files plus precompiled libraries (like libc) and matches every unresolved symbol. It scans definitions (add in add.o, printf in libc), then replaces placeholders with real memory addresses. This is where the program finally becomes a complete, connected unit instead of scattered pieces.

Once linking is done, the linker produces the final executable (like a.out). This file contains two key things: (1) fully resolved machine code, and (2) an executable header describing architecture, entry point, and memory layout (text, data, heap, stack). This header is crucial — it tells the OS exactly how to load and run the program.

When you run the program, the OS reads the executable header first. It maps the program into memory, loads required libraries, sets up the stack (arguments, environment), and then jumps to the entry point. From there, the CPU starts executing your code — this is the moment your program actually comes to life. Register Instruction Pointer is the Program Counter.

JavaScript takes a fundamentally different approach — there is no ahead-of-time compilation step before you run your program.

Interpreted Languages

Languages like JS, Java, and Python don't convert all source code into machine code ahead of time. Instead, they use a runtime — specific to the machine — that executes code through a combination of compilation and interpretation at runtime.

Java compiles to JVM bytecode, Python compiles to CPython bytecode, and JavaScript (via V8) compiles to its own bytecode — all of them then execute that bytecode through an interpreter or JIT compiler rather than running native machine code directly.

This is why the same JS file runs on any machine as long as the right runtime is installed.

How V8 Runs Your JavaScript

When we type node app.js in the terminal and kick off, the node process starts running and the machine code for that is loaded in the RAM to be executed by the CPU. Inside this process there is v8 engine whose sole job is to execute JS can be seen in action as follows -

Parser: For converting JS source to AST.

Ignition Compiler: For compiling AST to Bytecode

Ignition Interpreter: Executes bytecode by dispatching to pre-compiled machine code handlers in V8's binary.

TurboFan JIT Compiler: For compiling the hot path Bytecode to direct machine code in RAM.

Most developers think of JavaScript execution as a single step — you write code, it runs. Under the hood, V8 puts your code through a carefully designed pipeline that balances fast startup against peak execution speed.

The Parser reads your JavaScript source and turns it into an Abstract Syntax Tree (AST) — a structured representation of what your code means. This happens once at startup. The AST is discarded after bytecode generation.

The Ignition Compiler takes that AST and compiles it down to bytecode — a compact, platform-neutral set of instructions stored in the heap. Again, this happens once. V8 never re-parses or re-compiles to bytecode on subsequent calls.

The Ignition Interpreter is where execution begins. For every bytecode Call instruction, Ignition consults a dispatch table, looks up the pre-written machine code handler for that instruction, and jumps to it. Those handlers live in V8's own binary — loaded into RAM when the process started — meaning Ignition's job is pure dispatch, not translation. No code is generated at this stage.

The Profiler runs silently alongside Ignition, counting how many times each function is called and observing what types flow through it. This is the intelligence that drives the next step.

TurboFan kicks in once a function crosses the hot threshold — typically around 100 calls. At that point, it takes the bytecode and compiles it into highly optimised machine code, baking in assumptions based on observed types. From that call onwards, the CPU runs TurboFan's output directly — no dispatch, no handler lookup, no interpreter overhead at all.

Deoptimisation is the escape hatch. If TurboFan compiled a function assuming req.id is always an integer, and a string arrives instead, those assumptions break. TurboFan's output is discarded and the function falls back to Ignition, restarting the count. Keeping types consistent in hot code paths is one of the most practical performance levers a backend engineer has in Node.js.

The Node.js process that V8 runs inside is just a regular OS process — no different in structure from a C program or a Java process. The OS gives it the same standard memory layout every process gets: a Stack for function call frames, a Heap for dynamic allocations, and a Text segment for compiled machine code.

What makes V8 interesting is how it maps onto this layout — the pre-written Ignition handlers sit in the Text segment loaded at startup, bytecode and TurboFan output live on the Heap, and execution contexts are managed on the Stack. Understanding this layout is what ties the whole V8 pipeline together.

Typical Memory layout for any process

The above Memory layout of the Node process follows the standard RAM layout as below -

V8 engine memory management

For a simple program like this -

javascript
var obj = { a: 1000, b: 2000}

var x = 10; 
console.log(x) 
var y = 20;

console.log(y)

function foo() {   var m =100;   console.log(m) }
function bar() {   var n =100;   console.log(n) }

foo()
bar()

The memory layout will be as follows (highly simplified) -