Laptop for Local LLM and AI Development

For three years, “doing AI on a laptop” meant SSHing into a cloud GPU and watching a progress bar over Wi-Fi. That tradeoff made sense when laptops had 8 GB of VRAM and consumer-class compute. It stopped making sense the day a mobile RTX 5090 shipped with 24 GB of GDDR7 and full CUDA support, and it stops making economic sense the moment you add up twelve months of hourly cloud GPU rentals.

The Raptor X18 is the first mobile workstation we have tested where the local-AI math finally works. Mid-sized transformers fit in VRAM. ResNet50 inference happens at desktop-class throughput. And the 256 GB of DDR5 system memory means you can hold an entire dataset in RAM while you experiment.

The interesting question is not whether the laptop can run a model. It is whether the laptop can run your model, your data, and your iteration loop, all without leaving your desk. // Eurocom Test Lab · AI Bench

// 01 As-Tested Configuration

Every benchmark in this article was measured on this exact unit, on stock firmware, on Windows 11 Pro with the Balanced power plan and the discrete GPU active.

EUROCOM // RAPTOR X18 · BENCHMARK BUILD AS TESTED

Processor

Intel Core Ultra 9 275HX · 24C / 24T · 36 MB cache · 5.29 GHz boost

Graphics

NVIDIA GeForce RTX 5090 · 24 GB GDDR7 · 10,496 CUDA · Blackwell · 328 Tensor AI cores

Memory

256 GB DDR5-5600 · 4 × 64 GB Crucial SO-DIMM · 262-pin · 1.1V

Storage

Samsung 9100 PRO 1 TB · PCIe Gen5 NVMe (1 of 4 M.2 slots)

Storage Expansion

4× M.2 NVMe slots · 1× PCIe Gen5 + 3× PCIe Gen4 · RAID capable

Display

18.0" 240Hz QHD 2560 × 1600 · 16:10 · 100% DCI-P3 · eDP · Sharp LQ180R1JW01

Operating System

Microsoft Windows 11 Pro 64-bit · Balanced power plan

Security & Privacy

Physically removable webcam, microphone, and wireless module

// 02 ONNX Inference Throughput

The SPECworkstation 4.0 ONNX inference tests are the cleanest available view into how an accelerator handles production-style vision pipelines. The Raptor X18's RTX 5090 Laptop returns numbers that compare directly with current desktop RTX 5080 cards.

ResNet50 INT8 inference hits 90.04 inferences per second — a SPEC ratio of 46.90, meaning the X18 chews through that workload nearly 47× faster than the SPEC reference platform. SuperResolution INT8 lands at 40.98 inferences/sec with a SPEC ratio of 21.30. FP32 numbers are healthier still on this generation of silicon because Blackwell's Tensor cores are not bottlenecking the way Ampere mobile did.

BENCH // ONNX INFERENCE · GPU SPECWORK 4.0

ResNet50 INT8 batch 32 · inferences/sec

90.04

inf/sec

ResNet50 FP32 batch 32 · inferences/sec

52.32

inf/sec

SuperResolution INT8 batch 32

40.98

inf/sec

SuperResolution FP32 batch 32

42.34

inf/sec

SPEC AI & ML vertical composite score

2.12

ratio

// 03 GPU Compute & General Workloads

Beyond purpose-built ML benchmarks, Geekbench 6's OpenCL test and PassMark's GPU Compute score give a sense of what the RTX 5090 Laptop delivers across general-purpose GPU work — the same kind of throughput that drives kernels in PyTorch, JAX, ONNX Runtime, and TensorRT.

BENCH // GPU COMPUTE GEEKBENCH 6 · PASSMARK

Geekbench 6.7.1 OpenCL RTX 5090 Laptop · overall

235,641

score

Geekbench 6.7.1 OpenCL · Stereo Matching 1.28 Tpixels/sec

1,342,646

score

Geekbench 6.7.1 OpenCL · Edge Detection 15.4 Gpixels/sec

414,772

score

Geekbench 6.7.1 OpenCL · Particle Physics 34,810 FPS

790,934

score

PassMark GPU Compute DirectCompute / OpenCL

19,565

score

PassMark 3D Graphics Mark world percentile: 97th

35,660

score

The takeaway: even on the OpenCL side — which is not where NVIDIA puts its driver effort — the X18 lands in the same territory as a desktop RTX 4080-class card. CUDA workloads scale higher still.

// CONFIGURE YOUR OWN

Build a Raptor X18 to your spec.

Every Eurocom laptop is configured to order in Ottawa. Choose the CPU, GPU, memory, storage, and display. Add enterprise warranty and on-site service.

Open Configurator →

// 04 VRAM & the 256 GB Question

The two numbers that decide whether a model fits are VRAM and system RAM. On VRAM, the RTX 5090 Laptop's 24 GB of GDDR7 sits in the sweet spot for 7B-parameter LLMs in FP16, 13B-parameter models with 4-bit quantization, and most ONNX vision pipelines at production batch sizes. Stable Diffusion XL with ControlNet and IP-Adapter fits with headroom.

On system memory, the as-tested configuration shipped with 256 GB of DDR5-5200 across four channels. For ML practitioners this is the part that quietly transforms the workflow. Datasets that previously had to stream from disk now load entirely into RAM. Feature engineering across millions of rows happens in pandas without paging. Multiple PyTorch DataLoaders can run concurrent without OOM. And on the CPU side, MaxxMem2 measured read bandwidth at 27.3 GB/s and write bandwidth at 47.9 GB/s — numbers comfortably ahead of typical mobile workstations.

BENCH // DATA SCIENCE · CPU PIPELINE SPECWORK 4.0

Pandas data manipulation · SPEC ratio

1.18

ratio

Scikit-learn model training · SPEC ratio

1.73

ratio

XGBoost gradient boosting · SPEC ratio

1.87

ratio

Data Science vertical composite

1.56

ratio

MaxxMem2 memory read DDR5-5200 4ch

27,267

MB/s

MaxxMem2 memory write DDR5-5200 4ch

47,923

MB/s

// 05 Real Workflows

Local LLM development

For developers using local LLMs as a code-completion or agent backbone, the X18 runs 7B-parameter models in FP16 with full context length, and 13B models in 4-bit quantization at interactive speeds. Token generation is comfortable for live coding workflows, and the 256 GB of RAM means an entire vector store can sit alongside the model.

Vision pipeline development

For computer-vision engineers, batched ONNX inference on the GPU at 90 inferences/sec for ResNet50 INT8 is enough to develop and validate production pipelines locally. The CUDA toolchain works the same way it does on a desktop, with no host-side caveats.

Fine-tuning & LoRA

LoRA and QLoRA fine-tuning of mid-sized open-weight models is the sweet spot for this hardware. With 24 GB of VRAM, full LoRA adapters on 7B models fit comfortably; aggressive quantization stretches that to 13B. The bottleneck shifts to dataset IO — which the Gen5 NVMe at 12 GB/s sequential is well-equipped to handle.

// 06 Sustained Performance

ML workloads do not look like a 60-second benchmark; they look like an overnight job. We ran the 3DMark Steel Nomad stress test for 20 consecutive loops — roughly an hour of continuous GPU load — and the X18 returned a frame-rate stability of 98.8%, with the best loop at 5,771 and the worst at 5,702. Equivalent GPU-bound ML training behaviour. The vapor-chamber thermal design holds clock speeds under sustained load rather than throttling after the first ten minutes.

CPU-only stress in Prime95 stabilized at 74°C maximum, 47°C average with no thermal throttling. Combined CPU+GPU loads — the kind a mixed training run produces — held the CPU package in the high 80s and briefly to 92°C, which is within Intel's design envelope and consistent with sustained dual-die activity.

// 07 Verdict

The math on local ML hardware finally works in 2026. A Raptor X18 configured the way we tested it pays for itself in roughly twelve months versus equivalent cloud GPU rental, with the added benefit that data never leaves the machine. For ML engineers, applied researchers, and AI-product developers who care about iteration speed and data residency, the X18 is one of the few laptops in the market that genuinely qualifies as a local replacement for cloud compute.

What you give up: weight, fan noise under load, and a battery long enough to go all day unplugged. What you get: a workstation that travels, with 256 GB of replaceable DDR5 and four M.2 NVMe slots (one Gen5, three Gen4) ready for the next generation of storage.

Eurocom configures every Raptor X18 to order in Ottawa — the same chassis is available with RTX 5080 or RTX 5090 graphics, with memory and storage tiers scaled to budget. For ML teams with data-residency or compliance requirements, the webcam, microphone, and wireless module are all physically removable from the chassis — the same machine works in a secure lab and on a flight.

// Key Takeaways

ResNet50 INT8 inference at 90 inferences/sec — 46.9× the SPEC reference platform
24 GB of GDDR7 fits 7B-parameter LLMs in FP16 and 13B models with 4-bit quantization
256 GB of DDR5-5200 keeps full datasets in RAM — no paging, no streaming bottleneck
Geekbench 6 OpenCL score of 235,641 — desktop-class general GPU compute
98.8% frame-rate stability over a 20-loop GPU stress test — sustained, not peak, performance
Local AI workflows replace recurring cloud GPU bills; the X18 pays for itself in roughly 12 months

Laptop for local LLM and AI development.

// 01 As-Tested Configuration

// 02 ONNX Inference Throughput

// 03 GPU Compute & General Workloads

// 04 VRAM & the 256 GB Question

// 05 Real Workflows

Local LLM development

Vision pipeline development

Fine-tuning & LoRA

// 06 Sustained Performance

// 07 Verdict

More From the Journal