GPUs

Here are a couple of leading GPUs to cost effectively get started using LLMs locally. You can use system RAM/CPU too it's just much slower.

The NVIDIA RTX PRO 4500 Blackwell is a professional-grade GPU built on NVIDIA's Blackwell architecture. It is designed to handle AI inference, data science, and heavy 3D rendering workloads. It features 32GB of GDDR7 memory, 10,496 CUDA cores, and support for PCIe Gen 5.

The card is available in two distinct editions:

Workstation Edition

  • Form Factor: Dual-slot, active cooling (built-in fan).

  • Power: 200 W.

  • Best for: Quiet, high-performance operation inside standard desktop chassis.

  • Price: Typically around $2,999 to $3,500.

Server Edition

  • Form Factor: Single-slot, passive cooling.

  • Power: 165 W.

  • Best for: High-density enterprise data centers and edge deployments.

Core Specs

  • Memory: 32GB GDDR7 with ECC (Error-Correcting Code).

  • Cores: 10,496 CUDA, 328 Tensor Cores (5th Gen), 82 RT Cores (4th Gen).

  • Connectivity: 4x DisplayPort 2.1b.

You can find older cards used in order to have even a smaller cost footprint when trying to run your models locally.

The NVIDIA Tesla P40 & Radeon Pro V620 are some examples of these cards but they are much slower than modern cards.

https://rocm.docs.amd.com/projects/install-on-linux/en/latest/

https://docs.nvidia.com/ai-enterprise/deployment/bare-metal/latest/first-system.html