I have a consumer laptop, and got the below log for running max serve followed by my devise hardware specs.
- The only component I can upgrade is RAM, how can I count the minimum additional required RAM to rum a specific model smoothly at my device,
- If I want to purchase or (custom build) a devise that can handle 10 concurrent requests for a specific model (I would like to build a home AI server for my family)
21:42:32.839 INFO: 6901 MainThread: max.pipelines: No GPUs available, falling back to CPU
21:42:32.839 INFO: 6901 MainThread: max.pipelines: No GPUs available, falling back to CPU
21:42:40.095 WARNING: 6933 MainThread: max.pipelines: Insufficient cache memory to support a batch containing one request at the max sequence length of 131072 tokens. Need to allocate at least 1024 pages (32.00 GiB), but only have enough memory for 280 pages (8.75 GiB).
21:42:41.442 INFO: 6933 MainThread: max.pipelines: Paged KVCache Manager allocated 280 device pages using 32.00 MiB per page.
Estimated memory consumption:
Weights: 4.58 GiB
KVCache allocation: 8.75 GiB
Total estimated: 13.33 GiB used / 14.81 GiB free
Auto-inferred max sequence length: 131072
Auto-inferred max batch size: 1
21:42:35.823 INFO: 6901 MainThread: max.entrypoints: Starting server using modularai/Llama-3.1-8B-Instruct-GGUF
21:42:35.823 INFO: 6901 MainThread: max.pipelines: Loading TextTokenizer and TextGenerationPipeline(Llama3Model) factory for:
architecture: LlamaForCausalLM
devices: cpu[0]
model_path: modularai/Llama-3.1-8B-Instruct-GGUF huggingface_revision: main
quantization_encoding: SupportedEncoding.q4_k
cache_strategy: KVCacheStrategy.PAGED
weight_path: [
llama-3.1-8b-instruct-q4_k_m.gguf ]
Below are my laptop specs:
[abuka@archlinux ~]$ sudo lshw -C display
*-display
description: VGA compatible controller
product: Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:03:00.0
logical name: /dev/fb0
version: c5
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi msix vga_controller bus_master cap_list fb
configuration: depth=32 driver=amdgpu latency=0 resolution=1920,1080
resources: irq:49 memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:e000(size=256) memory:fcd00000-fcd7ffff
[abuka@archlinux ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 7 5800HS with Radeon Graphics
CPU family: 25
Model: 80
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU(s) scaling MHz: 53%
CPU max MHz: 4465.2612
CPU min MHz: 403.4880
BogoMIPS: 6388.15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid ext
d_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs sk
init wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflu
shopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_c
lean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 256 KiB (8 instances)
L1i: 256 KiB (8 instances)
L2: 4 MiB (8 instances)
L3: 16 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerabilities:
Gather data sampling: Not affected
Ghostwrite: Not affected
Indirect target selection: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Mitigation; Safe RET
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsa: Mitigation; Clear CPU buffers
Tsx async abort: Not affected
[abuka@archlinux ~]$
[abuka@archlinux ~]$ free -h
total used free shared buff/cache available
Mem: 15Gi 11Gi 561Mi 108Mi 2.9Gi 3.1Gi
Swap: 4.0Gi 3.7Gi 345Mi
[abuka@archlinux ~]$
```