The case for sovereign AI infrastructure
Engineers accustomed to cloud-native environments and managed services often find the transition to bare-metal hardware jarring. In the cloud, hardware abstraction means that firmware updates, driver conflicts, and physical networking are invisible concerns. On bare-metal systems, these layers must be managed deliberately.
Despite this friction, the strategic value of local, sovereign AI infrastructure is clear. An on-premises workstation isolates sensitive research from third-party telemetry, guarantees predictable latency, and eliminates recurring computational costs. Platforms built on modern architecture—such as the Geekom A9 with Ryzen AI Max+ 395—offer strong local capabilities. This model ships with 128GB of high-bandwidth, quad-channel LPDDR5X memory. Properly configured, it can yield up to 64GB of dedicated memory to the GPU via the BIOS, transforming a compact desktop unit into an AI workstation capable of running 70-billion-parameter models entirely in RAM.
Reaching that state of stability, however, can be a demanding process. What follows is a record of the specific interventions required to stabilise the hardware and install Ubuntu 26.04 Server.
The NVMe storage failure
The device was unstable from the outset. When attempting to install an operating system, the system would drop the NVMe storage volumes entirely, leading to dracut and dev-mapper emergency mode errors. The operating system did not fail; rather, we lost the mapping to the underlying physical drive.
In our case, this instability stemmed from the factory firmware (BIOS v0.12). Updating the main BIOS to v0.13 and the Embedded Controller to v0.17 stabilised the PCI lanes and kept the storage subsystem accessible.
Bypassing the firmware boot loop
In our testing, we found updating the BIOS using the vendor’s automated USB script unreliable. Following an apparently successful update, the motherboard entered an EFI Shell boot loop. The system read the startup.nsh script on the USB drive during each power cycle and attempted to flash the ROM repeatedly.
To break this loop, we had to intervene manually via the EFI shell:
- Enter the boot menu during startup and select the UEFI: Internal Shell.
- Locate the USB volume (typically mapped as
fs1:).
- Execute the flash utility manually, appending the flags required to force the update:
AfuEfix64.exe [BIOS_FILE].bin /p /b /n /x /update
The /update flag was critical in our case; without it, the utility appeared only to verify the binary rather than commit the firmware change. The EFI shell also rendered in an exceptionally small font on a modern display, so using a smartphone camera to photograph the screen and check for typos before execution proved helpful.
After the main BIOS flash, we also had to update the Embedded Controller (EC). The EC handles out-of-band management tasks such as fan curves and power states. Navigating to the EC folder on the USB drive and running the manual flash tool kept thermal management synchronised with the new BIOS instructions.
Crucially, remove the USB drive physically before rebooting to prevent the shell from re-engaging the automated script.
Installing Ubuntu 26.04
In our testing, earlier versions of Ubuntu, such as 24.04, lacked the recent kernel support required for AMD’s Ryzen AI Max 300 series, resulting in fatal amdgpu initialisation errors. On Ubuntu 26.04 development builds, running kernel 7.0.0-14, that issue was resolved natively. However, to reach the installer dialogue reliably without black screens, we still had to append the nomodeset parameter to the GRUB boot line. This forces a basic software framebuffer until the installation completes.
During the installation procedure, two specific configurations proved necessary:
1. Avoid the Logical Volume Manager (LVM) on this unit
When configuring the disks, we found standard ext4 partitioning more reliable than LVM. Given the hardware’s history of storage instability under earlier firmware, removing the extra abstraction layer of logical volumes provided a more resilient foundation.
2. Network configuration
For the initial server provision, we avoided wireless networking and used a physical ethernet connection. That removed the risk of needing an internet connection to install the drivers required for wireless capability.
The Ubuntu 26.04 installer also uses a lean network profile. If a port is not explicitly configured, it may default to an IPv6-only state (accept-ra: true). In our installation, configuring the interface explicitly with dhcp4: true ensured standard IPv4 routing on the local network.
Conclusion
Configuring bare-metal systems can feel slow compared to cloud provisioning. However, once we stabilise the firmware and deploy the appropriate kernel, the Geekom A9 with Ryzen AI Max+ 395 becomes a capable foundation for local AI research.
Appendix: Technical background
For engineers interested in the underlying mechanics of these failures and their resolutions, the following technical context applies:
- The script boot loop: Vendor-supplied firmware updates frequently rely on a
startup.nsh script executing from the UEFI Internal Shell. If the flash process fails, the system boot order remains pointed at the shell, causing an infinite loop on reboot.
- The AMI flash utility:
AfuEfix64.exe is the standard American Megatrends (AMI) Aptio V EFI flash tool. The specific parameters used (/p, /b, /n, /x, /update) bypass standard ROM ID checks and force the board to accept the manual firmware commitment.
- Storage state errors: The
dracut and dev-mapper errors occur when the Linux kernel drops the NVMe drive entirely. This is generally caused by PCI-e Active State Power Management (ASPM) bugs or ACPI table errors in early BIOS versions (such as v0.12). A firmware update resolves this hardware-level disconnect.
- Embedded controller synchronisation: The EC manages low-level thermals and power states. Running a modern main BIOS alongside outdated EC firmware desynchronises power management, leading to instability or thermal throttling.
- Kernel dependencies and framebuffers: AMD’s recent silicon architectures require recent kernels (7.0 and above) and updated
linux-firmware to initialise the integrated GPU successfully. Older kernels fail to load the amdgpu driver correctly. Using nomodeset instructs the kernel to skip loading specialised video drivers before the X server/display manager is active, heavily reducing crashes during the installation phase.
- Unified Memory Architecture (UMA) allocation: To run large language models locally via ROCm, the integrated GPU requires contiguous, pre-allocated memory. Forcing the UMA Framebuffer size in the BIOS designates a portion of the system’s LPDDR5 memory as dedicated VRAM, a critical step for hosting 70-billion-parameter models entirely in memory.