The Mystery
For weeks, my Raspberry Pi cluster was going offline seemingly at random. No pattern, no warning—just sudden, complete failure. And when I say complete, I mean complete: the root partition would disappear, SSH would die mid-session, and I couldn’t even run basic commands like reboot or exit. The only solution was a hard power reset.
This was maddening because I had no logs. When your root filesystem vanishes, so do all your binaries and your ability to investigate what just happened.
The Investigation
I knew I needed logs that would survive the crashes, so I had to get creative:
- Configured rsyslog to talk to journald
- Set up remote logging to forward everything to my personal computer
- Waited for the next crash (which felt like watching paint dry, but with more anxiety)
- Trudged through mountains of logs looking for clues
Finally, I found something:
|
1 2 3 |
cluster0 fstrim[264392]: fstrim: /boot/firmware: FITRIM ioctl failed: Input/output error cluster0 fstrim[264392]: /data/brick1: 1.7 TiB (1888567922688 bytes) trimmed on /dev/nvme0n1p3 cluster0 fstrim[264392]: /: 112.1 GiB (120364019712 bytes) trimmed on /dev/nvme0n1p2 |
And the smoking gun:
|
1 2 3 4 |
cluster0 kernel: nvme nvme0: I/O tag 152 (3098) opcode 0x9 (I/O Cmd) QID 1 timeout, aborting req_op:DISCARD(3) size:4096 cluster0 kernel: nvme nvme0: I/O tag 153 (2099) opcode 0x1 (I/O Cmd) QID 1 timeout, aborting req_op:WRITE(1) size:20480 cluster0 kernel: nvme nvme0: I/O tag 152 (3098) opcode 0x9 (I/O Cmd) QID 1 timeout, reset controller cluster0 kernel: INFO: task jbd2/nvme0n1p3-:680 blocked for more than 120 seconds. |
The fstrim systemd timer was triggering, attempting to TRIM the /boot/firmware partition (which is vfat/FAT32), and causing the NVMe controller to timeout and reset. When the controller reset, my entire system—including the root partition—would go offline.
The Research Phase (and a Dead End)
Before I even discovered it was fstrim causing the problem, I was reading forum posts about NVMe HATs on Raspberry Pis. The overwhelming message was: “Your HAT (or drive) doesn’t work properly. Go buy different hardware.”
Something didn’t sit right with me about that diagnosis, but without logs showing what was actually failing, I couldn’t prove otherwise.
Once I finally captured logs showing fstrim was the last thing to run before the system panicked, I started searching specifically for “fstrim Raspberry Pi issues.” Guess what I found? Almost nothing. Not one person really discussing fstrim problems on a Pi, which seemed bizarre.
Then it hit me: the Raspberry Pi Imager creates that vfat /boot/firmware partition automatically. Every single person using the official imaging tool has this partition. And fstrim runs weekly by default on most Linux systems.
This means a significant number of people blaming their “incompatible” NVMe drives are probably experiencing the exact same fstrim/vfat issue. They either:
- Get lucky and the timing never quite lines up to cause a full crash
- Experience random instability they can’t explain and give up
- Replace their hardware thinking it’s defective
- Never capture logs because the root partition disappears when it crashes
I searched for fstrim solutions specifically and found nothing helpful. Most suggestions were either:
- “Your NVMe drive doesn’t support TRIM” (but mine clearly did—TRIM worked fine on my ext4 partitions!)
- “Disable TRIM entirely” (throwing the baby out with the bathwater)
- Vague workarounds that didn’t address the root cause
I could manually run fstrim on my ext4 partitions ( / and /data/brick1) without any issues—they trimmed in seconds. But running it on /boot/firmware would lock up, timeout, and crash the entire system.
The Solution
Here’s what I learned: FAT/vfat filesystems have problematic TRIM support. The way the vfat filesystem’s FITRIM ioctl issues TRIM commands can cause certain NVMe controllers to timeout and reset.
The fix is beautifully simple: tell fstrim to only operate on filesystem types that handle TRIM well.
Step-by-Step Fix
-
Edit the fstrim systemd service:
1sudo systemctl edit fstrim.service -
Add this override configuration:
123[Service]ExecStart=ExecStart=/sbin/fstrim --fstab --verbose --quiet-unsupported -t ext4The first empty ExecStart= clears the original command, and the second one replaces it with a version that only trims ext4 filesystems.
-
Reload systemd:
1sudo systemctl daemon-reload -
Test it manually:
12sudo systemctl start fstrim.servicesudo journalctl -u fstrim.service -n 50
That’s it! No more crashes, no more controller resets, and TRIM still works perfectly on the partitions that matter.
Why This Works
My /etc/fstab looked like this:
|
1 2 3 |
PARTUUID=64dd9cc5-01 /boot/firmware vfat defaults 0 2 PARTUUID=64dd9cc5-02 / ext4 defaults,noatime,discard 0 1 /dev/nvme0n1p3 /data/brick1 ext4 defaults,noatime,discard 1 2 |
By adding -t ext4 to the fstrim command, we’re telling it: “Only look at ext4 filesystems when reading from fstab.” The vfat /boot/firmware partition gets automatically skipped, and since that partition rarely changes and is tiny anyway, we’re not losing anything meaningful.
Scalability
The beauty of this approach is that it’s scalable and maintainable:
- Add as many ext4 partitions as you want—they’ll automatically be trimmed
- Problematic filesystem types (vfat, ntfs, etc.) are automatically excluded
- No need to manually list every partition or create exclusion lists
If you use other filesystem types that support TRIM well, you can add them:
|
1 |
ExecStart=/sbin/fstrim --fstab --verbose --quiet-unsupported -t ext4,xfs,btrfs |
Note: If you use ZFS, don’t add it here—ZFS handles TRIM internally through zpool trim commands.
The Aftermath
After applying this fix, my Pis have been rock solid. No more random crashes, no more power resets, and TRIM is still doing its job on the partitions that matter.
It’s frustrating that this issue is so common yet the solution isn’t widely documented. Hopefully this helps someone else avoid the weeks of debugging I went through!
My Setup
For context, here’s what I’m running:
- Raspberry Pi with NVMe HAT
- Booting from NVMe drive
- Three partitions: vfat boot, ext4 root, ext4 data
- GlusterFS cluster (which is irrelevant to this issue but explains the /data/brick1 mount)
Lessons Learned
- Remote logging is essential for debugging systems that can fail catastrophically
- Not all filesystems handle all operations equally well—what works for ext4 might break on vfat
- The default configuration isn’t always right for every use case
- When in doubt, limit operations to what you know works rather than trying to exclude what you know fails
Special thanks to Claude for helping me connect the dots when I was completely stuck. Sometimes you just need a fresh perspective!