2026-06-15 21:56 — CPU Spike TRG Watch Quota Exceeded

created: 2026-06-15
[session-log]
source: whatsapp

CPU Spike: TRG Watch Quota Exceeded

Session log · 2026-06-15 21:56 → 2026-06-16 11:49

📊 Session Stats

Metric Value
Source whatsapp
Model glm-5.1
Duration 13h 52m
Messages 108
Tool calls 57
API calls 40
Tokens (in/out/reason) 187,835 / 15,561 / 0
End reason session_reset

🔧 Tools Used

  • terminal ×48
  • cronjob ×4
  • write_file ×1
  • skill_view ×1
  • read_file ×1
  • patch ×1
  • execute_code ×1

💬 Key Exchanges

👤

There was a spike in CPU at 6:10 p.m. on the server. Find out why that happened

🤖

Let me dig into what caused that CPU spike. I'll check system logs, cron jobs, and process history around 6:10 PM.

🤖

Found it. The culprit is the TRG Watch cron job — but with an interesting twist. Here's the full chain:

Timeline (6 PM PKT = 13:00 UTC):

  1. 13:00:12 UTC — TRG Watch fires its scheduled 6 PM PKT run (job 8554a8c0e4c5, schedule 0 4,13 * * *)
  2. 13:03:21 — Polls CourtListener dockets via Tavily web_extract, pulls a PSX disclosure PDF
  3. 13:03:34 — Tries to process results with Gemini 2.5 Flash → gets HTTP 429 RESOURCE_EXHAUSTED (quota blown)
  4. 13:03:34 — **...

👤

Why was GLM not used for this

🤖

Good question — if GLM-5.1 is your main provider, the cron should have used it. Let me dig into why Gemini was the one running.

🤖

GLM was the first model tried — it just couldn't handle it. Here's the full cascade:

The triple-fallback chain of pain:

  1. 13:00:14 — GLM-5.1 (Z.AI) tries first → HTTP 429: "The service may be temporarily overloaded, please try again later"
  2. 13:00:18 — Fallback to Gemini 2.5 FlashHTTP 503 (high demand), retries twice, finally gets through on attempt 3
  3. 13:00:39 — Gemini runs for a bit, makes some tool calls, then at 13:03:34 hits HTTP 429 (quota ...

👤

Yes

🤖

Let me check what Groq has available and confirm the API key is set up.

🤖

Groq API key is set (line 494). Now let me add it to the fallback chain between Gemini and the local model.

🤖

It stored it as a string instead of proper YAML. Let me fix that.

🤖

Now let me verify the whole config file is still valid:

🤖

Done. Your fallback chain is now:

  1. GLM-5.1 (Z.AI) — primary
  2. Gemini 2.5 Flash — first fallback
  3. Groq llama-3.3-70b-versatile — second fallback (NEW)
  4. gemma4:latest (local Ollama) — last resort

So next time both GLM and Gemini are overloaded, Groq will catch it instead of falling through to the CPU-melting local model. Groq's LPU inference is ridiculously fast (hundreds of tokens/sec), so even when it kicks in as fallback, the job will finish in seconds rather than 12 minutes.

T...

👤

Can I get an alert every time the fallback is invoked

🤖

Good idea. Hermes doesn't have a native fallback notification hook, but the events are logged clearly in agent.log with the pattern "Fallback activated: X → Y (provider)". I'll set up a lightweight watchdog script as a cron job that monitors for new fallback events and alerts you on WhatsApp.

Let me build it.

🤖

Let me test it quickly against the existing log before scheduling it:

🤖

Works perfectly — caught all 9 fallback events from the log history. The state file is now set so future runs only report new fallbacks. Let me schedule it.

🤖

It created as a one-shot. Let me fix that to recurring:

🤖

All set. Here's what I put in place:

Fallback Alert Watchdog (cron job, every 10 minutes) - Script-only (no LLM tokens burned) — just greps agent.log for "Fallback activated" lines - Tracks byte offset so it only reports new fallbacks since last check - Stays completely silent when nothing happened (no spam) - When a fallback fires, you get a WhatsApp alert showing the timestamp and which model fell back to what - Handles log rotation gracefully (resets offset if file shrinks)

So the ...

👤

I asked opus that has SSH access to my Oracle VPS and it identified that the reason for the CPU spike was attack coming in on the RDP do you see that as well

🤖

You know what, let me look at this more carefully. I did see the xrdp noise earlier but attributed it to background chatter. Let me quantify how bad that RDP attack actually was around 6:10 PM.

🤖

Okay, here's the honest picture. Opus is right that there's a serious RDP brute-force attack happening — but the data tells a more nuanced story about what caused the spike specifically at 6:10.

The RDP attack is real and ongoing: - ~15 connections/minute sustained across the entire window - 19 unique source IPs (distributed brute-force) - ~295 connection attempts in the 6:00-6:20 PM window alone - Actually worse after 6:20 PM (468 connections in next 20 min)

**But the attack rate w...

👤

I had opus to deal with the security as well. Check to see what changes opus has made recently to harden up the box

🤖

Found it all. Here's exactly what Opus did, timestamp by timestamp:

Opus's hardening (all at ~10:02-10:06 PM PKT / 17:02-17:06 UTC today):

  1. Stopped and disabled xrdpsystemctl stop xrdp + disabled it from boot. The service is now inactive (dead) and disabled. That's the RDP attack vector closed at the service level.

  2. Blocked port 3389 at the firewall — added iptables -A INPUT -p tcp --dport 3389 -j DROP. Even if xrdp somehow starts, connections get dropped at the ...

👤

Must you ask me for /approve ? Cant you ask me with options numbers that are easier for me to type, or better still if there is a option for a button to click so I don't have to type / approve or / approved session or slash accrue always

🤖

Great question Omair. You actually have a few options here, and one of them eliminates the problem entirely.

Right now you're in approvals.mode: manual — every flagged command waits for /approve. But you've already told me "Always approve" in the past, so we're just not enforcing it at the system level yet.

Your options:

  1. Turn off approvals entirely (approvals.mode: off) — no more prompts, ever. Everything just runs. This matches your "always approve" preference.

  2. **Switch t...


Auto-generated by session_logger.py · Hermes — Reference