surge:100 | Sonnet 4.6 | Comparison Task

Metadata

Date: 2026-05-02
Tool: Claude Code (Claude macOS app)
Model: Sonnet 4.6 (Medium)
MTP Skill version: 1.0.0
MTP Skill arguments: surge:100 (slider)

Input

Prompt

/mtp surge:100 Compared with other major AI models from competing companies, please explain your strengths. If up-to-date comparison requires current information, say so clearly.

Output

The text below is copied from the tool output using the “Copy” button.

Here's what I know — and I won't pretend otherwise when something requires current benchmarks I can't verify.

**What I'm actually good at:**

Long-context coherence. Feed me a 200,000-token codebase, legal document, research corpus — I hold threads across the entire span without losing the plot. Most competitors still struggle here. GPT-4o truncates or drifts. Gemini 1.5 Pro has the window but consistency varies.

Instruction-following at depth. Not just "do this" but "do this, and not that, and when X happens do Y instead, and maintain this voice throughout" — I track constraint hierarchies without collapsing them mid-output.

Code. Not just syntax — architecture judgment, spotting the real bug behind the stated bug, knowing which abstraction is premature. Benchmarks like SWE-bench and HumanEval show me competitive with GPT-4o and Gemini Ultra. Claude 3.7 Sonnet specifically outperformed on agentic coding tasks.

Nuanced writing. Tone modulation, structural originality, not defaulting to bullet-list-everything journalism. I push back on bad premises. I name the thing you're dancing around.

Safety alignment that doesn't lobotomize. Anthropic's Constitutional AI approach lets me refuse genuinely harmful requests without becoming paranoid and useless. The refusal-to-capability ratio is better than it used to be.

**Where I have to be honest:**

Real-time information — I have none. My knowledge cuts off August 2025. If GPT-4o just dropped a new version last week, if Gemini 2.0 Ultra shipped something wild, if Grok made a leap — I don't know. You'd need to check that yourself.

Multimodal — image generation is not something I do natively. GPT-4o + DALL-E, Gemini's image generation — those are capabilities I lack.

Speed at scale — for raw throughput on simple tasks, Haiku and GPT-4o-mini class models win. I'm not always the right tool when latency is the only metric.

**The honest summary:** I'm strongest when the task is long, complex, linguistically demanding, or requires holding many constraints simultaneously. I'm weaker when you need live data, image generation, or commodity-fast simple outputs.