I think its much better than opus. I would describe its code output as boring and to the point, no fluff. O3 Pro is better at abstraction, but grok heavy is better at bug hunting, and only doing exactly as needed. I swapped my openai pro license for grok, its good. Another big advantage is the context window size. Honestly I use these models all day long, and have felt that while sonnet 3.5 was ground breaking, but that anthropic is behind google, openai, and now xai.
For tools, I use repo prompt + grok website. Personally think claude code is overrated, and hand building the context by selecting the files is far better for complicated tasks