Why did Meta release OPT-175 and Llama separately?
In other words, why not keep making one model and improve it?
Meta is huge. Probably released by different teams on parallel tracks. There can be benefits to diversity of effort.
They're trained in completely different ways. Llama is trained in a way optimised for inference.