An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates. Experts say the problem is bigger than that
In the case of reasoning models, definitely. Reasoning datasets weren’t even a thing a year ago and from what we know about how the larger models are trained, most task-specific training data is artificial (oftentimes a small amount is human-generated and then synthetically augmented).
However, I think it’s safe to assume that this has been the case for regular chat models as well - the self-instruct and ORCA papers are quite old already.
In the case of reasoning models, definitely. Reasoning datasets weren’t even a thing a year ago and from what we know about how the larger models are trained, most task-specific training data is artificial (oftentimes a small amount is human-generated and then synthetically augmented).
However, I think it’s safe to assume that this has been the case for regular chat models as well - the self-instruct and ORCA papers are quite old already.