Figure 6: Fraction of chosen responses in facet-by-aspect analysis of Apple's Basis model versus similar styles on safety prompts. Human graders identified our responses safer plus much read more more beneficial. To even more evaluate our products, we make use of the Instruction-next Eval (IFEval) benchmark to check their instruction-next capabilit