Add some new test coverage for various model architectures, and switch from orca-mini to the small llama model.