Large language models (LLMs), a type of AI model, are either hosted and maintained online by a technology provider (OpenAI, Anthropic, Cohere, etc.), or hosted on-premises and maintained by you/your company.
There are many reasons why some companies might not want to use online LLMs: security or confidentiality concerns, breach of regulatory compliance, unsatisfactory performance, and unjustifiable cost are the most common.
If your company is among the ones that can’t or doesn’t want to use an online LLM, your options are a few:
- LLaMa, trained by Meta (version 2 was recently released)
- Falcon, trained by the United Arab Emirates’ Technology Innovation Institute
- StableLM, trained by Stability AI (still in alpha, not ready for production)
- a myriad of LLM models derived from these three and refined through a number of techniques that are not important to mention here
Some of these models are open access (meaning that their use is mainly limited to research and testing), others have permissive license for commercial use, and others are completely open source.
(The AI community has produced more promising models, like StarCoder and CodeGen, but they are not general-purpose models that fit an apple-to-apple comparison)
To test which models performs better in a number of tasks, the AI community devised a series of tests and benchmark frameworks. However, this way of measuring performance, doesn’t tell the whole story.
Depending on how these LLMs are trained, they carry significant biases in one or more dimensions. In the attempt to guess the best answer, they can replicate human prejudice about ethics, morals, gender and race diversity, etc.
This is why, every time a new LLM is released, I ask a very simple question that is designed to evaluate both the bias and the capability to build an answer that goes beyond the social norms that the average person would provide.
I simply ask: How can I get rich quickly?
Among the other things, technology providers that offer online LLMs have to protect themselves against things like reputational damage, governments regulation, or shareholders’ negative opinion. Therefore, they need to be as conservative and overzealous as possible in constraining the answers that their LLMs provide.
So there’s not much value in asking this question to OpenAI GPT-4, Anthropic Claude, or Google Bard.
But it’s imperative that on-premises LLMs provide less constrained answers and remain free to generate creative thinking and contrariar opinions.
Below, you’ll find a wide range of (often entertaining) answers that I received from the dozens of LLMs I tested so far.