1. Introduction
As Sam Altman once tweeted in this year, 2024: “This is the most interesting year in human history, except for all future years”. This is no exaggeration given the rapid pace of AI progress. Each new LLM release seems to change the state of the game, and the latest open source one, Llama 3 8b, demonstrates that even smaller models can excel if trained with more tokens than the “compute optimal.” Although not trained following Chinchilla laws, it shines in training over orders of magnitude more tokens than anticipated, while the computed optimal number of tokens is about 200B it was trained on 15T tokens.
To grasp what these models can achieve, check out Huggingface’s task taxonomy. The capabilities are amazing, ranging from image generation to text processing. Text is particularly interesting because it’s the primary form of communication, especially in tech where instructions are given in text form, similar to cooking recipes, we call coding.
2. Navigating the AI Landscape: Considerations and Concerns
Let’s outline your options for accessing powerful AI models:
- Local hardware like an RTX 4090;
- Cloud services:
- Via APIs (OpenAI, Google Cloud, Claude, Groq, etc.)
- Renting cloud hardware (Google Cloud, Microsoft Azure, AWS, Lambda Labs, RunPod, etc.)
The main decision boils down to either purchasing hardware or subscribing to a service. But what should you consider?
2.1 Privacy and Security
Cloud and LLM API providers generally offer guarantees regarding privacy and security, but how much do you trust them? Running models locally ensures you have control over data security and privacy by reducing exposure to external servers.
Even large enterprises struggle with this issue, such as when Samsung banned employees from using ChatGPT due to data leaks. If your work requires strict confidentiality or regulatory compliance, APIs and cloud VMs might be off-limits, leaving local hardware as your best bet.
In terms of privacy and security:
- Local hardware: most private/secure
- Cloud VMs
- Cloud APIs: least private/secure
2.2 Accessibility and Cost
Open-source models are closing the gap with proprietary ones, even in sheer parameter count, but they’re still behind in state-of-the-art performance. For peak performance, you’ll need to rely on big names like OpenAI, Claude, and Google.
Comparing local hardware and cloud VMs also presents challenges. While wealthy enthusiasts can set up rigs with x4 RTX 4090s, even those are limited to 96 GB of VRAM. With Llama 3’s 400 billion parameters on the horizon, don’t expect to run high-end models on consumer-grade hardware alone.
On the other hand, enterprise-grade A100s and H100s offer features unavailable in consumer GPUs (for a lengthy explanation about it, check this blog post). If you require top-tier AI performance, consumer hardware isn’t the best investment.
Cost Considerations:
It depends on how often you will use the hardware (utilization rate). If you’re not utilizing it 24/7, cloud options like APIs or VMs will be more flexible and affordable. If you require round-the-clock usage or are training a model from scratch, with a multi-month training, calculate to see if the total cost of ownership is worthwhile.
2.3 Ethical Implications
Ethical concerns are plentiful due to the lack of transparency regarding training data, even for open-weight models. Proprietary models are worse, providing no weight visibility, meaning, limited model interpretability. Although biases can be identified in closed models, addressing them is difficult without fine-tuning.
Models often reflect biases favoring their creators, and users are locked into their parameters. Moreover, alignment remains gray in some areas, which can pose challenges for legitimate use cases (e.g., film research that asks how to hijack a car). Close model providers might mistakenly flag users for violating terms of service.
Overall, ethical issues with closed models arise from the opacity and differing interpretations of the right, wrong, and “still open question”. Here, we rely on corporations to act responsibly 😥.
3. Accessing AI Power: Local Hardware vs. Cloud
Now you have the tools to make a choice but consider these quick guidelines:
- Privacy-focused: If you need absolute privacy and security, opt for local hardware or cloud VMs;
- Experimentation: If you’re exploring possibilities, start with APIs for high-end proprietary models like ChatGPT and Claude. Test feasibility first, then try replicating results with smaller, open-weight models;
- Research and Fine-tuning: For fine-tuning or reinforcement learning (RL), start with cloud VMs. If compute demands become significant, assess whether local hardware makes sense in the long run.
If you are stuck on local hardware vs cloud GPUs, we can get on the details on the economic side. “What would be the cheapest option for my project?”, we will use the RTX 3090 vs AWS V100 example that can be found on this blog post. But a general rule: If your projects are expected to extend more than a year, investing in a desktop GPU generally proves more economical. For shorter or less intensive projects, cloud instances might be more suitable, especially for those who can capitalize on the flexibility to scale GPU resources as needed.
3.1 Comparative Cost Analysis: Desktop vs. Cloud GPU
3.1.1 Desktop GPU Costs
- Hardware: A desktop equipped with an NVIDIA RTX 3090 GPU might cost around $2,200.
- Electricity: In the US, the cost of electricity is typically about $0.12 per kWh.
Assuming the desktop has a GPU that consumes 350 watts and a CPU that consumes an additional 100 watts, the total power consumption at 15% utilization (average over a year) can be calculated as follows:
(350 W+100 W)×0.15×24 hours×365 days=591 kWh
This results in an annual electricity cost of:
591 kWh×$0.12/kWh=$70.92
3.1.2 Cloud GPU Costs
- Usage: For an AWS on-demand instance using a V100 GPU, the rate is $2.14 per hour.
For 15% utilization over 300 days, the cost calculation is:
$2.14/hr×0.15×24 hours×300 days=$2,311.2
3.1.3 Break-Even Analysis
By comparing these costs, the break-even point where both options cost approximately the same is around 300 days of usage. Specifically:
- Desktop total cost: $2,200 (initial cost) + $70.92 (electricity) = $2,270.92
- Cloud total cost for 300 days at 15% utilization: $2,311.2
Therefore, if you plan to use the GPU for beyond 300 days, purchasing a desktop GPU becomes more cost-effective than using a cloud instance.
Note: Consider hidden costs such as time for setup and ongoing maintenance of a local machine. Cloud instances, while potentially more expensive over long periods, offer significant flexibility to scale resources as needed.
3.1.4 Utilization Considerations
Utilization rates can vary significantly based on the user, but what people don’t have in mind it they are usually lower than expected, as the number of the blog post suggested:
- Personal machines often fall between 5-10%.
- PhD students typically have lower utilization rates on personal desktops (less than 15%).
- Research clusters, like those managed with Slurm at universities or within companies, often see higher utilization rates (over 35% for student clusters and over 60% for company-wide clusters).
After all of that, if you are in doubt of what GPU to use, you can check here for consumer hardware and here for professional-grade ones.
4. Cost Calculator
For your convenience, we have built this calculator that helps you figure out, for your particular context, if buying local hardware is better than renting cloud solutions. Here’s how to use the calculator (all in USD):
- GPU Cost: we assume you already have some kind of desktop. So here please input the cost of the GPU you plan on purchasing (if additional hardware changes are required, like PSU upgrades, please also include them in the cost). We have defaulted to $2000 USD, the average cost of an RTX 4090 in May 2024.
- GPU Power Consumption: input the power consumption of your GPU card, so that the calculator can estimate energy cost. By default set to 450W which is the RTX 4090 consumption.
- Mean Annual Utilization Rate: the % of time that the computer will be running model training or inference. Say, if the total time your computer is used during a year for ML purposes equals 36 days (864 hours), then Utilization Rate is 10% (36/365). See section above for guidelines on the typical utilization rates based on your profile.
- Cost of Electricity: input the cost per kWh of your location. You’ll have to research this information from your utilities provider. In Medellin Colombia the average cost is $1100 COP/kWh, which is 0.28$USD/kWh. Please input the cost in dollars.
- Cloud Cost Per Hour: get the cost of your cloud alternative, per hour. We’ve defaulted to $2.14 per hour for an AWS on-demand instance using a V100 GPU.
Cost Calculator
How to interpret the break-even:
- Your project duration, and any other projects you have planned, will have a duration under the break-even days -> You should rent cloud hardware
- Your project duration, and any other projects you have planned, will have a duration longer than break-even days -> You should buy local hardware
4. Conclusion
That’s it AI enthusiast! Choosing the where and what compute substrate of your application should run is a case-dependent and ever-evolving question as your project evolves and pivots. The following are two emerging trends that you should have in mind.
4.1 Models Are Still Improving, Especially in Reasoning
LLMs haven’t reached their architectural limits yet, and scaling laws indicate at least an order-of-magnitude improvement in parameter count for purely text data (supposing that GPT-4 has 1.5 trillion parameters and the theoretical asymptotic from the original scaling laws paper). Simply throwing more compute and data at these models will lead to advancements. Reasoning capabilities are the golden goose, with major players focused on improving them.
Llama 3’s release reignited the push for smaller models, even if trained longer. This trend will expand use cases as smaller models become increasingly capable.
4.2 A World Starving for Compute
Though training foundation models currently require a massive number of GPUs and data centers, demand for inference will also surge. Imagine multiple GPT6-based agents working on different ideas to tackle the same real-world problem, the cycle of requiring more compute for training and inference will continue.
For you to fully leverage the AI wave, understanding your requirements and future developments is crucial for selecting the right hardware or cloud services.