Get started with MAX

On this page, we'll show you how to run some example projects.

By installing MAX, you understand and agree to our software license.

1. Install Magic

Magic is a virtual environment manager and package manager based on conda. You can install it on macOS and Ubuntu Linux with this command:

curl -ssL https://magic.modular.com | bash

Then run the source command printed in your terminal.

2. Run your first model

Let's start with something simple just to make sure everything works.

Clone the MAX code examples:

git clone https://github.com/modularml/max.git

Navigate into the BERT Python code example and activate the Magic environment shell (which installs MAX):
```
cd max/examples/inference/bert-python-torchscript
```
```
magic shell
```
When you first invoke an environment command such as magic shell, Magic installs MAX and all the project dependencies inside the project-specific virtual environment.

When it finishes the setup, you should see the environment name prefixed to your terminal prompt. For example:
```
(BERT Python Torchscript) jamie:~/max/examples/inference/bert-python-torchscript$
```
Use this script to download and run the BERT model in MAX:
```
bash run.sh
```
This downloads the model from HuggingFace and runs it with some input text.

You should see results like this:
```
input text: Paris is the [MASK] of France.
filled mask: Paris is the capital of France.
```
Cool, it works! (If it didn't work, let us know.)

After downloading the model, run.sh calls a Python script that executes it using the MAX Python API. You can see the code on GitHub. This API allows you to use MAX Engine as a drop-in replacement for your existing runtime, with just 3 lines of code. You can also use MAX Engine with our C and Mojo APIs.
To exit the Magic shell, just type exit:
```
exit
```

Compile time

The first time you run the example, it takes some time to compile the model. This might be unfamiliar if you're used to "eager execution" in ML frameworks, but MAX Engine uses next-generation compiler technology to optimize the graph and extract more performance, without any accuracy loss. This happens only when you load the model, and it pays dividends with significant speed-ups at run time.

If you're interested in how our performance compares to stock frameworks on different CPU architectures, check out our performance dashboard.

**Figure 1.** MAX Engine latency speed-up when running Mistral-7B vs PyTorch (MAX Engine is 2.5x faster).

3. Try Llama3 on MAX

In the previous example, we ran a PyTorch model using the MAX Engine Python API, but MAX offers much more than that. You can also use MAX to build high-performance, state-of-the-art AI models in Mojo.

Mojo is a systems programming language built from the ground up to deliver maximum performance on any hardware and enable programmability across the entire AI software stack. You don't have to write a single line of Mojo to accelerate your models with MAX Engine. However, MAX Engine and Mojo share essential compiler technologies, which means Mojo has unique abilities that unlock new levels of performance for your models in MAX Engine.

Take a look for yourself. We've built the Llama 3 large language model entirely in Mojo, using the MAX Graph API. It's incredibly fast and you can try it right now:

Navigate back to the path where you cloned our repo. Then navigate to the Graph API examples:
```
cd max/examples/graph-api
```
Now let's use magic run to execute a command inside the virtual environment, without actually activating a shell into it:
```
magic run llama3 --prompt "what is the meaning of life"
```
This time, Magic already has a cached version of MAX, downloaded for the previous example. So it only needs to download the model weights, compile the Llama 3 model, and run it.

When the pipeline begins, you'll see it print the response in real-time as tokens are emitted by the model. This is all running locally on your CPU. When it's done, you'll also see some performance stats.

Check out the Mojo Llama 3 code on GitHub.

Next steps

MAX is available now as a developer preview, and there's much more to come. To understand how MAX can accelerate your AI workloads and simplify your workflows, try some of the following:

Try more examples

See more examples that execute PyTorch and ONNX models using Python, C, and Mojo, plus GenAI models built in Mojo.

Run a model with Python

Learn how to run inference with an ONNX model, using our Python API.

Build a graph in Mojo

Learn more about how to build high-performance AI models using Mojo and the Graph API.

Deploy a model

Learn how to deploy a model using MAX Engine, Amazon SageMaker, and AWS CloudFormation.

And this is just the beginning!

In the coming months, we'll add support for GPU hardware, more extensibility APIs, and more solutions for production deployment with MAX.

Join the discussion

Get in touch with other MAX developers, ask questions, and share feedback on Discord and GitHub.

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!
If you'd like to share more information, please report an issue on GitHub

😔 What went wrong?

1. Install Magic​

2. Run your first model​

3. Try Llama3 on MAX​

Next steps​

1. Install Magic

2. Run your first model

3. Try Llama3 on MAX

Next steps