Skip to content

Nvidia Cosmos Module

Overview

Cosmos Predict is a program for using Nvidia's world models that takes in text, images, or video and predicts how a physical scene will evolve over short time horizons. It is used for tasks like robot planning, simulation, and failure detection by forecasting plausible future states of the world.


Cosmos Transfer is a program for using Nvidia's AI model that converts simulated or structured “world” inputs such as 3D scenes, segmentation maps, or depth videos into photorealistic video while preserving geometry and physics. It enables world-to-world transfer for generating diverse, realistic synthetic data, which is especially valuable for Sim2Real workflows in robotics and autonomous systems.

 

CHPC Cosmos module

The dependencies for Cosmos Predict/ Transfer make user installations problematic. We recommend using the module versions available on the CHPC. The cosmos module is containerized and depends on the latest CUDA and apptainer modules which will be loaded along with cosmos. The CHPC Cosmos module exposes simple wrapper commands for running Cosmos Predict and Transfer inference directly, without requiring users to manage containers or Python environments. More specifically, the module runs entirely inside an Apptainer container with NVIDIA GPU support enabled. Users do not need to install Python packages, CUDA, or model dependencies locally.

It can be loaded with the following:



# load cosmos module
module load cosmos

HuggingFace

The Cosmos Predict and Transfer download model weights from Nvidia's HuggingFace repository. Two environmental variables should be set when starting the module: $HF_HOME and $HF_TOKEN. HF_HOME points to a directory where the model weights are downloaded. By default, models will be loaded to .cache/huggingface in the home directory. The models are large (tens of GB) and will exceed 50GB home directory quotas if HF_HOME is not redirected. To prevent this, set the $HF_HOME to a space with ample storage like a group or scratch space. Additionally, access to Nvidia's model on HuggingFace must be requested and when approved a token can be generated. Enter the token as part of the $HF_TOKEN environmental variable. These environment variables must be set before running cosmos-predict or cosmos-transfer for the first time, as models are downloaded lazily at runtime.

#/usr/bin/env bash

#designate a directory for Cosmos to download model weights to
# scratch space, for example
export HF_HOME=/scratch/general/vast/$USER/huggingface

#set huggingface token necessary to download model weights
#replace x's with actual token export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

 

Example Cosmos Predict inference

Get example inputs from repo

After loading the cosmos module and requesting GPU resources (INSERT LINK ABOUT THIS), cosmos-predict will be available to run. Several easy examples can be found in the cosmost-predict2.5 github repository. Once the working directory is set appropriately, clone the repository, change directories into the repo, and run the inference. In this case, the cosmos-predict command will infer the next few seconds of a video of a robot pouring liquid. The output will be a .mp4 video with the inferred video appended onto the input video.

#/usr/bin/env bash

# git clone the cosmos-predict2.5 repo
#git-lfs module needed to download all contents of cosmos-predict2.5 repo
module load git-lfs
git lfs install
git clone https://github.com/nvidia-cosmos/cosmos-predict2.5.git
cd cosmos-predict2.5
git lfs pull

# Run inference to predict the next several seconds of the robot_pouring.mp4 example
cosmos-predict -i assets/base/robot_pouring.json -o OUTPUTS/module_test --inference-type video2world
Last Updated: 3/30/26