Spaces:

vaibhavpandeyvpz
/

stable-diffusion-fast-text-to-3d

Running on Zero

App Files Files Community

vaibhavpandeyvpz commited on 6 days ago

Commit

eeef97b

0 Parent(s):

Deploy to HF spaces

Browse files

Files changed (41) hide show

.gitattributes +2 -0
.gitignore +77 -0
README.md +124 -0
app.py +472 -0
load/tets/160_tets.npz +3 -0
requirements.txt +40 -0
sf3d/models/camera.py +32 -0
sf3d/models/global_estimator/multi_head_estimator.py +118 -0
sf3d/models/image_estimator/clip_based_estimator.py +168 -0
sf3d/models/isosurface.py +229 -0
sf3d/models/mesh.py +289 -0
sf3d/models/network.py +213 -0
sf3d/models/tokenizers/dinov2.py +1196 -0
sf3d/models/tokenizers/image.py +101 -0
sf3d/models/tokenizers/triplane.py +49 -0
sf3d/models/transformers/attention.py +31 -0
sf3d/models/transformers/backbone.py +515 -0
sf3d/models/utils.py +236 -0
sf3d/system.py +534 -0
sf3d/utils.py +105 -0
texture_baker/README.md +26 -0
texture_baker/requirements.txt +2 -0
texture_baker/setup.py +142 -0
texture_baker/texture_baker/__init__.py +4 -0
texture_baker/texture_baker/baker.py +86 -0
texture_baker/texture_baker/csrc/baker.cpp +548 -0
texture_baker/texture_baker/csrc/baker.h +203 -0
texture_baker/texture_baker/csrc/baker_kernel.cu +306 -0
texture_baker/texture_baker/csrc/baker_kernel.metal +170 -0
texture_baker/texture_baker/csrc/baker_kernel.mm +260 -0
uv_unwrapper/README.md +0 -0
uv_unwrapper/requirements.txt +2 -0
uv_unwrapper/setup.py +83 -0
uv_unwrapper/uv_unwrapper/__init__.py +6 -0
uv_unwrapper/uv_unwrapper/csrc/bvh.cpp +381 -0
uv_unwrapper/uv_unwrapper/csrc/bvh.h +118 -0
uv_unwrapper/uv_unwrapper/csrc/common.h +493 -0
uv_unwrapper/uv_unwrapper/csrc/intersect.cpp +702 -0
uv_unwrapper/uv_unwrapper/csrc/intersect.h +10 -0
uv_unwrapper/uv_unwrapper/csrc/unwrapper.cpp +271 -0
uv_unwrapper/uv_unwrapper/unwrap.py +669 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ *.npz filter=lfs diff=lfs merge=lfs -text
2	+ load/tets/160_tets.npz filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,77 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+env/
+ENV/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Jupyter Notebook
+.ipynb_checkpoints
+# Environment variables
+.env
+.env.local
+# Model cache
+.cache/
+*.safetensors
+*.ckpt
+*.pt
+*.pth
+# Generated files
+output/
+*.glb
+*.gltf
+*.obj
+*.ply
+# Gradio temp files
+gradio_cached_examples/
+flagged/
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log
+logs/
+# Temporary files
+tmp/
+temp/
+*.tmp
+# Hugging Face
+.huggingface/
+references/

README.md ADDED Viewed

	@@ -0,0 +1,124 @@

+---
+title: Stable Diffusion Fast Text to 3D
+emoji: 🎨
+colorFrom: red
+colorTo: pink
+sdk: gradio
+sdk_version: 6.1.0
+app_file: app.py
+pinned: false
+license: other
+license_name: stabilityai-ai-community
+license_link: https://huggingface.co/stabilityai/stable-fast-3d/blob/main/LICENSE.md
+models:
+  - stabilityai/stable-diffusion-xl-base-1.0
+  - stabilityai/stable-fast-3d
+gpu: true
+---
+# Text to Image to 3D Generation
+This Hugging Face Space provides a complete workflow to generate 3D models from text prompts using:
+1. **Stable Diffusion XL** - Generate high-quality images from text prompts
+2. **rembg** - Remove backgrounds from generated images
+3. **Stable Fast 3D** - Convert images to 3D mesh models
+## Features
+- 🎨 **Text to Image**: Generate images using Stable Diffusion XL base model
+- ✂️ **Background Removal**: Automatically remove backgrounds using rembg
+- 🎮 **3D Generation**: Create textured 3D mesh models from images
+- 🔄 **Step-by-step Workflow**: Review and confirm at each step
+- ⚙️ **Customizable**: Adjust remeshing options, vertex count, and texture resolution
+## How to Use
+1. **Step 1 - Text to Image**:
+   - Enter your text prompt describing what you want to generate
+   - Optionally add a negative prompt to exclude unwanted elements
+   - Adjust the number of inference steps (more steps = higher quality, slower)
+   - Click "Generate Image" and wait for the result
+2. **Step 2 - Background Removal**:
+   - Review the generated image
+   - Click "Continue to Background Removal" to remove the background
+   - Preview the result with transparency
+3. **Step 3 - 3D Generation**:
+   - Review the background-removed image
+   - Adjust 3D generation settings:
+     - **Remeshing Option**: Choose "none", "triangle", or "quad" remeshing
+     - **Target Vertex Count**: Set to -1 for automatic, or specify a target count
+     - **Texture Size**: Choose texture resolution (512-2048)
+   - Click "Continue to 3D Generation" to create the 3D model
+   - Download your GLB file
+## Tips
+- **Prompts**: Be descriptive and specific. Include style keywords like "3D render", "character", "stylized"
+- **Background Removal**: Works best with clear foreground objects
+- **3D Generation**:
+  - Use "none" remeshing for best quality
+  - Higher texture sizes produce better quality but take longer
+  - Vertex count of -1 uses the model's default
+## Technical Details
+- **Models Used**:
+  - `stabilityai/stable-diffusion-xl-base-1.0` for text-to-image
+  - `rembg` for background removal
+  - `stabilityai/stable-fast-3d` for image-to-3D
+- **Output Format**: GLB (glTF Binary) files compatible with most 3D software and viewers
+- **GPU Resource Management**:
+  - Uses `@spaces.GPU()` decorators to properly manage GPU resources in Hugging Face Spaces
+  - GPU is allocated for text-to-image generation, background removal, and 3D mesh generation
+  - Ensures efficient GPU usage across the workflow
+## Requirements
+This Space requires:
+- GPU support (recommended for faster generation)
+- Sufficient memory for model loading
+- Internet connection for model downloads
+- **Access to gated models**: The `stabilityai/stable-fast-3d` model is gated. You must:
+  1. Accept the model's terms of use on [Hugging Face](https://huggingface.co/stabilityai/stable-fast-3d)
+  2. The Space will automatically authenticate using the `HF_TOKEN` environment variable
+### Dependencies
+This Space uses several models and packages:
+1. **Stable Diffusion XL**: Automatically downloaded from Hugging Face
+2. **rembg**: Installed via pip (included in requirements.txt)
+3. **Stable Fast 3D**:
+   - The model weights are downloaded from Hugging Face
+   - The `sf3d` Python package is included in this repository
+   - **Note**: This is a gated model - access must be granted by Stability AI
+4. **texture_baker and uv_unwrapper**:
+   - These packages are included in the repository
+   - They are automatically compiled and installed at runtime when the app starts
+   - Installation may take a few minutes on first run
+   - CUDA architecture is automatically detected or uses fallback architectures
+### Authentication
+- **Hugging Face Token**: The Space automatically authenticates using the `HF_TOKEN` environment variable
+- **Gated Model Access**: You must accept the terms and request access to `stabilityai/stable-fast-3d` on Hugging Face
+- The authentication happens at startup, so all model downloads use the authenticated session
+All required packages (`sf3d`, `texture_baker`, `uv_unwrapper`, and `load/tets`) are included in this repository, so no additional setup is needed.
+## License
+This Space uses models with the following licenses:
+- Stable Diffusion XL: [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
+- Stable Fast 3D: [Stability AI Community License](https://huggingface.co/stabilityai/stable-fast-3d/blob/main/LICENSE.md)
+## Credits
+- [Stability AI](https://stability.ai/) for Stable Diffusion XL and Stable Fast 3D
+- [rembg](https://github.com/danielgatis/rembg) for background removal
+- Built with [Gradio](https://gradio.app/)

app.py ADDED Viewed

	@@ -0,0 +1,472 @@

+import spaces
+import torch
+import os
+import tempfile
+import time
+from contextlib import nullcontext
+from functools import lru_cache
+from typing import Any
+import gradio as gr
+import numpy as np
+import rembg
+from diffusers import DiffusionPipeline
+from gradio_litmodel3d import LitModel3D
+from huggingface_hub import login
+from PIL import Image
+# Authenticate with Hugging Face using token from environment
+# HF_TOKEN is automatically available in Hugging Face Spaces
+hf_token = os.environ.get("HF_TOKEN")
+if hf_token:
+    # Login to Hugging Face - this stores the token for all HF Hub operations
+    login(token=hf_token)
+    # Also ensure it's set as environment variable for any libraries that check it directly
+    os.environ["HF_TOKEN"] = hf_token
+    print("Authenticated with Hugging Face")
+else:
+    print("Warning: HF_TOKEN not found. Gated models may not be accessible.")
+    print("Please ensure HF_TOKEN is set in your Space's secrets.")
+if not torch.cuda.is_available():
+    raise Exception("CUDA is not available")
+# Set environment variables for building texture_baker and uv_unwrapper
+os.environ["USE_CUDA"] = "1"
+os.environ["USE_NATIVE_ARCH"] = "0"  # Disable native arch to avoid build issues
+# Set CUDA architecture list to avoid detection issues
+# PyTorch's build system fails when it can't detect GPU architectures
+# Setting TORCH_CUDA_ARCH_LIST explicitly prevents this error
+if torch.cuda.is_available():
+    try:
+        # Try to get the actual compute capability
+        compute_cap = torch.cuda.get_device_capability(0)
+        cuda_arch = f"{compute_cap[0]}.{compute_cap[1]}"
+        os.environ["TORCH_CUDA_ARCH_LIST"] = cuda_arch
+        print(
+            f"Detected CUDA capability: {cuda_arch}, setting TORCH_CUDA_ARCH_LIST={cuda_arch}"
+        )
+    except Exception as e:
+        # Fallback to common architectures if detection fails
+        # Include multiple architectures to support various GPU models
+        fallback_archs = "7.0;7.5;8.0;8.6;8.9;9.0"
+        os.environ["TORCH_CUDA_ARCH_LIST"] = fallback_archs
+        print(
+            f"Could not detect CUDA capability: {e}, using fallback architectures: {fallback_archs}"
+        )
+else:
+    # Should not happen since we check above, but just in case
+    print("Warning: CUDA not available but trying to build with CUDA support")
+os.system(
+    "USE_CUDA=1 USE_NATIVE_ARCH=0 pip install -vv --no-build-isolation ./texture_baker ./uv_unwrapper"
+)
+import sf3d.utils as sf3d_utils
+from sf3d.system import SF3D
+# Set up environment
+os.environ["GRADIO_TEMP_DIR"] = os.path.join(os.environ.get("TMPDIR", "/tmp"), "gradio")
+# Initialize rembg session
+rembg_session = rembg.new_session()
+# Constants for 3D generation
+COND_WIDTH = 512
+COND_HEIGHT = 512
+COND_DISTANCE = 1.6
+COND_FOVY_DEG = 40
+BACKGROUND_COLOR = [0.5, 0.5, 0.5]
+# Cached. Doesn't change
+c2w_cond = sf3d_utils.default_cond_c2w(COND_DISTANCE)
+intrinsic, intrinsic_normed_cond = sf3d_utils.create_intrinsic_from_fov_deg(
+    COND_FOVY_DEG, COND_HEIGHT, COND_WIDTH
+)
+generated_files = []
+# Initialize device and SF3D model (like official app)
+device = sf3d_utils.get_device()
+# SF3D model - initialized at startup like official app
+# Token is automatically used after login() call above
+sf3d_model = SF3D.from_pretrained(
+    "stabilityai/stable-fast-3d",
+    config_name="config.yaml",
+    weight_name="model.safetensors",
+)
+sf3d_model.eval()
+sf3d_model = sf3d_model.to(device)
+# SDXL pipeline - lazy loaded to save memory
+sd_pipeline = None
+def initialize_sdxl():
+    """Initialize SDXL pipeline on first use."""
+    global sd_pipeline, device
+    if sd_pipeline is None:
+        print("Loading Stable Diffusion XL model...")
+        sd_pipeline = DiffusionPipeline.from_pretrained(
+            "stabilityai/stable-diffusion-xl-base-1.0",
+            torch_dtype=torch.float16 if device == "cuda" else torch.float32,
+            use_safetensors=True,
+            variant="fp16" if device == "cuda" else None,
+        )
+        if device == "cuda":
+            sd_pipeline = sd_pipeline.to(device)
+            # Enable memory efficient attention if available
+            try:
+                sd_pipeline.enable_xformers_memory_efficient_attention()
+            except:
+                pass
+        elif device == "mps":
+            sd_pipeline = sd_pipeline.to(device)
+        else:
+            sd_pipeline.enable_model_cpu_offload()
+        print("SDXL model loaded!")
+    return sd_pipeline
+@spaces.GPU()
+def generate_text_to_image(
+    prompt: str, negative_prompt: str = "", num_inference_steps: int = 30
+):
+    """Generate image from text prompt using SDXL."""
+    pipeline = initialize_sdxl()
+    print(f"Generating image from prompt: {prompt}")
+    # Generate image
+    with torch.no_grad():
+        if device == "cuda":
+            with torch.autocast(device_type="cuda", dtype=torch.float16):
+                image = pipeline(
+                    prompt=prompt,
+                    negative_prompt=negative_prompt if negative_prompt else None,
+                    num_inference_steps=num_inference_steps,
+                ).images[0]
+        else:
+            image = pipeline(
+                prompt=prompt,
+                negative_prompt=negative_prompt if negative_prompt else None,
+                num_inference_steps=num_inference_steps,
+            ).images[0]
+    return image
+@spaces.GPU()
+def remove_background_from_image(image: Image.Image) -> Image.Image:
+    """Remove background from image using rembg."""
+    print("Removing background...")
+    result = rembg.remove(image, session=rembg_session)
+    return result
+def create_batch(input_image: Image) -> dict[str, Any]:
+    """Create batch for SF3D model - matches official app structure."""
+    img_cond = (
+        torch.from_numpy(
+            np.asarray(input_image.resize((COND_WIDTH, COND_HEIGHT))).astype(np.float32)
+            / 255.0
+        )
+        .float()
+        .clip(0, 1)
+    )
+    mask_cond = img_cond[:, :, -1:]
+    rgb_cond = torch.lerp(
+        torch.tensor(BACKGROUND_COLOR)[None, None, :], img_cond[:, :, :3], mask_cond
+    )
+    batch_elem = {
+        "rgb_cond": rgb_cond,
+        "mask_cond": mask_cond,
+        "c2w_cond": c2w_cond.unsqueeze(0),
+        "intrinsic_cond": intrinsic.unsqueeze(0),
+        "intrinsic_normed_cond": intrinsic_normed_cond.unsqueeze(0),
+    }
+    # Add batch dim
+    batched = {k: v.unsqueeze(0) for k, v in batch_elem.items()}
+    return batched
+def run_model(input_image, remesh_option, vertex_count, texture_size):
+    """Run SF3D model - matches official app structure."""
+    start = time.time()
+    with torch.no_grad():
+        with (
+            torch.autocast(device_type=device, dtype=torch.bfloat16)
+            if "cuda" in device
+            else nullcontext()
+        ):
+            model_batch = create_batch(input_image)
+            model_batch = {k: v.to(device) for k, v in model_batch.items()}
+            trimesh_mesh, _glob_dict = sf3d_model.generate_mesh(
+                model_batch, texture_size, remesh_option.lower(), vertex_count
+            )
+            trimesh_mesh = trimesh_mesh[0]
+    # Create new tmp file
+    tmp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".glb")
+    trimesh_mesh.export(tmp_file.name, file_type="glb", include_normals=True)
+    generated_files.append(tmp_file.name)
+    print("Generation took:", time.time() - start, "s")
+    return tmp_file.name
+@spaces.GPU()
+def generate_3d_from_image(
+    input_image: Image.Image,
+    remesh_option: str = "none",
+    vertex_count: int = -1,
+    texture_size: int = 1024,
+) -> str:
+    """Generate 3D mesh from image using SF3D."""
+    # Resize foreground if needed (like official app)
+    foreground_ratio = 0.85
+    processed_image = sf3d_utils.resize_foreground(
+        input_image, foreground_ratio, out_size=(COND_WIDTH, COND_HEIGHT)
+    )
+    return run_model(processed_image, remesh_option, vertex_count, texture_size)
+@lru_cache
+def checkerboard(squares: int, size: int, min_value: float = 0.5):
+    """Create checkerboard pattern for transparency preview."""
+    base = np.zeros((squares, squares)) + min_value
+    base[1::2, ::2] = 1
+    base[::2, 1::2] = 1
+    repeat_mult = size // squares
+    return (
+        base.repeat(repeat_mult, axis=0)
+        .repeat(repeat_mult, axis=1)[:, :, None]
+        .repeat(3, axis=-1)
+    )
+def show_mask_preview(input_image: Image.Image) -> Image.Image:
+    """Show image with checkerboard background for transparency preview."""
+    img_numpy = np.array(input_image)
+    alpha = img_numpy[:, :, 3] / 255.0
+    chkb = checkerboard(32, 512) * 255
+    new_img = img_numpy[..., :3] * alpha[:, :, None] + chkb * (1 - alpha[:, :, None])
+    return Image.fromarray(new_img.astype(np.uint8), mode="RGB")
+# Gradio Interface Functions
+def step1_generate_image(prompt, negative_prompt, num_steps):
+    """Step 1: Generate image from text."""
+    if not prompt:
+        return None, gr.update(visible=False), "Please enter a prompt"
+    try:
+        image = generate_text_to_image(prompt, negative_prompt, num_steps)
+        return (
+            image,
+            gr.update(visible=True, value="Continue to Background Removal"),
+            "Image generated successfully! Review and continue to remove background.",
+        )
+    except Exception as e:
+        return None, gr.update(visible=False), f"Error generating image: {str(e)}"
+def step2_remove_background(image):
+    """Step 2: Remove background from image."""
+    if image is None:
+        return None, None, gr.update(visible=False), "Please generate an image first"
+    try:
+        # Convert to RGB if needed
+        if image.mode != "RGB":
+            image = image.convert("RGB")
+        bg_removed = remove_background_from_image(image)
+        preview = show_mask_preview(bg_removed)
+        return (
+            bg_removed,
+            preview,
+            gr.update(visible=True, value="Continue to 3D Generation"),
+            "Background removed successfully! Review and continue to generate 3D model.",
+        )
+    except Exception as e:
+        return (
+            None,
+            None,
+            gr.update(visible=False),
+            f"Error removing background: {str(e)}",
+        )
+def step3_generate_3d(image_with_bg_removed, remesh_option, vertex_count, texture_size):
+    """Step 3: Generate 3D model from image."""
+    if image_with_bg_removed is None:
+        return gr.update(value=None, visible=False), "Please remove background first"
+    try:
+        glb_file = generate_3d_from_image(
+            image_with_bg_removed, remesh_option, vertex_count, texture_size
+        )
+        return (
+            gr.update(value=glb_file, visible=True),
+            "3D model generated successfully! You can download it below.",
+        )
+    except Exception as e:
+        return (
+            gr.update(value=None, visible=False),
+            f"Error generating 3D model: {str(e)}",
+        )
+# Create Gradio Interface
+with gr.Blocks(title="Text to Image to 3D") as demo:
+    gr.Markdown(
+        """
+    # Text to Image to 3D Generation
+    This app allows you to generate 3D models from text prompts in three steps:
+    1. **Text to Image**: Generate an image using Stable Diffusion XL
+    2. **Background Removal**: Remove the background using rembg
+    3. **3D Generation**: Create a 3D mesh model using Stable Fast 3D
+    **Instructions:**
+    - Enter your text prompt and generate an image
+    - Review the generated image and continue to remove the background
+    - Review the background-removed image and continue to generate the 3D model
+    - Download your 3D model as a GLB file
+    """
+    )
+    with gr.Row():
+        with gr.Column(scale=1):
+            gr.Markdown("### Step 1: Text to Image")
+            prompt = gr.Textbox(
+                label="Prompt",
+                placeholder="A cute robot character, 3D render, colorful",
+                lines=2,
+            )
+            negative_prompt = gr.Textbox(
+                label="Negative Prompt (optional)",
+                placeholder="blurry, low quality, distorted",
+                lines=2,
+            )
+            num_steps = gr.Slider(
+                label="Number of Inference Steps",
+                minimum=20,
+                maximum=50,
+                value=30,
+                step=5,
+            )
+            generate_btn = gr.Button("Generate Image", variant="primary")
+            step1_status = gr.Textbox(label="Status", interactive=False)
+            step1_image = gr.Image(label="Generated Image", type="pil")
+            step1_continue_btn = gr.Button(
+                "Continue to Background Removal",
+                visible=False,
+                variant="secondary",
+            )
+        with gr.Column(scale=1):
+            gr.Markdown("### Step 2: Background Removal")
+            step2_image = gr.Image(label="Image with Background Removed", type="pil")
+            step2_preview = gr.Image(
+                label="Preview (with transparency)",
+                type="pil",
+                visible=False,
+            )
+            step2_status = gr.Textbox(label="Status", interactive=False)
+            step2_continue_btn = gr.Button(
+                "Continue to 3D Generation",
+                visible=False,
+                variant="secondary",
+            )
+        with gr.Column(scale=1):
+            gr.Markdown("### Step 3: 3D Generation")
+            remesh_option = gr.Radio(
+                choices=["none", "triangle", "quad"],
+                label="Remeshing Option",
+                value="none",
+            )
+            vertex_count = gr.Slider(
+                label="Target Vertex Count (-1 for auto)",
+                minimum=-1,
+                maximum=20000,
+                value=-1,
+                step=100,
+            )
+            texture_size = gr.Slider(
+                label="Texture Size",
+                minimum=512,
+                maximum=2048,
+                value=1024,
+                step=256,
+            )
+            step3_generate_btn = gr.Button("Generate 3D Model", variant="primary")
+            step3_status = gr.Textbox(label="Status", interactive=False)
+            step3_output = LitModel3D(
+                label="3D Model",
+                visible=False,
+                clear_color=[0.0, 0.0, 0.0, 0.0],
+            )
+    # State variables
+    step2_image_state = gr.State()
+    # Event handlers
+    generate_btn.click(
+        fn=step1_generate_image,
+        inputs=[prompt, negative_prompt, num_steps],
+        outputs=[step1_image, step1_continue_btn, step1_status],
+    )
+    step1_continue_btn.click(
+        fn=step2_remove_background,
+        inputs=[step1_image],
+        outputs=[step2_image, step2_preview, step2_continue_btn, step2_status],
+    ).then(
+        fn=lambda img: img,
+        inputs=[step2_image],
+        outputs=[step2_image_state],
+    )
+    step2_continue_btn.click(
+        fn=step3_generate_3d,
+        inputs=[step2_image_state, remesh_option, vertex_count, texture_size],
+        outputs=[step3_output, step3_status],
+    )
+    # Update preview when image changes
+    step2_image.change(
+        fn=show_mask_preview,
+        inputs=[step2_image],
+        outputs=[step2_preview],
+    ).then(
+        fn=lambda: gr.update(visible=True),
+        outputs=[step2_preview],
+    )
+if __name__ == "__main__":
+    # Delete previous gradio temp dir folder (like official app)
+    if os.path.exists(os.environ["GRADIO_TEMP_DIR"]):
+        print(f"Deleting {os.environ['GRADIO_TEMP_DIR']}")
+        import shutil
+        shutil.rmtree(os.environ["GRADIO_TEMP_DIR"])
+    demo.queue()
+    demo.launch(share=False)

load/tets/160_tets.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f4be37efc604d28d55a1a78c2aabefeeab7e63149f541aa45f9dd858ee35bb9
+size 15408790

requirements.txt ADDED Viewed

	@@ -0,0 +1,40 @@

+wheel
+setuptools==69.5.1
+# Core dependencies
+torch==2.5.1
+torchvision==0.20.1
+numpy==1.26.4
+Pillow>=9.5.0
+# Stable Diffusion XL
+diffusers>=0.21.0
+transformers==4.42.3
+accelerate>=0.20.0
+safetensors>=0.3.0
+invisible-watermark>=0.2.0
+# Background removal
+rembg[gpu]==2.0.57; sys_platform != 'darwin'
+rembg==2.0.57; sys_platform == 'darwin'
+# Stable Fast 3D dependencies
+einops==0.7.0
+jaxtyping==0.2.31
+omegaconf==2.3.0
+open_clip_torch==2.24.0
+trimesh==4.4.1
+huggingface-hub>=0.23.2,<1.0
+pynanoinstantmeshes==0.0.3
+gpytoolbox==0.2.0
+# Gradio and UI
+gradio==4.41.0
+gradio-litmodel3d==0.0.1
+# Additional utilities
+tqdm>=4.65.0
+# (HF hack) These are installed at runtime in gradio_app.py
+# ./texture_baker/
+# ./uv_unwrapper/

sf3d/models/camera.py ADDED Viewed

	@@ -0,0 +1,32 @@

+from dataclasses import dataclass, field
+from typing import List
+import torch
+import torch.nn as nn
+from sf3d.models.utils import BaseModule
+class LinearCameraEmbedder(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        in_channels: int = 25
+        out_channels: int = 768
+        conditions: List[str] = field(default_factory=list)
+    cfg: Config
+    def configure(self) -> None:
+        self.linear = nn.Linear(self.cfg.in_channels, self.cfg.out_channels)
+    def forward(self, **kwargs):
+        cond_tensors = []
+        for cond_name in self.cfg.conditions:
+            assert cond_name in kwargs
+            cond = kwargs[cond_name]
+            # cond in shape (B, Nv, ...)
+            cond_tensors.append(cond.view(*cond.shape[:2], -1))
+        cond_tensor = torch.cat(cond_tensors, dim=-1)
+        assert cond_tensor.shape[-1] == self.cfg.in_channels
+        embedding = self.linear(cond_tensor)
+        return embedding

sf3d/models/global_estimator/multi_head_estimator.py ADDED Viewed

	@@ -0,0 +1,118 @@

+from dataclasses import dataclass, field
+from typing import Any, List, Optional
+import torch.nn as nn
+from jaxtyping import Float
+from torch import Tensor
+from sf3d.models.network import get_activation
+from sf3d.models.utils import BaseModule
+@dataclass
+class HeadSpec:
+    name: str
+    out_channels: int
+    n_hidden_layers: int
+    output_activation: Optional[str] = None
+    output_bias: float = 0.0
+    add_to_decoder_features: bool = False
+    shape: Optional[list[int]] = None
+class MultiHeadEstimator(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        triplane_features: int = 1024
+        n_layers: int = 2
+        hidden_features: int = 512
+        activation: str = "relu"
+        pool: str = "max"
+        # Literal["mean", "max"] = "mean"  # noqa: F821
+        heads: List[HeadSpec] = field(default_factory=lambda: [])
+    cfg: Config
+    def configure(self):
+        layers = []
+        cur_features = self.cfg.triplane_features * 3
+        for _ in range(self.cfg.n_layers):
+            layers.append(
+                nn.Conv2d(
+                    cur_features,
+                    self.cfg.hidden_features,
+                    kernel_size=3,
+                    padding=0,
+                    stride=2,
+                )
+            )
+            layers.append(self.make_activation(self.cfg.activation))
+            cur_features = self.cfg.hidden_features
+        self.layers = nn.Sequential(*layers)
+        assert len(self.cfg.heads) > 0
+        heads = {}
+        for head in self.cfg.heads:
+            head_layers = []
+            for i in range(head.n_hidden_layers):
+                head_layers += [
+                    nn.Linear(
+                        self.cfg.hidden_features,
+                        self.cfg.hidden_features,
+                    ),
+                    self.make_activation(self.cfg.activation),
+                ]
+            head_layers += [
+                nn.Linear(
+                    self.cfg.hidden_features,
+                    head.out_channels,
+                ),
+            ]
+            heads[head.name] = nn.Sequential(*head_layers)
+        self.heads = nn.ModuleDict(heads)
+    def make_activation(self, activation):
+        if activation == "relu":
+            return nn.ReLU(inplace=True)
+        elif activation == "silu":
+            return nn.SiLU(inplace=True)
+        else:
+            raise NotImplementedError
+    def forward(
+        self,
+        triplane: Float[Tensor, "B 3 F Ht Wt"],
+    ) -> dict[str, Any]:
+        x = self.layers(
+            triplane.reshape(
+                triplane.shape[0], -1, triplane.shape[-2], triplane.shape[-1]
+            )
+        )
+        if self.cfg.pool == "max":
+            x = x.amax(dim=[-2, -1])
+        elif self.cfg.pool == "mean":
+            x = x.mean(dim=[-2, -1])
+        else:
+            raise NotImplementedError
+        out = {
+            ("decoder_" if head.add_to_decoder_features else "")
+            + head.name: get_activation(head.output_activation)(
+                self.heads[head.name](x) + head.output_bias
+            )
+            for head in self.cfg.heads
+        }
+        for head in self.cfg.heads:
+            if head.shape:
+                head_name = (
+                    "decoder_" if head.add_to_decoder_features else ""
+                ) + head.name
+                out[head_name] = out[head_name].reshape(*head.shape)
+        return out

sf3d/models/image_estimator/clip_based_estimator.py ADDED Viewed

	@@ -0,0 +1,168 @@

+from dataclasses import dataclass, field
+from typing import Any, List, Optional
+import open_clip
+import torch
+import torch.nn as nn
+from jaxtyping import Float
+from torch import Tensor
+from torchvision.transforms import Normalize
+from sf3d.models.network import get_activation
+from sf3d.models.utils import BaseModule
+@dataclass
+class HeadSpec:
+    name: str
+    out_channels: int
+    n_hidden_layers: int
+    output_activation: Optional[str] = None
+    output_bias: float = 0.0
+    add_to_decoder_features: bool = False
+    shape: Optional[list[int]] = None
+class ClipBasedHeadEstimator(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        model: str = "ViT-B-32"
+        pretrain: str = "laion2b_s34b_b79k"
+        distribution: str = "beta"
+        # ["mean", "mode", "sample", "sample_mean"]
+        distribution_eval: str = "mode"
+        activation: str = "relu"
+        hidden_features: int = 512
+        heads: List[HeadSpec] = field(default_factory=lambda: [])
+    cfg: Config
+    def configure(self):
+        self.model, _, self.preprocess = open_clip.create_model_and_transforms(
+            self.cfg.model, pretrained=self.cfg.pretrain
+        )
+        self.model.eval()
+        # Do not add the weights in self.model to the optimizer
+        for param in self.model.parameters():
+            param.requires_grad = False
+        assert len(self.cfg.heads) > 0
+        heads = {}
+        for head in self.cfg.heads:
+            head_layers = []
+            for i in range(head.n_hidden_layers):
+                head_layers += [
+                    nn.Linear(
+                        self.cfg.hidden_features,
+                        self.cfg.hidden_features,
+                    ),
+                    self.make_activation(self.cfg.activation),
+                ]
+            head_layers = [nn.Sequential(*head_layers)]
+            head_layers += [
+                nn.Sequential(
+                    nn.Linear(
+                        self.cfg.hidden_features,
+                        self.cfg.hidden_features,
+                    ),
+                    self.make_activation(self.cfg.activation),
+                    nn.Linear(self.cfg.hidden_features, 1),
+                )
+                for _ in range(2)
+            ]
+            heads[head.name] = nn.ModuleList(head_layers)
+        self.heads = nn.ModuleDict(heads)
+    def make_activation(self, activation):
+        if activation == "relu":
+            return nn.ReLU(inplace=True)
+        elif activation == "silu":
+            return nn.SiLU(inplace=True)
+        else:
+            raise NotImplementedError
+    def forward(
+        self,
+        cond_image: Float[Tensor, "B 1 H W 3"],
+        sample: bool = True,
+    ) -> dict[str, Any]:
+        # Run the model
+        # Resize cond_image to 224
+        cond_image = nn.functional.interpolate(
+            cond_image.flatten(0, 1).permute(0, 3, 1, 2).contiguous(),
+            size=(224, 224),
+            mode="bilinear",
+            align_corners=False,
+        )
+        cond_image = Normalize(
+            mean=open_clip.constants.OPENAI_DATASET_MEAN,
+            std=open_clip.constants.OPENAI_DATASET_STD,
+        )(cond_image)
+        image_features = self.model.encode_image(cond_image)
+        # Run the heads
+        outputs = {}
+        for head_dict in self.cfg.heads:
+            head_name = head_dict.name
+            shared_head, d1_h, d2_h = self.heads[head_name]
+            shared_features = shared_head(image_features)
+            d1, d2 = [head(shared_features).squeeze(-1) for head in [d1_h, d2_h]]
+            if self.cfg.distribution == "normal":
+                mean = d1
+                var = d2
+                if mean.shape[-1] == 1:
+                    outputs[head_name] = torch.distributions.Normal(
+                        mean + head_dict.output_bias,
+                        torch.nn.functional.softplus(var),
+                    )
+                else:
+                    outputs[head_name] = torch.distributions.MultivariateNormal(
+                        mean + head_dict.output_bias,
+                        torch.nn.functional.softplus(var).diag_embed(),
+                    )
+            elif self.cfg.distribution == "beta":
+                outputs[head_name] = torch.distributions.Beta(
+                    torch.nn.functional.softplus(d1 + head_dict.output_bias),
+                    torch.nn.functional.softplus(d2 + head_dict.output_bias),
+                )
+            else:
+                raise NotImplementedError
+        if sample:
+            for head_dict in self.cfg.heads:
+                head_name = head_dict.name
+                dist = outputs[head_name]
+                if self.cfg.distribution_eval == "mean":
+                    out = dist.mean
+                elif self.cfg.distribution_eval == "mode":
+                    out = dist.mode
+                elif self.cfg.distribution_eval == "sample_mean":
+                    out = dist.sample([10]).mean(-1)
+                else:
+                    # use rsample if gradient is needed
+                    out = dist.rsample() if self.training else dist.sample()
+                outputs[head_name] = get_activation(head_dict.output_activation)(out)
+                outputs[f"{head_name}_dist"] = dist
+        for head in self.cfg.heads:
+            if head.shape:
+                if not sample:
+                    raise ValueError(
+                        "Cannot reshape non-sampled probabilisitic outputs"
+                    )
+                outputs[head.name] = outputs[head.name].reshape(*head.shape)
+            if head.add_to_decoder_features:
+                outputs[f"decoder_{head.name}"] = outputs[head.name]
+                del outputs[head.name]
+        return outputs

sf3d/models/isosurface.py ADDED Viewed

	@@ -0,0 +1,229 @@

+from typing import Optional, Tuple
+import numpy as np
+import torch
+import torch.nn as nn
+from jaxtyping import Float, Integer
+from torch import Tensor
+from .mesh import Mesh
+class IsosurfaceHelper(nn.Module):
+    points_range: Tuple[float, float] = (0, 1)
+    @property
+    def grid_vertices(self) -> Float[Tensor, "N 3"]:
+        raise NotImplementedError
+    @property
+    def requires_instance_per_batch(self) -> bool:
+        return False
+class MarchingTetrahedraHelper(IsosurfaceHelper):
+    def __init__(self, resolution: int, tets_path: str):
+        super().__init__()
+        self.resolution = resolution
+        self.tets_path = tets_path
+        self.triangle_table: Float[Tensor, "..."]
+        self.register_buffer(
+            "triangle_table",
+            torch.as_tensor(
+                [
+                    [-1, -1, -1, -1, -1, -1],
+                    [1, 0, 2, -1, -1, -1],
+                    [4, 0, 3, -1, -1, -1],
+                    [1, 4, 2, 1, 3, 4],
+                    [3, 1, 5, -1, -1, -1],
+                    [2, 3, 0, 2, 5, 3],
+                    [1, 4, 0, 1, 5, 4],
+                    [4, 2, 5, -1, -1, -1],
+                    [4, 5, 2, -1, -1, -1],
+                    [4, 1, 0, 4, 5, 1],
+                    [3, 2, 0, 3, 5, 2],
+                    [1, 3, 5, -1, -1, -1],
+                    [4, 1, 2, 4, 3, 1],
+                    [3, 0, 4, -1, -1, -1],
+                    [2, 0, 1, -1, -1, -1],
+                    [-1, -1, -1, -1, -1, -1],
+                ],
+                dtype=torch.long,
+            ),
+            persistent=False,
+        )
+        self.num_triangles_table: Integer[Tensor, "..."]
+        self.register_buffer(
+            "num_triangles_table",
+            torch.as_tensor(
+                [0, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 1, 0], dtype=torch.long
+            ),
+            persistent=False,
+        )
+        self.base_tet_edges: Integer[Tensor, "..."]
+        self.register_buffer(
+            "base_tet_edges",
+            torch.as_tensor([0, 1, 0, 2, 0, 3, 1, 2, 1, 3, 2, 3], dtype=torch.long),
+            persistent=False,
+        )
+        tets = np.load(self.tets_path)
+        self._grid_vertices: Float[Tensor, "..."]
+        self.register_buffer(
+            "_grid_vertices",
+            torch.from_numpy(tets["vertices"]).float(),
+            persistent=False,
+        )
+        self.indices: Integer[Tensor, "..."]
+        self.register_buffer(
+            "indices", torch.from_numpy(tets["indices"]).long(), persistent=False
+        )
+        self._all_edges: Optional[Integer[Tensor, "Ne 2"]] = None
+        center_indices, boundary_indices = self.get_center_boundary_index(
+            self._grid_vertices
+        )
+        self.center_indices: Integer[Tensor, "..."]
+        self.register_buffer("center_indices", center_indices, persistent=False)
+        self.boundary_indices: Integer[Tensor, "..."]
+        self.register_buffer("boundary_indices", boundary_indices, persistent=False)
+    def get_center_boundary_index(self, verts):
+        magn = torch.sum(verts**2, dim=-1)
+        center_idx = torch.argmin(magn)
+        boundary_neg = verts == verts.max()
+        boundary_pos = verts == verts.min()
+        boundary = torch.bitwise_or(boundary_pos, boundary_neg)
+        boundary = torch.sum(boundary.float(), dim=-1)
+        boundary_idx = torch.nonzero(boundary)
+        return center_idx, boundary_idx.squeeze(dim=-1)
+    def normalize_grid_deformation(
+        self, grid_vertex_offsets: Float[Tensor, "Nv 3"]
+    ) -> Float[Tensor, "Nv 3"]:
+        return (
+            (self.points_range[1] - self.points_range[0])
+            / self.resolution  # half tet size is approximately 1 / self.resolution
+            * torch.tanh(grid_vertex_offsets)
+        )  # FIXME: hard-coded activation
+    @property
+    def grid_vertices(self) -> Float[Tensor, "Nv 3"]:
+        return self._grid_vertices
+    @property
+    def all_edges(self) -> Integer[Tensor, "Ne 2"]:
+        if self._all_edges is None:
+            # compute edges on GPU, or it would be VERY SLOW (basically due to the unique operation)
+            edges = torch.tensor(
+                [0, 1, 0, 2, 0, 3, 1, 2, 1, 3, 2, 3],
+                dtype=torch.long,
+                device=self.indices.device,
+            )
+            _all_edges = self.indices[:, edges].reshape(-1, 2)
+            _all_edges_sorted = torch.sort(_all_edges, dim=1)[0]
+            _all_edges = torch.unique(_all_edges_sorted, dim=0)
+            self._all_edges = _all_edges
+        return self._all_edges
+    def sort_edges(self, edges_ex2):
+        with torch.no_grad():
+            order = (edges_ex2[:, 0] > edges_ex2[:, 1]).long()
+            order = order.unsqueeze(dim=1)
+            a = torch.gather(input=edges_ex2, index=order, dim=1)
+            b = torch.gather(input=edges_ex2, index=1 - order, dim=1)
+        return torch.stack([a, b], -1)
+    def _forward(self, pos_nx3, sdf_n, tet_fx4):
+        with torch.no_grad():
+            occ_n = sdf_n > 0
+            occ_fx4 = occ_n[tet_fx4.reshape(-1)].reshape(-1, 4)
+            occ_sum = torch.sum(occ_fx4, -1)
+            valid_tets = (occ_sum > 0) & (occ_sum < 4)
+            occ_sum = occ_sum[valid_tets]
+            # find all vertices
+            all_edges = tet_fx4[valid_tets][:, self.base_tet_edges].reshape(-1, 2)
+            all_edges = self.sort_edges(all_edges)
+            unique_edges, idx_map = torch.unique(all_edges, dim=0, return_inverse=True)
+            unique_edges = unique_edges.long()
+            mask_edges = occ_n[unique_edges.reshape(-1)].reshape(-1, 2).sum(-1) == 1
+            mapping = (
+                torch.ones(
+                    (unique_edges.shape[0]), dtype=torch.long, device=pos_nx3.device
+                )
+                * -1
+            )
+            mapping[mask_edges] = torch.arange(
+                mask_edges.sum(), dtype=torch.long, device=pos_nx3.device
+            )
+            idx_map = mapping[idx_map]  # map edges to verts
+            interp_v = unique_edges[mask_edges]
+        edges_to_interp = pos_nx3[interp_v.reshape(-1)].reshape(-1, 2, 3)
+        edges_to_interp_sdf = sdf_n[interp_v.reshape(-1)].reshape(-1, 2, 1)
+        edges_to_interp_sdf[:, -1] *= -1
+        denominator = edges_to_interp_sdf.sum(1, keepdim=True)
+        edges_to_interp_sdf = torch.flip(edges_to_interp_sdf, [1]) / denominator
+        verts = (edges_to_interp * edges_to_interp_sdf).sum(1)
+        idx_map = idx_map.reshape(-1, 6)
+        v_id = torch.pow(2, torch.arange(4, dtype=torch.long, device=pos_nx3.device))
+        tetindex = (occ_fx4[valid_tets] * v_id.unsqueeze(0)).sum(-1)
+        num_triangles = self.num_triangles_table[tetindex]
+        # Generate triangle indices
+        faces = torch.cat(
+            (
+                torch.gather(
+                    input=idx_map[num_triangles == 1],
+                    dim=1,
+                    index=self.triangle_table[tetindex[num_triangles == 1]][:, :3],
+                ).reshape(-1, 3),
+                torch.gather(
+                    input=idx_map[num_triangles == 2],
+                    dim=1,
+                    index=self.triangle_table[tetindex[num_triangles == 2]][:, :6],
+                ).reshape(-1, 3),
+            ),
+            dim=0,
+        )
+        return verts, faces
+    def forward(
+        self,
+        level: Float[Tensor, "N3 1"],
+        deformation: Optional[Float[Tensor, "N3 3"]] = None,
+    ) -> Mesh:
+        if deformation is not None:
+            grid_vertices = self.grid_vertices + self.normalize_grid_deformation(
+                deformation
+            )
+        else:
+            grid_vertices = self.grid_vertices
+        v_pos, t_pos_idx = self._forward(grid_vertices, level, self.indices)
+        mesh = Mesh(
+            v_pos=v_pos,
+            t_pos_idx=t_pos_idx,
+            # extras
+            grid_vertices=grid_vertices,
+            tet_edges=self.all_edges,
+            grid_level=level,
+            grid_deformation=deformation,
+        )
+        return mesh

sf3d/models/mesh.py ADDED Viewed

	@@ -0,0 +1,289 @@

+from __future__ import annotations
+import math
+from typing import Any, Dict, Optional
+import gpytoolbox
+import numpy as np
+import pynanoinstantmeshes
+import torch
+import torch.nn.functional as F
+import trimesh
+from jaxtyping import Float, Integer
+from torch import Tensor
+from sf3d.models.utils import dot
+try:
+    from uv_unwrapper import Unwrapper
+except ImportError:
+    import logging
+    logging.warning(
+        "Could not import uv_unwrapper. Please install it via `pip install uv_unwrapper/`"
+    )
+    # Exit early to avoid further errors
+    raise ImportError("uv_unwrapper not found")
+class Mesh:
+    def __init__(
+        self, v_pos: Float[Tensor, "Nv 3"], t_pos_idx: Integer[Tensor, "Nf 3"], **kwargs
+    ) -> None:
+        self.v_pos: Float[Tensor, "Nv 3"] = v_pos
+        self.t_pos_idx: Integer[Tensor, "Nf 3"] = t_pos_idx
+        self._v_nrm: Optional[Float[Tensor, "Nv 3"]] = None
+        self._v_tng: Optional[Float[Tensor, "Nv 3"]] = None
+        self._v_tex: Optional[Float[Tensor, "Nt 3"]] = None
+        self._edges: Optional[Integer[Tensor, "Ne 2"]] = None
+        self.extras: Dict[str, Any] = {}
+        for k, v in kwargs.items():
+            self.add_extra(k, v)
+        self.unwrapper = Unwrapper()
+    def add_extra(self, k, v) -> None:
+        self.extras[k] = v
+    @property
+    def requires_grad(self):
+        return self.v_pos.requires_grad
+    @property
+    def v_nrm(self):
+        if self._v_nrm is None:
+            self._v_nrm = self._compute_vertex_normal()
+        return self._v_nrm
+    @property
+    def v_tng(self):
+        if self._v_tng is None:
+            self._v_tng = self._compute_vertex_tangent()
+        return self._v_tng
+    @property
+    def v_tex(self):
+        if self._v_tex is None:
+            self.unwrap_uv()
+        return self._v_tex
+    @property
+    def edges(self):
+        if self._edges is None:
+            self._edges = self._compute_edges()
+        return self._edges
+    def _compute_vertex_normal(self):
+        i0 = self.t_pos_idx[:, 0]
+        i1 = self.t_pos_idx[:, 1]
+        i2 = self.t_pos_idx[:, 2]
+        v0 = self.v_pos[i0, :]
+        v1 = self.v_pos[i1, :]
+        v2 = self.v_pos[i2, :]
+        face_normals = torch.cross(v1 - v0, v2 - v0, dim=-1)
+        # Splat face normals to vertices
+        v_nrm = torch.zeros_like(self.v_pos)
+        v_nrm.scatter_add_(0, i0[:, None].repeat(1, 3), face_normals)
+        v_nrm.scatter_add_(0, i1[:, None].repeat(1, 3), face_normals)
+        v_nrm.scatter_add_(0, i2[:, None].repeat(1, 3), face_normals)
+        # Normalize, replace zero (degenerated) normals with some default value
+        v_nrm = torch.where(
+            dot(v_nrm, v_nrm) > 1e-20, v_nrm, torch.as_tensor([0.0, 0.0, 1.0]).to(v_nrm)
+        )
+        v_nrm = F.normalize(v_nrm, dim=1)
+        if torch.is_anomaly_enabled():
+            assert torch.all(torch.isfinite(v_nrm))
+        return v_nrm
+    def _compute_vertex_tangent(self):
+        vn_idx = [None] * 3
+        pos = [None] * 3
+        tex = [None] * 3
+        for i in range(0, 3):
+            pos[i] = self.v_pos[self.t_pos_idx[:, i]]
+            tex[i] = self.v_tex[self.t_pos_idx[:, i]]
+            # t_nrm_idx is always the same as t_pos_idx
+            vn_idx[i] = self.t_pos_idx[:, i]
+        tangents = torch.zeros_like(self.v_nrm)
+        tansum = torch.zeros_like(self.v_nrm)
+        # Compute tangent space for each triangle
+        duv1 = tex[1] - tex[0]
+        duv2 = tex[2] - tex[0]
+        dpos1 = pos[1] - pos[0]
+        dpos2 = pos[2] - pos[0]
+        tng_nom = dpos1 * duv2[..., 1:2] - dpos2 * duv1[..., 1:2]
+        denom = duv1[..., 0:1] * duv2[..., 1:2] - duv1[..., 1:2] * duv2[..., 0:1]
+        # Avoid division by zero for degenerated texture coordinates
+        denom_safe = denom.clip(1e-6)
+        tang = tng_nom / denom_safe
+        # Update all 3 vertices
+        for i in range(0, 3):
+            idx = vn_idx[i][:, None].repeat(1, 3)
+            tangents.scatter_add_(0, idx, tang)  # tangents[n_i] = tangents[n_i] + tang
+            tansum.scatter_add_(
+                0, idx, torch.ones_like(tang)
+            )  # tansum[n_i] = tansum[n_i] + 1
+        # Also normalize it. Here we do not normalize the individual triangles first so larger area
+        # triangles influence the tangent space more
+        tangents = tangents / tansum
+        # Normalize and make sure tangent is perpendicular to normal
+        tangents = F.normalize(tangents, dim=1)
+        tangents = F.normalize(tangents - dot(tangents, self.v_nrm) * self.v_nrm)
+        if torch.is_anomaly_enabled():
+            assert torch.all(torch.isfinite(tangents))
+        return tangents
+    def quad_remesh(
+        self,
+        quad_vertex_count: int = -1,
+        quad_rosy: int = 4,
+        quad_crease_angle: float = -1.0,
+        quad_smooth_iter: int = 2,
+        quad_align_to_boundaries: bool = False,
+    ) -> Mesh:
+        if quad_vertex_count < 0:
+            quad_vertex_count = self.v_pos.shape[0]
+        v_pos = self.v_pos.detach().cpu().numpy().astype(np.float32)
+        t_pos_idx = self.t_pos_idx.detach().cpu().numpy().astype(np.uint32)
+        new_vert, new_faces = pynanoinstantmeshes.remesh(
+            v_pos,
+            t_pos_idx,
+            quad_vertex_count // 4,
+            rosy=quad_rosy,
+            posy=4,
+            creaseAngle=quad_crease_angle,
+            align_to_boundaries=quad_align_to_boundaries,
+            smooth_iter=quad_smooth_iter,
+            deterministic=False,
+        )
+        # Briefly load in trimesh
+        mesh = trimesh.Trimesh(vertices=new_vert, faces=new_faces.astype(np.int32))
+        v_pos = torch.from_numpy(mesh.vertices).to(self.v_pos).contiguous()
+        t_pos_idx = torch.from_numpy(mesh.faces).to(self.t_pos_idx).contiguous()
+        # Create new mesh
+        return Mesh(v_pos, t_pos_idx)
+    def triangle_remesh(
+        self,
+        triangle_average_edge_length_multiplier: Optional[float] = None,
+        triangle_remesh_steps: int = 10,
+        triangle_vertex_count=-1,
+    ):
+        if triangle_vertex_count > 0:
+            reduction = triangle_vertex_count / self.v_pos.shape[0]
+            print("Triangle reduction:", reduction)
+            v_pos = self.v_pos.detach().cpu().numpy().astype(np.float32)
+            t_pos_idx = self.t_pos_idx.detach().cpu().numpy().astype(np.int32)
+            if reduction > 1.0:
+                subdivide_iters = int(math.ceil(math.log(reduction) / math.log(2)))
+                print("Subdivide iters:", subdivide_iters)
+                v_pos, t_pos_idx = gpytoolbox.subdivide(
+                    v_pos,
+                    t_pos_idx,
+                    iters=subdivide_iters,
+                )
+                reduction = triangle_vertex_count / v_pos.shape[0]
+            # Simplify
+            points_out, faces_out, _, _ = gpytoolbox.decimate(
+                v_pos,
+                t_pos_idx,
+                face_ratio=reduction,
+            )
+            # Convert back to torch
+            self.v_pos = torch.from_numpy(points_out).to(self.v_pos)
+            self.t_pos_idx = torch.from_numpy(faces_out).to(self.t_pos_idx)
+            self._edges = None
+            triangle_average_edge_length_multiplier = None
+        edges = self.edges
+        if triangle_average_edge_length_multiplier is None:
+            h = None
+        else:
+            h = float(
+                torch.linalg.norm(
+                    self.v_pos[edges[:, 0]] - self.v_pos[edges[:, 1]], dim=1
+                )
+                .mean()
+                .item()
+                * triangle_average_edge_length_multiplier
+            )
+        # Convert to numpy
+        v_pos = self.v_pos.detach().cpu().numpy().astype(np.float64)
+        t_pos_idx = self.t_pos_idx.detach().cpu().numpy().astype(np.int32)
+        # Remesh
+        v_remesh, f_remesh = gpytoolbox.remesh_botsch(
+            v_pos,
+            t_pos_idx,
+            triangle_remesh_steps,
+            h,
+        )
+        # Convert back to torch
+        v_pos = torch.from_numpy(v_remesh).to(self.v_pos).contiguous()
+        t_pos_idx = torch.from_numpy(f_remesh).to(self.t_pos_idx).contiguous()
+        # Create new mesh
+        return Mesh(v_pos, t_pos_idx)
+    @torch.no_grad()
+    def unwrap_uv(
+        self,
+        island_padding: float = 0.02,
+    ) -> Mesh:
+        uv, indices = self.unwrapper(
+            self.v_pos, self.v_nrm, self.t_pos_idx, island_padding
+        )
+        # Do store per vertex UVs.
+        # This means we need to duplicate some vertices at the seams
+        individual_vertices = self.v_pos[self.t_pos_idx].reshape(-1, 3)
+        individual_faces = torch.arange(
+            individual_vertices.shape[0],
+            device=individual_vertices.device,
+            dtype=self.t_pos_idx.dtype,
+        ).reshape(-1, 3)
+        uv_flat = uv[indices].reshape((-1, 2))
+        # uv_flat[:, 1] = 1 - uv_flat[:, 1]
+        self.v_pos = individual_vertices
+        self.t_pos_idx = individual_faces
+        self._v_tex = uv_flat
+        self._v_nrm = self._compute_vertex_normal()
+        self._v_tng = self._compute_vertex_tangent()
+    def _compute_edges(self):
+        # Compute edges
+        edges = torch.cat(
+            [
+                self.t_pos_idx[:, [0, 1]],
+                self.t_pos_idx[:, [1, 2]],
+                self.t_pos_idx[:, [2, 0]],
+            ],
+            dim=0,
+        )
+        edges = edges.sort()[0]
+        edges = torch.unique(edges, dim=0)
+        return edges

sf3d/models/network.py ADDED Viewed

	@@ -0,0 +1,213 @@

+from dataclasses import dataclass, field
+from typing import Callable, List, Optional
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from einops import rearrange
+from jaxtyping import Float
+from torch import Tensor
+from torch.amp import custom_bwd, custom_fwd
+from torch.autograd import Function
+from sf3d.models.utils import BaseModule, normalize
+from sf3d.utils import get_device
+def conditional_decorator(decorator_with_args, condition, *args, **kwargs):
+    def wrapper(fn):
+        if condition:
+            if len(kwargs) == 0:
+                return decorator_with_args
+            return decorator_with_args(*args, **kwargs)(fn)
+        else:
+            return fn
+    return wrapper
+class PixelShuffleUpsampleNetwork(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        in_channels: int = 1024
+        out_channels: int = 40
+        scale_factor: int = 4
+        conv_layers: int = 4
+        conv_kernel_size: int = 3
+    cfg: Config
+    def configure(self) -> None:
+        layers = []
+        output_channels = self.cfg.out_channels * self.cfg.scale_factor**2
+        in_channels = self.cfg.in_channels
+        for i in range(self.cfg.conv_layers):
+            cur_out_channels = (
+                in_channels if i != self.cfg.conv_layers - 1 else output_channels
+            )
+            layers.append(
+                nn.Conv2d(
+                    in_channels,
+                    cur_out_channels,
+                    self.cfg.conv_kernel_size,
+                    padding=(self.cfg.conv_kernel_size - 1) // 2,
+                )
+            )
+            if i != self.cfg.conv_layers - 1:
+                layers.append(nn.ReLU(inplace=True))
+        layers.append(nn.PixelShuffle(self.cfg.scale_factor))
+        self.upsample = nn.Sequential(*layers)
+    def forward(
+        self, triplanes: Float[Tensor, "B 3 Ci Hp Wp"]
+    ) -> Float[Tensor, "B 3 Co Hp2 Wp2"]:
+        return rearrange(
+            self.upsample(
+                rearrange(triplanes, "B Np Ci Hp Wp -> (B Np) Ci Hp Wp", Np=3)
+            ),
+            "(B Np) Co Hp Wp -> B Np Co Hp Wp",
+            Np=3,
+        )
+class _TruncExp(Function):  # pylint: disable=abstract-method
+    # Implementation from torch-ngp:
+    # https://github.com/ashawkey/torch-ngp/blob/93b08a0d4ec1cc6e69d85df7f0acdfb99603b628/activation.py
+    @staticmethod
+    @conditional_decorator(
+        custom_fwd,
+        "cuda" in get_device(),
+        cast_inputs=torch.float32,
+        device_type="cuda",
+    )
+    def forward(ctx, x):  # pylint: disable=arguments-differ
+        ctx.save_for_backward(x)
+        return torch.exp(x)
+    @staticmethod
+    @conditional_decorator(custom_bwd, "cuda" in get_device())
+    def backward(ctx, g):  # pylint: disable=arguments-differ
+        x = ctx.saved_tensors[0]
+        return g * torch.exp(torch.clamp(x, max=15))
+trunc_exp = _TruncExp.apply
+def get_activation(name) -> Callable:
+    if name is None:
+        return lambda x: x
+    name = name.lower()
+    if name == "none" or name == "linear" or name == "identity":
+        return lambda x: x
+    elif name == "lin2srgb":
+        return lambda x: torch.where(
+            x > 0.0031308,
+            torch.pow(torch.clamp(x, min=0.0031308), 1.0 / 2.4) * 1.055 - 0.055,
+            12.92 * x,
+        ).clamp(0.0, 1.0)
+    elif name == "exp":
+        return lambda x: torch.exp(x)
+    elif name == "shifted_exp":
+        return lambda x: torch.exp(x - 1.0)
+    elif name == "trunc_exp":
+        return trunc_exp
+    elif name == "shifted_trunc_exp":
+        return lambda x: trunc_exp(x - 1.0)
+    elif name == "sigmoid":
+        return lambda x: torch.sigmoid(x)
+    elif name == "tanh":
+        return lambda x: torch.tanh(x)
+    elif name == "shifted_softplus":
+        return lambda x: F.softplus(x - 1.0)
+    elif name == "scale_-11_01":
+        return lambda x: x * 0.5 + 0.5
+    elif name == "negative":
+        return lambda x: -x
+    elif name == "normalize_channel_last":
+        return lambda x: normalize(x)
+    elif name == "normalize_channel_first":
+        return lambda x: normalize(x, dim=1)
+    else:
+        try:
+            return getattr(F, name)
+        except AttributeError:
+            raise ValueError(f"Unknown activation function: {name}")
+@dataclass
+class HeadSpec:
+    name: str
+    out_channels: int
+    n_hidden_layers: int
+    output_activation: Optional[str] = None
+    out_bias: float = 0.0
+class MaterialMLP(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        in_channels: int = 120
+        n_neurons: int = 64
+        activation: str = "silu"
+        heads: List[HeadSpec] = field(default_factory=lambda: [])
+    cfg: Config
+    def configure(self) -> None:
+        assert len(self.cfg.heads) > 0
+        heads = {}
+        for head in self.cfg.heads:
+            head_layers = []
+            for i in range(head.n_hidden_layers):
+                head_layers += [
+                    nn.Linear(
+                        self.cfg.in_channels if i == 0 else self.cfg.n_neurons,
+                        self.cfg.n_neurons,
+                    ),
+                    self.make_activation(self.cfg.activation),
+                ]
+            head_layers += [
+                nn.Linear(
+                    self.cfg.n_neurons,
+                    head.out_channels,
+                ),
+            ]
+            heads[head.name] = nn.Sequential(*head_layers)
+        self.heads = nn.ModuleDict(heads)
+    def make_activation(self, activation):
+        if activation == "relu":
+            return nn.ReLU(inplace=True)
+        elif activation == "silu":
+            return nn.SiLU(inplace=True)
+        else:
+            raise NotImplementedError
+    def keys(self):
+        return self.heads.keys()
+    def forward(
+        self, x, include: Optional[List] = None, exclude: Optional[List] = None
+    ):
+        if include is not None and exclude is not None:
+            raise ValueError("Cannot specify both include and exclude.")
+        if include is not None:
+            heads = [h for h in self.cfg.heads if h.name in include]
+        elif exclude is not None:
+            heads = [h for h in self.cfg.heads if h.name not in exclude]
+        else:
+            heads = self.cfg.heads
+        out = {
+            head.name: get_activation(head.output_activation)(
+                self.heads[head.name](x) + head.out_bias
+            )
+            for head in heads
+        }
+        return out

sf3d/models/tokenizers/dinov2.py ADDED Viewed

	@@ -0,0 +1,1196 @@

+# coding=utf-8
+# Copyright 2023 Meta AI and The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""PyTorch DINOv2 model."""
+import collections.abc
+import math
+from dataclasses import dataclass
+from typing import Dict, List, Optional, Set, Tuple, Union
+import torch
+import torch.nn.functional as F
+import torch.utils.checkpoint
+from torch import nn
+from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+from transformers.activations import ACT2FN
+from transformers.modeling_outputs import (
+    BackboneOutput,
+    BaseModelOutput,
+    BaseModelOutputWithPooling,
+    ImageClassifierOutput,
+)
+from transformers.modeling_utils import PreTrainedModel
+from transformers.models.dinov2.configuration_dinov2 import Dinov2Config
+from transformers.pytorch_utils import (
+    find_pruneable_heads_and_indices,
+    prune_linear_layer,
+)
+from transformers.utils import (
+    add_code_sample_docstrings,
+    add_start_docstrings,
+    add_start_docstrings_to_model_forward,
+    logging,
+    replace_return_docstrings,
+)
+from transformers.utils.backbone_utils import BackboneMixin
+logger = logging.get_logger(__name__)
+# General docstring
+_CONFIG_FOR_DOC = "Dinov2Config"
+# Base docstring
+_CHECKPOINT_FOR_DOC = "facebook/dinov2-base"
+_EXPECTED_OUTPUT_SHAPE = [1, 257, 768]
+# Image classification docstring
+_IMAGE_CLASS_CHECKPOINT = "facebook/dinov2-base"
+DINOV2_PRETRAINED_MODEL_ARCHIVE_LIST = [
+    "facebook/dinov2-base",
+    # See all DINOv2 models at https://huggingface.co/models?filter=dinov2
+]
+class Dinov2Embeddings(nn.Module):
+    """
+    Construct the CLS token, mask token, position and patch embeddings.
+    """
+    def __init__(self, config: Dinov2Config) -> None:
+        super().__init__()
+        self.cls_token = nn.Parameter(torch.randn(1, 1, config.hidden_size))
+        # register as mask token as it's not used in optimization
+        # to avoid the use of find_unused_parameters_true
+        # self.mask_token = nn.Parameter(torch.zeros(1, config.hidden_size))
+        self.register_buffer("mask_token", torch.zeros(1, config.hidden_size))
+        self.patch_embeddings = Dinov2PatchEmbeddings(config)
+        num_patches = self.patch_embeddings.num_patches
+        self.position_embeddings = nn.Parameter(
+            torch.randn(1, num_patches + 1, config.hidden_size)
+        )
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        self.config = config
+    def interpolate_pos_encoding(
+        self, embeddings: torch.Tensor, height: int, width: int
+    ) -> torch.Tensor:
+        """
+        This method allows to interpolate the pre-trained position encodings, to be able to use the model on higher
+        resolution images.
+        Source:
+        https://github.com/facebookresearch/dino/blob/de9ee3df6cf39fac952ab558447af1fa1365362a/vision_transformer.py#L174
+        """
+        num_patches = embeddings.shape[1] - 1
+        num_positions = self.position_embeddings.shape[1] - 1
+        if num_patches == num_positions and height == width:
+            return self.position_embeddings
+        class_pos_embed = self.position_embeddings[:, 0]
+        patch_pos_embed = self.position_embeddings[:, 1:]
+        dim = embeddings.shape[-1]
+        height = height // self.config.patch_size
+        width = width // self.config.patch_size
+        # we add a small number to avoid floating point error in the interpolation
+        # see discussion at https://github.com/facebookresearch/dino/issues/8
+        height, width = height + 0.1, width + 0.1
+        patch_pos_embed = patch_pos_embed.reshape(
+            1, int(math.sqrt(num_positions)), int(math.sqrt(num_positions)), dim
+        )
+        patch_pos_embed = patch_pos_embed.permute(0, 3, 1, 2)
+        patch_pos_embed = nn.functional.interpolate(
+            patch_pos_embed,
+            scale_factor=(
+                height / math.sqrt(num_positions),
+                width / math.sqrt(num_positions),
+            ),
+            mode="bicubic",
+            align_corners=False,
+        )
+        if (
+            int(height) != patch_pos_embed.shape[-2]
+            or int(width) != patch_pos_embed.shape[-1]
+        ):
+            raise ValueError(
+                "Width or height does not match with the interpolated position embeddings"
+            )
+        patch_pos_embed = patch_pos_embed.permute(0, 2, 3, 1).view(1, -1, dim)
+        return torch.cat((class_pos_embed.unsqueeze(0), patch_pos_embed), dim=1)
+    def forward(
+        self,
+        pixel_values: torch.Tensor,
+        bool_masked_pos: Optional[torch.Tensor] = None,
+    ) -> torch.Tensor:
+        batch_size, _, height, width = pixel_values.shape
+        patch_embeddings = self.patch_embeddings(pixel_values)
+        embeddings = patch_embeddings
+        if bool_masked_pos is not None:
+            embeddings = torch.where(
+                bool_masked_pos.unsqueeze(-1),
+                self.mask_token.to(embeddings.dtype).unsqueeze(0),
+                embeddings,
+            )
+        # add the [CLS] token to the embedded patch tokens
+        cls_tokens = self.cls_token.expand(batch_size, -1, -1)
+        embeddings = torch.cat((cls_tokens, embeddings), dim=1)
+        # add positional encoding to each token
+        embeddings = embeddings + self.interpolate_pos_encoding(
+            embeddings, height, width
+        )
+        embeddings = self.dropout(embeddings)
+        return embeddings
+class Dinov2PatchEmbeddings(nn.Module):
+    """
+    This class turns `pixel_values` of shape `(batch_size, num_channels, height, width)` into the initial
+    `hidden_states` (patch embeddings) of shape `(batch_size, seq_length, hidden_size)` to be consumed by a
+    Transformer.
+    """
+    def __init__(self, config):
+        super().__init__()
+        image_size, patch_size = config.image_size, config.patch_size
+        num_channels, hidden_size = config.num_channels, config.hidden_size
+        image_size = (
+            image_size
+            if isinstance(image_size, collections.abc.Iterable)
+            else (image_size, image_size)
+        )
+        patch_size = (
+            patch_size
+            if isinstance(patch_size, collections.abc.Iterable)
+            else (patch_size, patch_size)
+        )
+        num_patches = (image_size[1] // patch_size[1]) * (
+            image_size[0] // patch_size[0]
+        )
+        self.image_size = image_size
+        self.patch_size = patch_size
+        self.num_channels = num_channels
+        self.num_patches = num_patches
+        self.projection = nn.Conv2d(
+            num_channels, hidden_size, kernel_size=patch_size, stride=patch_size
+        )
+    def forward(self, pixel_values: torch.Tensor) -> torch.Tensor:
+        """
+        num_channels = pixel_values.shape[1]
+        if num_channels != self.num_channels:
+            raise ValueError(
+                "Make sure that the channel dimension of the pixel values match with the one set in the configuration."
+                f" Expected {self.num_channels} but got {num_channels}."
+            )
+        """
+        embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2)
+        return embeddings
+# Copied from transformers.models.vit.modeling_vit.ViTSelfAttention with ViT->Dinov2
+class Dinov2SelfAttention(nn.Module):
+    def __init__(self, config: Dinov2Config) -> None:
+        super().__init__()
+        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(
+            config, "embedding_size"
+        ):
+            raise ValueError(
+                f"The hidden size {config.hidden_size,} is not a multiple of the number of attention "
+                f"heads {config.num_attention_heads}."
+            )
+        self.num_attention_heads = config.num_attention_heads
+        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
+        self.all_head_size = self.num_attention_heads * self.attention_head_size
+        self.attention_probs_dropout_prob = config.attention_probs_dropout_prob
+        self.query = nn.Linear(
+            config.hidden_size, self.all_head_size, bias=config.qkv_bias
+        )
+        self.key = nn.Linear(
+            config.hidden_size, self.all_head_size, bias=config.qkv_bias
+        )
+        self.value = nn.Linear(
+            config.hidden_size, self.all_head_size, bias=config.qkv_bias
+        )
+        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
+    def transpose_for_scores(self, x: torch.Tensor) -> torch.Tensor:
+        new_x_shape = x.size()[:-1] + (
+            self.num_attention_heads,
+            self.attention_head_size,
+        )
+        x = x.view(new_x_shape)
+        return x.permute(0, 2, 1, 3)
+    def forward(
+        self,
+        hidden_states,
+        head_mask: Optional[torch.Tensor] = None,
+        output_attentions: bool = False,
+    ) -> Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor]]:
+        mixed_query_layer = self.query(hidden_states)
+        if hasattr(F, "scaled_dot_product_attention"):
+            assert head_mask is None and not output_attentions
+            new_size = hidden_states.size()[:-1] + (
+                self.num_attention_heads,
+                self.attention_head_size,
+            )
+            key_layer = self.key(hidden_states).reshape(new_size).transpose(1, 2)
+            value_layer = self.value(hidden_states).reshape(new_size).transpose(1, 2)
+            query_layer = mixed_query_layer.reshape(new_size).transpose(1, 2)
+            context_layer = F.scaled_dot_product_attention(
+                query_layer,
+                key_layer,
+                value_layer,
+                dropout_p=self.attention_probs_dropout_prob,
+                is_causal=False,
+            )
+            context_layer = context_layer.transpose(1, 2).reshape(
+                *hidden_states.size()[:-1], -1
+            )
+        else:
+            key_layer = self.transpose_for_scores(self.key(hidden_states))
+            value_layer = self.transpose_for_scores(self.value(hidden_states))
+            query_layer = self.transpose_for_scores(mixed_query_layer)
+            # Take the dot product between "query" and "key" to get the raw attention scores.
+            attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
+            attention_scores = attention_scores / math.sqrt(self.attention_head_size)
+            # Normalize the attention scores to probabilities.
+            attention_probs = nn.functional.softmax(attention_scores, dim=-1)
+            # This is actually dropping out entire tokens to attend to, which might
+            # seem a bit unusual, but is taken from the original Transformer paper.
+            attention_probs = self.dropout(attention_probs)
+            # Mask heads if we want to
+            if head_mask is not None:
+                attention_probs = attention_probs * head_mask
+            context_layer = torch.matmul(attention_probs, value_layer)
+            context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
+            new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
+            context_layer = context_layer.view(new_context_layer_shape)
+        outputs = (
+            (context_layer, attention_probs) if output_attentions else (context_layer,)
+        )
+        return outputs
+# Copied from transformers.models.vit.modeling_vit.ViTSelfOutput with ViT->Dinov2
+class Dinov2SelfOutput(nn.Module):
+    """
+    The residual connection is defined in Dinov2Layer instead of here (as is the case with other models), due to the
+    layernorm applied before each block.
+    """
+    def __init__(self, config: Dinov2Config) -> None:
+        super().__init__()
+        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+    def forward(
+        self, hidden_states: torch.Tensor, input_tensor: torch.Tensor
+    ) -> torch.Tensor:
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.dropout(hidden_states)
+        return hidden_states
+# Copied from transformers.models.vit.modeling_vit.ViTAttention with ViT->Dinov2
+class Dinov2Attention(nn.Module):
+    def __init__(self, config: Dinov2Config) -> None:
+        super().__init__()
+        self.attention = Dinov2SelfAttention(config)
+        self.output = Dinov2SelfOutput(config)
+        self.pruned_heads = set()
+    def prune_heads(self, heads: Set[int]) -> None:
+        if len(heads) == 0:
+            return
+        heads, index = find_pruneable_heads_and_indices(
+            heads,
+            self.attention.num_attention_heads,
+            self.attention.attention_head_size,
+            self.pruned_heads,
+        )
+        # Prune linear layers
+        self.attention.query = prune_linear_layer(self.attention.query, index)
+        self.attention.key = prune_linear_layer(self.attention.key, index)
+        self.attention.value = prune_linear_layer(self.attention.value, index)
+        self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)
+        # Update hyper params and store pruned heads
+        self.attention.num_attention_heads = self.attention.num_attention_heads - len(
+            heads
+        )
+        self.attention.all_head_size = (
+            self.attention.attention_head_size * self.attention.num_attention_heads
+        )
+        self.pruned_heads = self.pruned_heads.union(heads)
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        head_mask: Optional[torch.Tensor] = None,
+        output_attentions: bool = False,
+    ) -> Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor]]:
+        self_outputs = self.attention(hidden_states, head_mask, output_attentions)
+        attention_output = self.output(self_outputs[0], hidden_states)
+        outputs = (attention_output,) + self_outputs[
+            1:
+        ]  # add attentions if we output them
+        return outputs
+class Dinov2LayerScale(nn.Module):
+    def __init__(self, config) -> None:
+        super().__init__()
+        self.lambda1 = nn.Parameter(
+            config.layerscale_value * torch.ones(config.hidden_size)
+        )
+    def forward(self, hidden_state: torch.Tensor) -> torch.Tensor:
+        return hidden_state * self.lambda1
+# Copied from transformers.models.beit.modeling_beit.drop_path
+def drop_path(
+    input: torch.Tensor, drop_prob: float = 0.0, training: bool = False
+) -> torch.Tensor:
+    """
+    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
+    Comment by Ross Wightman: This is the same as the DropConnect impl I created for EfficientNet, etc networks,
+    however, the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
+    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for changing the
+    layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 'survival rate' as the
+    argument.
+    """
+    if drop_prob == 0.0 or not training:
+        return input
+    keep_prob = 1 - drop_prob
+    shape = (input.shape[0],) + (1,) * (
+        input.ndim - 1
+    )  # work with diff dim tensors, not just 2D ConvNets
+    random_tensor = keep_prob + torch.rand(
+        shape, dtype=input.dtype, device=input.device
+    )
+    random_tensor.floor_()  # binarize
+    output = input.div(keep_prob) * random_tensor
+    return output
+# Copied from transformers.models.beit.modeling_beit.BeitDropPath
+class Dinov2DropPath(nn.Module):
+    """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks)."""
+    def __init__(self, drop_prob: Optional[float] = None) -> None:
+        super().__init__()
+        self.drop_prob = drop_prob
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        return drop_path(hidden_states, self.drop_prob, self.training)
+    def extra_repr(self) -> str:
+        return "p={}".format(self.drop_prob)
+class Dinov2MLP(nn.Module):
+    def __init__(self, config) -> None:
+        super().__init__()
+        in_features = out_features = config.hidden_size
+        hidden_features = int(config.hidden_size * config.mlp_ratio)
+        self.fc1 = nn.Linear(in_features, hidden_features, bias=True)
+        if isinstance(config.hidden_act, str):
+            self.activation = ACT2FN[config.hidden_act]
+        else:
+            self.activation = config.hidden_act
+        self.fc2 = nn.Linear(hidden_features, out_features, bias=True)
+    def forward(self, hidden_state: torch.Tensor) -> torch.Tensor:
+        hidden_state = self.fc1(hidden_state)
+        hidden_state = self.activation(hidden_state)
+        hidden_state = self.fc2(hidden_state)
+        return hidden_state
+class Dinov2SwiGLUFFN(nn.Module):
+    def __init__(self, config) -> None:
+        super().__init__()
+        in_features = out_features = config.hidden_size
+        hidden_features = int(config.hidden_size * config.mlp_ratio)
+        hidden_features = (int(hidden_features * 2 / 3) + 7) // 8 * 8
+        self.weights_in = nn.Linear(in_features, 2 * hidden_features, bias=True)
+        self.weights_out = nn.Linear(hidden_features, out_features, bias=True)
+    def forward(self, hidden_state: torch.Tensor) -> torch.Tensor:
+        hidden_state = self.weights_in(hidden_state)
+        x1, x2 = hidden_state.chunk(2, dim=-1)
+        hidden = nn.functional.silu(x1) * x2
+        return self.weights_out(hidden)
+class Dinov2Layer(nn.Module):
+    """This corresponds to the Block class in the original implementation."""
+    def __init__(self, config: Dinov2Config) -> None:
+        super().__init__()
+        self.norm1 = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        self.norm1_modulation = None
+        self.attention = Dinov2Attention(config)
+        self.layer_scale1 = Dinov2LayerScale(config)
+        self.drop_path1 = (
+            Dinov2DropPath(config.drop_path_rate)
+            if config.drop_path_rate > 0.0
+            else nn.Identity()
+        )
+        self.norm2 = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        self.norm2_modulation = None
+        if config.use_swiglu_ffn:
+            self.mlp = Dinov2SwiGLUFFN(config)
+        else:
+            self.mlp = Dinov2MLP(config)
+        self.layer_scale2 = Dinov2LayerScale(config)
+        self.drop_path2 = (
+            Dinov2DropPath(config.drop_path_rate)
+            if config.drop_path_rate > 0.0
+            else nn.Identity()
+        )
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        head_mask: Optional[torch.Tensor] = None,
+        modulation_cond: Optional[torch.Tensor] = None,
+        output_attentions: bool = False,
+    ) -> Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor]]:
+        hidden_states_norm = self.norm1(hidden_states)
+        if self.norm1_modulation is not None:
+            assert modulation_cond is not None
+            hidden_states_norm = self.norm1_modulation(
+                hidden_states_norm, modulation_cond
+            )
+        self_attention_outputs = self.attention(
+            hidden_states_norm,  # in Dinov2, layernorm is applied before self-attention
+            head_mask,
+            output_attentions=output_attentions,
+        )
+        attention_output = self_attention_outputs[0]
+        attention_output = self.layer_scale1(attention_output)
+        outputs = self_attention_outputs[
+            1:
+        ]  # add self attentions if we output attention weights
+        # first residual connection
+        hidden_states = attention_output + hidden_states
+        # in Dinov2, layernorm is also applied after self-attention
+        layer_output = self.norm2(hidden_states)
+        if self.norm2_modulation is not None:
+            assert modulation_cond is not None
+            layer_output = self.norm2_modulation(layer_output, modulation_cond)
+        layer_output = self.mlp(layer_output)
+        layer_output = self.layer_scale2(layer_output)
+        # second residual connection
+        layer_output = layer_output + hidden_states
+        outputs = (layer_output,) + outputs
+        return outputs
+    def register_ada_norm_modulation(self, norm1_mod: nn.Module, norm2_mod: nn.Module):
+        self.norm1_modulation = norm1_mod
+        self.norm2_modulation = norm2_mod
+# Copied from transformers.models.vit.modeling_vit.ViTEncoder with ViT->Dinov2
+class Dinov2Encoder(nn.Module):
+    def __init__(self, config: Dinov2Config) -> None:
+        super().__init__()
+        self.config = config
+        self.layer = nn.ModuleList(
+            [Dinov2Layer(config) for _ in range(config.num_hidden_layers)]
+        )
+        self.gradient_checkpointing = False
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        head_mask: Optional[torch.Tensor] = None,
+        modulation_cond: Optional[torch.Tensor] = None,
+        output_attentions: bool = False,
+        output_hidden_states: bool = False,
+        return_dict: bool = True,
+    ) -> Union[tuple, BaseModelOutput]:
+        all_hidden_states = () if output_hidden_states else None
+        all_self_attentions = () if output_attentions else None
+        for i, layer_module in enumerate(self.layer):
+            if output_hidden_states:
+                all_hidden_states = all_hidden_states + (hidden_states,)
+            layer_head_mask = head_mask[i] if head_mask is not None else None
+            if self.gradient_checkpointing and self.training:
+                def create_custom_forward(module):
+                    def custom_forward(*inputs):
+                        return module(*inputs, output_attentions)
+                    return custom_forward
+                layer_outputs = torch.utils.checkpoint.checkpoint(
+                    create_custom_forward(layer_module),
+                    hidden_states,
+                    layer_head_mask,
+                    modulation_cond,
+                    use_reentrant=False,
+                )
+            else:
+                layer_outputs = layer_module(
+                    hidden_states, layer_head_mask, modulation_cond, output_attentions
+                )
+            hidden_states = layer_outputs[0]
+            if output_attentions:
+                all_self_attentions = all_self_attentions + (layer_outputs[1],)
+        if output_hidden_states:
+            all_hidden_states = all_hidden_states + (hidden_states,)
+        if not return_dict:
+            return tuple(
+                v
+                for v in [hidden_states, all_hidden_states, all_self_attentions]
+                if v is not None
+            )
+        return BaseModelOutput(
+            last_hidden_state=hidden_states,
+            hidden_states=all_hidden_states,
+            attentions=all_self_attentions,
+        )
+class Dinov2PreTrainedModel(PreTrainedModel):
+    """
+    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
+    models.
+    """
+    config_class = Dinov2Config
+    base_model_prefix = "dinov2"
+    main_input_name = "pixel_values"
+    supports_gradient_checkpointing = True
+    def _init_weights(self, module: Union[nn.Linear, nn.Conv2d, nn.LayerNorm]) -> None:
+        """Initialize the weights"""
+        if isinstance(module, (nn.Linear, nn.Conv2d)):
+            # Upcast the input in `fp32` and cast it back to desired `dtype` to avoid
+            # `trunc_normal_cpu` not implemented in `half` issues
+            module.weight.data = nn.init.trunc_normal_(
+                module.weight.data.to(torch.float32),
+                mean=0.0,
+                std=self.config.initializer_range,
+            ).to(module.weight.dtype)
+            if module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+        elif isinstance(module, Dinov2Embeddings):
+            module.position_embeddings.data = nn.init.trunc_normal_(
+                module.position_embeddings.data.to(torch.float32),
+                mean=0.0,
+                std=self.config.initializer_range,
+            ).to(module.position_embeddings.dtype)
+            module.cls_token.data = nn.init.trunc_normal_(
+                module.cls_token.data.to(torch.float32),
+                mean=0.0,
+                std=self.config.initializer_range,
+            ).to(module.cls_token.dtype)
+    def _set_gradient_checkpointing(
+        self, module: Dinov2Encoder, value: bool = False
+    ) -> None:
+        if isinstance(module, Dinov2Encoder):
+            module.gradient_checkpointing = value
+DINOV2_START_DOCSTRING = r"""
+    This model is a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. Use it
+    as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and
+    behavior.
+    Parameters:
+        config ([`Dinov2Config`]): Model configuration class with all the parameters of the model.
+            Initializing with a config file does not load the weights associated with the model, only the
+            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
+"""
+DINOV2_BASE_INPUTS_DOCSTRING = r"""
+    Args:
+        pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
+            Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
+            [`BitImageProcessor.preprocess`] for details.
+        bool_masked_pos (`torch.BoolTensor` of shape `(batch_size, sequence_length)`):
+            Boolean masked positions. Indicates which patches are masked (1) and which aren't (0). Only relevant for
+            pre-training.
+        head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
+            Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
+            - 1 indicates the head is **not masked**,
+            - 0 indicates the head is **masked**.
+        output_attentions (`bool`, *optional*):
+            Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
+            tensors for more detail.
+        output_hidden_states (`bool`, *optional*):
+            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
+            more detail.
+        return_dict (`bool`, *optional*):
+            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
+"""
+DINOV2_INPUTS_DOCSTRING = r"""
+    Args:
+        pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
+            Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
+            [`BitImageProcessor.preprocess`] for details.
+        head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
+            Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
+            - 1 indicates the head is **not masked**,
+            - 0 indicates the head is **masked**.
+        output_attentions (`bool`, *optional*):
+            Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
+            tensors for more detail.
+        output_hidden_states (`bool`, *optional*):
+            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
+            more detail.
+        return_dict (`bool`, *optional*):
+            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
+"""
+@dataclass
+class CustomBaseModelOutputWithPooling(BaseModelOutputWithPooling):
+    patch_embeddings: Optional[torch.FloatTensor] = None
+@add_start_docstrings(
+    "The bare DINOv2 Model transformer outputting raw hidden-states without any specific head on top.",
+    DINOV2_START_DOCSTRING,
+)
+class Dinov2Model(Dinov2PreTrainedModel):
+    def __init__(self, config: Dinov2Config):
+        super().__init__(config)
+        self.config = config
+        self.embeddings = Dinov2Embeddings(config)
+        self.encoder = Dinov2Encoder(config)
+        self.layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_input_embeddings(self) -> Dinov2PatchEmbeddings:
+        return self.embeddings.patch_embeddings
+    def expand_input_channels(self, extra_input_channels: int) -> None:
+        if extra_input_channels == 0:
+            return
+        conv_old = self.embeddings.patch_embeddings.projection
+        conv_new = nn.Conv2d(
+            self.config.num_channels + extra_input_channels,
+            self.config.hidden_size,
+            kernel_size=self.config.patch_size,
+            stride=self.config.patch_size,
+        ).to(self.device)
+        with torch.no_grad():
+            conv_new.weight[:, :3] = conv_old.weight
+            conv_new.bias = conv_old.bias
+        self.embeddings.patch_embeddings.projection = conv_new
+        del conv_old
+    def _prune_heads(self, heads_to_prune: Dict[int, List[int]]) -> None:
+        """
+        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
+        class PreTrainedModel
+        """
+        for layer, heads in heads_to_prune.items():
+            self.encoder.layer[layer].attention.prune_heads(heads)
+    @add_start_docstrings_to_model_forward(DINOV2_BASE_INPUTS_DOCSTRING)
+    @add_code_sample_docstrings(
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=BaseModelOutputWithPooling,
+        config_class=_CONFIG_FOR_DOC,
+        modality="vision",
+        expected_output=_EXPECTED_OUTPUT_SHAPE,
+    )
+    def forward(
+        self,
+        pixel_values: Optional[torch.Tensor] = None,
+        bool_masked_pos: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        modulation_cond: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[Tuple, BaseModelOutputWithPooling]:
+        output_attentions = (
+            output_attentions
+            if output_attentions is not None
+            else self.config.output_attentions
+        )
+        output_hidden_states = (
+            output_hidden_states
+            if output_hidden_states is not None
+            else self.config.output_hidden_states
+        )
+        return_dict = (
+            return_dict if return_dict is not None else self.config.use_return_dict
+        )
+        if pixel_values is None:
+            raise ValueError("You have to specify pixel_values")
+        # Prepare head mask if needed
+        # 1.0 in head_mask indicate we keep the head
+        # attention_probs has shape bsz x n_heads x N x N
+        # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
+        # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
+        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
+        embedding_output = self.embeddings(
+            pixel_values, bool_masked_pos=bool_masked_pos
+        )
+        encoder_outputs = self.encoder(
+            embedding_output,
+            head_mask=head_mask,
+            modulation_cond=modulation_cond,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        sequence_output = encoder_outputs[0]
+        sequence_output = self.layernorm(sequence_output)
+        pooled_output = sequence_output[:, 0, :]
+        if not return_dict:
+            head_outputs = (sequence_output, pooled_output)
+            return head_outputs + encoder_outputs[1:]
+        return CustomBaseModelOutputWithPooling(
+            last_hidden_state=sequence_output,
+            pooler_output=pooled_output,
+            hidden_states=encoder_outputs.hidden_states,
+            attentions=encoder_outputs.attentions,
+            patch_embeddings=embedding_output,
+        )
+    def set_gradient_checkpointing(self, value: bool = False) -> None:
+        self._set_gradient_checkpointing(self.encoder, value)
+@add_start_docstrings(
+    """
+    Dinov2 Model transformer with an image classification head on top (a linear layer on top of the final hidden state
+    of the [CLS] token) e.g. for ImageNet.
+    """,
+    DINOV2_START_DOCSTRING,
+)
+class Dinov2ForImageClassification(Dinov2PreTrainedModel):
+    def __init__(self, config: Dinov2Config) -> None:
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.dinov2 = Dinov2Model(config)
+        # Classifier head
+        self.classifier = (
+            nn.Linear(config.hidden_size * 2, config.num_labels)
+            if config.num_labels > 0
+            else nn.Identity()
+        )
+        # Initialize weights and apply final processing
+        self.post_init()
+    @add_start_docstrings_to_model_forward(DINOV2_INPUTS_DOCSTRING)
+    @add_code_sample_docstrings(
+        checkpoint=_IMAGE_CLASS_CHECKPOINT,
+        output_type=ImageClassifierOutput,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(
+        self,
+        pixel_values: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[tuple, ImageClassifierOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
+            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
+            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
+        """
+        return_dict = (
+            return_dict if return_dict is not None else self.config.use_return_dict
+        )
+        outputs = self.dinov2(
+            pixel_values,
+            head_mask=head_mask,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        sequence_output = outputs[0]  # batch_size, sequence_length, hidden_size
+        cls_token = sequence_output[:, 0]
+        patch_tokens = sequence_output[:, 1:]
+        linear_input = torch.cat([cls_token, patch_tokens.mean(dim=1)], dim=1)
+        logits = self.classifier(linear_input)
+        loss = None
+        if labels is not None:
+            # move labels to correct device to enable model parallelism
+            labels = labels.to(logits.device)
+            if self.config.problem_type is None:
+                if self.num_labels == 1:
+                    self.config.problem_type = "regression"
+                elif self.num_labels > 1 and (
+                    labels.dtype == torch.long or labels.dtype == torch.int
+                ):
+                    self.config.problem_type = "single_label_classification"
+                else:
+                    self.config.problem_type = "multi_label_classification"
+            if self.config.problem_type == "regression":
+                loss_fct = MSELoss()
+                if self.num_labels == 1:
+                    loss = loss_fct(logits.squeeze(), labels.squeeze())
+                else:
+                    loss = loss_fct(logits, labels)
+            elif self.config.problem_type == "single_label_classification":
+                loss_fct = CrossEntropyLoss()
+                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+            elif self.config.problem_type == "multi_label_classification":
+                loss_fct = BCEWithLogitsLoss()
+                loss = loss_fct(logits, labels)
+        if not return_dict:
+            output = (logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+        return ImageClassifierOutput(
+            loss=loss,
+            logits=logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+@add_start_docstrings(
+    """
+    Dinov2 backbone, to be used with frameworks like DETR and MaskFormer.
+    """,
+    DINOV2_START_DOCSTRING,
+)
+class Dinov2Backbone(Dinov2PreTrainedModel, BackboneMixin):
+    def __init__(self, config):
+        super().__init__(config)
+        super()._init_backbone(config)
+        self.num_features = [
+            config.hidden_size for _ in range(config.num_hidden_layers + 1)
+        ]
+        self.embeddings = Dinov2Embeddings(config)
+        self.encoder = Dinov2Encoder(config)
+        self.layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_input_embeddings(self) -> Dinov2PatchEmbeddings:
+        return self.embeddings.patch_embeddings
+    @add_start_docstrings_to_model_forward(DINOV2_INPUTS_DOCSTRING)
+    @replace_return_docstrings(output_type=BackboneOutput, config_class=_CONFIG_FOR_DOC)
+    def forward(
+        self,
+        pixel_values: torch.Tensor,
+        output_hidden_states: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> BackboneOutput:
+        """
+        Returns:
+        Examples:
+        ```python
+        >>> from transformers import AutoImageProcessor, AutoBackbone
+        >>> import torch
+        >>> from PIL import Image
+        >>> import requests
+        >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+        >>> image = Image.open(requests.get(url, stream=True).raw)
+        >>> processor = AutoImageProcessor.from_pretrained("facebook/dinov2-base")
+        >>> model = AutoBackbone.from_pretrained(
+        ...     "facebook/dinov2-base", out_features=["stage2", "stage5", "stage8", "stage11"]
+        ... )
+        >>> inputs = processor(image, return_tensors="pt")
+        >>> outputs = model(**inputs)
+        >>> feature_maps = outputs.feature_maps
+        >>> list(feature_maps[-1].shape)
+        [1, 768, 16, 16]
+        ```"""
+        return_dict = (
+            return_dict if return_dict is not None else self.config.use_return_dict
+        )
+        output_hidden_states = (
+            output_hidden_states
+            if output_hidden_states is not None
+            else self.config.output_hidden_states
+        )
+        output_attentions = (
+            output_attentions
+            if output_attentions is not None
+            else self.config.output_attentions
+        )
+        embedding_output = self.embeddings(pixel_values)
+        outputs = self.encoder(
+            embedding_output,
+            output_hidden_states=True,
+            output_attentions=output_attentions,
+            return_dict=return_dict,
+        )
+        hidden_states = outputs.hidden_states if return_dict else outputs[1]
+        feature_maps = ()
+        for stage, hidden_state in zip(self.stage_names, hidden_states):
+            if stage in self.out_features:
+                if self.config.apply_layernorm:
+                    hidden_state = self.layernorm(hidden_state)
+                if self.config.reshape_hidden_states:
+                    batch_size, _, height, width = pixel_values.shape
+                    patch_size = self.config.patch_size
+                    hidden_state = hidden_state[:, 1:, :].reshape(
+                        batch_size, width // patch_size, height // patch_size, -1
+                    )
+                    hidden_state = hidden_state.permute(0, 3, 1, 2).contiguous()
+                feature_maps += (hidden_state,)
+        if not return_dict:
+            if output_hidden_states:
+                output = (feature_maps,) + outputs[1:]
+            else:
+                output = (feature_maps,) + outputs[2:]
+            return output
+        return BackboneOutput(
+            feature_maps=feature_maps,
+            hidden_states=outputs.hidden_states if output_hidden_states else None,
+            attentions=outputs.attentions if output_attentions else None,
+        )
+class CustomPatchEmbeddings(nn.Module):
+    """
+    This class turns `pixel_values` of shape `(batch_size, num_channels, height, width)` into the initial
+    `hidden_states` (patch embeddings) of shape `(batch_size, seq_length, hidden_size)` to be consumed by a
+    Transformer.
+    """
+    def __init__(
+        self, image_size: int, patch_size: int, num_channels: int, hidden_size: int
+    ):
+        super().__init__()
+        image_size = (
+            image_size
+            if isinstance(image_size, collections.abc.Iterable)
+            else (image_size, image_size)
+        )
+        patch_size = (
+            patch_size
+            if isinstance(patch_size, collections.abc.Iterable)
+            else (patch_size, patch_size)
+        )
+        num_patches = (image_size[1] // patch_size[1]) * (
+            image_size[0] // patch_size[0]
+        )
+        self.image_size = image_size
+        self.patch_size = patch_size
+        self.num_channels = num_channels
+        self.num_patches = num_patches
+        self.projection = nn.Conv2d(
+            num_channels, hidden_size, kernel_size=patch_size, stride=patch_size
+        )
+    def forward(self, pixel_values: torch.Tensor) -> torch.Tensor:
+        num_channels = pixel_values.shape[1]
+        if num_channels != self.num_channels:
+            raise ValueError(
+                "Make sure that the channel dimension of the pixel values match with the one set in the configuration."
+                f" Expected {self.num_channels} but got {num_channels}."
+            )
+        embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2)
+        return embeddings
+class CustomEmbeddings(nn.Module):
+    """
+    Construct the CLS token, mask token, position and patch embeddings.
+    """
+    def __init__(
+        self, image_size: int, patch_size: int, num_channels: int, hidden_size: int
+    ) -> None:
+        super().__init__()
+        self.image_size = image_size
+        self.patch_size = patch_size
+        self.num_channels = num_channels
+        self.hidden_size = hidden_size
+        self.cls_token = nn.Parameter(torch.randn(1, 1, self.hidden_size))
+        self.patch_embeddings = CustomPatchEmbeddings(
+            image_size, patch_size, num_channels, hidden_size
+        )
+        num_patches = self.patch_embeddings.num_patches
+        self.position_embeddings = nn.Parameter(
+            torch.randn(1, num_patches + 1, self.hidden_size)
+        )
+    def interpolate_pos_encoding(
+        self, embeddings: torch.Tensor, height: int, width: int
+    ) -> torch.Tensor:
+        """
+        This method allows to interpolate the pre-trained position encodings, to be able to use the model on higher
+        resolution images.
+        Source:
+        https://github.com/facebookresearch/dino/blob/de9ee3df6cf39fac952ab558447af1fa1365362a/vision_transformer.py#L174
+        """
+        num_patches = embeddings.shape[1] - 1
+        num_positions = self.position_embeddings.shape[1] - 1
+        if num_patches == num_positions and height == width:
+            return self.position_embeddings
+        class_pos_embed = self.position_embeddings[:, 0]
+        patch_pos_embed = self.position_embeddings[:, 1:]
+        dim = embeddings.shape[-1]
+        height = height // self.patch_size
+        width = width // self.patch_size
+        # we add a small number to avoid floating point error in the interpolation
+        # see discussion at https://github.com/facebookresearch/dino/issues/8
+        height, width = height + 0.1, width + 0.1
+        patch_pos_embed = patch_pos_embed.reshape(
+            1, int(math.sqrt(num_positions)), int(math.sqrt(num_positions)), dim
+        )
+        patch_pos_embed = patch_pos_embed.permute(0, 3, 1, 2)
+        patch_pos_embed = nn.functional.interpolate(
+            patch_pos_embed,
+            scale_factor=(
+                height / math.sqrt(num_positions),
+                width / math.sqrt(num_positions),
+            ),
+            mode="bicubic",
+            align_corners=False,
+        )
+        if (
+            int(height) != patch_pos_embed.shape[-2]
+            or int(width) != patch_pos_embed.shape[-1]
+        ):
+            raise ValueError(
+                "Width or height does not match with the interpolated position embeddings"
+            )
+        patch_pos_embed = patch_pos_embed.permute(0, 2, 3, 1).view(1, -1, dim)
+        return torch.cat((class_pos_embed.unsqueeze(0), patch_pos_embed), dim=1)
+    def forward(
+        self,
+        pixel_values: torch.Tensor,
+    ) -> torch.Tensor:
+        batch_size, _, height, width = pixel_values.shape
+        patch_embeddings = self.patch_embeddings(pixel_values)
+        embeddings = patch_embeddings
+        # add the [CLS] token to the embedded patch tokens
+        cls_tokens = self.cls_token.expand(batch_size, -1, -1)
+        embeddings = torch.cat((cls_tokens, embeddings), dim=1)
+        # add positional encoding to each token
+        embeddings = embeddings + self.interpolate_pos_encoding(
+            embeddings, height, width
+        )
+        return embeddings

sf3d/models/tokenizers/image.py ADDED Viewed

	@@ -0,0 +1,101 @@

+from dataclasses import dataclass
+from typing import Optional
+import torch
+import torch.nn as nn
+from einops import rearrange
+from jaxtyping import Float
+from torch import Tensor
+from sf3d.models.tokenizers.dinov2 import Dinov2Model
+from sf3d.models.transformers.attention import Modulation
+from sf3d.models.utils import BaseModule
+class DINOV2SingleImageTokenizer(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        pretrained_model_name_or_path: str = "facebook/dinov2-large"
+        width: int = 512
+        height: int = 512
+        modulation_cond_dim: int = 768
+    cfg: Config
+    def configure(self) -> None:
+        self.model = Dinov2Model.from_pretrained(self.cfg.pretrained_model_name_or_path)
+        for p in self.model.parameters():
+            p.requires_grad_(False)
+        self.model.eval()
+        self.model.set_gradient_checkpointing(False)
+        # add modulation
+        modulations = []
+        for layer in self.model.encoder.layer:
+            norm1_modulation = Modulation(
+                self.model.config.hidden_size,
+                self.cfg.modulation_cond_dim,
+                zero_init=True,
+                single_layer=True,
+            )
+            norm2_modulation = Modulation(
+                self.model.config.hidden_size,
+                self.cfg.modulation_cond_dim,
+                zero_init=True,
+                single_layer=True,
+            )
+            layer.register_ada_norm_modulation(norm1_modulation, norm2_modulation)
+            modulations += [norm1_modulation, norm2_modulation]
+        self.modulations = nn.ModuleList(modulations)
+        self.register_buffer(
+            "image_mean",
+            torch.as_tensor([0.485, 0.456, 0.406]).reshape(1, 1, 3, 1, 1),
+            persistent=False,
+        )
+        self.register_buffer(
+            "image_std",
+            torch.as_tensor([0.229, 0.224, 0.225]).reshape(1, 1, 3, 1, 1),
+            persistent=False,
+        )
+    def forward(
+        self,
+        images: Float[Tensor, "B *N C H W"],
+        modulation_cond: Optional[Float[Tensor, "B *N Cc"]],
+        **kwargs,
+    ) -> Float[Tensor, "B *N Ct Nt"]:
+        model = self.model
+        packed = False
+        if images.ndim == 4:
+            packed = True
+            images = images.unsqueeze(1)
+            if modulation_cond is not None:
+                assert modulation_cond.ndim == 2
+                modulation_cond = modulation_cond.unsqueeze(1)
+        batch_size, n_input_views = images.shape[:2]
+        images = (images - self.image_mean) / self.image_std
+        out = model(
+            rearrange(images, "B N C H W -> (B N) C H W"),
+            modulation_cond=(
+                rearrange(modulation_cond, "B N Cc -> (B N) Cc")
+                if modulation_cond is not None
+                else None
+            ),
+        )
+        local_features = out.last_hidden_state
+        local_features = local_features.permute(0, 2, 1)
+        local_features = rearrange(
+            local_features, "(B N) Ct Nt -> B N Ct Nt", B=batch_size
+        )
+        if packed:
+            local_features = local_features.squeeze(1)
+        return local_features
+    def detokenize(self, *args, **kwargs):
+        raise NotImplementedError

sf3d/models/tokenizers/triplane.py ADDED Viewed

	@@ -0,0 +1,49 @@

+import math
+from dataclasses import dataclass
+import torch
+import torch.nn as nn
+from einops import rearrange, repeat
+from jaxtyping import Float
+from torch import Tensor
+from sf3d.models.utils import BaseModule
+class TriplaneLearnablePositionalEmbedding(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        plane_size: int = 96
+        num_channels: int = 1024
+    cfg: Config
+    def configure(self) -> None:
+        self.embeddings = nn.Parameter(
+            torch.randn(
+                (3, self.cfg.num_channels, self.cfg.plane_size, self.cfg.plane_size),
+                dtype=torch.float32,
+            )
+            * 1
+            / math.sqrt(self.cfg.num_channels)
+        )
+    def forward(self, batch_size: int) -> Float[Tensor, "B Ct Nt"]:
+        return rearrange(
+            repeat(self.embeddings, "Np Ct Hp Wp -> B Np Ct Hp Wp", B=batch_size),
+            "B Np Ct Hp Wp -> B Ct (Np Hp Wp)",
+        )
+    def detokenize(
+        self, tokens: Float[Tensor, "B Ct Nt"]
+    ) -> Float[Tensor, "B 3 Ct Hp Wp"]:
+        batch_size, Ct, Nt = tokens.shape
+        assert Nt == self.cfg.plane_size**2 * 3
+        assert Ct == self.cfg.num_channels
+        return rearrange(
+            tokens,
+            "B Ct (Np Hp Wp) -> B Np Ct Hp Wp",
+            Np=3,
+            Hp=self.cfg.plane_size,
+            Wp=self.cfg.plane_size,
+        )

sf3d/models/transformers/attention.py ADDED Viewed

	@@ -0,0 +1,31 @@

+import torch
+import torch.nn as nn
+class Modulation(nn.Module):
+    def __init__(
+        self,
+        embedding_dim: int,
+        condition_dim: int,
+        zero_init: bool = False,
+        single_layer: bool = False,
+    ):
+        super().__init__()
+        self.silu = nn.SiLU()
+        if single_layer:
+            self.linear1 = nn.Identity()
+        else:
+            self.linear1 = nn.Linear(condition_dim, condition_dim)
+        self.linear2 = nn.Linear(condition_dim, embedding_dim * 2)
+        # Only zero init the last linear layer
+        if zero_init:
+            nn.init.zeros_(self.linear2.weight)
+            nn.init.zeros_(self.linear2.bias)
+    def forward(self, x: torch.Tensor, condition: torch.Tensor) -> torch.Tensor:
+        emb = self.linear2(self.silu(self.linear1(condition)))
+        scale, shift = torch.chunk(emb, 2, dim=1)
+        x = x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)
+        return x

sf3d/models/transformers/backbone.py ADDED Viewed

	@@ -0,0 +1,515 @@

+from dataclasses import dataclass
+from typing import Optional
+import torch
+import torch.nn.functional as F
+from torch import nn
+from sf3d.models.utils import BaseModule
+class GEGLU(nn.Module):
+    r"""
+    A variant of the gated linear unit activation function from https://arxiv.org/abs/2002.05202.
+    Parameters:
+        dim_in (`int`): The number of channels in the input.
+        dim_out (`int`): The number of channels in the output.
+    """
+    def __init__(self, dim_in: int, dim_out: int):
+        super().__init__()
+        self.proj = nn.Linear(dim_in, dim_out * 2)
+    def gelu(self, gate: torch.Tensor) -> torch.Tensor:
+        if gate.device.type != "mps":
+            return F.gelu(gate)
+        # mps: gelu is not implemented for float16
+        return F.gelu(gate.to(dtype=torch.float32)).to(dtype=gate.dtype)
+    def forward(self, hidden_states, scale: float = 1.0):
+        args = ()
+        hidden_states, gate = self.proj(hidden_states, *args).chunk(2, dim=-1)
+        return hidden_states * self.gelu(gate)
+class CrossAttention(nn.Module):
+    def __init__(
+        self,
+        dim,
+        kv_dim=None,
+        num_heads=16,
+        qkv_bias=False,
+        attn_drop=0.0,
+        proj_drop=0.0,
+    ):
+        super().__init__()
+        self.num_heads = num_heads
+        head_dim = dim // num_heads
+        self.scale = head_dim**-0.5
+        kv_dim = dim if not kv_dim else kv_dim
+        self.wq = nn.Linear(dim, dim, bias=qkv_bias)
+        self.wk = nn.Linear(kv_dim, dim, bias=qkv_bias)
+        self.wv = nn.Linear(kv_dim, dim, bias=qkv_bias)
+        self.attn_drop = attn_drop
+        self.proj = nn.Linear(dim, dim)
+        self.proj_drop = nn.Dropout(proj_drop)
+    def forward(self, x_q, x_kv):
+        B, N_q, C = x_q.shape
+        B, N_kv, _ = x_kv.shape
+        # [B, N_q, C] -> [B, N_q, H, C/H]
+        q = self.wq(x_q).reshape(B, N_q, self.num_heads, C // self.num_heads)
+        # [B, N_kv, C] -> [B, N_kv, H, C/H]
+        k = self.wk(x_kv).reshape(B, N_kv, self.num_heads, C // self.num_heads)
+        v = self.wv(x_kv).reshape(B, N_kv, self.num_heads, C // self.num_heads)
+        #  attention
+        x = torch.nn.functional.scaled_dot_product_attention(
+            q.permute(0, 2, 1, 3),
+            k.permute(0, 2, 1, 3),
+            v.permute(0, 2, 1, 3),
+            attn_mask=None,
+            dropout_p=self.attn_drop,
+            scale=self.scale,
+        ).permute(0, 2, 1, 3)
+        # [B, N_q, H, C/H] -> [B, N_q, C]
+        x = x.reshape(B, N_q, C)
+        x = self.proj(x)
+        x = self.proj_drop(x)
+        return x
+class FeedForward(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        dim_out: Optional[int] = None,
+        mult: int = 4,
+        dropout: float = 0.0,
+    ):
+        super().__init__()
+        inner_dim = int(dim * mult)
+        dim_out = dim_out if dim_out is not None else dim
+        act_fn = GEGLU(dim, inner_dim)
+        self.net = nn.ModuleList([])
+        self.net.append(act_fn)
+        self.net.append(nn.Dropout(dropout))
+        self.net.append(nn.Linear(inner_dim, dim_out))
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        for module in self.net:
+            x = module(x)
+        return x
+class BasicBlock(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        kv_dim: Optional[int] = None,
+        num_heads: int = 16,
+        qkv_bias: bool = False,
+        attn_drop: float = 0.0,
+        proj_drop: float = 0.0,
+        ff_drop: float = 0.0,
+    ):
+        super().__init__()
+        self.norm1 = nn.LayerNorm(dim)
+        self.attn1 = CrossAttention(
+            dim,
+            kv_dim=dim,
+            num_heads=num_heads,
+            qkv_bias=qkv_bias,
+            attn_drop=attn_drop,
+            proj_drop=proj_drop,
+        )
+        self.norm2 = nn.LayerNorm(dim)
+        self.attn2 = CrossAttention(
+            dim,
+            kv_dim=kv_dim,
+            num_heads=num_heads,
+            qkv_bias=qkv_bias,
+            attn_drop=attn_drop,
+            proj_drop=proj_drop,
+        )
+        self.norm3 = nn.LayerNorm(dim)
+        self.ff = FeedForward(dim, dropout=ff_drop)
+    def forward(self, z, x):
+        z_norm = self.norm1(z)
+        z = z + self.attn1(z_norm, z_norm)
+        # TODO: do we need to have the second attention when x is None?
+        z_norm = self.norm2(z)
+        z = z + self.attn2(z_norm, x if x is not None else z_norm)
+        z_norm = self.norm3(z)
+        z = z + self.ff(z_norm)
+        return z
+class SingleStreamTransformer(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        num_attention_heads: int = 16
+        attention_head_dim: int = 88
+        in_channels: Optional[int] = None
+        out_channels: Optional[int] = None
+        num_layers: int = 16
+        dropout: float = 0.0
+        norm_num_groups: int = 32
+        cross_attention_dim: Optional[int] = None
+        attention_bias: bool = False
+    cfg: Config
+    def configure(self) -> None:
+        self.num_attention_heads = self.cfg.num_attention_heads
+        self.attention_head_dim = self.cfg.attention_head_dim
+        inner_dim = self.num_attention_heads * self.attention_head_dim
+        # Define input layers
+        self.norm = torch.nn.GroupNorm(
+            num_groups=self.cfg.norm_num_groups,
+            num_channels=self.cfg.in_channels,
+            eps=1e-6,
+            affine=True,
+        )
+        self.proj_in = nn.Linear(self.cfg.in_channels, inner_dim)
+        # Define transformers blocks
+        self.transformer_blocks = nn.ModuleList(
+            [
+                BasicBlock(
+                    inner_dim,
+                    kv_dim=self.cfg.cross_attention_dim,
+                    num_heads=self.num_attention_heads,
+                    qkv_bias=self.cfg.attention_bias,
+                    proj_drop=self.cfg.dropout,
+                    ff_drop=self.cfg.dropout,
+                )
+                for d in range(self.cfg.num_layers)
+            ]
+        )
+        # 4. Define output layers
+        self.proj_out = nn.Linear(inner_dim, self.cfg.in_channels)
+    def forward(self, hidden_states, encoder_hidden_states=None, **kwargs):
+        residual = hidden_states
+        hidden_states = self.norm(hidden_states)
+        hidden_states = hidden_states.permute(0, 2, 1)
+        hidden_states = self.proj_in(hidden_states)
+        for block in self.transformer_blocks:
+            hidden_states = block(hidden_states, encoder_hidden_states)
+        hidden_states = self.proj_out(hidden_states).permute(0, 2, 1).contiguous()
+        # TODO: do we really need to add the residual?
+        hidden_states = hidden_states + residual
+        return hidden_states
+class FuseBlock(nn.Module):
+    """
+    Fuse X in to Z with cross attention
+    """
+    def __init__(
+        self,
+        dim_z: int,
+        dim_x: int,
+        num_heads: int = 16,
+        qkv_bias: bool = False,
+        attn_drop: float = 0.0,
+        proj_drop: float = 0.0,
+        ff_drop: float = 0.0,
+        norm_x_input: bool = True,
+    ):
+        super().__init__()
+        self.norm_x_input = norm_x_input
+        if self.norm_x_input:
+            self.norm_x = nn.LayerNorm(dim_x)
+        self.attn = CrossAttention(
+            dim_z,
+            kv_dim=dim_x,
+            num_heads=num_heads,
+            qkv_bias=qkv_bias,
+            attn_drop=attn_drop,
+            proj_drop=proj_drop,
+        )
+        self.norm_z1 = nn.LayerNorm(dim_z)
+        self.norm_z2 = nn.LayerNorm(dim_z)
+        self.ff = FeedForward(dim_z, dropout=ff_drop)
+    def forward(self, z, x):
+        # TODO: do we need to normalize x?
+        z = z + self.attn(self.norm_z1(z), self.norm_x(x) if self.norm_x_input else x)
+        z = z + self.ff(self.norm_z2(z))
+        return z
+@torch.no_grad()
+def get_triplane_attention_mask(res):
+    N = 3 * res * res
+    attn_mask = torch.zeros(3, res, res, 3, res, res)
+    i, j = torch.meshgrid(torch.arange(res), torch.arange(res))
+    attn_mask[0, i, j, 1, i, :] = 1.0
+    attn_mask[0, i, j, 2, j, :] = 1.0
+    attn_mask[1, i, j, 0, i, :] = 1.0
+    attn_mask[1, i, j, 2, :, j] = 1.0
+    attn_mask[2, i, j, 0, :, i] = 1.0
+    attn_mask[2, i, j, 1, :, j] = 1.0
+    attn_mask = attn_mask.bool()
+    attn_bias = torch.empty_like(attn_mask, dtype=torch.float)
+    attn_bias.masked_fill_(attn_mask, 0.0)
+    attn_bias.masked_fill_(~attn_mask, float("-inf"))
+    return attn_bias.reshape(N, N)
+class TriplaneAttention(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        resolution: int,
+        num_heads: int = 16,
+        qkv_bias: bool = False,
+        attn_drop: float = 0.0,
+        proj_drop: float = 0.0,
+        full_attention: bool = False,
+    ):
+        super().__init__()
+        self.num_heads = num_heads
+        head_dim = dim // num_heads
+        self.scale = head_dim**-0.5
+        self.wq = nn.Linear(dim, dim, bias=qkv_bias)
+        self.wk = nn.Linear(dim, dim, bias=qkv_bias)
+        self.wv = nn.Linear(dim, dim, bias=qkv_bias)
+        self.attn_drop = attn_drop
+        self.proj = nn.Linear(dim, dim)
+        self.proj_drop = nn.Dropout(proj_drop)
+        self.resolution = resolution
+        self.full_attention = full_attention
+        self.attn_mask = (
+            get_triplane_attention_mask(resolution) if not full_attention else None
+        )
+    def forward(self, x):
+        B, N, C = x.shape
+        # [B, N, C] -> [B, N, H, C/H]
+        q = self.wq(x).reshape(B, N, self.num_heads, C // self.num_heads)
+        k = self.wk(x).reshape(B, N, self.num_heads, C // self.num_heads)
+        v = self.wv(x).reshape(B, N, self.num_heads, C // self.num_heads)
+        # detokenize the planes
+        assert N == self.resolution**2 * 3
+        attn_bias = (
+            self.attn_mask.to(q)
+            .unsqueeze(0)
+            .unsqueeze(0)
+            .expand(B, self.num_heads, -1, -1)
+            if not self.full_attention
+            else None
+        )
+        # full attention
+        x = torch.nn.functional.scaled_dot_product_attention(
+            q.permute(0, 2, 1, 3),
+            k.permute(0, 2, 1, 3),
+            v.permute(0, 2, 1, 3),
+            attn_mask=attn_bias,
+            dropout_p=self.attn_drop,
+            scale=self.scale,
+        ).permute(0, 2, 1, 3)
+        # [B, N_q, H, C/H] -> [B, N_q, C]
+        x = x.reshape(B, N, C)
+        x = self.proj(x)
+        x = self.proj_drop(x)
+        return x
+class TwoStreamBlock(nn.Module):
+    def __init__(
+        self,
+        dim_latent: int,
+        dim_input: int,
+        num_basic_blocks: int = 4,
+        num_heads: int = 16,
+        qkv_bias: bool = False,
+        attn_drop: float = 0.0,
+        proj_drop: float = 0.0,
+        ff_drop: float = 0.0,
+        norm_x_input: bool = True,
+        dim_cross: Optional[int] = None,
+    ):
+        super().__init__()
+        # Define the fuse block that fuse the input into the latent
+        self.fuse_block_in = FuseBlock(
+            dim_latent,
+            dim_input,
+            num_heads=num_heads,
+            qkv_bias=qkv_bias,
+            attn_drop=attn_drop,
+            proj_drop=proj_drop,
+            ff_drop=ff_drop,
+            norm_x_input=norm_x_input,
+        )
+        # Define the transformer block that process the latent
+        self.transformer_block = nn.ModuleList(
+            [
+                BasicBlock(
+                    dim_latent,
+                    kv_dim=dim_cross,
+                    num_heads=num_heads,
+                    qkv_bias=qkv_bias,
+                    proj_drop=proj_drop,
+                    ff_drop=ff_drop,
+                )
+                for _ in range(num_basic_blocks)
+            ]
+        )
+        # Define the fuse block that fuse the latent into the input
+        self.fuse_block_out = FuseBlock(
+            dim_input,
+            dim_latent,
+            num_heads=num_heads,
+            qkv_bias=qkv_bias,
+            attn_drop=attn_drop,
+            proj_drop=proj_drop,
+            ff_drop=ff_drop,
+            norm_x_input=norm_x_input,
+        )
+    def forward(self, latent, input, cross_input):
+        latent = self.fuse_block_in(latent, input)
+        for block in self.transformer_block:
+            latent = block(latent, cross_input)
+        input = self.fuse_block_out(input, latent)
+        return latent, input
+class TwoStreamInterleaveTransformer(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        num_attention_heads: int = 16
+        attention_head_dim: int = 64
+        raw_triplane_channels: int = 1024
+        triplane_channels: int = 1024
+        raw_image_channels: int = 1024
+        num_latents: int = 1792
+        num_blocks: int = 4
+        num_basic_blocks: int = 3
+        dropout: float = 0.0
+        latent_init_std: float = 0.02
+        norm_num_groups: int = 32
+        attention_bias: bool = False
+        norm_x_input: bool = False
+        cross_attention_dim: int = 1024
+        mix_latent: bool = True
+    cfg: Config
+    def configure(self) -> None:
+        self.mix_latent = self.cfg.mix_latent
+        # Define the dimensions
+        self.num_attention_heads = self.cfg.num_attention_heads
+        self.attention_head_dim = self.cfg.attention_head_dim
+        self.num_latents = self.cfg.num_latents
+        self.latent_dim = self.num_attention_heads * self.attention_head_dim
+        # Define input layers
+        if self.cfg.norm_num_groups > 0:
+            self.norm_triplane = torch.nn.GroupNorm(
+                num_groups=self.cfg.norm_num_groups,
+                num_channels=self.cfg.raw_triplane_channels,
+                eps=1e-6,
+                affine=True,
+            )
+        else:
+            self.norm_triplane = nn.LayerNorm(self.cfg.raw_triplane_channels)
+        self.proj_triplane = nn.Linear(
+            self.cfg.raw_triplane_channels, self.cfg.triplane_channels
+        )
+        if self.mix_latent:
+            self.norm_image = nn.LayerNorm(self.cfg.raw_image_channels)
+            self.proj_image = nn.Linear(self.cfg.raw_image_channels, self.latent_dim)
+        self.norm_latent = nn.LayerNorm(self.latent_dim)
+        self.proj_latent = nn.Linear(self.latent_dim, self.latent_dim)
+        # Define the latents
+        self.latent_init = nn.Parameter(
+            torch.zeros(1, self.num_latents, self.latent_dim)
+        )
+        nn.init.normal_(self.latent_init, std=self.cfg.latent_init_std)
+        # Define the transformer blocks
+        self.main_blocks = nn.ModuleList(
+            [
+                TwoStreamBlock(
+                    self.latent_dim,
+                    self.cfg.triplane_channels,
+                    num_basic_blocks=self.cfg.num_basic_blocks,
+                    num_heads=self.num_attention_heads,
+                    qkv_bias=self.cfg.attention_bias,
+                    proj_drop=self.cfg.dropout,
+                    ff_drop=self.cfg.dropout,
+                    norm_x_input=self.cfg.norm_x_input,
+                    dim_cross=self.cfg.cross_attention_dim,
+                )
+                for _ in range(self.cfg.num_blocks)
+            ]
+        )
+        # 4. Define output layers
+        self.proj_out = nn.Linear(
+            self.cfg.triplane_channels, self.cfg.raw_triplane_channels
+        )
+    def forward(self, hidden_states, encoder_hidden_states, **kwargs):
+        # hidden_states: [B, triplane_dim, N_triplane] is triplane tokens
+        # encoder_hidden_states: [B, N_image, image_dim] is the image tokens
+        if isinstance(self.norm_triplane, nn.GroupNorm):
+            triplane_tokens = self.norm_triplane(hidden_states)
+            triplane_tokens = triplane_tokens.permute(
+                0, 2, 1
+            )  # [B, N_triplane, triplane_dim]
+        elif isinstance(self.norm_triplane, nn.LayerNorm):
+            triplane_tokens = self.norm_triplane(hidden_states.permute(0, 2, 1))
+        else:
+            raise ValueError("Unknown normalization layer")
+        triplane_tokens = self.proj_triplane(triplane_tokens)
+        if self.mix_latent:
+            image_tokens = self.norm_image(
+                encoder_hidden_states
+            )  # [B, N_image, image_dim]
+            image_tokens = self.proj_image(image_tokens)
+        init_latents = self.latent_init.expand(
+            hidden_states.shape[0], -1, -1
+        )  # [B, N_latent_init, latent_dim]
+        init_latents = self.norm_latent(init_latents)
+        init_latents = self.proj_latent(init_latents)
+        if self.mix_latent:
+            latent_tokens = torch.cat(
+                [image_tokens, init_latents], dim=1
+            )  # [B, N_latent, latent_dim]
+        else:
+            latent_tokens = init_latents
+        # forward the main blocks
+        for block in self.main_blocks:
+            latent_tokens, triplane_tokens = block(
+                latent_tokens, triplane_tokens, encoder_hidden_states
+            )
+        # project the triplane tokens back to the original dimension
+        triplane_tokens = self.proj_out(triplane_tokens).permute(0, 2, 1).contiguous()
+        triplane_tokens = triplane_tokens + hidden_states
+        return triplane_tokens

sf3d/models/utils.py ADDED Viewed

	@@ -0,0 +1,236 @@

+import dataclasses
+import importlib
+from dataclasses import dataclass
+from typing import Any, List, Optional, Tuple, Union
+import numpy as np
+import PIL
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from jaxtyping import Float, Int, Num
+from omegaconf import DictConfig, OmegaConf
+from torch import Tensor
+class BaseModule(nn.Module):
+    @dataclass
+    class Config:
+        pass
+    cfg: Config  # add this to every subclass of BaseModule to enable static type checking
+    def __init__(
+        self, cfg: Optional[Union[dict, DictConfig]] = None, *args, **kwargs
+    ) -> None:
+        super().__init__()
+        self.cfg = parse_structured(self.Config, cfg)
+        self.configure(*args, **kwargs)
+    def configure(self, *args, **kwargs) -> None:
+        raise NotImplementedError
+def find_class(cls_string):
+    module_string = ".".join(cls_string.split(".")[:-1])
+    cls_name = cls_string.split(".")[-1]
+    module = importlib.import_module(module_string, package=None)
+    cls = getattr(module, cls_name)
+    return cls
+def parse_structured(fields: Any, cfg: Optional[Union[dict, DictConfig]] = None) -> Any:
+    # Check if cfg.keys are in fields
+    cfg_ = cfg.copy()
+    keys = list(cfg_.keys())
+    field_names = {f.name for f in dataclasses.fields(fields)}
+    for key in keys:
+        # This is helpful when swapping out modules from CLI
+        if key not in field_names:
+            print(f"Ignoring {key} as it's not supported by {fields}")
+            cfg_.pop(key)
+    scfg = OmegaConf.merge(OmegaConf.structured(fields), cfg_)
+    return scfg
+EPS_DTYPE = {
+    torch.float16: 1e-4,
+    torch.bfloat16: 1e-4,
+    torch.float32: 1e-7,
+    torch.float64: 1e-8,
+}
+def dot(x, y, dim=-1):
+    return torch.sum(x * y, dim, keepdim=True)
+def reflect(x, n):
+    return x - 2 * dot(x, n) * n
+def normalize(x, dim=-1, eps=None):
+    if eps is None:
+        eps = EPS_DTYPE[x.dtype]
+    return F.normalize(x, dim=dim, p=2, eps=eps)
+ValidScale = Union[Tuple[float, float], Num[Tensor, "2 D"]]
+def scale_tensor(
+    dat: Num[Tensor, "... D"], inp_scale: ValidScale, tgt_scale: ValidScale
+):
+    if inp_scale is None:
+        inp_scale = (0, 1)
+    if tgt_scale is None:
+        tgt_scale = (0, 1)
+    if isinstance(tgt_scale, Tensor):
+        assert dat.shape[-1] == tgt_scale.shape[-1]
+    dat = (dat - inp_scale[0]) / (inp_scale[1] - inp_scale[0])
+    dat = dat * (tgt_scale[1] - tgt_scale[0]) + tgt_scale[0]
+    return dat
+def dilate_fill(img, mask, iterations=10):
+    oldMask = mask.float()
+    oldImg = img
+    mask_kernel = torch.ones(
+        (1, 1, 3, 3),
+        dtype=oldMask.dtype,
+        device=oldMask.device,
+    )
+    for i in range(iterations):
+        newMask = torch.nn.functional.max_pool2d(oldMask, 3, 1, 1)
+        # Fill the extension with mean color of old valid regions
+        img_unfold = F.unfold(oldImg, (3, 3)).view(1, 3, 3 * 3, -1)
+        mask_unfold = F.unfold(oldMask, (3, 3)).view(1, 1, 3 * 3, -1)
+        new_mask_unfold = F.unfold(newMask, (3, 3)).view(1, 1, 3 * 3, -1)
+        # Average color of the valid region
+        mean_color = (img_unfold.sum(dim=2) / mask_unfold.sum(dim=2).clip(1)).unsqueeze(
+            2
+        )
+        # Extend it to the new region
+        fill_color = (mean_color * new_mask_unfold).view(1, 3 * 3 * 3, -1)
+        mask_conv = F.conv2d(
+            newMask, mask_kernel, padding=1
+        )  # Get the sum for each kernel patch
+        newImg = F.fold(
+            fill_color, (img.shape[-2], img.shape[-1]), (3, 3)
+        ) / mask_conv.clamp(1)
+        diffMask = newMask - oldMask
+        oldMask = newMask
+        oldImg = torch.lerp(oldImg, newImg, diffMask)
+    return oldImg
+def float32_to_uint8_np(
+    x: Float[np.ndarray, "*B H W C"],
+    dither: bool = True,
+    dither_mask: Optional[Float[np.ndarray, "*B H W C"]] = None,
+    dither_strength: float = 1.0,
+) -> Int[np.ndarray, "*B H W C"]:
+    if dither:
+        dither = (
+            dither_strength * np.random.rand(*x[..., :1].shape).astype(np.float32) - 0.5
+        )
+        if dither_mask is not None:
+            dither = dither * dither_mask
+        return np.clip(np.floor((256.0 * x + dither)), 0, 255).astype(np.uint8)
+    return np.clip(np.floor((256.0 * x)), 0, 255).astype(torch.uint8)
+def convert_data(data):
+    if data is None:
+        return None
+    elif isinstance(data, np.ndarray):
+        return data
+    elif isinstance(data, torch.Tensor):
+        if data.dtype in [torch.float16, torch.bfloat16]:
+            data = data.float()
+        return data.detach().cpu().numpy()
+    elif isinstance(data, list):
+        return [convert_data(d) for d in data]
+    elif isinstance(data, dict):
+        return {k: convert_data(v) for k, v in data.items()}
+    else:
+        raise TypeError(
+            "Data must be in type numpy.ndarray, torch.Tensor, list or dict, getting",
+            type(data),
+        )
+class ImageProcessor:
+    def convert_and_resize(
+        self,
+        image: Union[PIL.Image.Image, np.ndarray, torch.Tensor],
+        size: int,
+    ):
+        if isinstance(image, PIL.Image.Image):
+            image = torch.from_numpy(np.array(image).astype(np.float32) / 255.0)
+        elif isinstance(image, np.ndarray):
+            if image.dtype == np.uint8:
+                image = torch.from_numpy(image.astype(np.float32) / 255.0)
+            else:
+                image = torch.from_numpy(image)
+        elif isinstance(image, torch.Tensor):
+            pass
+        batched = image.ndim == 4
+        if not batched:
+            image = image[None, ...]
+        image = F.interpolate(
+            image.permute(0, 3, 1, 2),
+            (size, size),
+            mode="bilinear",
+            align_corners=False,
+            antialias=True,
+        ).permute(0, 2, 3, 1)
+        if not batched:
+            image = image[0]
+        return image
+    def __call__(
+        self,
+        image: Union[
+            PIL.Image.Image,
+            np.ndarray,
+            torch.FloatTensor,
+            List[PIL.Image.Image],
+            List[np.ndarray],
+            List[torch.FloatTensor],
+        ],
+        size: int,
+    ) -> Any:
+        if isinstance(image, (np.ndarray, torch.FloatTensor)) and image.ndim == 4:
+            image = self.convert_and_resize(image, size)
+        else:
+            if not isinstance(image, list):
+                image = [image]
+            image = [self.convert_and_resize(im, size) for im in image]
+            image = torch.stack(image, dim=0)
+        return image
+def get_intrinsic_from_fov(fov, H, W, bs=-1):
+    focal_length = 0.5 * H / np.tan(0.5 * fov)
+    intrinsic = np.identity(3, dtype=np.float32)
+    intrinsic[0, 0] = focal_length
+    intrinsic[1, 1] = focal_length
+    intrinsic[0, 2] = W / 2.0
+    intrinsic[1, 2] = H / 2.0
+    if bs > 0:
+        intrinsic = intrinsic[None].repeat(bs, axis=0)
+    return torch.from_numpy(intrinsic)

sf3d/system.py ADDED Viewed

	@@ -0,0 +1,534 @@

+import os
+from contextlib import nullcontext
+from dataclasses import dataclass, field
+from typing import Any, List, Literal, Optional, Tuple, Union
+import numpy as np
+import torch
+import torch.nn.functional as F
+import trimesh
+from einops import rearrange
+from huggingface_hub import hf_hub_download
+from jaxtyping import Float
+from omegaconf import OmegaConf
+from PIL import Image
+from safetensors.torch import load_model
+from torch import Tensor
+from sf3d.models.isosurface import MarchingTetrahedraHelper
+from sf3d.models.mesh import Mesh
+from sf3d.models.utils import (
+    BaseModule,
+    ImageProcessor,
+    convert_data,
+    dilate_fill,
+    find_class,
+    float32_to_uint8_np,
+    normalize,
+    scale_tensor,
+)
+from sf3d.utils import create_intrinsic_from_fov_deg, default_cond_c2w, get_device
+try:
+    from texture_baker import TextureBaker
+except ImportError:
+    import logging
+    logging.warning(
+        "Could not import texture_baker. Please install it via `pip install texture-baker/`"
+    )
+    # Exit early to avoid further errors
+    raise ImportError("texture_baker not found")
+class SF3D(BaseModule):
+    @dataclass
+    class Config(BaseModule.Config):
+        cond_image_size: int
+        isosurface_resolution: int
+        isosurface_threshold: float = 10.0
+        radius: float = 1.0
+        background_color: list[float] = field(default_factory=lambda: [0.5, 0.5, 0.5])
+        default_fovy_deg: float = 40.0
+        default_distance: float = 1.6
+        camera_embedder_cls: str = ""
+        camera_embedder: dict = field(default_factory=dict)
+        image_tokenizer_cls: str = ""
+        image_tokenizer: dict = field(default_factory=dict)
+        tokenizer_cls: str = ""
+        tokenizer: dict = field(default_factory=dict)
+        backbone_cls: str = ""
+        backbone: dict = field(default_factory=dict)
+        post_processor_cls: str = ""
+        post_processor: dict = field(default_factory=dict)
+        decoder_cls: str = ""
+        decoder: dict = field(default_factory=dict)
+        image_estimator_cls: str = ""
+        image_estimator: dict = field(default_factory=dict)
+        global_estimator_cls: str = ""
+        global_estimator: dict = field(default_factory=dict)
+    cfg: Config
+    @classmethod
+    def from_pretrained(
+        cls, pretrained_model_name_or_path: str, config_name: str, weight_name: str
+    ):
+        if os.path.isdir(pretrained_model_name_or_path):
+            config_path = os.path.join(pretrained_model_name_or_path, config_name)
+            weight_path = os.path.join(pretrained_model_name_or_path, weight_name)
+        else:
+            config_path = hf_hub_download(
+                repo_id=pretrained_model_name_or_path, filename=config_name
+            )
+            weight_path = hf_hub_download(
+                repo_id=pretrained_model_name_or_path, filename=weight_name
+            )
+        cfg = OmegaConf.load(config_path)
+        OmegaConf.resolve(cfg)
+        model = cls(cfg)
+        load_model(model, weight_path)
+        return model
+    @property
+    def device(self):
+        return next(self.parameters()).device
+    def configure(self):
+        self.image_tokenizer = find_class(self.cfg.image_tokenizer_cls)(
+            self.cfg.image_tokenizer
+        )
+        self.tokenizer = find_class(self.cfg.tokenizer_cls)(self.cfg.tokenizer)
+        self.camera_embedder = find_class(self.cfg.camera_embedder_cls)(
+            self.cfg.camera_embedder
+        )
+        self.backbone = find_class(self.cfg.backbone_cls)(self.cfg.backbone)
+        self.post_processor = find_class(self.cfg.post_processor_cls)(
+            self.cfg.post_processor
+        )
+        self.decoder = find_class(self.cfg.decoder_cls)(self.cfg.decoder)
+        self.image_estimator = find_class(self.cfg.image_estimator_cls)(
+            self.cfg.image_estimator
+        )
+        self.global_estimator = find_class(self.cfg.global_estimator_cls)(
+            self.cfg.global_estimator
+        )
+        self.bbox: Float[Tensor, "2 3"]
+        self.register_buffer(
+            "bbox",
+            torch.as_tensor(
+                [
+                    [-self.cfg.radius, -self.cfg.radius, -self.cfg.radius],
+                    [self.cfg.radius, self.cfg.radius, self.cfg.radius],
+                ],
+                dtype=torch.float32,
+            ),
+        )
+        self.isosurface_helper = MarchingTetrahedraHelper(
+            self.cfg.isosurface_resolution,
+            os.path.join(
+                os.path.dirname(__file__),
+                "..",
+                "load",
+                "tets",
+                f"{self.cfg.isosurface_resolution}_tets.npz",
+            ),
+        )
+        self.baker = TextureBaker()
+        self.image_processor = ImageProcessor()
+    def triplane_to_meshes(
+        self, triplanes: Float[Tensor, "B 3 Cp Hp Wp"]
+    ) -> list[Mesh]:
+        meshes = []
+        for i in range(triplanes.shape[0]):
+            triplane = triplanes[i]
+            grid_vertices = scale_tensor(
+                self.isosurface_helper.grid_vertices.to(triplanes.device),
+                self.isosurface_helper.points_range,
+                self.bbox,
+            )
+            values = self.query_triplane(grid_vertices, triplane)
+            decoded = self.decoder(values, include=["vertex_offset", "density"])
+            sdf = decoded["density"] - self.cfg.isosurface_threshold
+            deform = decoded["vertex_offset"].squeeze(0)
+            mesh: Mesh = self.isosurface_helper(
+                sdf.view(-1, 1), deform.view(-1, 3) if deform is not None else None
+            )
+            mesh.v_pos = scale_tensor(
+                mesh.v_pos, self.isosurface_helper.points_range, self.bbox
+            )
+            meshes.append(mesh)
+        return meshes
+    def query_triplane(
+        self,
+        positions: Float[Tensor, "*B N 3"],
+        triplanes: Float[Tensor, "*B 3 Cp Hp Wp"],
+    ) -> Float[Tensor, "*B N F"]:
+        batched = positions.ndim == 3
+        if not batched:
+            # no batch dimension
+            triplanes = triplanes[None, ...]
+            positions = positions[None, ...]
+        assert triplanes.ndim == 5 and positions.ndim == 3
+        positions = scale_tensor(
+            positions, (-self.cfg.radius, self.cfg.radius), (-1, 1)
+        )
+        indices2D: Float[Tensor, "B 3 N 2"] = torch.stack(
+            (positions[..., [0, 1]], positions[..., [0, 2]], positions[..., [1, 2]]),
+            dim=-3,
+        ).to(triplanes.dtype)
+        out: Float[Tensor, "B3 Cp 1 N"] = F.grid_sample(
+            rearrange(triplanes, "B Np Cp Hp Wp -> (B Np) Cp Hp Wp", Np=3).float(),
+            rearrange(indices2D, "B Np N Nd -> (B Np) () N Nd", Np=3).float(),
+            align_corners=True,
+            mode="bilinear",
+        )
+        out = rearrange(out, "(B Np) Cp () N -> B N (Np Cp)", Np=3)
+        return out
+    def get_scene_codes(self, batch) -> Float[Tensor, "B 3 C H W"]:
+        # if batch[rgb_cond] is only one view, add a view dimension
+        if len(batch["rgb_cond"].shape) == 4:
+            batch["rgb_cond"] = batch["rgb_cond"].unsqueeze(1)
+            batch["mask_cond"] = batch["mask_cond"].unsqueeze(1)
+            batch["c2w_cond"] = batch["c2w_cond"].unsqueeze(1)
+            batch["intrinsic_cond"] = batch["intrinsic_cond"].unsqueeze(1)
+            batch["intrinsic_normed_cond"] = batch["intrinsic_normed_cond"].unsqueeze(1)
+        batch_size, n_input_views = batch["rgb_cond"].shape[:2]
+        camera_embeds: Optional[Float[Tensor, "B Nv Cc"]]
+        camera_embeds = self.camera_embedder(**batch)
+        input_image_tokens: Float[Tensor, "B Nv Cit Nit"] = self.image_tokenizer(
+            rearrange(batch["rgb_cond"], "B Nv H W C -> B Nv C H W"),
+            modulation_cond=camera_embeds,
+        )
+        input_image_tokens = rearrange(
+            input_image_tokens, "B Nv C Nt -> B (Nv Nt) C", Nv=n_input_views
+        )
+        tokens: Float[Tensor, "B Ct Nt"] = self.tokenizer(batch_size)
+        tokens = self.backbone(
+            tokens,
+            encoder_hidden_states=input_image_tokens,
+            modulation_cond=None,
+        )
+        direct_codes = self.tokenizer.detokenize(tokens)
+        scene_codes = self.post_processor(direct_codes)
+        return scene_codes, direct_codes
+    def run_image(
+        self,
+        image: Union[Image.Image, List[Image.Image]],
+        bake_resolution: int,
+        remesh: Literal["none", "triangle", "quad"] = "none",
+        vertex_count: int = -1,
+        estimate_illumination: bool = False,
+    ) -> Tuple[Union[trimesh.Trimesh, List[trimesh.Trimesh]], dict[str, Any]]:
+        if isinstance(image, list):
+            rgb_cond = []
+            mask_cond = []
+            for img in image:
+                mask, rgb = self.prepare_image(img)
+                mask_cond.append(mask)
+                rgb_cond.append(rgb)
+            rgb_cond = torch.stack(rgb_cond, 0)
+            mask_cond = torch.stack(mask_cond, 0)
+            batch_size = rgb_cond.shape[0]
+        else:
+            mask_cond, rgb_cond = self.prepare_image(image)
+            batch_size = 1
+        c2w_cond = default_cond_c2w(self.cfg.default_distance).to(self.device)
+        intrinsic, intrinsic_normed_cond = create_intrinsic_from_fov_deg(
+            self.cfg.default_fovy_deg,
+            self.cfg.cond_image_size,
+            self.cfg.cond_image_size,
+        )
+        batch = {
+            "rgb_cond": rgb_cond,
+            "mask_cond": mask_cond,
+            "c2w_cond": c2w_cond.view(1, 1, 4, 4).repeat(batch_size, 1, 1, 1),
+            "intrinsic_cond": intrinsic.to(self.device)
+            .view(1, 1, 3, 3)
+            .repeat(batch_size, 1, 1, 1),
+            "intrinsic_normed_cond": intrinsic_normed_cond.to(self.device)
+            .view(1, 1, 3, 3)
+            .repeat(batch_size, 1, 1, 1),
+        }
+        meshes, global_dict = self.generate_mesh(
+            batch, bake_resolution, remesh, vertex_count, estimate_illumination
+        )
+        if batch_size == 1:
+            return meshes[0], global_dict
+        else:
+            return meshes, global_dict
+    def prepare_image(self, image):
+        if image.mode != "RGBA":
+            raise ValueError("Image must be in RGBA mode")
+        img_cond = (
+            torch.from_numpy(
+                np.asarray(
+                    image.resize((self.cfg.cond_image_size, self.cfg.cond_image_size))
+                ).astype(np.float32)
+                / 255.0
+            )
+            .float()
+            .clip(0, 1)
+            .to(self.device)
+        )
+        mask_cond = img_cond[:, :, -1:]
+        rgb_cond = torch.lerp(
+            torch.tensor(self.cfg.background_color, device=self.device)[None, None, :],
+            img_cond[:, :, :3],
+            mask_cond,
+        )
+        return mask_cond, rgb_cond
+    def generate_mesh(
+        self,
+        batch,
+        bake_resolution: int,
+        remesh: Literal["none", "triangle", "quad"] = "none",
+        vertex_count: int = -1,
+        estimate_illumination: bool = False,
+    ) -> Tuple[List[trimesh.Trimesh], dict[str, Any]]:
+        batch["rgb_cond"] = self.image_processor(
+            batch["rgb_cond"], self.cfg.cond_image_size
+        )
+        batch["mask_cond"] = self.image_processor(
+            batch["mask_cond"], self.cfg.cond_image_size
+        )
+        scene_codes, non_postprocessed_codes = self.get_scene_codes(batch)
+        global_dict = {}
+        if self.image_estimator is not None:
+            global_dict.update(
+                self.image_estimator(batch["rgb_cond"] * batch["mask_cond"])
+            )
+        if self.global_estimator is not None and estimate_illumination:
+            global_dict.update(self.global_estimator(non_postprocessed_codes))
+        device = get_device()
+        with torch.no_grad():
+            with (
+                torch.autocast(device_type=device, enabled=False)
+                if "cuda" in device
+                else nullcontext()
+            ):
+                meshes = self.triplane_to_meshes(scene_codes)
+                rets = []
+                for i, mesh in enumerate(meshes):
+                    # Check for empty mesh
+                    if mesh.v_pos.shape[0] == 0:
+                        rets.append(trimesh.Trimesh())
+                        continue
+                    if remesh == "triangle":
+                        mesh = mesh.triangle_remesh(triangle_vertex_count=vertex_count)
+                    elif remesh == "quad":
+                        mesh = mesh.quad_remesh(quad_vertex_count=vertex_count)
+                    else:
+                        if vertex_count > 0:
+                            print(
+                                "Warning: vertex_count is ignored when remesh is none"
+                            )
+                    print("After Remesh", mesh.v_pos.shape[0], mesh.t_pos_idx.shape[0])
+                    mesh.unwrap_uv()
+                    # Build textures
+                    rast = self.baker.rasterize(
+                        mesh.v_tex, mesh.t_pos_idx, bake_resolution
+                    )
+                    bake_mask = self.baker.get_mask(rast)
+                    pos_bake = self.baker.interpolate(
+                        mesh.v_pos,
+                        rast,
+                        mesh.t_pos_idx,
+                    )
+                    gb_pos = pos_bake[bake_mask]
+                    tri_query = self.query_triplane(gb_pos, scene_codes[i])[0]
+                    decoded = self.decoder(
+                        tri_query, exclude=["density", "vertex_offset"]
+                    )
+                    nrm = self.baker.interpolate(
+                        mesh.v_nrm,
+                        rast,
+                        mesh.t_pos_idx,
+                    )
+                    gb_nrm = F.normalize(nrm[bake_mask], dim=-1)
+                    decoded["normal"] = gb_nrm
+                    # Check if any keys in global_dict start with decoded_
+                    for k, v in global_dict.items():
+                        if k.startswith("decoder_"):
+                            decoded[k.replace("decoder_", "")] = v[i]
+                    mat_out = {
+                        "albedo": decoded["features"],
+                        "roughness": decoded["roughness"],
+                        "metallic": decoded["metallic"],
+                        "normal": normalize(decoded["perturb_normal"]),
+                        "bump": None,
+                    }
+                    for k, v in mat_out.items():
+                        if v is None:
+                            continue
+                        if v.shape[0] == 1:
+                            # Skip and directly add a single value
+                            mat_out[k] = v[0]
+                        else:
+                            f = torch.zeros(
+                                bake_resolution,
+                                bake_resolution,
+                                v.shape[-1],
+                                dtype=v.dtype,
+                                device=v.device,
+                            )
+                            if v.shape == f.shape:
+                                continue
+                            if k == "normal":
+                                # Use un-normalized tangents here so that larger smaller tris
+                                # Don't effect the tangents that much
+                                tng = self.baker.interpolate(
+                                    mesh.v_tng,
+                                    rast,
+                                    mesh.t_pos_idx,
+                                )
+                                gb_tng = tng[bake_mask]
+                                gb_tng = F.normalize(gb_tng, dim=-1)
+                                gb_btng = F.normalize(
+                                    torch.cross(gb_nrm, gb_tng, dim=-1), dim=-1
+                                )
+                                normal = F.normalize(mat_out["normal"], dim=-1)
+                                # Create tangent space matrix and transform normal
+                                tangent_matrix = torch.stack(
+                                    [gb_tng, gb_btng, gb_nrm], dim=-1
+                                )
+                                normal_tangent = torch.bmm(
+                                    tangent_matrix.transpose(1, 2), normal.unsqueeze(-1)
+                                ).squeeze(-1)
+                                # Convert from [-1,1] to [0,1] range for storage
+                                normal_tangent = (normal_tangent * 0.5 + 0.5).clamp(
+                                    0, 1
+                                )
+                                f[bake_mask] = normal_tangent.view(-1, 3)
+                                mat_out["bump"] = f
+                            else:
+                                f[bake_mask] = v.view(-1, v.shape[-1])
+                                mat_out[k] = f
+                    def uv_padding(arr):
+                        if arr.ndim == 1:
+                            return arr
+                        return (
+                            dilate_fill(
+                                arr.permute(2, 0, 1)[None, ...].contiguous(),
+                                bake_mask.unsqueeze(0).unsqueeze(0),
+                                iterations=bake_resolution // 150,
+                            )
+                            .squeeze(0)
+                            .permute(1, 2, 0)
+                            .contiguous()
+                        )
+                    verts_np = convert_data(mesh.v_pos)
+                    faces = convert_data(mesh.t_pos_idx)
+                    uvs = convert_data(mesh.v_tex)
+                    basecolor_tex = Image.fromarray(
+                        float32_to_uint8_np(convert_data(uv_padding(mat_out["albedo"])))
+                    ).convert("RGB")
+                    basecolor_tex.format = "JPEG"
+                    metallic = mat_out["metallic"].squeeze().cpu().item()
+                    roughness = mat_out["roughness"].squeeze().cpu().item()
+                    if "bump" in mat_out and mat_out["bump"] is not None:
+                        bump_np = convert_data(uv_padding(mat_out["bump"]))
+                        bump_up = np.ones_like(bump_np)
+                        bump_up[..., :2] = 0.5
+                        bump_up[..., 2:] = 1
+                        bump_tex = Image.fromarray(
+                            float32_to_uint8_np(
+                                bump_np,
+                                dither=True,
+                                # Do not dither if something is perfectly flat
+                                dither_mask=np.all(
+                                    bump_np == bump_up, axis=-1, keepdims=True
+                                ).astype(np.float32),
+                            )
+                        ).convert("RGB")
+                        bump_tex.format = (
+                            "JPEG"  # PNG would be better but the assets are larger
+                        )
+                    else:
+                        bump_tex = None
+                    material = trimesh.visual.material.PBRMaterial(
+                        baseColorTexture=basecolor_tex,
+                        roughnessFactor=roughness,
+                        metallicFactor=metallic,
+                        normalTexture=bump_tex,
+                    )
+                    tmesh = trimesh.Trimesh(
+                        vertices=verts_np,
+                        faces=faces,
+                        visual=trimesh.visual.texture.TextureVisuals(
+                            uv=uvs, material=material
+                        ),
+                    )
+                    rot = trimesh.transformations.rotation_matrix(
+                        np.radians(-90), [1, 0, 0]
+                    )
+                    tmesh.apply_transform(rot)
+                    tmesh.apply_transform(
+                        trimesh.transformations.rotation_matrix(
+                            np.radians(90), [0, 1, 0]
+                        )
+                    )
+                    tmesh.invert()
+                    rets.append(tmesh)
+        return rets, global_dict

sf3d/utils.py ADDED Viewed

	@@ -0,0 +1,105 @@

+import os
+from typing import Any, Union
+import numpy as np
+import rembg
+import torch
+import torchvision.transforms.functional as torchvision_F
+from PIL import Image
+import sf3d.models.utils as sf3d_utils
+def get_device():
+    if os.environ.get("SF3D_USE_CPU", "0") == "1":
+        return "cpu"
+    device = "cpu"
+    if torch.cuda.is_available():
+        device = "cuda"
+    elif torch.backends.mps.is_available():
+        device = "mps"
+    return device
+def create_intrinsic_from_fov_deg(fov_deg: float, cond_height: int, cond_width: int):
+    intrinsic = sf3d_utils.get_intrinsic_from_fov(
+        np.deg2rad(fov_deg),
+        H=cond_height,
+        W=cond_width,
+    )
+    intrinsic_normed_cond = intrinsic.clone()
+    intrinsic_normed_cond[..., 0, 2] /= cond_width
+    intrinsic_normed_cond[..., 1, 2] /= cond_height
+    intrinsic_normed_cond[..., 0, 0] /= cond_width
+    intrinsic_normed_cond[..., 1, 1] /= cond_height
+    return intrinsic, intrinsic_normed_cond
+def default_cond_c2w(distance: float):
+    c2w_cond = torch.as_tensor(
+        [
+            [0, 0, 1, distance],
+            [1, 0, 0, 0],
+            [0, 1, 0, 0],
+            [0, 0, 0, 1],
+        ]
+    ).float()
+    return c2w_cond
+def remove_background(
+    image: Image,
+    rembg_session: Any = None,
+    force: bool = False,
+    **rembg_kwargs,
+) -> Image:
+    do_remove = True
+    if image.mode == "RGBA" and image.getextrema()[3][0] < 255:
+        do_remove = False
+    do_remove = do_remove or force
+    if do_remove:
+        image = rembg.remove(image, session=rembg_session, **rembg_kwargs)
+    return image
+def get_1d_bounds(arr):
+    nz = np.flatnonzero(arr)
+    return nz[0], nz[-1]
+def get_bbox_from_mask(mask, thr=0.5):
+    masks_for_box = (mask > thr).astype(np.float32)
+    assert masks_for_box.sum() > 0, "Empty mask!"
+    x0, x1 = get_1d_bounds(masks_for_box.sum(axis=-2))
+    y0, y1 = get_1d_bounds(masks_for_box.sum(axis=-1))
+    return x0, y0, x1, y1
+def resize_foreground(
+    image: Union[Image.Image, np.ndarray],
+    ratio: float,
+    out_size=None,
+) -> Image:
+    if isinstance(image, np.ndarray):
+        image = Image.fromarray(image, mode="RGBA")
+    assert image.mode == "RGBA"
+    # Get bounding box
+    mask_np = np.array(image)[:, :, -1]
+    x1, y1, x2, y2 = get_bbox_from_mask(mask_np, thr=0.5)
+    h, w = y2 - y1, x2 - x1
+    yc, xc = (y1 + y2) / 2, (x1 + x2) / 2
+    scale = max(h, w) / ratio
+    new_image = torchvision_F.crop(
+        image,
+        top=int(yc - scale / 2),
+        left=int(xc - scale / 2),
+        height=int(scale),
+        width=int(scale),
+    )
+    if out_size is not None:
+        new_image = new_image.resize(out_size)
+    return new_image

texture_baker/README.md ADDED Viewed

	@@ -0,0 +1,26 @@

+# Texture baker
+Small texture baker which rasterizes barycentric coordinates to a tensor.
+It also implements an interpolation module which can be used to bake attributes to textures then.
+## Usage
+The baker can quickly bake vertex attributes to the a texture atlas based on the UV coordinates.
+It supports baking on the CPU and GPU.
+```python
+from texture_baker import TextureBaker
+mesh = ...
+uv = mesh.uv # num_vertex, 2
+triangle_idx = mesh.faces # num_faces, 3
+vertices = mesh.vertices # num_vertex, 3
+tb  = TextureBaker()
+# First get the barycentric coordinates
+rast = tb.rasterize(
+    uv=uv, face_indices=triangle_idx, bake_resolution=1024
+)
+# Then interpolate vertex attributes
+position_bake = tb.interpolate(attr=vertices, rast=rast, face_indices=triangle_idx)
+```

texture_baker/requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ torch
2	+ numpy

texture_baker/setup.py ADDED Viewed

	@@ -0,0 +1,142 @@

+import glob
+import os
+import platform
+import torch
+from setuptools import find_packages, setup
+from torch.utils.cpp_extension import (
+    CUDA_HOME,
+    BuildExtension,
+    CppExtension,
+    CUDAExtension,
+)
+library_name = "texture_baker"
+def get_extensions():
+    debug_mode = os.getenv("DEBUG", "0") == "1"
+    use_cuda = os.getenv("USE_CUDA", "1" if torch.cuda.is_available() else "0") == "1"
+    use_metal = (
+        os.getenv("USE_METAL", "1" if torch.backends.mps.is_available() else "0") == "1"
+    )
+    use_native_arch = os.getenv("USE_NATIVE_ARCH", "1") == "1"
+    if debug_mode:
+        print("Compiling in debug mode")
+    use_cuda = use_cuda and CUDA_HOME is not None
+    extension = CUDAExtension if use_cuda else CppExtension
+    is_hip_extension = (
+        True
+        if (
+            (os.environ.get("ROCM_HOME") is not None)
+            and (torch.version.hip is not None)
+        )
+        else False
+    )
+    extra_link_args = []
+    extra_compile_args = {
+        "cxx": (
+            [
+                "-O3" if not debug_mode else "-O0",
+                "-fdiagnostics-color=always",
+                "-fopenmp",
+            ]
+            + ["-march=native"]
+            if use_native_arch
+            else []
+        ),
+        "nvcc": [
+            "-O3" if not debug_mode else "-O0",
+        ],
+    }
+    if debug_mode:
+        extra_compile_args["cxx"].append("-g")
+        if platform.system() == "Windows":
+            extra_compile_args["cxx"].append("/Z7")
+            extra_compile_args["cxx"].append("/Od")
+            extra_link_args.extend(["/DEBUG"])
+        extra_compile_args["cxx"].append("-UNDEBUG")
+        extra_compile_args["nvcc"].append("-UNDEBUG")
+        extra_compile_args["nvcc"].append("-g")
+        extra_link_args.extend(["-O0", "-g"])
+    define_macros = []
+    extensions = []
+    libraries = []
+    this_dir = os.path.dirname(os.path.curdir)
+    sources = glob.glob(
+        os.path.join(this_dir, library_name, "csrc", "**", "*.cpp"), recursive=True
+    )
+    if len(sources) == 0:
+        print("No source files found for extension, skipping extension compilation")
+        return None
+    if use_cuda:
+        define_macros += [
+            ("THRUST_IGNORE_CUB_VERSION_CHECK", None),
+        ]
+        sources += glob.glob(
+            os.path.join(this_dir, library_name, "csrc", "**", "*.cu"), recursive=True
+        )
+        if not is_hip_extension:
+            libraries += ["cudart", "c10_cuda"]
+    if use_metal:
+        define_macros += [
+            ("WITH_MPS", None),
+        ]
+        sources += glob.glob(
+            os.path.join(this_dir, library_name, "csrc", "**", "*.mm"), recursive=True
+        )
+        extra_compile_args.update(
+            {"cxx": ["-O3", "-arch", "arm64", "-mmacosx-version-min=10.15"]}
+        )
+        extra_link_args += ["-arch", "arm64"]
+    extensions.append(
+        extension(
+            name=f"{library_name}._C",
+            sources=sources,
+            define_macros=define_macros,
+            extra_compile_args=extra_compile_args,
+            extra_link_args=extra_link_args,
+            libraries=libraries
+            + [
+                "c10",
+                "torch",
+                "torch_cpu",
+                "torch_python",
+            ],
+        )
+    )
+    for ext in extensions:
+        ext.libraries = ["cudart_static" if x == "cudart" else x for x in ext.libraries]
+    print(extensions)
+    return extensions
+setup(
+    name=library_name,
+    version="0.0.1",
+    packages=find_packages(where="."),
+    package_dir={"": "."},
+    ext_modules=get_extensions(),
+    install_requires=[],
+    package_data={
+        library_name: [os.path.join("csrc", "*.h"), os.path.join("csrc", "*.metal")],
+    },
+    description="Small texture baker which rasterizes barycentric coordinates to a tensor.",
+    long_description=open("README.md").read(),
+    long_description_content_type="text/markdown",
+    url="https://github.com/Stability-AI/texture_baker",
+    cmdclass={"build_ext": BuildExtension},
+)

texture_baker/texture_baker/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+import torch  # noqa: F401
+from . import _C  # noqa: F401
+from .baker import TextureBaker  # noqa: F401

texture_baker/texture_baker/baker.py ADDED Viewed

	@@ -0,0 +1,86 @@

+import torch
+import torch.nn as nn
+from torch import Tensor
+class TextureBaker(nn.Module):
+    def __init__(self):
+        super().__init__()
+    def rasterize(
+        self,
+        uv: Tensor,
+        face_indices: Tensor,
+        bake_resolution: int,
+    ) -> Tensor:
+        """
+        Rasterize the UV coordinates to a barycentric coordinates
+        & Triangle idxs texture map
+        Args:
+            uv (Tensor, num_vertices 2, float): UV coordinates of the mesh
+            face_indices (Tensor, num_faces 3, int): Face indices of the mesh
+            bake_resolution (int): Resolution of the bake
+        Returns:
+            Tensor, bake_resolution bake_resolution 4, float: Rasterized map
+        """
+        return torch.ops.texture_baker_cpp.rasterize(
+            uv, face_indices.to(torch.int32), bake_resolution
+        )
+    def get_mask(self, rast: Tensor) -> Tensor:
+        """
+        Get the occupancy mask from the rasterized map
+        Args:
+            rast (Tensor, bake_resolution bake_resolution 4, float): Rasterized map
+        Returns:
+            Tensor, bake_resolution bake_resolution, bool: Mask
+        """
+        return rast[..., -1] >= 0
+    def interpolate(
+        self,
+        attr: Tensor,
+        rast: Tensor,
+        face_indices: Tensor,
+    ) -> Tensor:
+        """
+        Interpolate the attributes using the rasterized map
+        Args:
+            attr (Tensor, num_vertices 3, float): Attributes of the mesh
+            rast (Tensor, bake_resolution bake_resolution 4, float): Rasterized map
+            face_indices (Tensor, num_faces 3, int): Face indices of the mesh
+            uv (Tensor, num_vertices 2, float): UV coordinates of the mesh
+        Returns:
+            Tensor, bake_resolution bake_resolution 3, float: Interpolated attributes
+        """
+        return torch.ops.texture_baker_cpp.interpolate(
+            attr, face_indices.to(torch.int32), rast
+        )
+    def forward(
+        self,
+        attr: Tensor,
+        uv: Tensor,
+        face_indices: Tensor,
+        bake_resolution: int,
+    ) -> Tensor:
+        """
+        Bake the texture
+        Args:
+            attr (Tensor, num_vertices 3, float): Attributes of the mesh
+            uv (Tensor, num_vertices 2, float): UV coordinates of the mesh
+            face_indices (Tensor, num_faces 3, int): Face indices of the mesh
+            bake_resolution (int): Resolution of the bake
+        Returns:
+            Tensor, bake_resolution bake_resolution 3, float: Baked texture
+        """
+        rast = self.rasterize(uv, face_indices, bake_resolution)
+        return self.interpolate(attr, rast, face_indices, uv)

texture_baker/texture_baker/csrc/baker.cpp ADDED Viewed

	@@ -0,0 +1,548 @@

+#include <ATen/ATen.h>
+#include <ATen/Context.h>
+#include <chrono>
+#include <cmath>
+#include <omp.h>
+#include <torch/extension.h>
+#ifndef __ARM_ARCH_ISA_A64
+#include <immintrin.h>
+#endif
+#include "baker.h"
+// #define TIMING
+#define BINS 8
+namespace texture_baker_cpp {
+// Calculate the centroid of a triangle
+tb_float2 triangle_centroid(const tb_float2 &v0, const tb_float2 &v1,
+                            const tb_float2 &v2) {
+  return {(v0.x + v1.x + v2.x) * 0.3333f, (v0.y + v1.y + v2.y) * 0.3333f};
+}
+float BVH::find_best_split_plane(const BVHNode &node, int &best_axis,
+                                 int &best_pos, AABB &centroidBounds) {
+  float best_cost = std::numeric_limits<float>::max();
+  for (int axis = 0; axis < 2; ++axis) // We use 2 as we have only x and y
+  {
+    float boundsMin = centroidBounds.min[axis];
+    float boundsMax = centroidBounds.max[axis];
+    if (boundsMin == boundsMax) {
+      continue;
+    }
+    // Populate the bins
+    float scale = BINS / (boundsMax - boundsMin);
+    float leftCountArea[BINS - 1], rightCountArea[BINS - 1];
+    int leftSum = 0, rightSum = 0;
+#ifndef __ARM_ARCH_ISA_A64
+#ifndef _MSC_VER
+    if (__builtin_cpu_supports("sse"))
+#elif (defined(_M_AMD64) || defined(_M_X64))
+    // SSE supported on Windows
+    if constexpr (true)
+#endif
+    {
+      __m128 min4[BINS], max4[BINS];
+      unsigned int count[BINS];
+      for (unsigned int i = 0; i < BINS; i++)
+        min4[i] = _mm_set_ps1(1e30f), max4[i] = _mm_set_ps1(-1e30f),
+        count[i] = 0;
+      for (int i = node.start; i < node.end; i++) {
+        int tri_idx = triangle_indices[i];
+        const Triangle &triangle = triangles[tri_idx];
+        int binIdx = std::min(
+            BINS - 1, (int)((triangle.centroid[axis] - boundsMin) * scale));
+        count[binIdx]++;
+        __m128 v0 = _mm_set_ps(triangle.v0.x, triangle.v0.y, 0.0f, 0.0f);
+        __m128 v1 = _mm_set_ps(triangle.v1.x, triangle.v1.y, 0.0f, 0.0f);
+        __m128 v2 = _mm_set_ps(triangle.v2.x, triangle.v2.y, 0.0f, 0.0f);
+        min4[binIdx] = _mm_min_ps(min4[binIdx], v0);
+        max4[binIdx] = _mm_max_ps(max4[binIdx], v0);
+        min4[binIdx] = _mm_min_ps(min4[binIdx], v1);
+        max4[binIdx] = _mm_max_ps(max4[binIdx], v1);
+        min4[binIdx] = _mm_min_ps(min4[binIdx], v2);
+        max4[binIdx] = _mm_max_ps(max4[binIdx], v2);
+      }
+      // gather data for the 7 planes between the 8 bins
+      __m128 leftMin4 = _mm_set_ps1(1e30f), rightMin4 = leftMin4;
+      __m128 leftMax4 = _mm_set_ps1(-1e30f), rightMax4 = leftMax4;
+      for (int i = 0; i < BINS - 1; i++) {
+        leftSum += count[i];
+        rightSum += count[BINS - 1 - i];
+        leftMin4 = _mm_min_ps(leftMin4, min4[i]);
+        rightMin4 = _mm_min_ps(rightMin4, min4[BINS - 2 - i]);
+        leftMax4 = _mm_max_ps(leftMax4, max4[i]);
+        rightMax4 = _mm_max_ps(rightMax4, max4[BINS - 2 - i]);
+        float le[4], re[4];
+        _mm_store_ps(le, _mm_sub_ps(leftMax4, leftMin4));
+        _mm_store_ps(re, _mm_sub_ps(rightMax4, rightMin4));
+        // SSE order goes from back to front
+        leftCountArea[i] = leftSum * (le[2] * le[3]); // 2D area calculation
+        rightCountArea[BINS - 2 - i] =
+            rightSum * (re[2] * re[3]); // 2D area calculation
+      }
+    }
+#else
+    if constexpr (false) {
+    }
+#endif
+    else {
+      struct Bin {
+        AABB bounds;
+        int triCount = 0;
+      } bins[BINS];
+      for (int i = node.start; i < node.end; i++) {
+        int tri_idx = triangle_indices[i];
+        const Triangle &triangle = triangles[tri_idx];
+        int binIdx = std::min(
+            BINS - 1, (int)((triangle.centroid[axis] - boundsMin) * scale));
+        bins[binIdx].triCount++;
+        bins[binIdx].bounds.grow(triangle.v0);
+        bins[binIdx].bounds.grow(triangle.v1);
+        bins[binIdx].bounds.grow(triangle.v2);
+      }
+      // Gather data for the planes between the bins
+      AABB leftBox, rightBox;
+      for (int i = 0; i < BINS - 1; i++) {
+        leftSum += bins[i].triCount;
+        leftBox.grow(bins[i].bounds);
+        leftCountArea[i] = leftSum * leftBox.area();
+        rightSum += bins[BINS - 1 - i].triCount;
+        rightBox.grow(bins[BINS - 1 - i].bounds);
+        rightCountArea[BINS - 2 - i] = rightSum * rightBox.area();
+      }
+    }
+    // Calculate SAH cost for the planes
+    scale = (boundsMax - boundsMin) / BINS;
+    for (int i = 0; i < BINS - 1; i++) {
+      float planeCost = leftCountArea[i] + rightCountArea[i];
+      if (planeCost < best_cost) {
+        best_axis = axis;
+        best_pos = i + 1;
+        best_cost = planeCost;
+      }
+    }
+  }
+  return best_cost;
+}
+void BVH::update_node_bounds(BVHNode &node, AABB &centroidBounds) {
+#ifndef __ARM_ARCH_ISA_A64
+#ifndef _MSC_VER
+  if (__builtin_cpu_supports("sse"))
+#elif (defined(_M_AMD64) || defined(_M_X64))
+  // SSE supported on Windows
+  if constexpr (true)
+#endif
+  {
+    __m128 min4 = _mm_set_ps1(1e30f), max4 = _mm_set_ps1(-1e30f);
+    __m128 cmin4 = _mm_set_ps1(1e30f), cmax4 = _mm_set_ps1(-1e30f);
+    for (int i = node.start; i < node.end; i += 2) {
+      int tri_idx1 = triangle_indices[i];
+      const Triangle &leafTri1 = triangles[tri_idx1];
+      // Check if the second actually exists in the node
+      __m128 v0, v1, v2, centroid;
+      if (i + 1 < node.end) {
+        int tri_idx2 = triangle_indices[i + 1];
+        const Triangle leafTri2 = triangles[tri_idx2];
+        v0 = _mm_set_ps(leafTri1.v0.x, leafTri1.v0.y, leafTri2.v0.x,
+                        leafTri2.v0.y);
+        v1 = _mm_set_ps(leafTri1.v1.x, leafTri1.v1.y, leafTri2.v1.x,
+                        leafTri2.v1.y);
+        v2 = _mm_set_ps(leafTri1.v2.x, leafTri1.v2.y, leafTri2.v2.x,
+                        leafTri2.v2.y);
+        centroid = _mm_set_ps(leafTri1.centroid.x, leafTri1.centroid.y,
+                              leafTri2.centroid.x, leafTri2.centroid.y);
+      } else {
+        // Otherwise do some duplicated work
+        v0 = _mm_set_ps(leafTri1.v0.x, leafTri1.v0.y, leafTri1.v0.x,
+                        leafTri1.v0.y);
+        v1 = _mm_set_ps(leafTri1.v1.x, leafTri1.v1.y, leafTri1.v1.x,
+                        leafTri1.v1.y);
+        v2 = _mm_set_ps(leafTri1.v2.x, leafTri1.v2.y, leafTri1.v2.x,
+                        leafTri1.v2.y);
+        centroid = _mm_set_ps(leafTri1.centroid.x, leafTri1.centroid.y,
+                              leafTri1.centroid.x, leafTri1.centroid.y);
+      }
+      min4 = _mm_min_ps(min4, v0);
+      max4 = _mm_max_ps(max4, v0);
+      min4 = _mm_min_ps(min4, v1);
+      max4 = _mm_max_ps(max4, v1);
+      min4 = _mm_min_ps(min4, v2);
+      max4 = _mm_max_ps(max4, v2);
+      cmin4 = _mm_min_ps(cmin4, centroid);
+      cmax4 = _mm_max_ps(cmax4, centroid);
+    }
+    float min_values[4], max_values[4], cmin_values[4], cmax_values[4];
+    _mm_store_ps(min_values, min4);
+    _mm_store_ps(max_values, max4);
+    _mm_store_ps(cmin_values, cmin4);
+    _mm_store_ps(cmax_values, cmax4);
+    node.bbox.min.x = std::min(min_values[3], min_values[1]);
+    node.bbox.min.y = std::min(min_values[2], min_values[0]);
+    node.bbox.max.x = std::max(max_values[3], max_values[1]);
+    node.bbox.max.y = std::max(max_values[2], max_values[0]);
+    centroidBounds.min.x = std::min(cmin_values[3], cmin_values[1]);
+    centroidBounds.min.y = std::min(cmin_values[2], cmin_values[0]);
+    centroidBounds.max.x = std::max(cmax_values[3], cmax_values[1]);
+    centroidBounds.max.y = std::max(cmax_values[2], cmax_values[0]);
+  }
+#else
+  if constexpr (false) {
+  }
+#endif
+  {
+    node.bbox.invalidate();
+    centroidBounds.invalidate();
+    // Calculate the bounding box for the node
+    for (int i = node.start; i < node.end; ++i) {
+      int tri_idx = triangle_indices[i];
+      const Triangle &tri = triangles[tri_idx];
+      node.bbox.grow(tri.v0);
+      node.bbox.grow(tri.v1);
+      node.bbox.grow(tri.v2);
+      centroidBounds.grow(tri.centroid);
+    }
+  }
+}
+void BVH::build(const tb_float2 *vertices, const tb_int3 *indices,
+                const int64_t &num_indices) {
+#ifdef TIMING
+  auto start = std::chrono::high_resolution_clock::now();
+#endif
+  // Create triangles
+  for (size_t i = 0; i < num_indices; ++i) {
+    tb_int3 idx = indices[i];
+    triangles.push_back(
+        {vertices[idx.x], vertices[idx.y], vertices[idx.z], static_cast<int>(i),
+         triangle_centroid(vertices[idx.x], vertices[idx.y], vertices[idx.z])});
+  }
+  // Initialize triangle_indices
+  triangle_indices.resize(triangles.size());
+  std::iota(triangle_indices.begin(), triangle_indices.end(), 0);
+  // Build BVH nodes
+  // Reserve extra capacity to fix windows specific crashes
+  nodes.reserve(triangles.size() * 2 + 1);
+  nodes.push_back({}); // Create the root node
+  root = 0;
+  // Define a struct for queue entries
+  struct QueueEntry {
+    int node_idx;
+    int start;
+    int end;
+  };
+  // Queue for breadth-first traversal
+  std::queue<QueueEntry> node_queue;
+  node_queue.push({root, 0, (int)triangles.size()});
+  // Process each node in the queue
+  while (!node_queue.empty()) {
+    QueueEntry current = node_queue.front();
+    node_queue.pop();
+    int node_idx = current.node_idx;
+    int start = current.start;
+    int end = current.end;
+    BVHNode &node = nodes[node_idx];
+    node.start = start;
+    node.end = end;
+    // Calculate the bounding box for the node
+    AABB centroidBounds;
+    update_node_bounds(node, centroidBounds);
+    // Determine the best split using SAH
+    int best_axis, best_pos;
+    float splitCost =
+        find_best_split_plane(node, best_axis, best_pos, centroidBounds);
+    float nosplitCost = node.calculate_node_cost();
+    // Stop condition: if the best cost is greater than or equal to the parent's
+    // cost
+    if (splitCost >= nosplitCost) {
+      // Leaf node
+      node.left = node.right = -1;
+      continue;
+    }
+    float scale =
+        BINS / (centroidBounds.max[best_axis] - centroidBounds.min[best_axis]);
+    int i = node.start;
+    int j = node.end - 1;
+    // Sort the triangle_indices in the range [start, end) based on the best
+    // axis
+    while (i <= j) {
+      // use the exact calculation we used for binning to prevent rare
+      // inaccuracies
+      int tri_idx = triangle_indices[i];
+      tb_float2 tcentr = triangles[tri_idx].centroid;
+      int binIdx = std::min(
+          BINS - 1,
+          (int)((tcentr[best_axis] - centroidBounds.min[best_axis]) * scale));
+      if (binIdx < best_pos)
+        i++;
+      else
+        std::swap(triangle_indices[i], triangle_indices[j--]);
+    }
+    int leftCount = i - node.start;
+    if (leftCount == 0 || leftCount == node.num_triangles()) {
+      // Leaf node
+      node.left = node.right = -1;
+      continue;
+    }
+    int mid = i;
+    // Create and set left child
+    node.left = nodes.size();
+    nodes.push_back({});
+    node_queue.push({node.left, start, mid});
+    // Create and set right child
+    node = nodes[node_idx]; // Update the node - Potentially stale reference
+    node.right = nodes.size();
+    nodes.push_back({});
+    node_queue.push({node.right, mid, end});
+  }
+#ifdef TIMING
+  auto end = std::chrono::high_resolution_clock::now();
+  std::chrono::duration<double> elapsed = end - start;
+  std::cout << "BVH build time: " << elapsed.count() << "s" << std::endl;
+#endif
+}
+// Utility function to clamp a value between a minimum and a maximum
+float clamp(float val, float minVal, float maxVal) {
+  return std::min(std::max(val, minVal), maxVal);
+}
+// Function to check if a point (xy) is inside a triangle defined by vertices
+// v1, v2, v3
+bool barycentric_coordinates(tb_float2 xy, tb_float2 v1, tb_float2 v2,
+                             tb_float2 v3, float &u, float &v, float &w) {
+  // Vectors from v1 to v2, v3 and xy
+  tb_float2 v1v2 = {v2.x - v1.x, v2.y - v1.y};
+  tb_float2 v1v3 = {v3.x - v1.x, v3.y - v1.y};
+  tb_float2 xyv1 = {xy.x - v1.x, xy.y - v1.y};
+  // Dot products of the vectors
+  float d00 = v1v2.x * v1v2.x + v1v2.y * v1v2.y;
+  float d01 = v1v2.x * v1v3.x + v1v2.y * v1v3.y;
+  float d11 = v1v3.x * v1v3.x + v1v3.y * v1v3.y;
+  float d20 = xyv1.x * v1v2.x + xyv1.y * v1v2.y;
+  float d21 = xyv1.x * v1v3.x + xyv1.y * v1v3.y;
+  // Calculate the barycentric coordinates
+  float denom = d00 * d11 - d01 * d01;
+  v = (d11 * d20 - d01 * d21) / denom;
+  w = (d00 * d21 - d01 * d20) / denom;
+  u = 1.0f - v - w;
+  // Check if the point is inside the triangle
+  return (v >= 0.0f) && (w >= 0.0f) && (v + w <= 1.0f);
+}
+bool BVH::intersect(const tb_float2 &point, float &u, float &v, float &w,
+                    int &index) const {
+  const int max_stack_size = 64;
+  int node_stack[max_stack_size];
+  int stack_size = 0;
+  node_stack[stack_size++] = root;
+  while (stack_size > 0) {
+    int node_idx = node_stack[--stack_size];
+    const BVHNode &node = nodes[node_idx];
+    if (node.is_leaf()) {
+      for (int i = node.start; i < node.end; ++i) {
+        const Triangle &tri = triangles[triangle_indices[i]];
+        if (barycentric_coordinates(point, tri.v0, tri.v1, tri.v2, u, v, w)) {
+          index = tri.index;
+          return true;
+        }
+      }
+    } else {
+      if (nodes[node.right].bbox.overlaps(point)) {
+        if (stack_size < max_stack_size) {
+          node_stack[stack_size++] = node.right;
+        } else {
+          // Handle stack overflow
+          throw std::runtime_error("Node stack overflow");
+        }
+      }
+      if (nodes[node.left].bbox.overlaps(point)) {
+        if (stack_size < max_stack_size) {
+          node_stack[stack_size++] = node.left;
+        } else {
+          // Handle stack overflow
+          throw std::runtime_error("Node stack overflow");
+        }
+      }
+    }
+  }
+  return false;
+}
+torch::Tensor rasterize_cpu(torch::Tensor uv, torch::Tensor indices,
+                            int64_t bake_resolution) {
+  int width = bake_resolution;
+  int height = bake_resolution;
+  int num_pixels = width * height;
+  torch::Tensor rast_result = torch::empty(
+      {bake_resolution, bake_resolution, 4},
+      torch::TensorOptions().dtype(torch::kFloat32).device(torch::kCPU));
+  float *rast_result_ptr = rast_result.contiguous().data_ptr<float>();
+  const tb_float2 *vertices = (tb_float2 *)uv.data_ptr<float>();
+  const tb_int3 *tris = (tb_int3 *)indices.data_ptr<int>();
+  BVH bvh;
+  bvh.build(vertices, tris, indices.size(0));
+#ifdef TIMING
+  auto start = std::chrono::high_resolution_clock::now();
+#endif
+#pragma omp parallel for
+  for (int idx = 0; idx < num_pixels; ++idx) {
+    int x = idx / height;
+    int y = idx % height;
+    int idx_ = idx * 4; // Note: *4 because we're storing float4 per pixel
+    tb_float2 pixel_coord = {float(y) / height, float(x) / width};
+    pixel_coord.x = clamp(pixel_coord.x, 0.0f, 1.0f);
+    pixel_coord.y = 1.0f - clamp(pixel_coord.y, 0.0f, 1.0f);
+    float u, v, w;
+    int triangle_idx;
+    if (bvh.intersect(pixel_coord, u, v, w, triangle_idx)) {
+      rast_result_ptr[idx_ + 0] = u;
+      rast_result_ptr[idx_ + 1] = v;
+      rast_result_ptr[idx_ + 2] = w;
+      rast_result_ptr[idx_ + 3] = static_cast<float>(triangle_idx);
+    } else {
+      rast_result_ptr[idx_ + 0] = 0.0f;
+      rast_result_ptr[idx_ + 1] = 0.0f;
+      rast_result_ptr[idx_ + 2] = 0.0f;
+      rast_result_ptr[idx_ + 3] = -1.0f;
+    }
+  }
+#ifdef TIMING
+  auto end = std::chrono::high_resolution_clock::now();
+  std::chrono::duration<double> elapsed = end - start;
+  std::cout << "Rasterization time: " << elapsed.count() << "s" << std::endl;
+#endif
+  return rast_result;
+}
+torch::Tensor interpolate_cpu(torch::Tensor attr, torch::Tensor indices,
+                              torch::Tensor rast) {
+#ifdef TIMING
+  auto start = std::chrono::high_resolution_clock::now();
+#endif
+  int height = rast.size(0);
+  int width = rast.size(1);
+  torch::Tensor pos_bake = torch::empty(
+      {height, width, 3},
+      torch::TensorOptions().dtype(torch::kFloat32).device(torch::kCPU));
+  const float *attr_ptr = attr.contiguous().data_ptr<float>();
+  const int *indices_ptr = indices.contiguous().data_ptr<int>();
+  const float *rast_ptr = rast.contiguous().data_ptr<float>();
+  float *output_ptr = pos_bake.contiguous().data_ptr<float>();
+  int num_pixels = width * height;
+#pragma omp parallel for
+  for (int idx = 0; idx < num_pixels; ++idx) {
+    int idx_ = idx * 4; // Index into the float4 array (4 floats per pixel)
+    tb_float3 barycentric = {
+        rast_ptr[idx_ + 0],
+        rast_ptr[idx_ + 1],
+        rast_ptr[idx_ + 2],
+    };
+    int triangle_idx = static_cast<int>(rast_ptr[idx_ + 3]);
+    if (triangle_idx < 0) {
+      output_ptr[idx * 3 + 0] = 0.0f;
+      output_ptr[idx * 3 + 1] = 0.0f;
+      output_ptr[idx * 3 + 2] = 0.0f;
+      continue;
+    }
+    tb_int3 triangle = {indices_ptr[3 * triangle_idx + 0],
+                        indices_ptr[3 * triangle_idx + 1],
+                        indices_ptr[3 * triangle_idx + 2]};
+    tb_float3 v1 = {attr_ptr[3 * triangle.x + 0], attr_ptr[3 * triangle.x + 1],
+                    attr_ptr[3 * triangle.x + 2]};
+    tb_float3 v2 = {attr_ptr[3 * triangle.y + 0], attr_ptr[3 * triangle.y + 1],
+                    attr_ptr[3 * triangle.y + 2]};
+    tb_float3 v3 = {attr_ptr[3 * triangle.z + 0], attr_ptr[3 * triangle.z + 1],
+                    attr_ptr[3 * triangle.z + 2]};
+    tb_float3 interpolated;
+    interpolated.x =
+        v1.x * barycentric.x + v2.x * barycentric.y + v3.x * barycentric.z;
+    interpolated.y =
+        v1.y * barycentric.x + v2.y * barycentric.y + v3.y * barycentric.z;
+    interpolated.z =
+        v1.z * barycentric.x + v2.z * barycentric.y + v3.z * barycentric.z;
+    output_ptr[idx * 3 + 0] = interpolated.x;
+    output_ptr[idx * 3 + 1] = interpolated.y;
+    output_ptr[idx * 3 + 2] = interpolated.z;
+  }
+#ifdef TIMING
+  auto end = std::chrono::high_resolution_clock::now();
+  std::chrono::duration<double> elapsed = end - start;
+  std::cout << "Interpolation time: " << elapsed.count() << "s" << std::endl;
+#endif
+  return pos_bake;
+}
+// Registers _C as a Python extension module.
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {}
+// Defines the operators
+TORCH_LIBRARY(texture_baker_cpp, m) {
+  m.def("rasterize(Tensor uv, Tensor indices, int bake_resolution) -> Tensor");
+  m.def("interpolate(Tensor attr, Tensor indices, Tensor rast) -> Tensor");
+}
+// Registers CPP implementations
+TORCH_LIBRARY_IMPL(texture_baker_cpp, CPU, m) {
+  m.impl("rasterize", &rasterize_cpu);
+  m.impl("interpolate", &interpolate_cpu);
+}
+} // namespace texture_baker_cpp

texture_baker/texture_baker/csrc/baker.h ADDED Viewed

	@@ -0,0 +1,203 @@

+#pragma once
+#if defined(__NVCC__) || defined(__HIPCC__) || defined(__METAL__)
+#define CUDA_ENABLED
+#ifndef __METAL__
+#define CUDA_HOST_DEVICE __host__ __device__
+#define CUDA_DEVICE __device__
+#define METAL_CONSTANT_MEM
+#define METAL_THREAD_MEM
+#else
+#define tb_float2 float2
+#define CUDA_HOST_DEVICE
+#define CUDA_DEVICE
+#define METAL_CONSTANT_MEM constant
+#define METAL_THREAD_MEM thread
+#endif
+#else
+#define CUDA_HOST_DEVICE
+#define CUDA_DEVICE
+#define METAL_CONSTANT_MEM
+#define METAL_THREAD_MEM
+#include <cfloat>
+#include <limits>
+#include <vector>
+#endif
+namespace texture_baker_cpp {
+// Structure to represent a 2D point or vector
+#ifndef __METAL__
+union alignas(8) tb_float2 {
+  struct {
+    float x, y;
+  };
+  float data[2];
+  float &operator[](size_t idx) {
+    if (idx > 1)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  const float &operator[](size_t idx) const {
+    if (idx > 1)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  bool operator==(const tb_float2 &rhs) const {
+    return x == rhs.x && y == rhs.y;
+  }
+};
+union alignas(4) tb_float3 {
+  struct {
+    float x, y, z;
+  };
+  float data[3];
+  float &operator[](size_t idx) {
+    if (idx > 2)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  const float &operator[](size_t idx) const {
+    if (idx > 2)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+};
+union alignas(16) tb_float4 {
+  struct {
+    float x, y, z, w;
+  };
+  float data[4];
+  float &operator[](size_t idx) {
+    if (idx > 3)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  const float &operator[](size_t idx) const {
+    if (idx > 3)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+};
+#endif
+union alignas(4) tb_int3 {
+  struct {
+    int x, y, z;
+  };
+  int data[3];
+#ifndef __METAL__
+  int &operator[](size_t idx) {
+    if (idx > 2)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+#endif
+};
+// BVH structure to accelerate point-triangle intersection
+struct alignas(16) AABB {
+  // Init bounding boxes with max/min
+  tb_float2 min = {FLT_MAX, FLT_MAX};
+  tb_float2 max = {FLT_MIN, FLT_MIN};
+#ifndef CUDA_ENABLED
+  // grow the AABB to include a point
+  void grow(const tb_float2 &p) {
+    min.x = std::min(min.x, p.x);
+    min.y = std::min(min.y, p.y);
+    max.x = std::max(max.x, p.x);
+    max.y = std::max(max.y, p.y);
+  }
+  void grow(const AABB &b) {
+    if (b.min.x != FLT_MAX) {
+      grow(b.min);
+      grow(b.max);
+    }
+  }
+#endif
+  // Check if two AABBs overlap
+  bool overlaps(const METAL_THREAD_MEM AABB &other) const {
+    return min.x <= other.max.x && max.x >= other.min.x &&
+           min.y <= other.max.y && max.y >= other.min.y;
+  }
+  bool overlaps(const METAL_THREAD_MEM tb_float2 &point) const {
+    return point.x >= min.x && point.x <= max.x && point.y >= min.y &&
+           point.y <= max.y;
+  }
+#if defined(__NVCC__) || defined(__HIPCC__)
+  CUDA_DEVICE bool overlaps(const float2 &point) const {
+    return point.x >= min.x && point.x <= max.x && point.y >= min.y &&
+           point.y <= max.y;
+  }
+#endif
+  // Initialize AABB to an invalid state
+  void invalidate() {
+    min = {FLT_MAX, FLT_MAX};
+    max = {FLT_MIN, FLT_MIN};
+  }
+  // Calculate the area of the AABB
+  float area() const {
+    tb_float2 extent = {max.x - min.x, max.y - min.y};
+    return extent.x * extent.y;
+  }
+};
+struct BVHNode {
+  AABB bbox;
+  int start, end;
+  int left, right;
+  int num_triangles() const { return end - start; }
+  CUDA_HOST_DEVICE bool is_leaf() const { return left == -1 && right == -1; }
+  float calculate_node_cost() {
+    float area = bbox.area();
+    return num_triangles() * area;
+  }
+};
+struct Triangle {
+  tb_float2 v0, v1, v2;
+  int index;
+  tb_float2 centroid;
+};
+#ifndef __METAL__
+struct BVH {
+  std::vector<BVHNode> nodes;
+  std::vector<Triangle> triangles;
+  std::vector<int> triangle_indices;
+  int root;
+  void build(const tb_float2 *vertices, const tb_int3 *indices,
+             const int64_t &num_indices);
+  bool intersect(const tb_float2 &point, float &u, float &v, float &w,
+                 int &index) const;
+  void update_node_bounds(BVHNode &node, AABB &centroidBounds);
+  float find_best_split_plane(const BVHNode &node, int &best_axis,
+                              int &best_pos, AABB &centroidBounds);
+};
+#endif
+} // namespace texture_baker_cpp

texture_baker/texture_baker/csrc/baker_kernel.cu ADDED Viewed

	@@ -0,0 +1,306 @@

+#include <ATen/ATen.h>
+#include <ATen/Context.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <torch/extension.h>
+#include "baker.h"
+// #define TIMING
+#define STRINGIFY(x) #x
+#define STR(x) STRINGIFY(x)
+#define FILE_LINE __FILE__ ":" STR(__LINE__)
+#define CUDA_CHECK_THROW(x) \
+	do { \
+		cudaError_t _result = x; \
+		if (_result != cudaSuccess) \
+			throw std::runtime_error(std::string(FILE_LINE " check failed " #x " failed: ") + cudaGetErrorString(_result)); \
+	} while(0)
+#if defined(__HIPCC__)
+#define cudaMallocAsync hipMallocAsync
+#define cudaFreeAsync hipFreeAsync
+#endif
+namespace texture_baker_cpp
+{
+    __device__ float3 operator+(const float3 &a, const float3 &b)
+    {
+        return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);
+    }
+    // xy: 2D test position
+    // v1: vertex position 1
+    // v2: vertex position 2
+    // v3: vertex position 3
+    //
+    __forceinline__ __device__ bool barycentric_coordinates(const float2 &xy, const tb_float2 &v1, const tb_float2 &v2, const tb_float2 &v3, float &u, float &v, float &w)
+    {
+        // Return true if the point (xy) is inside the triangle defined by the vertices v1, v2, v3.
+        // If the point is inside the triangle, the barycentric coordinates are stored in u, v, and w.
+        float2 v1v2 = make_float2(v2.x - v1.x, v2.y - v1.y);
+        float2 v1v3 = make_float2(v3.x - v1.x, v3.y - v1.y);
+        float2 xyv1 = make_float2(xy.x - v1.x, xy.y - v1.y);
+        float d00 = v1v2.x * v1v2.x + v1v2.y * v1v2.y;
+        float d01 = v1v2.x * v1v3.x + v1v2.y * v1v3.y;
+        float d11 = v1v3.x * v1v3.x + v1v3.y * v1v3.y;
+        float d20 = xyv1.x * v1v2.x + xyv1.y * v1v2.y;
+        float d21 = xyv1.x * v1v3.x + xyv1.y * v1v3.y;
+        float denom = d00 * d11 - d01 * d01;
+        v = (d11 * d20 - d01 * d21) / denom;
+        w = (d00 * d21 - d01 * d20) / denom;
+        u = 1.0f - v - w;
+        return (v >= 0.0f) && (w >= 0.0f) && (v + w <= 1.0f);
+    }
+    __global__ void kernel_interpolate(const float3* __restrict__ attr, const int3* __restrict__ indices, const float4* __restrict__ rast, float3* __restrict__ output, int width, int height)
+    {
+        // Interpolate the attr into output based on the rast result (barycentric coordinates, + triangle idx)
+        //int idx = x * width + y;
+        int idx = blockIdx.x * blockDim.x + threadIdx.x;
+        int x = idx / width;
+        int y = idx % width;
+        if (x >= width || y >= height)
+            return;
+        float4 barycentric = rast[idx];
+        int triangle_idx = int(barycentric.w);
+        if (triangle_idx < 0)
+        {
+            output[idx] = make_float3(0.0f, 0.0f, 0.0f);
+            return;
+        }
+        float3 v1 = attr[indices[triangle_idx].x];
+        float3 v2 = attr[indices[triangle_idx].y];
+        float3 v3 = attr[indices[triangle_idx].z];
+        output[idx] = make_float3(v1.x * barycentric.x, v1.y * barycentric.x, v1.z * barycentric.x)
+        + make_float3(v2.x * barycentric.y, v2.y * barycentric.y, v2.z * barycentric.y)
+        + make_float3(v3.x * barycentric.z, v3.y * barycentric.z, v3.z * barycentric.z);
+    }
+    __device__ bool bvh_intersect(
+        const BVHNode* __restrict__ nodes,
+        const Triangle* __restrict__ triangles,
+        const int* __restrict__ triangle_indices,
+        const int root,
+        const float2 &point,
+        float &u, float &v, float &w,
+        int &index)
+    {
+        constexpr int max_stack_size = 64;
+        int node_stack[max_stack_size];
+        int stack_size = 0;
+        node_stack[stack_size++] = root;
+        while (stack_size > 0)
+        {
+            int node_idx = node_stack[--stack_size];
+            const BVHNode &node = nodes[node_idx];
+            if (node.is_leaf())
+            {
+                for (int i = node.start; i < node.end; ++i)
+                {
+                    const Triangle &tri = triangles[triangle_indices[i]];
+                    if (barycentric_coordinates(point, tri.v0, tri.v1, tri.v2, u, v, w))
+                    {
+                        index = tri.index;
+                        return true;
+                    }
+                }
+            }
+            else
+            {
+                if (nodes[node.right].bbox.overlaps(point))
+                {
+                    if (stack_size < max_stack_size)
+                    {
+                        node_stack[stack_size++] = node.right;
+                    }
+                    else
+                    {
+                        // Handle stack overflow
+                        // Make sure NDEBUG is not defined (see setup.py)
+                        assert(0 && "Node stack overflow");
+                    }
+                }
+                if (nodes[node.left].bbox.overlaps(point))
+                {
+                    if (stack_size < max_stack_size)
+                    {
+                        node_stack[stack_size++] = node.left;
+                    }
+                    else
+                    {
+                        // Handle stack overflow
+                        // Make sure NDEBUG is not defined (see setup.py)
+                        assert(0 && "Node stack overflow");
+                    }
+                }
+            }
+        }
+        return false;
+    }
+    __global__ void kernel_bake_uv(
+        float2* __restrict__ uv,
+        int3* __restrict__ indices,
+        float4* __restrict__ output,
+        const BVHNode* __restrict__ nodes,
+        const Triangle* __restrict__ triangles,
+        const int* __restrict__ triangle_indices,
+        const int root,
+        const int width,
+        const int height,
+        const int num_indices)
+    {
+        //int idx = x * width + y;
+        int idx = blockIdx.x * blockDim.x + threadIdx.x;
+        int x = idx / width;
+        int y = idx % width;
+        if (y >= width || x >= height)
+            return;
+        // We index x,y but the original coords are HW. So swap them
+        float2 pixel_coord = make_float2(float(y) / height, float(x) / width);
+        pixel_coord.x = fminf(fmaxf(pixel_coord.x, 0.0f), 1.0f);
+        pixel_coord.y = 1.0f - fminf(fmaxf(pixel_coord.y, 0.0f), 1.0f);
+        float u, v, w;
+        int triangle_idx;
+        bool hit = bvh_intersect(nodes, triangles, triangle_indices, root, pixel_coord, u, v, w, triangle_idx);
+        if (hit)
+        {
+            output[idx] = make_float4(u, v, w, float(triangle_idx));
+            return;
+        }
+        output[idx] = make_float4(0.0f, 0.0f, 0.0f, -1.0f);
+    }
+    torch::Tensor rasterize_gpu(
+        torch::Tensor uv,
+        torch::Tensor indices,
+        int64_t bake_resolution)
+    {
+#ifdef TIMING
+        auto start = std::chrono::high_resolution_clock::now();
+#endif
+        constexpr int block_size = 16 * 16;
+        int grid_size = bake_resolution * bake_resolution / block_size;
+        dim3 block_dims(block_size, 1, 1);
+        dim3 grid_dims(grid_size, 1, 1);
+        int num_indices = indices.size(0);
+        int width = bake_resolution;
+        int height = bake_resolution;
+        // Step 1: create an empty tensor to store the output.
+        torch::Tensor rast_result = torch::empty({bake_resolution, bake_resolution, 4}, torch::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA));
+        auto vertices_cpu = uv.contiguous().cpu();
+        auto indices_cpu = indices.contiguous().cpu();
+        const tb_float2 *vertices_cpu_ptr = (tb_float2*)vertices_cpu.contiguous().data_ptr<float>();
+        const tb_int3 *tris_cpu_ptr = (tb_int3*)indices_cpu.contiguous().data_ptr<int>();
+        BVH bvh;
+        bvh.build(vertices_cpu_ptr, tris_cpu_ptr, indices.size(0));
+        BVHNode *nodes_gpu = nullptr;
+        Triangle *triangles_gpu = nullptr;
+        int *triangle_indices_gpu = nullptr;
+        const int bvh_root = bvh.root;
+        cudaStream_t cuda_stream = at::cuda::getCurrentCUDAStream();
+        CUDA_CHECK_THROW(cudaMallocAsync(&nodes_gpu, sizeof(BVHNode) * bvh.nodes.size(), cuda_stream));
+        CUDA_CHECK_THROW(cudaMallocAsync(&triangles_gpu, sizeof(Triangle) * bvh.triangles.size(), cuda_stream));
+        CUDA_CHECK_THROW(cudaMallocAsync(&triangle_indices_gpu, sizeof(int) * bvh.triangle_indices.size(), cuda_stream));
+        CUDA_CHECK_THROW(cudaMemcpyAsync(nodes_gpu, bvh.nodes.data(), sizeof(BVHNode) * bvh.nodes.size(), cudaMemcpyHostToDevice, cuda_stream));
+        CUDA_CHECK_THROW(cudaMemcpyAsync(triangles_gpu, bvh.triangles.data(), sizeof(Triangle) * bvh.triangles.size(), cudaMemcpyHostToDevice, cuda_stream));
+        CUDA_CHECK_THROW(cudaMemcpyAsync(triangle_indices_gpu, bvh.triangle_indices.data(), sizeof(int) * bvh.triangle_indices.size(), cudaMemcpyHostToDevice, cuda_stream));
+        kernel_bake_uv<<<grid_dims, block_dims, 0, cuda_stream>>>(
+            (float2 *)uv.contiguous().data_ptr<float>(),
+            (int3 *)indices.contiguous().data_ptr<int>(),
+            (float4 *)rast_result.contiguous().data_ptr<float>(),
+            nodes_gpu,
+            triangles_gpu,
+            triangle_indices_gpu,
+            bvh_root,
+            width,
+            height,
+            num_indices);
+        CUDA_CHECK_THROW(cudaFreeAsync(nodes_gpu, cuda_stream));
+        CUDA_CHECK_THROW(cudaFreeAsync(triangles_gpu, cuda_stream));
+        CUDA_CHECK_THROW(cudaFreeAsync(triangle_indices_gpu, cuda_stream));
+#ifdef TIMING
+        CUDA_CHECK_THROW(cudaStreamSynchronize(cuda_stream));
+        auto end = std::chrono::high_resolution_clock::now();
+        std::chrono::duration<double> elapsed = end - start;
+        std::cout << "Rasterization time (CUDA): " << elapsed.count() << "s" << std::endl;
+#endif
+        return rast_result;
+    }
+    torch::Tensor interpolate_gpu(
+        torch::Tensor attr,
+        torch::Tensor indices,
+        torch::Tensor rast)
+    {
+#ifdef TIMING
+        auto start = std::chrono::high_resolution_clock::now();
+#endif
+        constexpr int block_size = 16 * 16;
+        int grid_size = rast.size(0) * rast.size(0) / block_size;
+        dim3 block_dims(block_size, 1, 1);
+        dim3 grid_dims(grid_size, 1, 1);
+        // Step 1: create an empty tensor to store the output.
+        torch::Tensor pos_bake = torch::empty({rast.size(0), rast.size(1), 3}, torch::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA));
+        int width = rast.size(0);
+        int height = rast.size(1);
+        cudaStream_t cuda_stream = at::cuda::getCurrentCUDAStream();
+        kernel_interpolate<<<grid_dims, block_dims, 0, cuda_stream>>>(
+            (float3 *)attr.contiguous().data_ptr<float>(),
+            (int3 *)indices.contiguous().data_ptr<int>(),
+            (float4 *)rast.contiguous().data_ptr<float>(),
+            (float3 *)pos_bake.contiguous().data_ptr<float>(),
+            width,
+            height);
+#ifdef TIMING
+        CUDA_CHECK_THROW(cudaStreamSynchronize(cuda_stream));
+        auto end = std::chrono::high_resolution_clock::now();
+        std::chrono::duration<double> elapsed = end - start;
+        std::cout << "Interpolation time (CUDA): " << elapsed.count() << "s" << std::endl;
+#endif
+        return pos_bake;
+    }
+    // Registers CUDA implementations
+    TORCH_LIBRARY_IMPL(texture_baker_cpp, CUDA, m)
+    {
+        m.impl("rasterize", &rasterize_gpu);
+        m.impl("interpolate", &interpolate_gpu);
+    }
+}

texture_baker/texture_baker/csrc/baker_kernel.metal ADDED Viewed

	@@ -0,0 +1,170 @@

+#include <metal_stdlib>
+using namespace metal;
+// This header is inlined manually
+//#include "baker.h"
+// Use the texture_baker_cpp so it can use the classes from baker.h
+using namespace texture_baker_cpp;
+// Utility function to compute barycentric coordinates
+bool barycentric_coordinates(float2 xy, float2 v1, float2 v2, float2 v3, thread float &u, thread float &v, thread float &w) {
+    float2 v1v2 = v2 - v1;
+    float2 v1v3 = v3 - v1;
+    float2 xyv1 = xy - v1;
+    float d00 = dot(v1v2, v1v2);
+    float d01 = dot(v1v2, v1v3);
+    float d11 = dot(v1v3, v1v3);
+    float d20 = dot(xyv1, v1v2);
+    float d21 = dot(xyv1, v1v3);
+    float denom = d00 * d11 - d01 * d01;
+    v = (d11 * d20 - d01 * d21) / denom;
+    w = (d00 * d21 - d01 * d20) / denom;
+    u = 1.0f - v - w;
+    return (v >= 0.0f) && (w >= 0.0f) && (v + w <= 1.0f);
+}
+// Kernel function for interpolation
+kernel void kernel_interpolate(constant packed_float3 *attr [[buffer(0)]],
+                            constant packed_int3 *indices [[buffer(1)]],
+                            constant packed_float4 *rast [[buffer(2)]],
+                            device packed_float3 *output [[buffer(3)]],
+                            constant int &width [[buffer(4)]],
+                            constant int &height [[buffer(5)]],
+                            uint3 blockIdx [[threadgroup_position_in_grid]],
+                            uint3 threadIdx [[thread_position_in_threadgroup]],
+                            uint3 blockDim [[threads_per_threadgroup]])
+{
+    // Calculate global position using threadgroup and thread positions
+    int x = blockIdx.x * blockDim.x + threadIdx.x;
+    int y = blockIdx.y * blockDim.y + threadIdx.y;
+    if (x >= width || y >= height) return;
+    int idx = y * width + x;
+    float4 barycentric = rast[idx];
+    int triangle_idx = int(barycentric.w);
+    if (triangle_idx < 0) {
+        output[idx] = float3(0.0f, 0.0f, 0.0f);
+        return;
+    }
+    float3 v1 = attr[indices[triangle_idx].x];
+    float3 v2 = attr[indices[triangle_idx].y];
+    float3 v3 = attr[indices[triangle_idx].z];
+    output[idx] = v1 * barycentric.x + v2 * barycentric.y + v3 * barycentric.z;
+}
+bool bvh_intersect(
+    constant BVHNode* nodes,
+    constant Triangle* triangles,
+    constant int* triangle_indices,
+    const thread int root,
+    const thread float2 &point,
+    thread float &u, thread float &v, thread float &w,
+    thread int &index)
+{
+    const int max_stack_size = 64;
+    thread int node_stack[max_stack_size];
+    int stack_size = 0;
+    node_stack[stack_size++] = root;
+    while (stack_size > 0)
+    {
+        int node_idx = node_stack[--stack_size];
+        BVHNode node = nodes[node_idx];
+        if (node.is_leaf())
+        {
+            for (int i = node.start; i < node.end; ++i)
+            {
+                constant Triangle &tri = triangles[triangle_indices[i]];
+                if (barycentric_coordinates(point, tri.v0, tri.v1, tri.v2, u, v, w))
+                {
+                    index = tri.index;
+                    return true;
+                }
+            }
+        }
+        else
+        {
+            BVHNode test_node = nodes[node.right];
+            if (test_node.bbox.overlaps(point))
+            {
+                if (stack_size < max_stack_size)
+                {
+                    node_stack[stack_size++] = node.right;
+                }
+                else
+                {
+                    // Handle stack overflow
+                    // Sadly, metal doesn't support asserts (but you could try enabling metal validation layers)
+                    return false;
+                }
+            }
+            test_node = nodes[node.left];
+            if (test_node.bbox.overlaps(point))
+            {
+                if (stack_size < max_stack_size)
+                {
+                    node_stack[stack_size++] = node.left;
+                }
+                else
+                {
+                    // Handle stack overflow
+                    return false;
+                }
+            }
+        }
+    }
+    return false;
+}
+// Kernel function for baking UV
+kernel void kernel_bake_uv(constant packed_float2 *uv [[buffer(0)]],
+                        constant packed_int3 *indices [[buffer(1)]],
+                        device packed_float4 *output [[buffer(2)]],
+                        constant BVHNode *nodes [[buffer(3)]],
+                        constant Triangle *triangles [[buffer(4)]],
+                        constant int *triangle_indices [[buffer(5)]],
+                        constant int &root [[buffer(6)]],
+                        constant int &width [[buffer(7)]],
+                        constant int &height [[buffer(8)]],
+                        constant int &num_indices [[buffer(9)]],
+                        uint3 blockIdx [[threadgroup_position_in_grid]],
+                        uint3 threadIdx [[thread_position_in_threadgroup]],
+                        uint3 blockDim [[threads_per_threadgroup]])
+{
+    // Calculate global position using threadgroup and thread positions
+    int x = blockIdx.x * blockDim.x + threadIdx.x;
+    int y = blockIdx.y * blockDim.y + threadIdx.y;
+    if (x >= width || y >= height) return;
+    int idx = x * width + y;
+    // Swap original coordinates
+    float2 pixel_coord = float2(float(y) / float(height), float(x) / float(width));
+    pixel_coord = clamp(pixel_coord, 0.0f, 1.0f);
+    pixel_coord.y = 1.0f - pixel_coord.y;
+    float u, v, w;
+    int triangle_idx;
+    bool hit = bvh_intersect(nodes, triangles, triangle_indices, root, pixel_coord, u, v, w, triangle_idx);
+    if (hit) {
+        output[idx] = float4(u, v, w, float(triangle_idx));
+        return;
+    }
+    output[idx] = float4(0.0f, 0.0f, 0.0f, -1.0f);
+}

texture_baker/texture_baker/csrc/baker_kernel.mm ADDED Viewed

	@@ -0,0 +1,260 @@

+#include <torch/extension.h>
+#include <ATen/ATen.h>
+#include <ATen/Context.h>
+#include "baker.h"
+#import <Foundation/Foundation.h>
+#import <Metal/Metal.h>
+#include <filesystem>
+// Helper function to retrieve the `MTLBuffer` from a `torch::Tensor`.
+static inline id<MTLBuffer> getMTLBufferStorage(const torch::Tensor& tensor) {
+  return __builtin_bit_cast(id<MTLBuffer>, tensor.storage().data());
+}
+// Helper function to create a compute pipeline state object (PSO).
+static inline id<MTLComputePipelineState> createComputePipelineState(id<MTLDevice> device, NSString* fullSource, std::string kernel_name) {
+    NSError *error = nil;
+    // Load the custom kernel shader.
+    MTLCompileOptions *options = [[MTLCompileOptions alloc] init];
+    // Add the preprocessor macro "__METAL__"
+    options.preprocessorMacros = @{@"__METAL__": @""};
+    id<MTLLibrary> customKernelLibrary = [device newLibraryWithSource: fullSource options:options error:&error];
+    TORCH_CHECK(customKernelLibrary, "Failed to create custom kernel library, error: ", error.localizedDescription.UTF8String);
+    id<MTLFunction> customKernelFunction = [customKernelLibrary newFunctionWithName:[NSString stringWithUTF8String:kernel_name.c_str()]];
+    TORCH_CHECK(customKernelFunction, "Failed to create function state object for ", kernel_name.c_str());
+    id<MTLComputePipelineState> pso = [device newComputePipelineStateWithFunction:customKernelFunction error:&error];
+    TORCH_CHECK(pso, error.localizedDescription.UTF8String);
+    return pso;
+}
+std::filesystem::path get_extension_path() {
+    // Ensure the GIL is held before calling any Python C API function
+    PyGILState_STATE gstate = PyGILState_Ensure();
+    const char* module_name = "texture_baker";
+    // Import the module by name
+    PyObject* module = PyImport_ImportModule(module_name);
+    if (!module) {
+        PyGILState_Release(gstate);
+        throw std::runtime_error("Could not import the module: " + std::string(module_name));
+    }
+    // Get the filename of the module
+    PyObject* filename_obj = PyModule_GetFilenameObject(module);
+    if (filename_obj) {
+        std::string path = PyUnicode_AsUTF8(filename_obj);
+        Py_DECREF(filename_obj);
+        PyGILState_Release(gstate);
+        // Get the directory part of the path (removing the __init__.py)
+        std::filesystem::path module_path = std::filesystem::path(path).parent_path();
+        // Append the 'csrc' directory to the path
+        module_path /= "csrc";
+        return module_path;
+    } else {
+        PyGILState_Release(gstate);
+        throw std::runtime_error("Could not retrieve the module filename.");
+    }
+}
+NSString *get_shader_sources_as_string()
+{
+    const std::filesystem::path csrc_path = get_extension_path();
+    const std::string shader_path = (csrc_path / "baker_kernel.metal").string();
+    const std::string shader_header_path = (csrc_path / "baker.h").string();
+    // Load the Metal shader from the specified path
+    NSError *error = nil;
+    NSString* shaderHeaderSource = [
+        NSString stringWithContentsOfFile:[NSString stringWithUTF8String:shader_header_path.c_str()]
+        encoding:NSUTF8StringEncoding
+        error:&error];
+    if (error) {
+        throw std::runtime_error("Failed to load baker.h: " + std::string(error.localizedDescription.UTF8String));
+    }
+    NSString* shaderSource = [
+        NSString stringWithContentsOfFile:[NSString stringWithUTF8String:shader_path.c_str()]
+        encoding:NSUTF8StringEncoding
+        error:&error];
+    if (error) {
+        throw std::runtime_error("Failed to load Metal shader: " + std::string(error.localizedDescription.UTF8String));
+    }
+    NSString *fullSource = [shaderHeaderSource stringByAppendingString:shaderSource];
+    return fullSource;
+}
+namespace texture_baker_cpp
+{
+    torch::Tensor rasterize_gpu(
+        torch::Tensor uv,
+        torch::Tensor indices,
+        int64_t bake_resolution)
+    {
+        TORCH_CHECK(uv.device().is_mps(), "uv must be a MPS tensor");
+        TORCH_CHECK(uv.is_contiguous(), "uv must be contiguous");
+        TORCH_CHECK(indices.is_contiguous(), "indices must be contiguous");
+        TORCH_CHECK(uv.scalar_type() == torch::kFloat32, "Unsupported data type: ", indices.scalar_type());
+        TORCH_CHECK(indices.scalar_type() == torch::kInt32, "Unsupported data type: ", indices.scalar_type());
+        torch::Tensor rast_result = torch::empty({bake_resolution, bake_resolution, 4}, torch::TensorOptions().dtype(torch::kFloat32).device(torch::kMPS)).contiguous();
+        @autoreleasepool {
+            auto vertices_cpu = uv.contiguous().cpu();
+            auto indices_cpu = indices.contiguous().cpu();
+            const tb_float2 *vertices_cpu_ptr = (tb_float2*)vertices_cpu.contiguous().data_ptr<float>();
+            const tb_int3 *tris_cpu_ptr = (tb_int3*)indices_cpu.contiguous().data_ptr<int>();
+            BVH bvh;
+            bvh.build(vertices_cpu_ptr, tris_cpu_ptr, indices.size(0));
+            id<MTLDevice> device = MTLCreateSystemDefaultDevice();
+            NSString *fullSource = get_shader_sources_as_string();
+            // Create a compute pipeline state object using the helper function
+            id<MTLComputePipelineState> bake_uv_PSO = createComputePipelineState(device, fullSource, "kernel_bake_uv");
+            // Get a reference to the command buffer for the MPS stream.
+            id<MTLCommandBuffer> commandBuffer = torch::mps::get_command_buffer();
+            TORCH_CHECK(commandBuffer, "Failed to retrieve command buffer reference");
+            // Get a reference to the dispatch queue for the MPS stream, which encodes the synchronization with the CPU.
+            dispatch_queue_t serialQueue = torch::mps::get_dispatch_queue();
+            dispatch_sync(serialQueue, ^(){
+                // Start a compute pass.
+                id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
+                TORCH_CHECK(computeEncoder, "Failed to create compute command encoder");
+                // Get Metal buffers directly from PyTorch tensors
+                auto uv_buf = getMTLBufferStorage(uv.contiguous());
+                auto indices_buf = getMTLBufferStorage(indices.contiguous());
+                auto rast_result_buf = getMTLBufferStorage(rast_result);
+                const int width = bake_resolution;
+                const int height = bake_resolution;
+                const int num_indices = indices.size(0);
+                const int bvh_root = bvh.root;
+                // Wrap the existing CPU memory in Metal buffers with shared memory
+                id<MTLBuffer> nodesBuffer = [device newBufferWithBytesNoCopy:(void*)bvh.nodes.data() length:sizeof(BVHNode) * bvh.nodes.size() options:MTLResourceStorageModeShared deallocator:nil];
+                id<MTLBuffer> trianglesBuffer = [device newBufferWithBytesNoCopy:(void*)bvh.triangles.data() length:sizeof(Triangle) * bvh.triangles.size() options:MTLResourceStorageModeShared deallocator:nil];
+                id<MTLBuffer> triangleIndicesBuffer = [device newBufferWithBytesNoCopy:(void*)bvh.triangle_indices.data() length:sizeof(int) * bvh.triangle_indices.size() options:MTLResourceStorageModeShared deallocator:nil];
+                [computeEncoder setComputePipelineState:bake_uv_PSO];
+                [computeEncoder setBuffer:uv_buf offset:uv.storage_offset() * uv.element_size() atIndex:0];
+                [computeEncoder setBuffer:indices_buf offset:indices.storage_offset() * indices.element_size() atIndex:1];
+                [computeEncoder setBuffer:rast_result_buf offset:rast_result.storage_offset() * rast_result.element_size() atIndex:2];
+                [computeEncoder setBuffer:nodesBuffer offset:0 atIndex:3];
+                [computeEncoder setBuffer:trianglesBuffer offset:0 atIndex:4];
+                [computeEncoder setBuffer:triangleIndicesBuffer offset:0 atIndex:5];
+                [computeEncoder setBytes:&bvh_root length:sizeof(int) atIndex:6];
+                [computeEncoder setBytes:&width length:sizeof(int) atIndex:7];
+                [computeEncoder setBytes:&height length:sizeof(int) atIndex:8];
+                [computeEncoder setBytes:&num_indices length:sizeof(int) atIndex:9];
+                // Calculate a thread group size.
+                int block_size = 16;
+                MTLSize threadgroupSize = MTLSizeMake(block_size, block_size, 1);  // Fixed threadgroup size
+                MTLSize numThreadgroups = MTLSizeMake(bake_resolution / block_size, bake_resolution / block_size, 1);
+                // Encode the compute command.
+                [computeEncoder dispatchThreadgroups:numThreadgroups threadsPerThreadgroup:threadgroupSize];
+                [computeEncoder endEncoding];
+                // Commit the work.
+                torch::mps::commit();
+            });
+        }
+        return rast_result;
+    }
+    torch::Tensor interpolate_gpu(
+        torch::Tensor attr,
+        torch::Tensor indices,
+        torch::Tensor rast)
+    {
+        TORCH_CHECK(attr.is_contiguous(), "attr must be contiguous");
+        TORCH_CHECK(indices.is_contiguous(), "indices must be contiguous");
+        TORCH_CHECK(rast.is_contiguous(), "rast must be contiguous");
+        torch::Tensor pos_bake = torch::empty({rast.size(0), rast.size(1), 3}, torch::TensorOptions().dtype(torch::kFloat32).device(torch::kMPS)).contiguous();
+        std::filesystem::path csrc_path = get_extension_path();
+        @autoreleasepool {
+            id<MTLDevice> device = MTLCreateSystemDefaultDevice();
+            NSString *fullSource = get_shader_sources_as_string();
+            // Create a compute pipeline state object using the helper function
+            id<MTLComputePipelineState> interpolate_PSO = createComputePipelineState(device, fullSource, "kernel_interpolate");
+            // Get a reference to the command buffer for the MPS stream.
+            id<MTLCommandBuffer> commandBuffer = torch::mps::get_command_buffer();
+            TORCH_CHECK(commandBuffer, "Failed to retrieve command buffer reference");
+            // Get a reference to the dispatch queue for the MPS stream, which encodes the synchronization with the CPU.
+            dispatch_queue_t serialQueue = torch::mps::get_dispatch_queue();
+            dispatch_sync(serialQueue, ^(){
+                // Start a compute pass.
+                id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
+                TORCH_CHECK(computeEncoder, "Failed to create compute command encoder");
+                // Get Metal buffers directly from PyTorch tensors
+                auto attr_buf = getMTLBufferStorage(attr.contiguous());
+                auto indices_buf = getMTLBufferStorage(indices.contiguous());
+                auto rast_buf = getMTLBufferStorage(rast.contiguous());
+                auto pos_bake_buf = getMTLBufferStorage(pos_bake);
+                int width = rast.size(0);
+                int height = rast.size(1);
+                [computeEncoder setComputePipelineState:interpolate_PSO];
+                [computeEncoder setBuffer:attr_buf offset:attr.storage_offset() * attr.element_size() atIndex:0];
+                [computeEncoder setBuffer:indices_buf offset:indices.storage_offset() * indices.element_size() atIndex:1];
+                [computeEncoder setBuffer:rast_buf offset:rast.storage_offset() * rast.element_size() atIndex:2];
+                [computeEncoder setBuffer:pos_bake_buf offset:pos_bake.storage_offset() * pos_bake.element_size() atIndex:3];
+                [computeEncoder setBytes:&width length:sizeof(int) atIndex:4];
+                [computeEncoder setBytes:&height length:sizeof(int) atIndex:5];
+                // Calculate a thread group size.
+                int block_size = 16;
+                MTLSize threadgroupSize = MTLSizeMake(block_size, block_size, 1);  // Fixed threadgroup size
+                MTLSize numThreadgroups = MTLSizeMake(rast.size(0) / block_size, rast.size(0) / block_size, 1);
+                // Encode the compute command.
+                [computeEncoder dispatchThreadgroups:numThreadgroups threadsPerThreadgroup:threadgroupSize];
+                [computeEncoder endEncoding];
+                // Commit the work.
+                torch::mps::commit();
+            });
+        }
+        return pos_bake;
+    }
+    // Registers MPS implementations
+    TORCH_LIBRARY_IMPL(texture_baker_cpp, MPS, m)
+    {
+        m.impl("rasterize", &rasterize_gpu);
+        m.impl("interpolate", &interpolate_gpu);
+    }
+}

uv_unwrapper/README.md ADDED Viewed

File without changes

uv_unwrapper/requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ torch
2	+ numpy

uv_unwrapper/setup.py ADDED Viewed

	@@ -0,0 +1,83 @@

+import glob
+import os
+import torch
+from setuptools import find_packages, setup
+from torch.utils.cpp_extension import (
+    BuildExtension,
+    CppExtension,
+)
+library_name = "uv_unwrapper"
+def get_extensions():
+    debug_mode = os.getenv("DEBUG", "0") == "1"
+    if debug_mode:
+        print("Compiling in debug mode")
+    is_mac = True if torch.backends.mps.is_available() else False
+    use_native_arch = not is_mac and os.getenv("USE_NATIVE_ARCH", "1") == "1"
+    extension = CppExtension
+    extra_link_args = []
+    extra_compile_args = {
+        "cxx": (
+            [
+                "-O3" if not debug_mode else "-O0",
+                "-fdiagnostics-color=always",
+                ("-Xclang " if is_mac else "") + "-fopenmp",
+            ]
+            + ["-march=native"]
+            if use_native_arch
+            else [] + ["-mmacosx-version-min=10.15"] if is_mac else []
+        ),
+    }
+    if debug_mode:
+        extra_compile_args["cxx"].append("-g")
+        extra_compile_args["cxx"].append("-UNDEBUG")
+        extra_link_args.extend(["-O0", "-g"])
+    define_macros = []
+    extensions = []
+    this_dir = os.path.dirname(os.path.curdir)
+    sources = glob.glob(
+        os.path.join(this_dir, library_name, "csrc", "**", "*.cpp"), recursive=True
+    )
+    if len(sources) == 0:
+        print("No source files found for extension, skipping extension compilation")
+        return None
+    extensions.append(
+        extension(
+            name=f"{library_name}._C",
+            sources=sources,
+            define_macros=define_macros,
+            extra_compile_args=extra_compile_args,
+            extra_link_args=extra_link_args,
+            libraries=(
+                ["c10", "torch", "torch_cpu", "torch_python"] + ["omp"]
+                if is_mac
+                else []
+            ),
+        )
+    )
+    print(extensions)
+    return extensions
+setup(
+    name=library_name,
+    version="0.0.1",
+    packages=find_packages(),
+    ext_modules=get_extensions(),
+    install_requires=[],
+    description="Box projection based UV unwrapper",
+    long_description=open("README.md").read(),
+    long_description_content_type="text/markdown",
+    cmdclass={"build_ext": BuildExtension},
+)

uv_unwrapper/uv_unwrapper/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+import torch  # noqa: F401
+from . import _C  # noqa: F401
+from .unwrap import Unwrapper
+__all__ = ["Unwrapper"]

uv_unwrapper/uv_unwrapper/csrc/bvh.cpp ADDED Viewed

	@@ -0,0 +1,381 @@

+#include "bvh.h"
+#include "common.h"
+#include <cstring>
+#include <iostream>
+#include <queue>
+#include <tuple>
+#include <utility>
+namespace UVUnwrapper {
+BVH::BVH(Triangle *tri, int *actual_idx, const size_t &num_indices) {
+  // Copty tri to triangle
+  triangle = new Triangle[num_indices];
+  memcpy(triangle, tri, num_indices * sizeof(Triangle));
+  // Copy actual_idx to actualIdx
+  actualIdx = new int[num_indices];
+  memcpy(actualIdx, actual_idx, num_indices * sizeof(int));
+  triIdx = new int[num_indices];
+  triCount = num_indices;
+  bvhNode = new BVHNode[triCount * 2 + 64];
+  nodesUsed = 2;
+  memset(bvhNode, 0, triCount * 2 * sizeof(BVHNode));
+  // populate triangle index array
+  for (int i = 0; i < triCount; i++)
+    triIdx[i] = i;
+  BVHNode &root = bvhNode[0];
+  root.start = 0, root.end = triCount;
+  AABB centroidBounds;
+  UpdateNodeBounds(0, centroidBounds);
+  // subdivide recursively
+  Subdivide(0, nodesUsed, centroidBounds);
+}
+BVH::BVH(const BVH &other)
+    : BVH(other.triangle, other.triIdx, other.triCount) {}
+BVH::BVH(BVH &&other) noexcept // move constructor
+    : triIdx(std::exchange(other.triIdx, nullptr)),
+      actualIdx(std::exchange(other.actualIdx, nullptr)),
+      triangle(std::exchange(other.triangle, nullptr)),
+      bvhNode(std::exchange(other.bvhNode, nullptr)) {}
+BVH &BVH::operator=(const BVH &other) // copy assignment
+{
+  return *this = BVH(other);
+}
+BVH &BVH::operator=(BVH &&other) noexcept // move assignment
+{
+  std::swap(triIdx, other.triIdx);
+  std::swap(actualIdx, other.actualIdx);
+  std::swap(triangle, other.triangle);
+  std::swap(bvhNode, other.bvhNode);
+  std::swap(triCount, other.triCount);
+  std::swap(nodesUsed, other.nodesUsed);
+  return *this;
+}
+BVH::~BVH() {
+  if (triIdx)
+    delete[] triIdx;
+  if (triangle)
+    delete[] triangle;
+  if (actualIdx)
+    delete[] actualIdx;
+  if (bvhNode)
+    delete[] bvhNode;
+}
+void BVH::UpdateNodeBounds(unsigned int nodeIdx, AABB &centroidBounds) {
+  BVHNode &node = bvhNode[nodeIdx];
+#ifndef __ARM_ARCH_ISA_A64
+#ifndef _MSC_VER
+  if (__builtin_cpu_supports("sse"))
+#elif (defined(_M_AMD64) || defined(_M_X64))
+  // SSE supported on Windows
+  if constexpr (true)
+#endif
+  {
+    __m128 min4 = _mm_set_ps1(FLT_MAX), max4 = _mm_set_ps1(FLT_MIN);
+    __m128 cmin4 = _mm_set_ps1(FLT_MAX), cmax4 = _mm_set_ps1(FLT_MIN);
+    for (int i = node.start; i < node.end; i += 2) {
+      Triangle &leafTri1 = triangle[triIdx[i]];
+      __m128 v0, v1, v2, centroid;
+      if (i + 1 < node.end) {
+        const Triangle leafTri2 = triangle[triIdx[i + 1]];
+        v0 = _mm_set_ps(leafTri1.v0.x, leafTri1.v0.y, leafTri2.v0.x,
+                        leafTri2.v0.y);
+        v1 = _mm_set_ps(leafTri1.v1.x, leafTri1.v1.y, leafTri2.v1.x,
+                        leafTri2.v1.y);
+        v2 = _mm_set_ps(leafTri1.v2.x, leafTri1.v2.y, leafTri2.v2.x,
+                        leafTri2.v2.y);
+        centroid = _mm_set_ps(leafTri1.centroid.x, leafTri1.centroid.y,
+                              leafTri2.centroid.x, leafTri2.centroid.y);
+      } else {
+        // Otherwise do some duplicated work
+        v0 = _mm_set_ps(leafTri1.v0.x, leafTri1.v0.y, leafTri1.v0.x,
+                        leafTri1.v0.y);
+        v1 = _mm_set_ps(leafTri1.v1.x, leafTri1.v1.y, leafTri1.v1.x,
+                        leafTri1.v1.y);
+        v2 = _mm_set_ps(leafTri1.v2.x, leafTri1.v2.y, leafTri1.v2.x,
+                        leafTri1.v2.y);
+        centroid = _mm_set_ps(leafTri1.centroid.x, leafTri1.centroid.y,
+                              leafTri1.centroid.x, leafTri1.centroid.y);
+      }
+      min4 = _mm_min_ps(min4, v0);
+      max4 = _mm_max_ps(max4, v0);
+      min4 = _mm_min_ps(min4, v1);
+      max4 = _mm_max_ps(max4, v1);
+      min4 = _mm_min_ps(min4, v2);
+      max4 = _mm_max_ps(max4, v2);
+      cmin4 = _mm_min_ps(cmin4, centroid);
+      cmax4 = _mm_max_ps(cmax4, centroid);
+    }
+    float min_values[4], max_values[4], cmin_values[4], cmax_values[4];
+    _mm_store_ps(min_values, min4);
+    _mm_store_ps(max_values, max4);
+    _mm_store_ps(cmin_values, cmin4);
+    _mm_store_ps(cmax_values, cmax4);
+    node.bbox.min.x = std::min(min_values[3], min_values[1]);
+    node.bbox.min.y = std::min(min_values[2], min_values[0]);
+    node.bbox.max.x = std::max(max_values[3], max_values[1]);
+    node.bbox.max.y = std::max(max_values[2], max_values[0]);
+    centroidBounds.min.x = std::min(cmin_values[3], cmin_values[1]);
+    centroidBounds.min.y = std::min(cmin_values[2], cmin_values[0]);
+    centroidBounds.max.x = std::max(cmax_values[3], cmax_values[1]);
+    centroidBounds.max.y = std::max(cmax_values[2], cmax_values[0]);
+  }
+#else
+  if constexpr (false) {
+  }
+#endif
+  else {
+    node.bbox.invalidate();
+    centroidBounds.invalidate();
+    // Calculate the bounding box for the node
+    for (int i = node.start; i < node.end; ++i) {
+      const Triangle &tri = triangle[triIdx[i]];
+      node.bbox.grow(tri.v0);
+      node.bbox.grow(tri.v1);
+      node.bbox.grow(tri.v2);
+      centroidBounds.grow(tri.centroid);
+    }
+  }
+}
+void BVH::Subdivide(unsigned int root_idx, unsigned int &nodePtr,
+                    AABB &rootCentroidBounds) {
+  // Create a queue for the nodes to be subdivided
+  std::queue<std::tuple<unsigned int, AABB>> nodeQueue;
+  nodeQueue.push(std::make_tuple(root_idx, rootCentroidBounds));
+  while (!nodeQueue.empty()) {
+    // Get the next node to process from the queue
+    auto [node_idx, centroidBounds] = nodeQueue.front();
+    nodeQueue.pop();
+    BVHNode &node = bvhNode[node_idx];
+    // Check if left is -1 and right not or vice versa
+    int axis, splitPos;
+    float cost = FindBestSplitPlane(node, axis, splitPos, centroidBounds);
+    if (cost >= node.calculate_node_cost()) {
+      node.left = node.right = -1;
+      continue; // Move on to the next node in the queue
+    }
+    int i = node.start;
+    int j = node.end - 1;
+    float scale = BINS / (centroidBounds.max[axis] - centroidBounds.min[axis]);
+    while (i <= j) {
+      int binIdx =
+          std::min(BINS - 1, (int)((triangle[triIdx[i]].centroid[axis] -
+                                    centroidBounds.min[axis]) *
+                                   scale));
+      if (binIdx < splitPos)
+        i++;
+      else
+        std::swap(triIdx[i], triIdx[j--]);
+    }
+    int leftCount = i - node.start;
+    if (leftCount == 0 || leftCount == (int)node.num_triangles()) {
+      node.left = node.right = -1;
+      continue; // Move on to the next node in the queue
+    }
+    int mid = i;
+    // Create child nodes
+    int leftChildIdx = nodePtr++;
+    int rightChildIdx = nodePtr++;
+    bvhNode[leftChildIdx].start = node.start;
+    bvhNode[leftChildIdx].end = mid;
+    bvhNode[rightChildIdx].start = mid;
+    bvhNode[rightChildIdx].end = node.end;
+    node.left = leftChildIdx;
+    node.right = rightChildIdx;
+    // Update the bounds for the child nodes and push them onto the queue
+    UpdateNodeBounds(leftChildIdx, centroidBounds);
+    nodeQueue.push(std::make_tuple(leftChildIdx, centroidBounds));
+    UpdateNodeBounds(rightChildIdx, centroidBounds);
+    nodeQueue.push(std::make_tuple(rightChildIdx, centroidBounds));
+  }
+}
+float BVH::FindBestSplitPlane(BVHNode &node, int &best_axis, int &best_pos,
+                              AABB &centroidBounds) {
+  float best_cost = FLT_MAX;
+  for (int axis = 0; axis < 2; ++axis) // We use 2 as we have only x and y
+  {
+    float boundsMin = centroidBounds.min[axis];
+    float boundsMax = centroidBounds.max[axis];
+    // Or floating point precision
+    if ((boundsMin == boundsMax) || (boundsMax - boundsMin < 1e-8f)) {
+      continue;
+    }
+    // populate the bins
+    float scale = BINS / (boundsMax - boundsMin);
+    float leftCountArea[BINS - 1], rightCountArea[BINS - 1];
+    int leftSum = 0, rightSum = 0;
+#ifndef __ARM_ARCH_ISA_A64
+#ifndef _MSC_VER
+    if (__builtin_cpu_supports("sse"))
+#elif (defined(_M_AMD64) || defined(_M_X64))
+    // SSE supported on Windows
+    if constexpr (true)
+#endif
+    {
+      __m128 min4[BINS], max4[BINS];
+      unsigned int count[BINS];
+      for (unsigned int i = 0; i < BINS; i++)
+        min4[i] = _mm_set_ps1(FLT_MAX), max4[i] = _mm_set_ps1(FLT_MIN),
+        count[i] = 0;
+      for (int i = node.start; i < node.end; i++) {
+        Triangle &tri = triangle[triIdx[i]];
+        int binIdx =
+            std::min(BINS - 1, (int)((tri.centroid[axis] - boundsMin) * scale));
+        count[binIdx]++;
+        __m128 v0 = _mm_set_ps(tri.v0.x, tri.v0.y, 0.0f, 0.0f);
+        __m128 v1 = _mm_set_ps(tri.v1.x, tri.v1.y, 0.0f, 0.0f);
+        __m128 v2 = _mm_set_ps(tri.v2.x, tri.v2.y, 0.0f, 0.0f);
+        min4[binIdx] = _mm_min_ps(min4[binIdx], v0);
+        max4[binIdx] = _mm_max_ps(max4[binIdx], v0);
+        min4[binIdx] = _mm_min_ps(min4[binIdx], v1);
+        max4[binIdx] = _mm_max_ps(max4[binIdx], v1);
+        min4[binIdx] = _mm_min_ps(min4[binIdx], v2);
+        max4[binIdx] = _mm_max_ps(max4[binIdx], v2);
+      }
+      // gather data for the 7 planes between the 8 bins
+      __m128 leftMin4 = _mm_set_ps1(FLT_MAX), rightMin4 = leftMin4;
+      __m128 leftMax4 = _mm_set_ps1(FLT_MIN), rightMax4 = leftMax4;
+      for (int i = 0; i < BINS - 1; i++) {
+        leftSum += count[i];
+        rightSum += count[BINS - 1 - i];
+        leftMin4 = _mm_min_ps(leftMin4, min4[i]);
+        rightMin4 = _mm_min_ps(rightMin4, min4[BINS - 2 - i]);
+        leftMax4 = _mm_max_ps(leftMax4, max4[i]);
+        rightMax4 = _mm_max_ps(rightMax4, max4[BINS - 2 - i]);
+        float le[4], re[4];
+        _mm_store_ps(le, _mm_sub_ps(leftMax4, leftMin4));
+        _mm_store_ps(re, _mm_sub_ps(rightMax4, rightMin4));
+        // SSE order goes from back to front
+        leftCountArea[i] = leftSum * (le[2] * le[3]); // 2D area calculation
+        rightCountArea[BINS - 2 - i] =
+            rightSum * (re[2] * re[3]); // 2D area calculation
+      }
+    }
+#else
+    if constexpr (false) {
+    }
+#endif
+    else {
+      struct Bin {
+        AABB bounds;
+        int triCount = 0;
+      } bin[BINS];
+      for (int i = node.start; i < node.end; i++) {
+        Triangle &tri = triangle[triIdx[i]];
+        int binIdx =
+            std::min(BINS - 1, (int)((tri.centroid[axis] - boundsMin) * scale));
+        bin[binIdx].triCount++;
+        bin[binIdx].bounds.grow(tri.v0);
+        bin[binIdx].bounds.grow(tri.v1);
+        bin[binIdx].bounds.grow(tri.v2);
+      }
+      // gather data for the 7 planes between the 8 bins
+      AABB leftBox, rightBox;
+      for (int i = 0; i < BINS - 1; i++) {
+        leftSum += bin[i].triCount;
+        leftBox.grow(bin[i].bounds);
+        leftCountArea[i] = leftSum * leftBox.area();
+        rightSum += bin[BINS - 1 - i].triCount;
+        rightBox.grow(bin[BINS - 1 - i].bounds);
+        rightCountArea[BINS - 2 - i] = rightSum * rightBox.area();
+      }
+    }
+    // calculate SAH cost for the 7 planes
+    scale = (boundsMax - boundsMin) / BINS;
+    for (int i = 0; i < BINS - 1; i++) {
+      const float planeCost = leftCountArea[i] + rightCountArea[i];
+      if (planeCost < best_cost)
+        best_axis = axis, best_pos = i + 1, best_cost = planeCost;
+    }
+  }
+  return best_cost;
+}
+std::vector<int> BVH::Intersect(Triangle &tri_intersect) {
+  /**
+   * @brief Intersect a triangle with the BVH
+   *
+   * @param triangle the triangle to intersect
+   *
+   * @return -1 for no intersection, the index of the intersected triangle
+   * otherwise
+   */
+  const int max_stack_size = 64;
+  int node_stack[max_stack_size];
+  int stack_size = 0;
+  std::vector<int> intersected_triangles;
+  node_stack[stack_size++] = 0; // Start with the root node (index 0)
+  while (stack_size > 0) {
+    int node_idx = node_stack[--stack_size];
+    const BVHNode &node = bvhNode[node_idx];
+    if (node.is_leaf()) {
+      for (int i = node.start; i < node.end; ++i) {
+        const Triangle &tri = triangle[triIdx[i]];
+        // Check that the triangle is not the same as the intersected triangle
+        if (tri == tri_intersect)
+          continue;
+        if (tri_intersect.overlaps(tri)) {
+          intersected_triangles.push_back(actualIdx[triIdx[i]]);
+        }
+      }
+    } else {
+      // Check right child first
+      if (bvhNode[node.right].bbox.overlaps(tri_intersect)) {
+        if (stack_size < max_stack_size) {
+          node_stack[stack_size++] = node.right;
+        } else {
+          throw std::runtime_error("Node stack overflow");
+        }
+      }
+      // Check left child
+      if (bvhNode[node.left].bbox.overlaps(tri_intersect)) {
+        if (stack_size < max_stack_size) {
+          node_stack[stack_size++] = node.left;
+        } else {
+          throw std::runtime_error("Node stack overflow");
+        }
+      }
+    }
+  }
+  return intersected_triangles; // Return all intersected triangle indices
+}
+} // namespace UVUnwrapper

uv_unwrapper/uv_unwrapper/csrc/bvh.h ADDED Viewed

	@@ -0,0 +1,118 @@

+#pragma once
+#include <cfloat>
+#include <cmath>
+#ifndef __ARM_ARCH_ISA_A64
+#include <immintrin.h>
+#endif
+#include <limits>
+#include <vector>
+#include "common.h"
+#include "intersect.h"
+/**
+ * Based on https://github.com/jbikker/bvh_article released under the unlicense.
+ */
+// bin count for binned BVH building
+#define BINS 8
+namespace UVUnwrapper {
+// minimalist triangle struct
+struct alignas(32) Triangle {
+  uv_float2 v0;
+  uv_float2 v1;
+  uv_float2 v2;
+  uv_float2 centroid;
+  bool overlaps(const Triangle &other) {
+    // return tri_tri_overlap_test_2d(v0, v1, v2, other.v0, other.v1, other.v2);
+    return triangle_triangle_intersection(v0, v1, v2, other.v0, other.v1,
+                                          other.v2);
+  }
+  bool operator==(const Triangle &rhs) const {
+    return v0 == rhs.v0 && v1 == rhs.v1 && v2 == rhs.v2;
+  }
+};
+// minimalist AABB struct with grow functionality
+struct alignas(16) AABB {
+  // Init bounding boxes with max/min
+  uv_float2 min = {FLT_MAX, FLT_MAX};
+  uv_float2 max = {FLT_MIN, FLT_MIN};
+  void grow(const uv_float2 &p) {
+    min.x = std::min(min.x, p.x);
+    min.y = std::min(min.y, p.y);
+    max.x = std::max(max.x, p.x);
+    max.y = std::max(max.y, p.y);
+  }
+  void grow(const AABB &b) {
+    if (b.min.x != FLT_MAX) {
+      grow(b.min);
+      grow(b.max);
+    }
+  }
+  bool overlaps(const Triangle &tri) {
+    return triangle_aabb_intersection(min, max, tri.v0, tri.v1, tri.v2);
+  }
+  float area() const {
+    uv_float2 extent = {max.x - min.x, max.y - min.y};
+    return extent.x * extent.y;
+  }
+  void invalidate() {
+    min = {FLT_MAX, FLT_MAX};
+    max = {FLT_MIN, FLT_MIN};
+  }
+};
+// 32-byte BVH node struct
+struct alignas(32) BVHNode {
+  AABB bbox;              // 16
+  int start = 0, end = 0; // 8
+  int left, right;
+  int num_triangles() const { return end - start; }
+  bool is_leaf() const { return left == -1 && right == -1; }
+  float calculate_node_cost() {
+    float area = bbox.area();
+    return num_triangles() * area;
+  }
+};
+class BVH {
+public:
+  BVH() = default;
+  BVH(BVH &&other) noexcept;
+  BVH(const BVH &other);
+  BVH &operator=(const BVH &other);
+  BVH &operator=(BVH &&other) noexcept;
+  BVH(Triangle *tri, int *actual_idx, const size_t &num_indices);
+  ~BVH();
+  std::vector<int> Intersect(Triangle &triangle);
+private:
+  void Subdivide(unsigned int node_idx, unsigned int &nodePtr,
+                 AABB &centroidBounds);
+  void UpdateNodeBounds(unsigned int nodeIdx, AABB &centroidBounds);
+  float FindBestSplitPlane(BVHNode &node, int &axis, int &splitPos,
+                           AABB &centroidBounds);
+public:
+  int *triIdx = nullptr;
+  int *actualIdx = nullptr;
+  unsigned int triCount;
+  unsigned int nodesUsed;
+  BVHNode *bvhNode = nullptr;
+  Triangle *triangle = nullptr;
+};
+} // namespace UVUnwrapper

uv_unwrapper/uv_unwrapper/csrc/common.h ADDED Viewed

	@@ -0,0 +1,493 @@

+#pragma once
+#include <array>
+#include <cmath>
+#include <iostream>
+#include <stdexcept>
+const float EPSILON = 1e-7f;
+// Structure to represent a 2D point or vector
+union alignas(8) uv_float2 {
+  struct {
+    float x, y;
+  };
+  float data[2];
+  float &operator[](size_t idx) {
+    if (idx > 1)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  const float &operator[](size_t idx) const {
+    if (idx > 1)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  bool operator==(const uv_float2 &rhs) const {
+    return x == rhs.x && y == rhs.y;
+  }
+};
+// Do not align as this is specifically tweaked for BVHNode
+union uv_float3 {
+  struct {
+    float x, y, z;
+  };
+  float data[3];
+  float &operator[](size_t idx) {
+    if (idx > 3)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  const float &operator[](size_t idx) const {
+    if (idx > 3)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  bool operator==(const uv_float3 &rhs) const {
+    return x == rhs.x && y == rhs.y && z == rhs.z;
+  }
+};
+union alignas(16) uv_float4 {
+  struct {
+    float x, y, z, w;
+  };
+  float data[4];
+  float &operator[](size_t idx) {
+    if (idx > 3)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  const float &operator[](size_t idx) const {
+    if (idx > 3)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  bool operator==(const uv_float4 &rhs) const {
+    return x == rhs.x && y == rhs.y && z == rhs.z && w == rhs.w;
+  }
+};
+union alignas(8) uv_int2 {
+  struct {
+    int x, y;
+  };
+  int data[2];
+  int &operator[](size_t idx) {
+    if (idx > 1)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  const int &operator[](size_t idx) const {
+    if (idx > 1)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  bool operator==(const uv_int2 &rhs) const { return x == rhs.x && y == rhs.y; }
+};
+union alignas(4) uv_int3 {
+  struct {
+    int x, y, z;
+  };
+  int data[3];
+  int &operator[](size_t idx) {
+    if (idx > 2)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  const int &operator[](size_t idx) const {
+    if (idx > 2)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  bool operator==(const uv_int3 &rhs) const {
+    return x == rhs.x && y == rhs.y && z == rhs.z;
+  }
+};
+union alignas(16) uv_int4 {
+  struct {
+    int x, y, z, w;
+  };
+  int data[4];
+  int &operator[](size_t idx) {
+    if (idx > 3)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  const int &operator[](size_t idx) const {
+    if (idx > 3)
+      throw std::runtime_error("bad index");
+    return data[idx];
+  }
+  bool operator==(const uv_int4 &rhs) const {
+    return x == rhs.x && y == rhs.y && z == rhs.z && w == rhs.w;
+  }
+};
+inline float calc_mean(float a, float b, float c) { return (a + b + c) / 3; }
+// Create a triangle centroid
+inline uv_float2 triangle_centroid(const uv_float2 &v0, const uv_float2 &v1,
+                                   const uv_float2 &v2) {
+  return {calc_mean(v0.x, v1.x, v2.x), calc_mean(v0.y, v1.y, v2.y)};
+}
+inline uv_float3 triangle_centroid(const uv_float3 &v0, const uv_float3 &v1,
+                                   const uv_float3 &v2) {
+  return {calc_mean(v0.x, v1.x, v2.x), calc_mean(v0.y, v1.y, v2.y),
+          calc_mean(v0.z, v1.z, v2.z)};
+}
+// Helper functions for vector math
+inline uv_float2 operator-(const uv_float2 &a, const uv_float2 &b) {
+  return {a.x - b.x, a.y - b.y};
+}
+inline uv_float3 operator-(const uv_float3 &a, const uv_float3 &b) {
+  return {a.x - b.x, a.y - b.y, a.z - b.z};
+}
+inline uv_float2 operator+(const uv_float2 &a, const uv_float2 &b) {
+  return {a.x + b.x, a.y + b.y};
+}
+inline uv_float3 operator+(const uv_float3 &a, const uv_float3 &b) {
+  return {a.x + b.x, a.y + b.y, a.z + b.z};
+}
+inline uv_float2 operator*(const uv_float2 &a, float scalar) {
+  return {a.x * scalar, a.y * scalar};
+}
+inline uv_float3 operator*(const uv_float3 &a, float scalar) {
+  return {a.x * scalar, a.y * scalar, a.z * scalar};
+}
+inline float dot(const uv_float2 &a, const uv_float2 &b) {
+  return a.x * b.x + a.y * b.y;
+}
+inline float dot(const uv_float3 &a, const uv_float3 &b) {
+  return a.x * b.x + a.y * b.y + a.z * b.z;
+}
+inline float cross(const uv_float2 &a, const uv_float2 &b) {
+  return a.x * b.y - a.y * b.x;
+}
+inline uv_float3 cross(const uv_float3 &a, const uv_float3 &b) {
+  return {a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x};
+}
+inline uv_float2 abs_vec(const uv_float2 &v) {
+  return {std::abs(v.x), std::abs(v.y)};
+}
+inline uv_float2 min_vec(const uv_float2 &a, const uv_float2 &b) {
+  return {std::min(a.x, b.x), std::min(a.y, b.y)};
+}
+inline uv_float2 max_vec(const uv_float2 &a, const uv_float2 &b) {
+  return {std::max(a.x, b.x), std::max(a.y, b.y)};
+}
+inline float distance_to(const uv_float2 &a, const uv_float2 &b) {
+  return std::sqrt(std::pow(a.x - b.x, 2) + std::pow(a.y - b.y, 2));
+}
+inline float distance_to(const uv_float3 &a, const uv_float3 &b) {
+  return std::sqrt(std::pow(a.x - b.x, 2) + std::pow(a.y - b.y, 2) +
+                   std::pow(a.z - b.z, 2));
+}
+inline uv_float2 normalize(const uv_float2 &v) {
+  float len = std::sqrt(v.x * v.x + v.y * v.y);
+  return {v.x / len, v.y / len};
+}
+inline uv_float3 normalize(const uv_float3 &v) {
+  float len = std::sqrt(v.x * v.x + v.y * v.y + v.z * v.z);
+  return {v.x / len, v.y / len, v.z / len};
+}
+inline float magnitude(const uv_float3 &v) {
+  return std::sqrt(v.x * v.x + v.y * v.y + v.z * v.z);
+}
+struct Matrix4 {
+  std::array<std::array<float, 4>, 4> m;
+  Matrix4() {
+    for (auto &row : m) {
+      row.fill(0.0f);
+    }
+    m[3][3] = 1.0f; // Identity matrix for 4th row and column
+  }
+  void set(float m00, float m01, float m02, float m03, float m10, float m11,
+           float m12, float m13, float m20, float m21, float m22, float m23,
+           float m30, float m31, float m32, float m33) {
+    m[0][0] = m00;
+    m[0][1] = m01;
+    m[0][2] = m02;
+    m[0][3] = m03;
+    m[1][0] = m10;
+    m[1][1] = m11;
+    m[1][2] = m12;
+    m[1][3] = m13;
+    m[2][0] = m20;
+    m[2][1] = m21;
+    m[2][2] = m22;
+    m[2][3] = m23;
+    m[3][0] = m30;
+    m[3][1] = m31;
+    m[3][2] = m32;
+    m[3][3] = m33;
+  }
+  float determinant() const {
+    return m[0][3] * m[1][2] * m[2][1] * m[3][0] -
+           m[0][2] * m[1][3] * m[2][1] * m[3][0] -
+           m[0][3] * m[1][1] * m[2][2] * m[3][0] +
+           m[0][1] * m[1][3] * m[2][2] * m[3][0] +
+           m[0][2] * m[1][1] * m[2][3] * m[3][0] -
+           m[0][1] * m[1][2] * m[2][3] * m[3][0] -
+           m[0][3] * m[1][2] * m[2][0] * m[3][1] +
+           m[0][2] * m[1][3] * m[2][0] * m[3][1] +
+           m[0][3] * m[1][0] * m[2][2] * m[3][1] -
+           m[0][0] * m[1][3] * m[2][2] * m[3][1] -
+           m[0][2] * m[1][0] * m[2][3] * m[3][1] +
+           m[0][0] * m[1][2] * m[2][3] * m[3][1] +
+           m[0][3] * m[1][1] * m[2][0] * m[3][2] -
+           m[0][1] * m[1][3] * m[2][0] * m[3][2] -
+           m[0][3] * m[1][0] * m[2][1] * m[3][2] +
+           m[0][0] * m[1][3] * m[2][1] * m[3][2] +
+           m[0][1] * m[1][0] * m[2][3] * m[3][2] -
+           m[0][0] * m[1][1] * m[2][3] * m[3][2] -
+           m[0][2] * m[1][1] * m[2][0] * m[3][3] +
+           m[0][1] * m[1][2] * m[2][0] * m[3][3] +
+           m[0][2] * m[1][0] * m[2][1] * m[3][3] -
+           m[0][0] * m[1][2] * m[2][1] * m[3][3] -
+           m[0][1] * m[1][0] * m[2][2] * m[3][3] +
+           m[0][0] * m[1][1] * m[2][2] * m[3][3];
+  }
+  Matrix4 operator*(const Matrix4 &other) const {
+    Matrix4 result;
+    for (int row = 0; row < 4; ++row) {
+      for (int col = 0; col < 4; ++col) {
+        result.m[row][col] =
+            m[row][0] * other.m[0][col] + m[row][1] * other.m[1][col] +
+            m[row][2] * other.m[2][col] + m[row][3] * other.m[3][col];
+      }
+    }
+    return result;
+  }
+  Matrix4 operator*(float scalar) const {
+    Matrix4 result = *this;
+    for (auto &row : result.m) {
+      for (auto &element : row) {
+        element *= scalar;
+      }
+    }
+    return result;
+  }
+  Matrix4 operator+(const Matrix4 &other) const {
+    Matrix4 result;
+    for (int i = 0; i < 4; ++i) {
+      for (int j = 0; j < 4; ++j) {
+        result.m[i][j] = m[i][j] + other.m[i][j];
+      }
+    }
+    return result;
+  }
+  Matrix4 operator-(const Matrix4 &other) const {
+    Matrix4 result;
+    for (int i = 0; i < 4; ++i) {
+      for (int j = 0; j < 4; ++j) {
+        result.m[i][j] = m[i][j] - other.m[i][j];
+      }
+    }
+    return result;
+  }
+  float trace() const { return m[0][0] + m[1][1] + m[2][2] + m[3][3]; }
+  Matrix4 identity() const {
+    Matrix4 identity;
+    identity.set(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1);
+    return identity;
+  }
+  Matrix4 power(int exp) const {
+    if (exp == 0)
+      return identity();
+    if (exp == 1)
+      return *this;
+    Matrix4 result = *this;
+    for (int i = 1; i < exp; ++i) {
+      result = result * (*this);
+    }
+    return result;
+  }
+  void print() {
+    // Print all entries in 4 rows with 4 columns
+    for (int i = 0; i < 4; ++i) {
+      for (int j = 0; j < 4; ++j) {
+        std::cout << m[i][j] << " ";
+      }
+      std::cout << std::endl;
+    }
+  }
+  bool invert() {
+    double inv[16], det;
+    double mArr[16];
+    // Convert the matrix to a 1D array for easier manipulation
+    for (int i = 0; i < 4; ++i) {
+      for (int j = 0; j < 4; ++j) {
+        mArr[i * 4 + j] = static_cast<double>(m[i][j]);
+      }
+    }
+    inv[0] = mArr[5] * mArr[10] * mArr[15] - mArr[5] * mArr[11] * mArr[14] -
+             mArr[9] * mArr[6] * mArr[15] + mArr[9] * mArr[7] * mArr[14] +
+             mArr[13] * mArr[6] * mArr[11] - mArr[13] * mArr[7] * mArr[10];
+    inv[4] = -mArr[4] * mArr[10] * mArr[15] + mArr[4] * mArr[11] * mArr[14] +
+             mArr[8] * mArr[6] * mArr[15] - mArr[8] * mArr[7] * mArr[14] -
+             mArr[12] * mArr[6] * mArr[11] + mArr[12] * mArr[7] * mArr[10];
+    inv[8] = mArr[4] * mArr[9] * mArr[15] - mArr[4] * mArr[11] * mArr[13] -
+             mArr[8] * mArr[5] * mArr[15] + mArr[8] * mArr[7] * mArr[13] +
+             mArr[12] * mArr[5] * mArr[11] - mArr[12] * mArr[7] * mArr[9];
+    inv[12] = -mArr[4] * mArr[9] * mArr[14] + mArr[4] * mArr[10] * mArr[13] +
+              mArr[8] * mArr[5] * mArr[14] - mArr[8] * mArr[6] * mArr[13] -
+              mArr[12] * mArr[5] * mArr[10] + mArr[12] * mArr[6] * mArr[9];
+    inv[1] = -mArr[1] * mArr[10] * mArr[15] + mArr[1] * mArr[11] * mArr[14] +
+             mArr[9] * mArr[2] * mArr[15] - mArr[9] * mArr[3] * mArr[14] -
+             mArr[13] * mArr[2] * mArr[11] + mArr[13] * mArr[3] * mArr[10];
+    inv[5] = mArr[0] * mArr[10] * mArr[15] - mArr[0] * mArr[11] * mArr[14] -
+             mArr[8] * mArr[2] * mArr[15] + mArr[8] * mArr[3] * mArr[14] +
+             mArr[12] * mArr[2] * mArr[11] - mArr[12] * mArr[3] * mArr[10];
+    inv[9] = -mArr[0] * mArr[9] * mArr[15] + mArr[0] * mArr[11] * mArr[13] +
+             mArr[8] * mArr[1] * mArr[15] - mArr[8] * mArr[3] * mArr[13] -
+             mArr[12] * mArr[1] * mArr[11] + mArr[12] * mArr[3] * mArr[9];
+    inv[13] = mArr[0] * mArr[9] * mArr[14] - mArr[0] * mArr[10] * mArr[13] -
+              mArr[8] * mArr[1] * mArr[14] + mArr[8] * mArr[2] * mArr[13] +
+              mArr[12] * mArr[1] * mArr[10] - mArr[12] * mArr[2] * mArr[9];
+    inv[2] = mArr[1] * mArr[6] * mArr[15] - mArr[1] * mArr[7] * mArr[14] -
+             mArr[5] * mArr[2] * mArr[15] + mArr[5] * mArr[3] * mArr[14] +
+             mArr[13] * mArr[2] * mArr[7] - mArr[13] * mArr[3] * mArr[6];
+    inv[6] = -mArr[0] * mArr[6] * mArr[15] + mArr[0] * mArr[7] * mArr[14] +
+             mArr[4] * mArr[2] * mArr[15] - mArr[4] * mArr[3] * mArr[14] -
+             mArr[12] * mArr[2] * mArr[7] + mArr[12] * mArr[3] * mArr[6];
+    inv[10] = mArr[0] * mArr[5] * mArr[15] - mArr[0] * mArr[7] * mArr[13] -
+              mArr[4] * mArr[1] * mArr[15] + mArr[4] * mArr[3] * mArr[13] +
+              mArr[12] * mArr[1] * mArr[7] - mArr[12] * mArr[3] * mArr[5];
+    inv[14] = -mArr[0] * mArr[5] * mArr[14] + mArr[0] * mArr[6] * mArr[13] +
+              mArr[4] * mArr[1] * mArr[14] - mArr[4] * mArr[2] * mArr[13] -
+              mArr[12] * mArr[1] * mArr[6] + mArr[12] * mArr[2] * mArr[5];
+    inv[3] = -mArr[1] * mArr[6] * mArr[11] + mArr[1] * mArr[7] * mArr[10] +
+             mArr[5] * mArr[2] * mArr[11] - mArr[5] * mArr[3] * mArr[10] -
+             mArr[9] * mArr[2] * mArr[7] + mArr[9] * mArr[3] * mArr[6];
+    inv[7] = mArr[0] * mArr[6] * mArr[11] - mArr[0] * mArr[7] * mArr[10] -
+             mArr[4] * mArr[2] * mArr[11] + mArr[4] * mArr[3] * mArr[10] +
+             mArr[8] * mArr[2] * mArr[7] - mArr[8] * mArr[3] * mArr[6];
+    inv[11] = -mArr[0] * mArr[5] * mArr[11] + mArr[0] * mArr[7] * mArr[9] +
+              mArr[4] * mArr[1] * mArr[11] - mArr[4] * mArr[3] * mArr[9] -
+              mArr[8] * mArr[1] * mArr[7] + mArr[8] * mArr[3] * mArr[5];
+    inv[15] = mArr[0] * mArr[5] * mArr[10] - mArr[0] * mArr[6] * mArr[9] -
+              mArr[4] * mArr[1] * mArr[10] + mArr[4] * mArr[2] * mArr[9] +
+              mArr[8] * mArr[1] * mArr[6] - mArr[8] * mArr[2] * mArr[5];
+    det = mArr[0] * inv[0] + mArr[1] * inv[4] + mArr[2] * inv[8] +
+          mArr[3] * inv[12];
+    if (fabs(det) < 1e-6) {
+      return false;
+    }
+    det = 1.0 / det;
+    for (int i = 0; i < 16; i++) {
+      inv[i] *= det;
+    }
+    // Convert the 1D array back to the 4x4 matrix
+    for (int i = 0; i < 4; ++i) {
+      for (int j = 0; j < 4; ++j) {
+        m[i][j] = static_cast<float>(inv[i * 4 + j]);
+      }
+    }
+    return true;
+  }
+};
+inline void apply_matrix4(uv_float3 &v, const Matrix4 matrix) {
+  float newX = v.x * matrix.m[0][0] + v.y * matrix.m[0][1] +
+               v.z * matrix.m[0][2] + matrix.m[0][3];
+  float newY = v.x * matrix.m[1][0] + v.y * matrix.m[1][1] +
+               v.z * matrix.m[1][2] + matrix.m[1][3];
+  float newZ = v.x * matrix.m[2][0] + v.y * matrix.m[2][1] +
+               v.z * matrix.m[2][2] + matrix.m[2][3];
+  float w = v.x * matrix.m[3][0] + v.y * matrix.m[3][1] + v.z * matrix.m[3][2] +
+            matrix.m[3][3];
+  if (std::fabs(w) > EPSILON) {
+    newX /= w;
+    newY /= w;
+    newZ /= w;
+  }
+  v.x = newX;
+  v.y = newY;
+  v.z = newZ;
+}

uv_unwrapper/uv_unwrapper/csrc/intersect.cpp ADDED Viewed

	@@ -0,0 +1,702 @@

+#include "intersect.h"
+#include "bvh.h"
+#include <algorithm>
+#include <cmath>
+#include <iostream>
+#include <stdexcept>
+#include <vector>
+bool triangle_aabb_intersection(const uv_float2 &aabbMin,
+                                const uv_float2 &aabbMax, const uv_float2 &v0,
+                                const uv_float2 &v1, const uv_float2 &v2) {
+  // Convert the min and max aabb defintion to left, right, top, bottom
+  float l = aabbMin.x;
+  float r = aabbMax.x;
+  float t = aabbMin.y;
+  float b = aabbMax.y;
+  int b0 = ((v0.x > l) ? 1 : 0) | ((v0.y > t) ? 2 : 0) | ((v0.x > r) ? 4 : 0) |
+           ((v0.y > b) ? 8 : 0);
+  if (b0 == 3)
+    return true;
+  int b1 = ((v1.x > l) ? 1 : 0) | ((v1.y > t) ? 2 : 0) | ((v1.x > r) ? 4 : 0) |
+           ((v1.y > b) ? 8 : 0);
+  if (b1 == 3)
+    return true;
+  int b2 = ((v2.x > l) ? 1 : 0) | ((v2.y > t) ? 2 : 0) | ((v2.x > r) ? 4 : 0) |
+           ((v2.y > b) ? 8 : 0);
+  if (b2 == 3)
+    return true;
+  float m, c, s;
+  int i0 = b0 ^ b1;
+  if (i0 != 0) {
+    if (v1.x != v0.x) {
+      m = (v1.y - v0.y) / (v1.x - v0.x);
+      c = v0.y - (m * v0.x);
+      if (i0 & 1) {
+        s = m * l + c;
+        if (s >= t && s <= b)
+          return true;
+      }
+      if (i0 & 2) {
+        s = (t - c) / m;
+        if (s >= l && s <= r)
+          return true;
+      }
+      if (i0 & 4) {
+        s = m * r + c;
+        if (s >= t && s <= b)
+          return true;
+      }
+      if (i0 & 8) {
+        s = (b - c) / m;
+        if (s >= l && s <= r)
+          return true;
+      }
+    } else {
+      if (l == v0.x || r == v0.x)
+        return true;
+      if (v0.x > l && v0.x < r)
+        return true;
+    }
+  }
+  int i1 = b1 ^ b2;
+  if (i1 != 0) {
+    if (v2.x != v1.x) {
+      m = (v2.y - v1.y) / (v2.x - v1.x);
+      c = v1.y - (m * v1.x);
+      if (i1 & 1) {
+        s = m * l + c;
+        if (s >= t && s <= b)
+          return true;
+      }
+      if (i1 & 2) {
+        s = (t - c) / m;
+        if (s >= l && s <= r)
+          return true;
+      }
+      if (i1 & 4) {
+        s = m * r + c;
+        if (s >= t && s <= b)
+          return true;
+      }
+      if (i1 & 8) {
+        s = (b - c) / m;
+        if (s >= l && s <= r)
+          return true;
+      }
+    } else {
+      if (l == v1.x || r == v1.x)
+        return true;
+      if (v1.x > l && v1.x < r)
+        return true;
+    }
+  }
+  int i2 = b0 ^ b2;
+  if (i2 != 0) {
+    if (v2.x != v0.x) {
+      m = (v2.y - v0.y) / (v2.x - v0.x);
+      c = v0.y - (m * v0.x);
+      if (i2 & 1) {
+        s = m * l + c;
+        if (s >= t && s <= b)
+          return true;
+      }
+      if (i2 & 2) {
+        s = (t - c) / m;
+        if (s >= l && s <= r)
+          return true;
+      }
+      if (i2 & 4) {
+        s = m * r + c;
+        if (s >= t && s <= b)
+          return true;
+      }
+      if (i2 & 8) {
+        s = (b - c) / m;
+        if (s >= l && s <= r)
+          return true;
+      }
+    } else {
+      if (l == v0.x || r == v0.x)
+        return true;
+      if (v0.x > l && v0.x < r)
+        return true;
+    }
+  }
+  // Bounding box check
+  float tbb_l = std::min(v0.x, std::min(v1.x, v2.x));
+  float tbb_t = std::min(v0.y, std::min(v1.y, v2.y));
+  float tbb_r = std::max(v0.x, std::max(v1.x, v2.x));
+  float tbb_b = std::max(v0.y, std::max(v1.y, v2.y));
+  if (tbb_l <= l && tbb_r >= r && tbb_t <= t && tbb_b >= b) {
+    float v0x = v2.x - v0.x;
+    float v0y = v2.y - v0.y;
+    float v1x = v1.x - v0.x;
+    float v1y = v1.y - v0.y;
+    float v2x, v2y;
+    float dot00, dot01, dot02, dot11, dot12, invDenom, u, v;
+    // Top-left corner
+    v2x = l - v0.x;
+    v2y = t - v0.y;
+    dot00 = v0x * v0x + v0y * v0y;
+    dot01 = v0x * v1x + v0y * v1y;
+    dot02 = v0x * v2x + v0y * v2y;
+    dot11 = v1x * v1x + v1y * v1y;
+    dot12 = v1x * v2x + v1y * v2y;
+    invDenom = 1.0f / (dot00 * dot11 - dot01 * dot01);
+    u = (dot11 * dot02 - dot01 * dot12) * invDenom;
+    v = (dot00 * dot12 - dot01 * dot02) * invDenom;
+    if (u >= 0 && v >= 0 && (u + v) <= 1)
+      return true;
+    // Bottom-left corner
+    v2x = l - v0.x;
+    v2y = b - v0.y;
+    dot02 = v0x * v2x + v0y * v2y;
+    dot12 = v1x * v2x + v1y * v2y;
+    u = (dot11 * dot02 - dot01 * dot12) * invDenom;
+    v = (dot00 * dot12 - dot01 * dot02) * invDenom;
+    if (u >= 0 && v >= 0 && (u + v) <= 1)
+      return true;
+    // Bottom-right corner
+    v2x = r - v0.x;
+    v2y = b - v0.y;
+    dot02 = v0x * v2x + v0y * v2y;
+    dot12 = v1x * v2x + v1y * v2y;
+    u = (dot11 * dot02 - dot01 * dot12) * invDenom;
+    v = (dot00 * dot12 - dot01 * dot02) * invDenom;
+    if (u >= 0 && v >= 0 && (u + v) <= 1)
+      return true;
+    // Top-right corner
+    v2x = r - v0.x;
+    v2y = t - v0.y;
+    dot02 = v0x * v2x + v0y * v2y;
+    dot12 = v1x * v2x + v1y * v2y;
+    u = (dot11 * dot02 - dot01 * dot12) * invDenom;
+    v = (dot00 * dot12 - dot01 * dot02) * invDenom;
+    if (u >= 0 && v >= 0 && (u + v) <= 1)
+      return true;
+  }
+  return false;
+}
+void tri_winding(uv_float2 &a, uv_float2 &b, uv_float2 &c) {
+  float det = (a.x * (b.y - c.y) + b.x * (c.y - a.y) + c.x * (a.y - b.y));
+  // If the determinant is negative, the triangle is oriented clockwise
+  if (det < 0) {
+    // Swap vertices b and c to ensure counter-clockwise winding
+    std::swap(b, c);
+  }
+}
+struct Triangle {
+  uv_float3 a, b, c;
+  Triangle(const uv_float2 &p1, const uv_float2 &q1, const uv_float2 &r1)
+      : a({p1.x, p1.y, 0}), b({q1.x, q1.y, 0}), c({r1.x, r1.y, 0}) {}
+  Triangle(const uv_float3 &p1, const uv_float3 &q1, const uv_float3 &r1)
+      : a(p1), b(q1), c(r1) {}
+  void getNormal(uv_float3 &normal) const {
+    uv_float3 u = b - a;
+    uv_float3 v = c - a;
+    normal = normalize(cross(u, v));
+  }
+};
+bool isTriDegenerated(const Triangle &tri) {
+  uv_float3 u = tri.a - tri.b;
+  uv_float3 v = tri.a - tri.c;
+  uv_float3 cr = cross(u, v);
+  return fabs(cr.x) < EPSILON && fabs(cr.y) < EPSILON && fabs(cr.z) < EPSILON;
+}
+int orient3D(const uv_float3 &a, const uv_float3 &b, const uv_float3 &c,
+             const uv_float3 &d) {
+  Matrix4 _matrix4;
+  _matrix4.set(a.x, a.y, a.z, 1, b.x, b.y, b.z, 1, c.x, c.y, c.z, 1, d.x, d.y,
+               d.z, 1);
+  float det = _matrix4.determinant();
+  if (det < -EPSILON)
+    return -1;
+  else if (det > EPSILON)
+    return 1;
+  else
+    return 0;
+}
+int orient2D(const uv_float2 &a, const uv_float2 &b, const uv_float2 &c) {
+  float det = (a.x * (b.y - c.y) + b.x * (c.y - a.y) + c.x * (a.y - b.y));
+  if (det < -EPSILON)
+    return -1;
+  else if (det > EPSILON)
+    return 1;
+  else
+    return 0;
+}
+int orient2D(const uv_float3 &a, const uv_float3 &b, const uv_float3 &c) {
+  uv_float2 a_2d = {a.x, a.y};
+  uv_float2 b_2d = {b.x, b.y};
+  uv_float2 c_2d = {c.x, c.y};
+  return orient2D(a_2d, b_2d, c_2d);
+}
+void permuteTriLeft(Triangle &tri) {
+  uv_float3 tmp = tri.a;
+  tri.a = tri.b;
+  tri.b = tri.c;
+  tri.c = tmp;
+}
+void permuteTriRight(Triangle &tri) {
+  uv_float3 tmp = tri.c;
+  tri.c = tri.b;
+  tri.b = tri.a;
+  tri.a = tmp;
+}
+void makeTriCounterClockwise(Triangle &tri) {
+  if (orient2D(tri.a, tri.b, tri.c) < 0) {
+    uv_float3 tmp = tri.c;
+    tri.c = tri.b;
+    tri.b = tmp;
+  }
+}
+void intersectPlane(const uv_float3 &a, const uv_float3 &b, const uv_float3 &p,
+                    const uv_float3 &n, uv_float3 &target) {
+  uv_float3 u = b - a;
+  uv_float3 v = a - p;
+  float dot1 = dot(n, u);
+  float dot2 = dot(n, v);
+  u = u * (-dot2 / dot1);
+  target = a + u;
+}
+void computeLineIntersection(const Triangle &t1, const Triangle &t2,
+                             std::vector<uv_float3> &target) {
+  uv_float3 n1, n2;
+  t1.getNormal(n1);
+  t2.getNormal(n2);
+  int o1 = orient3D(t1.a, t1.c, t2.b, t2.a);
+  int o2 = orient3D(t1.a, t1.b, t2.c, t2.a);
+  uv_float3 i1, i2;
+  if (o1 > 0) {
+    if (o2 > 0) {
+      intersectPlane(t1.a, t1.c, t2.a, n2, i1);
+      intersectPlane(t2.a, t2.c, t1.a, n1, i2);
+    } else {
+      intersectPlane(t1.a, t1.c, t2.a, n2, i1);
+      intersectPlane(t1.a, t1.b, t2.a, n2, i2);
+    }
+  } else {
+    if (o2 > 0) {
+      intersectPlane(t2.a, t2.b, t1.a, n1, i1);
+      intersectPlane(t2.a, t2.c, t1.a, n1, i2);
+    } else {
+      intersectPlane(t2.a, t2.b, t1.a, n1, i1);
+      intersectPlane(t1.a, t1.b, t2.a, n2, i2);
+    }
+  }
+  target.push_back(i1);
+  if (distance_to(i1, i2) >= EPSILON) {
+    target.push_back(i2);
+  }
+}
+void makeTriAVertexAlone(Triangle &tri, int oa, int ob, int oc) {
+  // Permute a, b, c so that a is alone on its side
+  if (oa == ob) {
+    // c is alone, permute right so c becomes a
+    permuteTriRight(tri);
+  } else if (oa == oc) {
+    // b is alone, permute so b becomes a
+    permuteTriLeft(tri);
+  } else if (ob != oc) {
+    // In case a, b, c have different orientation, put a on positive side
+    if (ob > 0) {
+      permuteTriLeft(tri);
+    } else if (oc > 0) {
+      permuteTriRight(tri);
+    }
+  }
+}
+void makeTriAVertexPositive(Triangle &tri, const Triangle &other) {
+  int o = orient3D(other.a, other.b, other.c, tri.a);
+  if (o < 0) {
+    std::swap(tri.b, tri.c);
+  }
+}
+bool crossIntersect(Triangle &t1, Triangle &t2, int o1a, int o1b, int o1c,
+                    std::vector<uv_float3> *target = nullptr) {
+  int o2a = orient3D(t1.a, t1.b, t1.c, t2.a);
+  int o2b = orient3D(t1.a, t1.b, t1.c, t2.b);
+  int o2c = orient3D(t1.a, t1.b, t1.c, t2.c);
+  if (o2a == o2b && o2a == o2c) {
+    return false;
+  }
+  // Make a vertex alone on its side for both triangles
+  makeTriAVertexAlone(t1, o1a, o1b, o1c);
+  makeTriAVertexAlone(t2, o2a, o2b, o2c);
+  // Ensure the vertex on the positive side
+  makeTriAVertexPositive(t2, t1);
+  makeTriAVertexPositive(t1, t2);
+  int o1 = orient3D(t1.a, t1.b, t2.a, t2.b);
+  int o2 = orient3D(t1.a, t1.c, t2.c, t2.a);
+  if (o1 <= 0 && o2 <= 0) {
+    if (target) {
+      computeLineIntersection(t1, t2, *target);
+    }
+    return true;
+  }
+  return false;
+}
+void linesIntersect2d(const uv_float3 &a1, const uv_float3 &b1,
+                      const uv_float3 &a2, const uv_float3 &b2,
+                      uv_float3 &target) {
+  float dx1 = a1.x - b1.x;
+  float dx2 = a2.x - b2.x;
+  float dy1 = a1.y - b1.y;
+  float dy2 = a2.y - b2.y;
+  float D = dx1 * dy2 - dx2 * dy1;
+  float n1 = a1.x * b1.y - a1.y * b1.x;
+  float n2 = a2.x * b2.y - a2.y * b2.x;
+  target.x = (n1 * dx2 - n2 * dx1) / D;
+  target.y = (n1 * dy2 - n2 * dy1) / D;
+  target.z = 0;
+}
+void clipTriangle(const Triangle &t1, const Triangle &t2,
+                  std::vector<uv_float3> &target) {
+  std::vector<uv_float3> clip = {t1.a, t1.b, t1.c};
+  std::vector<uv_float3> output = {t2.a, t2.b, t2.c};
+  std::vector<int> orients(output.size() * 3, 0);
+  uv_float3 inter;
+  for (int i = 0; i < 3; ++i) {
+    const int i_prev = (i + 2) % 3;
+    std::vector<uv_float3> input;
+    std::copy(output.begin(), output.end(), std::back_inserter(input));
+    output.clear();
+    for (size_t j = 0; j < input.size(); ++j) {
+      orients[j] = orient2D(clip[i_prev], clip[i], input[j]);
+    }
+    for (size_t j = 0; j < input.size(); ++j) {
+      const int j_prev = (j - 1 + input.size()) % input.size();
+      if (orients[j] >= 0) {
+        if (orients[j_prev] < 0) {
+          linesIntersect2d(clip[i_prev], clip[i], input[j_prev], input[j],
+                           inter);
+          output.push_back({inter.x, inter.y, inter.z});
+        }
+        output.push_back({input[j].x, input[j].y, input[j].z});
+      } else if (orients[j_prev] >= 0) {
+        linesIntersect2d(clip[i_prev], clip[i], input[j_prev], input[j], inter);
+        output.push_back({inter.x, inter.y, inter.z});
+      }
+    }
+  }
+  // Clear duplicated points
+  for (const auto &point : output) {
+    int j = 0;
+    bool sameFound = false;
+    while (!sameFound && j < target.size()) {
+      sameFound = distance_to(point, target[j]) <= 1e-6;
+      j++;
+    }
+    if (!sameFound) {
+      target.push_back(point);
+    }
+  }
+}
+bool intersectionTypeR1(const Triangle &t1, const Triangle &t2) {
+  const uv_float3 &p1 = t1.a;
+  const uv_float3 &q1 = t1.b;
+  const uv_float3 &r1 = t1.c;
+  const uv_float3 &p2 = t2.a;
+  const uv_float3 &r2 = t2.c;
+  if (orient2D(r2, p2, q1) >= 0) {     // I
+    if (orient2D(r2, p1, q1) >= 0) {   // II.a
+      if (orient2D(p1, p2, q1) >= 0) { // III.a
+        return true;
+      } else {
+        if (orient2D(p1, p2, r1) >= 0) {   // IV.a
+          if (orient2D(q1, r1, p2) >= 0) { // V
+            return true;
+          }
+        }
+      }
+    }
+  } else {
+    if (orient2D(r2, p2, r1) >= 0) {     // II.b
+      if (orient2D(q1, r1, r2) >= 0) {   // III.b
+        if (orient2D(p1, p2, r1) >= 0) { // IV.b (diverges from paper)
+          return true;
+        }
+      }
+    }
+  }
+  return false;
+}
+bool intersectionTypeR2(const Triangle &t1, const Triangle &t2) {
+  const uv_float3 &p1 = t1.a;
+  const uv_float3 &q1 = t1.b;
+  const uv_float3 &r1 = t1.c;
+  const uv_float3 &p2 = t2.a;
+  const uv_float3 &q2 = t2.b;
+  const uv_float3 &r2 = t2.c;
+  if (orient2D(r2, p2, q1) >= 0) {       // I
+    if (orient2D(q2, r2, q1) >= 0) {     // II.a
+      if (orient2D(p1, p2, q1) >= 0) {   // III.a
+        if (orient2D(p1, q2, q1) <= 0) { // IV.a
+          return true;
+        }
+      } else {
+        if (orient2D(p1, p2, r1) >= 0) {   // IV.b
+          if (orient2D(r2, p2, r1) <= 0) { // V.a
+            return true;
+          }
+        }
+      }
+    } else {
+      if (orient2D(p1, q2, q1) <= 0) {     // III.b
+        if (orient2D(q2, r2, r1) >= 0) {   // IV.c
+          if (orient2D(q1, r1, q2) >= 0) { // V.b
+            return true;
+          }
+        }
+      }
+    }
+  } else {
+    if (orient2D(r2, p2, r1) >= 0) {     // II.b
+      if (orient2D(q1, r1, r2) >= 0) {   // III.c
+        if (orient2D(r1, p1, p2) >= 0) { // IV.d
+          return true;
+        }
+      } else {
+        if (orient2D(q1, r1, q2) >= 0) {   // IV.e
+          if (orient2D(q2, r2, r1) >= 0) { // V.c
+            return true;
+          }
+        }
+      }
+    }
+  }
+  return false;
+}
+bool coplanarIntersect(Triangle &t1, Triangle &t2,
+                       std::vector<uv_float3> *target = nullptr) {
+  uv_float3 normal, u, v;
+  t1.getNormal(normal);
+  normal = normalize(normal);
+  u = normalize(t1.a - t1.b);
+  v = cross(normal, u);
+  // Move basis to t1.a
+  u = u + t1.a;
+  v = v + t1.a;
+  normal = normal + t1.a;
+  Matrix4 _matrix;
+  _matrix.set(t1.a.x, u.x, v.x, normal.x, t1.a.y, u.y, v.y, normal.y, t1.a.z,
+              u.z, v.z, normal.z, 1, 1, 1, 1);
+  Matrix4 _affineMatrix;
+  _affineMatrix.set(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1);
+  _matrix.invert(); // Invert the _matrix
+  _matrix = _affineMatrix * _matrix;
+  // Apply transformation
+  apply_matrix4(t1.a, _matrix);
+  apply_matrix4(t1.b, _matrix);
+  apply_matrix4(t1.c, _matrix);
+  apply_matrix4(t2.a, _matrix);
+  apply_matrix4(t2.b, _matrix);
+  apply_matrix4(t2.c, _matrix);
+  makeTriCounterClockwise(t1);
+  makeTriCounterClockwise(t2);
+  const uv_float3 &p1 = t1.a;
+  const uv_float3 &p2 = t2.a;
+  const uv_float3 &q2 = t2.b;
+  const uv_float3 &r2 = t2.c;
+  int o_p2q2 = orient2D(p2, q2, p1);
+  int o_q2r2 = orient2D(q2, r2, p1);
+  int o_r2p2 = orient2D(r2, p2, p1);
+  bool intersecting = false;
+  if (o_p2q2 >= 0) {
+    if (o_q2r2 >= 0) {
+      if (o_r2p2 >= 0) {
+        // + + +
+        intersecting = true;
+      } else {
+        // + + -
+        intersecting = intersectionTypeR1(t1, t2);
+      }
+    } else {
+      if (o_r2p2 >= 0) {
+        // + - +
+        permuteTriRight(t2);
+        intersecting = intersectionTypeR1(t1, t2);
+      } else {
+        // + - -
+        intersecting = intersectionTypeR2(t1, t2);
+      }
+    }
+  } else {
+    if (o_q2r2 >= 0) {
+      if (o_r2p2 >= 0) {
+        // - + +
+        permuteTriLeft(t2);
+        intersecting = intersectionTypeR1(t1, t2);
+      } else {
+        // - + -
+        permuteTriLeft(t2);
+        intersecting = intersectionTypeR2(t1, t2);
+      }
+    } else {
+      if (o_r2p2 >= 0) {
+        // - - +
+        permuteTriRight(t2);
+        intersecting = intersectionTypeR2(t1, t2);
+      } else {
+        // - - -
+        std::cerr << "Triangles should not be flat." << std::endl;
+        return false;
+      }
+    }
+  }
+  if (intersecting && target) {
+    clipTriangle(t1, t2, *target);
+    _matrix.invert();
+    // Apply the transform to each target point
+    for (int i = 0; i < target->size(); ++i) {
+      apply_matrix4(target->at(i), _matrix);
+    }
+  }
+  return intersecting;
+}
+// Helper function to calculate the area of a polygon
+float polygon_area(const std::vector<uv_float3> &polygon) {
+  if (polygon.size() < 3)
+    return 0.0f; // Not a polygon
+  uv_float3 normal = {0.0f, 0.0f, 0.0f}; // Initialize normal vector
+  // Calculate the cross product of edges around the polygon
+  for (size_t i = 0; i < polygon.size(); ++i) {
+    uv_float3 p1 = polygon[i];
+    uv_float3 p2 = polygon[(i + 1) % polygon.size()];
+    normal = normal + cross(p1, p2); // Accumulate the normal vector
+  }
+  float area =
+      magnitude(normal) / 2.0f; // Area is half the magnitude of the normal
+  return area;
+}
+bool triangle_triangle_intersection(uv_float2 p1, uv_float2 q1, uv_float2 r1,
+                                    uv_float2 p2, uv_float2 q2, uv_float2 r2) {
+  Triangle t1(p1, q1, r1);
+  Triangle t2(p2, q2, r2);
+  if (isTriDegenerated(t1) || isTriDegenerated(t2)) {
+    // std::cerr << "Degenerated triangles provided, skipping." << std::endl;
+    return false;
+  }
+  int o1a = orient3D(t2.a, t2.b, t2.c, t1.a);
+  int o1b = orient3D(t2.a, t2.b, t2.c, t1.b);
+  int o1c = orient3D(t2.a, t2.b, t2.c, t1.c);
+  std::vector<uv_float3> intersections;
+  bool intersects;
+  if (o1a == o1b && o1a == o1c) // [[likely]]
+  {
+    intersects = o1a == 0 && coplanarIntersect(t1, t2, &intersections);
+  } else // [[unlikely]]
+  {
+    intersects = crossIntersect(t1, t2, o1a, o1b, o1c, &intersections);
+  }
+  if (intersects) {
+    float area = polygon_area(intersections);
+    // std::cout << "Intersection area: " << area << std::endl;
+    if (area < 1e-10f || std::isfinite(area) == false) {
+      // std::cout<<"Invalid area: " << area << std::endl;
+      return false; // Ignore intersection if the area is too small
+    }
+  }
+  return intersects;
+}

uv_unwrapper/uv_unwrapper/csrc/intersect.h ADDED Viewed

	@@ -0,0 +1,10 @@

+#pragma once
+#include "common.h"
+#include <vector>
+bool triangle_aabb_intersection(const uv_float2 &aabb_min,
+                                const uv_float2 &aabb_max, const uv_float2 &v0,
+                                const uv_float2 &v1, const uv_float2 &v2);
+bool triangle_triangle_intersection(uv_float2 p1, uv_float2 q1, uv_float2 r1,
+                                    uv_float2 p2, uv_float2 q2, uv_float2 r2);

uv_unwrapper/uv_unwrapper/csrc/unwrapper.cpp ADDED Viewed

	@@ -0,0 +1,271 @@

+#include "bvh.h"
+#include <ATen/ATen.h>
+#include <ATen/Context.h>
+#include <chrono>
+#include <cmath>
+#include <cstring>
+#include <omp.h>
+#include <set>
+#include <torch/extension.h>
+#include <vector>
+// #define TIMING
+#if defined(_MSC_VER)
+#include <BaseTsd.h>
+typedef SSIZE_T ssize_t;
+#endif
+namespace UVUnwrapper {
+void create_bvhs(BVH *bvhs, Triangle *triangles,
+                 std::vector<std::set<int>> &triangle_per_face, int num_faces,
+                 int start, int end) {
+#pragma omp parallel for
+  for (int i = start; i < end; i++) {
+    int num_triangles = triangle_per_face[i].size();
+    Triangle *triangles_per_face = new Triangle[num_triangles];
+    int *indices = new int[num_triangles];
+    int j = 0;
+    for (int idx : triangle_per_face[i]) {
+      triangles_per_face[j] = triangles[idx];
+      indices[j++] = idx;
+    }
+    // Each thread writes to it's own memory space
+    // First check if the number of triangles is 0
+    if (num_triangles == 0) {
+      bvhs[i - start] = std::move(BVH()); // Default constructor
+    } else {
+      bvhs[i - start] = std::move(
+          BVH(triangles_per_face, indices,
+              num_triangles)); // BVH now handles memory of triangles_per_face
+    }
+    delete[] triangles_per_face;
+  }
+}
+void perform_intersection_check(BVH *bvhs, int num_bvhs, Triangle *triangles,
+                                uv_float3 *vertex_tri_centroids,
+                                int64_t *assign_indices_ptr,
+                                ssize_t num_indices, int offset,
+                                std::vector<std::set<int>> &triangle_per_face) {
+  std::vector<std::pair<int, int>>
+      unique_intersections; // Store unique intersections as pairs of triangle
+                            // indices
+// Step 1: Detect intersections in parallel
+#pragma omp parallel for
+  for (int i = 0; i < num_indices; i++) {
+    if (assign_indices_ptr[i] < offset) {
+      continue;
+    }
+    Triangle cur_tri = triangles[i];
+    auto &cur_bvh = bvhs[assign_indices_ptr[i] - offset];
+    if (cur_bvh.bvhNode == nullptr) {
+      continue;
+    }
+    std::vector<int> intersections = cur_bvh.Intersect(cur_tri);
+    if (!intersections.empty()) {
+#pragma omp critical
+      {
+        for (int intersect : intersections) {
+          if (i != intersect) {
+            // Ensure we only store unique pairs (A, B) where A < B to avoid
+            // duplication
+            if (i < intersect) {
+              unique_intersections.push_back(std::make_pair(i, intersect));
+            } else {
+              unique_intersections.push_back(std::make_pair(intersect, i));
+            }
+          }
+        }
+      }
+    }
+  }
+  // Step 2: Process unique intersections
+  for (int idx = 0; idx < unique_intersections.size(); idx++) {
+    int first = unique_intersections[idx].first;
+    int second = unique_intersections[idx].second;
+    int i_idx = assign_indices_ptr[first];
+    int norm_idx = i_idx % 6;
+    int axis = (norm_idx < 2) ? 0 : (norm_idx < 4) ? 1 : 2;
+    bool use_max = (i_idx % 2) == 1;
+    float pos_a = vertex_tri_centroids[first][axis];
+    float pos_b = vertex_tri_centroids[second][axis];
+    // Sort the intersections based on vertex_tri_centroids along the specified
+    // axis
+    if (use_max) {
+      if (pos_a < pos_b) {
+        std::swap(first, second);
+      }
+    } else {
+      if (pos_a > pos_b) {
+        std::swap(first, second);
+      }
+    }
+    // Update the unique intersections
+    unique_intersections[idx].first = first;
+    unique_intersections[idx].second = second;
+  }
+  // Now only get the second intersections from the pair and put them in a set
+  // The second intersection should always be the occluded triangle
+  std::set<int> second_intersections;
+  for (int idx = 0; idx < (int)unique_intersections.size(); idx++) {
+    int second = unique_intersections[idx].second;
+    second_intersections.insert(second);
+  }
+  for (int int_idx : second_intersections) {
+    // Move the second (occluded) triangle by 6
+    int intersect_idx = assign_indices_ptr[int_idx];
+    int new_index = intersect_idx + 6;
+    new_index = std::clamp(new_index, 0, 12);
+    assign_indices_ptr[int_idx] = new_index;
+    triangle_per_face[intersect_idx].erase(int_idx);
+    triangle_per_face[new_index].insert(int_idx);
+  }
+}
+torch::Tensor assign_faces_uv_to_atlas_index(torch::Tensor vertices,
+                                             torch::Tensor indices,
+                                             torch::Tensor face_uv,
+                                             torch::Tensor face_index) {
+  // Get the number of faces
+  int num_faces = indices.size(0);
+  torch::Tensor assign_indices =
+      torch::empty(
+          {
+              num_faces,
+          },
+          torch::TensorOptions().dtype(torch::kInt64).device(torch::kCPU))
+          .contiguous();
+  auto vert_accessor = vertices.accessor<float, 2>();
+  auto indices_accessor = indices.accessor<int64_t, 2>();
+  auto face_uv_accessor = face_uv.accessor<float, 2>();
+  const int64_t *face_index_ptr = face_index.contiguous().data_ptr<int64_t>();
+  int64_t *assign_indices_ptr = assign_indices.data_ptr<int64_t>();
+  // copy face_index to assign_indices
+  memcpy(assign_indices_ptr, face_index_ptr, num_faces * sizeof(int64_t));
+#ifdef TIMING
+  auto start = std::chrono::high_resolution_clock::now();
+#endif
+  uv_float3 *vertex_tri_centroids = new uv_float3[num_faces];
+  Triangle *triangles = new Triangle[num_faces];
+  // Use std::set to store triangles for each face
+  std::vector<std::set<int>> triangle_per_face;
+  triangle_per_face.resize(13);
+#pragma omp parallel for
+  for (int i = 0; i < num_faces; i++) {
+    int face_idx = i * 3;
+    triangles[i].v0 = {face_uv_accessor[face_idx + 0][0],
+                       face_uv_accessor[face_idx + 0][1]};
+    triangles[i].v1 = {face_uv_accessor[face_idx + 1][0],
+                       face_uv_accessor[face_idx + 1][1]};
+    triangles[i].v2 = {face_uv_accessor[face_idx + 2][0],
+                       face_uv_accessor[face_idx + 2][1]};
+    triangles[i].centroid =
+        triangle_centroid(triangles[i].v0, triangles[i].v1, triangles[i].v2);
+    uv_float3 v0 = {vert_accessor[indices_accessor[i][0]][0],
+                    vert_accessor[indices_accessor[i][0]][1],
+                    vert_accessor[indices_accessor[i][0]][2]};
+    uv_float3 v1 = {vert_accessor[indices_accessor[i][1]][0],
+                    vert_accessor[indices_accessor[i][1]][1],
+                    vert_accessor[indices_accessor[i][1]][2]};
+    uv_float3 v2 = {vert_accessor[indices_accessor[i][2]][0],
+                    vert_accessor[indices_accessor[i][2]][1],
+                    vert_accessor[indices_accessor[i][2]][2]};
+    vertex_tri_centroids[i] = triangle_centroid(v0, v1, v2);
+// Assign the triangle to the face index
+#pragma omp critical
+    { triangle_per_face[face_index_ptr[i]].insert(i); }
+  }
+#ifdef TIMING
+  auto start_bvh = std::chrono::high_resolution_clock::now();
+#endif
+  BVH *bvhs = new BVH[6];
+  create_bvhs(bvhs, triangles, triangle_per_face, num_faces, 0, 6);
+#ifdef TIMING
+  auto end_bvh = std::chrono::high_resolution_clock::now();
+  std::chrono::duration<double> elapsed_seconds = end_bvh - start_bvh;
+  std::cout << "BVH build time: " << elapsed_seconds.count() << "s\n";
+  auto start_intersection_1 = std::chrono::high_resolution_clock::now();
+#endif
+  perform_intersection_check(bvhs, 6, triangles, vertex_tri_centroids,
+                             assign_indices_ptr, num_faces, 0,
+                             triangle_per_face);
+#ifdef TIMING
+  auto end_intersection_1 = std::chrono::high_resolution_clock::now();
+  elapsed_seconds = end_intersection_1 - start_intersection_1;
+  std::cout << "Intersection 1 time: " << elapsed_seconds.count() << "s\n";
+#endif
+  // Create 6 new bvhs and delete the old ones
+  BVH *new_bvhs = new BVH[6];
+  create_bvhs(new_bvhs, triangles, triangle_per_face, num_faces, 6, 12);
+#ifdef TIMING
+  auto end_bvh2 = std::chrono::high_resolution_clock::now();
+  elapsed_seconds = end_bvh2 - end_intersection_1;
+  std::cout << "BVH 2 build time: " << elapsed_seconds.count() << "s\n";
+  auto start_intersection_2 = std::chrono::high_resolution_clock::now();
+#endif
+  perform_intersection_check(new_bvhs, 6, triangles, vertex_tri_centroids,
+                             assign_indices_ptr, num_faces, 6,
+                             triangle_per_face);
+#ifdef TIMING
+  auto end_intersection_2 = std::chrono::high_resolution_clock::now();
+  elapsed_seconds = end_intersection_2 - start_intersection_2;
+  std::cout << "Intersection 2 time: " << elapsed_seconds.count() << "s\n";
+  elapsed_seconds = end_intersection_2 - start;
+  std::cout << "Total time: " << elapsed_seconds.count() << "s\n";
+#endif
+  // Cleanup
+  delete[] vertex_tri_centroids;
+  delete[] triangles;
+  delete[] bvhs;
+  delete[] new_bvhs;
+  return assign_indices;
+}
+// Registers _C as a Python extension module.
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {}
+// Defines the operators
+TORCH_LIBRARY(UVUnwrapper, m) {
+  m.def("assign_faces_uv_to_atlas_index(Tensor vertices, Tensor indices, "
+        "Tensor face_uv, Tensor face_index) -> Tensor");
+}
+// Registers CPP implementations
+TORCH_LIBRARY_IMPL(UVUnwrapper, CPU, m) {
+  m.impl("assign_faces_uv_to_atlas_index", &assign_faces_uv_to_atlas_index);
+}
+} // namespace UVUnwrapper

uv_unwrapper/uv_unwrapper/unwrap.py ADDED Viewed

	@@ -0,0 +1,669 @@

+import math
+from typing import Tuple
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import Tensor
+class Unwrapper(nn.Module):
+    def __init__(self):
+        super().__init__()
+    def _box_assign_vertex_to_cube_face(
+        self,
+        vertex_positions: Tensor,
+        vertex_normals: Tensor,
+        triangle_idxs: Tensor,
+        bbox: Tensor,
+    ) -> Tuple[Tensor, Tensor]:
+        """
+        Assigns each vertex to a cube face based on the face normal
+        Args:
+            vertex_positions (Tensor, Nv 3, float): Vertex positions
+            vertex_normals (Tensor, Nv 3, float): Vertex normals
+            triangle_idxs (Tensor, Nf 3, int): Triangle indices
+            bbox (Tensor, 2 3, float): Bounding box of the mesh
+        Returns:
+            Tensor, Nf 3 2, float: UV coordinates
+            Tensor, Nf, int: Cube face indices
+        """
+        # Test to not have a scaled model to fit the space better
+        # bbox_min = bbox[:1].mean(-1, keepdim=True)
+        # bbox_max = bbox[1:].mean(-1, keepdim=True)
+        # v_pos_normalized = (vertex_positions - bbox_min) / (bbox_max - bbox_min)
+        # Create a [0, 1] normalized vertex position
+        v_pos_normalized = (vertex_positions - bbox[:1]) / (bbox[1:] - bbox[:1])
+        # And to [-1, 1]
+        v_pos_normalized = 2.0 * v_pos_normalized - 1.0
+        # Get all vertex positions for each triangle
+        # Now how do we define to which face the triangle belongs? Mean face pos? Max vertex pos?
+        v0 = v_pos_normalized[triangle_idxs[:, 0]]
+        v1 = v_pos_normalized[triangle_idxs[:, 1]]
+        v2 = v_pos_normalized[triangle_idxs[:, 2]]
+        tri_stack = torch.stack([v0, v1, v2], dim=1)
+        vn0 = vertex_normals[triangle_idxs[:, 0]]
+        vn1 = vertex_normals[triangle_idxs[:, 1]]
+        vn2 = vertex_normals[triangle_idxs[:, 2]]
+        tri_stack_nrm = torch.stack([vn0, vn1, vn2], dim=1)
+        # Just average the normals per face
+        face_normal = F.normalize(torch.sum(tri_stack_nrm, 1), eps=1e-6, dim=-1)
+        # Now decide based on the face normal in which box map we project
+        # abs_x, abs_y, abs_z = tri_stack_nrm.abs().unbind(-1)
+        abs_x, abs_y, abs_z = tri_stack.abs().unbind(-1)
+        axis = torch.tensor(
+            [
+                [1, 0, 0],  # 0
+                [-1, 0, 0],  # 1
+                [0, 1, 0],  # 2
+                [0, -1, 0],  # 3
+                [0, 0, 1],  # 4
+                [0, 0, -1],  # 5
+            ],
+            device=face_normal.device,
+            dtype=face_normal.dtype,
+        )
+        face_normal_axis = (face_normal[:, None] * axis[None]).sum(-1)
+        index = face_normal_axis.argmax(-1)
+        max_axis, uc, vc = (
+            torch.ones_like(abs_x),
+            torch.zeros_like(tri_stack[..., :1]),
+            torch.zeros_like(tri_stack[..., :1]),
+        )
+        mask_pos_x = index == 0
+        max_axis[mask_pos_x] = abs_x[mask_pos_x]
+        uc[mask_pos_x] = tri_stack[mask_pos_x][..., 1:2]
+        vc[mask_pos_x] = -tri_stack[mask_pos_x][..., -1:]
+        mask_neg_x = index == 1
+        max_axis[mask_neg_x] = abs_x[mask_neg_x]
+        uc[mask_neg_x] = tri_stack[mask_neg_x][..., 1:2]
+        vc[mask_neg_x] = -tri_stack[mask_neg_x][..., -1:]
+        mask_pos_y = index == 2
+        max_axis[mask_pos_y] = abs_y[mask_pos_y]
+        uc[mask_pos_y] = tri_stack[mask_pos_y][..., 0:1]
+        vc[mask_pos_y] = -tri_stack[mask_pos_y][..., -1:]
+        mask_neg_y = index == 3
+        max_axis[mask_neg_y] = abs_y[mask_neg_y]
+        uc[mask_neg_y] = tri_stack[mask_neg_y][..., 0:1]
+        vc[mask_neg_y] = -tri_stack[mask_neg_y][..., -1:]
+        mask_pos_z = index == 4
+        max_axis[mask_pos_z] = abs_z[mask_pos_z]
+        uc[mask_pos_z] = tri_stack[mask_pos_z][..., 0:1]
+        vc[mask_pos_z] = tri_stack[mask_pos_z][..., 1:2]
+        mask_neg_z = index == 5
+        max_axis[mask_neg_z] = abs_z[mask_neg_z]
+        uc[mask_neg_z] = tri_stack[mask_neg_z][..., 0:1]
+        vc[mask_neg_z] = -tri_stack[mask_neg_z][..., 1:2]
+        # UC from [-1, 1] to [0, 1]
+        max_dim_div = max_axis.max(dim=0, keepdim=True).values
+        uc = ((uc[..., 0] / max_dim_div + 1.0) * 0.5).clip(0, 1)
+        vc = ((vc[..., 0] / max_dim_div + 1.0) * 0.5).clip(0, 1)
+        uv = torch.stack([uc, vc], dim=-1)
+        return uv, index
+    def _assign_faces_uv_to_atlas_index(
+        self,
+        vertex_positions: Tensor,
+        triangle_idxs: Tensor,
+        face_uv: Tensor,
+        face_index: Tensor,
+    ) -> Tensor:  # noqa: F821
+        """
+        Assigns the face UV to the atlas index
+        Args:
+            vertex_positions (Float[Tensor, "Nv 3"]): Vertex positions
+            triangle_idxs (Integer[Tensor, "Nf 3"]): Triangle indices
+            face_uv (Float[Tensor, "Nf 3 2"]): Face UV coordinates
+            face_index (Integer[Tensor, "Nf"]): Face indices
+        Returns:
+            Integer[Tensor, "Nf"]: Atlas index
+        """
+        return torch.ops.UVUnwrapper.assign_faces_uv_to_atlas_index(
+            vertex_positions.cpu(),
+            triangle_idxs.cpu(),
+            face_uv.view(-1, 2).cpu(),
+            face_index.cpu(),
+        ).to(vertex_positions.device)
+    def _find_slice_offset_and_scale(
+        self, index: Tensor
+    ) -> Tuple[Tensor, Tensor, Tensor, Tensor]:  # noqa: F821
+        """
+        Find the slice offset and scale
+        Args:
+            index (Integer[Tensor, "Nf"]): Atlas index
+        Returns:
+            Float[Tensor, "Nf"]: Offset x
+            Float[Tensor, "Nf"]: Offset y
+            Float[Tensor, "Nf"]: Division x
+            Float[Tensor, "Nf"]: Division y
+        """
+        # 6 due to the 6 cube faces
+        off = 1 / 3
+        dupl_off = 1 / 6
+        # Here, we need to decide how to pack the textures in the case of overlap
+        def x_offset_calc(x, i):
+            offset_calc = i // 6
+            # Initial coordinates - just 3x2 grid
+            if offset_calc == 0:
+                return off * x
+            else:
+                # Smaller 3x2 grid plus eventual shift to right for
+                # second overlap
+                return dupl_off * x + min(offset_calc - 1, 1) * 0.5
+        def y_offset_calc(x, i):
+            offset_calc = i // 6
+            # Initial coordinates - just a 3x2 grid
+            if offset_calc == 0:
+                return off * x
+            else:
+                # Smaller coordinates in the lowest row
+                return dupl_off * x + off * 2
+        offset_x = torch.zeros_like(index, dtype=torch.float32)
+        offset_y = torch.zeros_like(index, dtype=torch.float32)
+        offset_x_vals = [0, 1, 2, 0, 1, 2]
+        offset_y_vals = [0, 0, 0, 1, 1, 1]
+        for i in range(index.max().item() + 1):
+            mask = index == i
+            if not mask.any():
+                continue
+            offset_x[mask] = x_offset_calc(offset_x_vals[i % 6], i)
+            offset_y[mask] = y_offset_calc(offset_y_vals[i % 6], i)
+        div_x = torch.full_like(index, 6 // 2, dtype=torch.float32)
+        # All overlap elements are saved in half scale
+        div_x[index >= 6] = 6
+        div_y = div_x.clone()  # Same for y
+        # Except for the random overlaps
+        div_x[index >= 12] = 2
+        # But the random overlaps are saved in a large block in the lower thirds
+        div_y[index >= 12] = 3
+        return offset_x, offset_y, div_x, div_y
+    def _calculate_tangents(
+        self,
+        vertex_positions: Tensor,
+        vertex_normals: Tensor,
+        triangle_idxs: Tensor,
+        face_uv: Tensor,
+    ) -> Tensor:
+        """
+        Calculate the tangents for each triangle
+        Args:
+            vertex_positions (Float[Tensor, "Nv 3"]): Vertex positions
+            vertex_normals (Float[Tensor, "Nv 3"]): Vertex normals
+            triangle_idxs (Integer[Tensor, "Nf 3"]): Triangle indices
+            face_uv (Float[Tensor, "Nf 3 2"]): Face UV coordinates
+        Returns:
+            Float[Tensor, "Nf 3 4"]: Tangents
+        """
+        vn_idx = [None] * 3
+        pos = [None] * 3
+        tex = face_uv.unbind(1)
+        for i in range(0, 3):
+            pos[i] = vertex_positions[triangle_idxs[:, i]]
+            # t_nrm_idx is always the same as t_pos_idx
+            vn_idx[i] = triangle_idxs[:, i]
+        if torch.backends.mps.is_available():
+            tangents = torch.zeros_like(vertex_normals).contiguous()
+            tansum = torch.zeros_like(vertex_normals).contiguous()
+        else:
+            tangents = torch.zeros_like(vertex_normals)
+            tansum = torch.zeros_like(vertex_normals)
+        # Compute tangent space for each triangle
+        duv1 = tex[1] - tex[0]
+        duv2 = tex[2] - tex[0]
+        dpos1 = pos[1] - pos[0]
+        dpos2 = pos[2] - pos[0]
+        tng_nom = dpos1 * duv2[..., 1:2] - dpos2 * duv1[..., 1:2]
+        denom = duv1[..., 0:1] * duv2[..., 1:2] - duv1[..., 1:2] * duv2[..., 0:1]
+        # Avoid division by zero for degenerated texture coordinates
+        denom_safe = denom.clip(1e-6)
+        tang = tng_nom / denom_safe
+        # Update all 3 vertices
+        for i in range(0, 3):
+            idx = vn_idx[i][:, None].repeat(1, 3)
+            tangents.scatter_add_(0, idx, tang)  # tangents[n_i] = tangents[n_i] + tang
+            tansum.scatter_add_(
+                0, idx, torch.ones_like(tang)
+            )  # tansum[n_i] = tansum[n_i] + 1
+        # Also normalize it. Here we do not normalize the individual triangles first so larger area
+        # triangles influence the tangent space more
+        tangents = tangents / tansum
+        # Normalize and make sure tangent is perpendicular to normal
+        tangents = F.normalize(tangents, dim=1)
+        tangents = F.normalize(
+            tangents
+            - (tangents * vertex_normals).sum(-1, keepdim=True) * vertex_normals
+        )
+        return tangents
+    def _rotate_uv_slices_consistent_space(
+        self,
+        vertex_positions: Tensor,
+        vertex_normals: Tensor,
+        triangle_idxs: Tensor,
+        uv: Tensor,
+        index: Tensor,
+    ) -> Tensor:
+        """
+        Rotate the UV slices so they are in a consistent space
+        Args:
+            vertex_positions (Float[Tensor, "Nv 3"]): Vertex positions
+            vertex_normals (Float[Tensor, "Nv 3"]): Vertex normals
+            triangle_idxs (Integer[Tensor, "Nf 3"]): Triangle indices
+            uv (Float[Tensor, "Nf 3 2"]): UV coordinates
+            index (Integer[Tensor, "Nf"]): Atlas index
+        Returns:
+            Float[Tensor, "Nf 3 2"]: Rotated UV coordinates
+        """
+        tangents = self._calculate_tangents(
+            vertex_positions, vertex_normals, triangle_idxs, uv
+        )
+        pos_stack = torch.stack(
+            [
+                -vertex_positions[..., 1],
+                vertex_positions[..., 0],
+                torch.zeros_like(vertex_positions[..., 0]),
+            ],
+            dim=-1,
+        )
+        expected_tangents = F.normalize(
+            torch.linalg.cross(
+                vertex_normals,
+                torch.linalg.cross(pos_stack, vertex_normals, dim=-1),
+                dim=-1,
+            ),
+            -1,
+        )
+        actual_tangents = tangents[triangle_idxs]
+        expected_tangents = expected_tangents[triangle_idxs]
+        def rotation_matrix_2d(theta):
+            c, s = torch.cos(theta), torch.sin(theta)
+            return torch.tensor([[c, -s], [s, c]])
+        # Now find the rotation
+        index_mod = index % 6  # Shouldn't happen. Just for safety
+        for i in range(6):
+            mask = index_mod == i
+            if not mask.any():
+                continue
+            actual_mean_tangent = actual_tangents[mask].mean(dim=(0, 1))
+            expected_mean_tangent = expected_tangents[mask].mean(dim=(0, 1))
+            dot_product = torch.dot(actual_mean_tangent, expected_mean_tangent)
+            cross_product = (
+                actual_mean_tangent[0] * expected_mean_tangent[1]
+                - actual_mean_tangent[1] * expected_mean_tangent[0]
+            )
+            angle = torch.atan2(cross_product, dot_product)
+            rot_matrix = rotation_matrix_2d(angle).to(mask.device)
+            # Center the uv coordinate to be in the range of -1 to 1 and 0 centered
+            uv_cur = uv[mask] * 2 - 1  # Center it first
+            # Rotate it
+            uv[mask] = torch.einsum("ij,nfj->nfi", rot_matrix, uv_cur)
+            # Rescale uv[mask] to be within the 0-1 range
+            uv[mask] = (uv[mask] - uv[mask].min()) / (uv[mask].max() - uv[mask].min())
+        return uv
+    def _handle_slice_uvs(
+        self,
+        uv: Tensor,
+        index: Tensor,  # noqa: F821
+        island_padding: float,
+        max_index: int = 6 * 2,
+    ) -> Tensor:  # noqa: F821
+        """
+        Handle the slice UVs
+        Args:
+            uv (Float[Tensor, "Nf 3 2"]): UV coordinates
+            index (Integer[Tensor, "Nf"]): Atlas index
+            island_padding (float): Island padding
+            max_index (int): Maximum index
+        Returns:
+            Float[Tensor, "Nf 3 2"]: Updated UV coordinates
+        """
+        uc, vc = uv.unbind(-1)
+        # Get the second slice (The first overlap)
+        index_filter = [index == i for i in range(6, max_index)]
+        # Normalize them to always fully fill the atlas patch
+        for i, fi in enumerate(index_filter):
+            if fi.sum() > 0:
+                # Scale the slice but only up to a factor of 2
+                # This keeps the texture resolution with the first slice in line (Half space in UV)
+                uc[fi] = (uc[fi] - uc[fi].min()) / (uc[fi].max() - uc[fi].min()).clip(
+                    0.5
+                )
+                vc[fi] = (vc[fi] - vc[fi].min()) / (vc[fi].max() - vc[fi].min()).clip(
+                    0.5
+                )
+        uc_padded = (uc * (1 - 2 * island_padding) + island_padding).clip(0, 1)
+        vc_padded = (vc * (1 - 2 * island_padding) + island_padding).clip(0, 1)
+        return torch.stack([uc_padded, vc_padded], dim=-1)
+    def _handle_remaining_uvs(
+        self,
+        uv: Tensor,
+        index: Tensor,  # noqa: F821
+        island_padding: float,
+    ) -> Tensor:
+        """
+        Handle the remaining UVs (The ones that are not slices)
+        Args:
+            uv (Float[Tensor, "Nf 3 2"]): UV coordinates
+            index (Integer[Tensor, "Nf"]): Atlas index
+            island_padding (float): Island padding
+        Returns:
+            Float[Tensor, "Nf 3 2"]: Updated UV coordinates
+        """
+        uc, vc = uv.unbind(-1)
+        # Get all remaining elements
+        remaining_filter = index >= 6 * 2
+        squares_left = remaining_filter.sum()
+        if squares_left == 0:
+            return uv
+        uc = uc[remaining_filter]
+        vc = vc[remaining_filter]
+        # Or remaining triangles are distributed in a rectangle
+        # The rectangle takes 0.5 of the entire uv space in width and 1/3 in height
+        ratio = 0.5 * (1 / 3)  # 1.5
+        # sqrt(744/(0.5*(1/3)))
+        mult = math.sqrt(squares_left / ratio)
+        num_square_width = int(math.ceil(0.5 * mult))
+        num_square_height = int(math.ceil(squares_left / num_square_width))
+        width = 1 / num_square_width
+        height = 1 / num_square_height
+        # The idea is again to keep the texture resolution consistent with the first slice
+        # This only occupys half the region in the texture chart but the scaling on the squares
+        # assumes full coverage.
+        clip_val = min(width, height) * 1.5
+        # Now normalize the UVs with taking into account the maximum scaling
+        uc = (uc - uc.min(dim=1, keepdim=True).values) / (
+            uc.amax(dim=1, keepdim=True) - uc.amin(dim=1, keepdim=True)
+        ).clip(clip_val)
+        vc = (vc - vc.min(dim=1, keepdim=True).values) / (
+            vc.amax(dim=1, keepdim=True) - vc.amin(dim=1, keepdim=True)
+        ).clip(clip_val)
+        # Add a small padding
+        uc = (
+            uc * (1 - island_padding * num_square_width * 0.5)
+            + island_padding * num_square_width * 0.25
+        ).clip(0, 1)
+        vc = (
+            vc * (1 - island_padding * num_square_height * 0.5)
+            + island_padding * num_square_height * 0.25
+        ).clip(0, 1)
+        uc = uc * width
+        vc = vc * height
+        # And calculate offsets for each element
+        idx = torch.arange(uc.shape[0], device=uc.device, dtype=torch.int32)
+        x_idx = idx % num_square_width
+        y_idx = idx // num_square_width
+        # And move each triangle to its own spot
+        uc = uc + x_idx[:, None] * width
+        vc = vc + y_idx[:, None] * height
+        uc = (uc * (1 - 2 * island_padding * 0.5) + island_padding * 0.5).clip(0, 1)
+        vc = (vc * (1 - 2 * island_padding * 0.5) + island_padding * 0.5).clip(0, 1)
+        uv[remaining_filter] = torch.stack([uc, vc], dim=-1)
+        return uv
+    def _distribute_individual_uvs_in_atlas(
+        self,
+        face_uv: Tensor,
+        assigned_faces: Tensor,
+        offset_x: Tensor,
+        offset_y: Tensor,
+        div_x: Tensor,
+        div_y: Tensor,
+        island_padding: float,
+    ) -> Tensor:
+        """
+        Distribute the individual UVs in the atlas
+        Args:
+            face_uv (Float[Tensor, "Nf 3 2"]): Face UV coordinates
+            assigned_faces (Integer[Tensor, "Nf"]): Assigned faces
+            offset_x (Float[Tensor, "Nf"]): Offset x
+            offset_y (Float[Tensor, "Nf"]): Offset y
+            div_x (Float[Tensor, "Nf"]): Division x
+            div_y (Float[Tensor, "Nf"]): Division y
+            island_padding (float): Island padding
+        Returns:
+            Float[Tensor, "Nf 3 2"]: Updated UV coordinates
+        """
+        # Place the slice first
+        placed_uv = self._handle_slice_uvs(face_uv, assigned_faces, island_padding)
+        # Then handle the remaining overlap elements
+        placed_uv = self._handle_remaining_uvs(
+            placed_uv, assigned_faces, island_padding
+        )
+        uc, vc = placed_uv.unbind(-1)
+        uc = uc / div_x[:, None] + offset_x[:, None]
+        vc = vc / div_y[:, None] + offset_y[:, None]
+        uv = torch.stack([uc, vc], dim=-1).view(-1, 2)
+        return uv
+    def _get_unique_face_uv(
+        self,
+        uv: Tensor,
+    ) -> Tuple[Tensor, Tensor]:
+        """
+        Get the unique face UV
+        Args:
+            uv (Float[Tensor, "Nf 3 2"]): UV coordinates
+        Returns:
+            Float[Tensor, "Utex 3"]: Unique UV coordinates
+            Integer[Tensor, "Nf"]: Vertex index
+        """
+        unique_uv, unique_idx = torch.unique(uv, return_inverse=True, dim=0)
+        # And add the face to uv index mapping
+        vtex_idx = unique_idx.view(-1, 3)
+        return unique_uv, vtex_idx
+    def _align_mesh_with_main_axis(
+        self, vertex_positions: Tensor, vertex_normals: Tensor
+    ) -> Tuple[Tensor, Tensor]:
+        """
+        Align the mesh with the main axis
+        Args:
+            vertex_positions (Float[Tensor, "Nv 3"]): Vertex positions
+            vertex_normals (Float[Tensor, "Nv 3"]): Vertex normals
+        Returns:
+            Float[Tensor, "Nv 3"]: Rotated vertex positions
+            Float[Tensor, "Nv 3"]: Rotated vertex normals
+        """
+        # Use pca to find the 2 main axis (third is derived by cross product)
+        # Set the random seed so it's repeatable
+        torch.manual_seed(0)
+        _, _, v = torch.pca_lowrank(vertex_positions, q=2)
+        main_axis, seconday_axis = v[:, 0], v[:, 1]
+        main_axis = F.normalize(main_axis, eps=1e-6, dim=-1)  # 3,
+        # Orthogonalize the second axis
+        seconday_axis = F.normalize(
+            seconday_axis
+            - (seconday_axis * main_axis).sum(-1, keepdim=True) * main_axis,
+            eps=1e-6,
+            dim=-1,
+        )  # 3,
+        # Create perpendicular third axis
+        third_axis = F.normalize(
+            torch.cross(main_axis, seconday_axis, dim=-1), dim=-1, eps=1e-6
+        )  # 3,
+        # Check to which canonical axis each aligns
+        main_axis_max_idx = main_axis.abs().argmax().item()
+        seconday_axis_max_idx = seconday_axis.abs().argmax().item()
+        third_axis_max_idx = third_axis.abs().argmax().item()
+        # Now sort the axes based on the argmax so they align with thecanonoical axes
+        # If two axes have the same argmax move one of them
+        all_possible_axis = {0, 1, 2}
+        cur_index = 1
+        while (
+            len(set([main_axis_max_idx, seconday_axis_max_idx, third_axis_max_idx]))
+            != 3
+        ):
+            # Find missing axis
+            missing_axis = all_possible_axis - set(
+                [main_axis_max_idx, seconday_axis_max_idx, third_axis_max_idx]
+            )
+            missing_axis = missing_axis.pop()
+            # Just assign it to third axis as it had the smallest contribution to the
+            # overall shape
+            if cur_index == 1:
+                third_axis_max_idx = missing_axis
+            elif cur_index == 2:
+                seconday_axis_max_idx = missing_axis
+            else:
+                raise ValueError("Could not find 3 unique axis")
+            cur_index += 1
+        if len({main_axis_max_idx, seconday_axis_max_idx, third_axis_max_idx}) != 3:
+            raise ValueError("Could not find 3 unique axis")
+        axes = [None] * 3
+        axes[main_axis_max_idx] = main_axis
+        axes[seconday_axis_max_idx] = seconday_axis
+        axes[third_axis_max_idx] = third_axis
+        # Create rotation matrix from the individual axes
+        rot_mat = torch.stack(axes, dim=1).T
+        # Now rotate the vertex positions and vertex normals so the mesh aligns with the main axis
+        vertex_positions = torch.einsum("ij,nj->ni", rot_mat, vertex_positions)
+        vertex_normals = torch.einsum("ij,nj->ni", rot_mat, vertex_normals)
+        return vertex_positions, vertex_normals
+    def forward(
+        self,
+        vertex_positions: Tensor,
+        vertex_normals: Tensor,
+        triangle_idxs: Tensor,
+        island_padding: float,
+    ) -> Tuple[Tensor, Tensor]:
+        """
+        Unwrap the mesh
+        Args:
+            vertex_positions (Float[Tensor, "Nv 3"]): Vertex positions
+            vertex_normals (Float[Tensor, "Nv 3"]): Vertex normals
+            triangle_idxs (Integer[Tensor, "Nf 3"]): Triangle indices
+            island_padding (float): Island padding
+        Returns:
+            Float[Tensor, "Utex 3"]: Unique UV coordinates
+            Integer[Tensor, "Nf"]: Vertex index
+        """
+        vertex_positions, vertex_normals = self._align_mesh_with_main_axis(
+            vertex_positions, vertex_normals
+        )
+        bbox = torch.stack(
+            [vertex_positions.min(dim=0).values, vertex_positions.max(dim=0).values],
+            dim=0,
+        )  # 2, 3
+        face_uv, face_index = self._box_assign_vertex_to_cube_face(
+            vertex_positions, vertex_normals, triangle_idxs, bbox
+        )
+        face_uv = self._rotate_uv_slices_consistent_space(
+            vertex_positions, vertex_normals, triangle_idxs, face_uv, face_index
+        )
+        assigned_atlas_index = self._assign_faces_uv_to_atlas_index(
+            vertex_positions, triangle_idxs, face_uv, face_index
+        )
+        offset_x, offset_y, div_x, div_y = self._find_slice_offset_and_scale(
+            assigned_atlas_index
+        )
+        placed_uv = self._distribute_individual_uvs_in_atlas(
+            face_uv,
+            assigned_atlas_index,
+            offset_x,
+            offset_y,
+            div_x,
+            div_y,
+            island_padding,
+        )
+        return self._get_unique_face_uv(placed_uv)