InstaDeepAI
/

instanovoplus-v1.1.0

@@ -1,205 +1,10 @@
 ---
-license: cc-by-nc-sa-4.0
-library_name: pytorch
 tags:
-- proteomics
-- mass-spectrometry
-- peptide-sequencing
-- de-novo-sequencing
-- diffusion
-- multinomial-diffusion
-- biology
-- computational-biology
-pipeline_tag: text-generation
-datasets:
-- InstaDeepAI/ms_ninespecies_benchmark
 ---
-# InstaNovoPlus: Diffusion-Powered De novo Peptide Sequencing Model
-## Model Description
-InstaNovoPlus is a diffusion-based model for de novo peptide sequencing from mass spectrometry data. This model leverages multinomial diffusion for accurate, database-free peptide identification for large-scale proteomics experiments.
-## Usage
-```python
-import torch
-import numpy as np
-import pandas as pd
-from instanovo.diffusion.multinomial_diffusion import InstaNovoPlus
-from instanovo.utils import SpectrumDataFrame
-from instanovo.transformer.dataset import SpectrumDataset, collate_batch
-from torch.utils.data import DataLoader
-from instanovo.inference import ScoredSequence
-from instanovo.inference.diffusion import DiffusionDecoder
-from instanovo.utils.metrics import Metrics
-from tqdm.notebook import tqdm
-# Load the model from the Hugging Face Hub
-model, config = InstaNovoPlus.from_pretrained("InstaDeepAI/instanovoplus-v1.1.0")
-# Move the model to the GPU if available
-device = "cuda" if torch.cuda.is_available() else "cpu"
-model = model.to(device).eval()
-# Update the residue set with custom modifications
-model.residue_set.update_remapping(
-    {
-        "M(ox)": "M[UNIMOD:35]",
-        "M(+15.99)": "M[UNIMOD:35]",
-        "S(p)": "S[UNIMOD:21]",  # Phosphorylation
-        "T(p)": "T[UNIMOD:21]",
-        "Y(p)": "Y[UNIMOD:21]",
-        "S(+79.97)": "S[UNIMOD:21]",
-        "T(+79.97)": "T[UNIMOD:21]",
-        "Y(+79.97)": "Y[UNIMOD:21]",
-        "Q(+0.98)": "Q[UNIMOD:7]",  # Deamidation
-        "N(+0.98)": "N[UNIMOD:7]",
-        "Q(+.98)": "Q[UNIMOD:7]",
-        "N(+.98)": "N[UNIMOD:7]",
-        "C(+57.02)": "C[UNIMOD:4]",  # Carboxyamidomethylation
-        "(+42.01)": "[UNIMOD:1]",  # Acetylation
-        "(+43.01)": "[UNIMOD:5]",  # Carbamylation
-        "(-17.03)": "[UNIMOD:385]",
-    }
-)
-# Load the test data
-sdf = SpectrumDataFrame.from_huggingface(
-    "InstaDeepAI/ms_ninespecies_benchmark",
-    is_annotated=True,
-    shuffle=False,
-    split="test[:10%]",  # Let's only use a subset of the test data for faster inference
-)
-# Create the dataset
-ds = SpectrumDataset(
-    sdf,
-    model.residue_set,
-    config.get("n_peaks", 200),
-    return_str=False,
-    annotated=True,
-    peptide_pad_length=model.config.get("max_length", 30),
-    reverse_peptide=False,  # we do not reverse peptide for diffusion
-    add_eos=False,
-    tokenize_peptide=True,
-)
-# Create the data loader
-dl = DataLoader(
-    ds,
-    batch_size=64,
-    num_workers=0,  # sdf requirement, handled internally
-    shuffle=False,  # sdf requirement, handled internally
-    collate_fn=collate_batch,
-)
-# Create the decoder
-diffusion_decoder = DiffusionDecoder(model=model)
-predictions = []
-log_probs = []
-# Iterate over the data loader
-for batch in tqdm(dl, total=len(dl)):
-    spectra, precursors, spectra_padding_mask, peptides, _ = batch
-    spectra = spectra.to(device)
-    precursors = precursors.to(device)
-    spectra_padding_mask = spectra_padding_mask.to(device)
-    peptides = peptides.to(device)
-    # Perform inference
-    with torch.no_grad():
-        batch_predictions, batch_log_probs = diffusion_decoder.decode(
-            spectra=spectra,
-            spectra_padding_mask=spectra_padding_mask,
-            precursors=precursors,
-            initial_sequence=peptides,
-        )
-    predictions.extend(batch_predictions)
-    log_probs.extend(batch_log_probs)
-# Initialize metrics
-metrics = Metrics(model.residue_set, config["isotope_error_range"])
-# Compute precision and recall
-aa_precision, aa_recall, peptide_recall, peptide_precision = metrics.compute_precision_recall(
-    peptides, preds
-)
-# Compute amino acid error rate and AUC
-aa_error_rate = metrics.compute_aa_er(targs, preds)
-auc = metrics.calc_auc(targs, preds, np.exp(pd.Series(probs)))
-print(f"amino acid error rate:    {aa_error_rate:.5f}")
-print(f"amino acid precision:     {aa_precision:.5f}")
-print(f"amino acid recall:        {aa_recall:.5f}")
-print(f"peptide precision:        {peptide_precision:.5f}")
-print(f"peptide recall:           {peptide_recall:.5f}")
-print(f"area under the PR curve:  {auc:.5f}")
-```
-For more explanation, see the [Getting Started notebook](https://github.com/instadeepai/InstaNovo/blob/main/notebooks/getting_started_with_instanovo.ipynb) in the repository.
-## Citation
-If you use InstaNovoPlus in your research, please cite:
-```bibtex
-@article{eloff_kalogeropoulos_2025_instanovo,
-        title        = {InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale
-                        proteomics experiments},
-        author       = {Eloff, Kevin and Kalogeropoulos, Konstantinos and Mabona, Amandla and Morell,
-                        Oliver and Catzel, Rachel and Rivera-de-Torre, Esperanza and Berg Jespersen,
-                        Jakob and Williams, Wesley and van Beljouw, Sam P. B. and Skwark, Marcin J.
-                        and Laustsen, Andreas Hougaard and Brouns, Stan J. J. and Ljungars,
-                        Anne and Schoof, Erwin M. and Van Goey, Jeroen and auf dem Keller, Ulrich and
-                        Beguir, Karim and Lopez Carranza, Nicolas and Jenkins, Timothy P.},
-        year         = {2025},
-        month        = {Mar},
-        day          = {31},
-        journal      = {Nature Machine Intelligence},
-        doi          = {10.1038/s42256-025-01019-5},
-        issn         = {2522-5839},
-        url          = {https://doi.org/10.1038/s42256-025-01019-5}
-}
-```
-## Resources
-- **Code Repository**: [https://github.com/instadeepai/InstaNovo](https://github.com/instadeepai/InstaNovo)
-- **Documentation**: [https://instadeepai.github.io/InstaNovo/](https://instadeepai.github.io/InstaNovo/)
-- **Publication**: [https://www.nature.com/articles/s42256-025-01019-5](https://www.nature.com/articles/s42256-025-01019-5)
-## License
-- **Code**: Licensed under Apache License 2.0
-- **Model Checkpoints**: Licensed under Creative Commons Non-Commercial (CC BY-NC-SA 4.0)
-## Installation
-```bash
-pip install instanovo
-```
-For GPU support, install with CUDA dependencies:
-```bash
-pip install instanovo[cu126]
-```
-## Requirements
-- Python >= 3.10, < 3.13
-- PyTorch >= 1.13.0
-- CUDA (optional, for GPU acceleration)
-## Support
-For questions, issues, or contributions, please visit the [GitHub repository](https://github.com/instadeepai/InstaNovo) or check the [documentation](https://instadeepai.github.io/InstaNovo/).

 ---
 tags:
+- model_hub_mixin
+- pytorch_model_hub_mixin
 ---
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
+- Code: [More Information Needed]
+- Paper: [More Information Needed]
+- Docs: [More Information Needed]

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a95bb269a974fbb70d914108c40d6389dc3fbe6a06d8d8f325f1f0f27ea9a88f
 size 690402444

 version https://git-lfs.github.com/spec/v1
+oid sha256:00cb39e8f7715fdfe3ab90a133ac2eb69082784c39d29f5f23a980e3967d4302
 size 690402444