BioGeek commited on
Commit
c00adce
·
verified ·
1 Parent(s): 594dd94

Add instanovoplus-v1.1.0 model

Browse files
Files changed (2) hide show
  1. README.md +6 -201
  2. model.safetensors +1 -1
README.md CHANGED
@@ -1,205 +1,10 @@
1
  ---
2
- license: cc-by-nc-sa-4.0
3
- library_name: pytorch
4
  tags:
5
- - proteomics
6
- - mass-spectrometry
7
- - peptide-sequencing
8
- - de-novo-sequencing
9
- - diffusion
10
- - multinomial-diffusion
11
- - biology
12
- - computational-biology
13
- pipeline_tag: text-generation
14
- datasets:
15
- - InstaDeepAI/ms_ninespecies_benchmark
16
  ---
17
 
18
- # InstaNovoPlus: Diffusion-Powered De novo Peptide Sequencing Model
19
-
20
-
21
-
22
- ## Model Description
23
-
24
- InstaNovoPlus is a diffusion-based model for de novo peptide sequencing from mass spectrometry data. This model leverages multinomial diffusion for accurate, database-free peptide identification for large-scale proteomics experiments.
25
-
26
-
27
- ## Usage
28
-
29
- ```python
30
- import torch
31
- import numpy as np
32
- import pandas as pd
33
- from instanovo.diffusion.multinomial_diffusion import InstaNovoPlus
34
- from instanovo.utils import SpectrumDataFrame
35
- from instanovo.transformer.dataset import SpectrumDataset, collate_batch
36
- from torch.utils.data import DataLoader
37
- from instanovo.inference import ScoredSequence
38
- from instanovo.inference.diffusion import DiffusionDecoder
39
- from instanovo.utils.metrics import Metrics
40
- from tqdm.notebook import tqdm
41
-
42
- # Load the model from the Hugging Face Hub
43
- model, config = InstaNovoPlus.from_pretrained("InstaDeepAI/instanovoplus-v1.1.0")
44
-
45
- # Move the model to the GPU if available
46
- device = "cuda" if torch.cuda.is_available() else "cpu"
47
- model = model.to(device).eval()
48
-
49
- # Update the residue set with custom modifications
50
- model.residue_set.update_remapping(
51
- {
52
- "M(ox)": "M[UNIMOD:35]",
53
- "M(+15.99)": "M[UNIMOD:35]",
54
- "S(p)": "S[UNIMOD:21]", # Phosphorylation
55
- "T(p)": "T[UNIMOD:21]",
56
- "Y(p)": "Y[UNIMOD:21]",
57
- "S(+79.97)": "S[UNIMOD:21]",
58
- "T(+79.97)": "T[UNIMOD:21]",
59
- "Y(+79.97)": "Y[UNIMOD:21]",
60
- "Q(+0.98)": "Q[UNIMOD:7]", # Deamidation
61
- "N(+0.98)": "N[UNIMOD:7]",
62
- "Q(+.98)": "Q[UNIMOD:7]",
63
- "N(+.98)": "N[UNIMOD:7]",
64
- "C(+57.02)": "C[UNIMOD:4]", # Carboxyamidomethylation
65
- "(+42.01)": "[UNIMOD:1]", # Acetylation
66
- "(+43.01)": "[UNIMOD:5]", # Carbamylation
67
- "(-17.03)": "[UNIMOD:385]",
68
- }
69
- )
70
-
71
- # Load the test data
72
- sdf = SpectrumDataFrame.from_huggingface(
73
- "InstaDeepAI/ms_ninespecies_benchmark",
74
- is_annotated=True,
75
- shuffle=False,
76
- split="test[:10%]", # Let's only use a subset of the test data for faster inference
77
- )
78
-
79
- # Create the dataset
80
- ds = SpectrumDataset(
81
- sdf,
82
- model.residue_set,
83
- config.get("n_peaks", 200),
84
- return_str=False,
85
- annotated=True,
86
- peptide_pad_length=model.config.get("max_length", 30),
87
- reverse_peptide=False, # we do not reverse peptide for diffusion
88
- add_eos=False,
89
- tokenize_peptide=True,
90
- )
91
-
92
- # Create the data loader
93
- dl = DataLoader(
94
- ds,
95
- batch_size=64,
96
- num_workers=0, # sdf requirement, handled internally
97
- shuffle=False, # sdf requirement, handled internally
98
- collate_fn=collate_batch,
99
- )
100
-
101
- # Create the decoder
102
- diffusion_decoder = DiffusionDecoder(model=model)
103
-
104
- predictions = []
105
- log_probs = []
106
-
107
- # Iterate over the data loader
108
- for batch in tqdm(dl, total=len(dl)):
109
- spectra, precursors, spectra_padding_mask, peptides, _ = batch
110
- spectra = spectra.to(device)
111
- precursors = precursors.to(device)
112
- spectra_padding_mask = spectra_padding_mask.to(device)
113
- peptides = peptides.to(device)
114
-
115
- # Perform inference
116
- with torch.no_grad():
117
- batch_predictions, batch_log_probs = diffusion_decoder.decode(
118
- spectra=spectra,
119
- spectra_padding_mask=spectra_padding_mask,
120
- precursors=precursors,
121
- initial_sequence=peptides,
122
- )
123
- predictions.extend(batch_predictions)
124
- log_probs.extend(batch_log_probs)
125
-
126
- # Initialize metrics
127
- metrics = Metrics(model.residue_set, config["isotope_error_range"])
128
-
129
- # Compute precision and recall
130
- aa_precision, aa_recall, peptide_recall, peptide_precision = metrics.compute_precision_recall(
131
- peptides, preds
132
- )
133
-
134
- # Compute amino acid error rate and AUC
135
- aa_error_rate = metrics.compute_aa_er(targs, preds)
136
- auc = metrics.calc_auc(targs, preds, np.exp(pd.Series(probs)))
137
-
138
- print(f"amino acid error rate: {aa_error_rate:.5f}")
139
- print(f"amino acid precision: {aa_precision:.5f}")
140
- print(f"amino acid recall: {aa_recall:.5f}")
141
- print(f"peptide precision: {peptide_precision:.5f}")
142
- print(f"peptide recall: {peptide_recall:.5f}")
143
- print(f"area under the PR curve: {auc:.5f}")
144
- ```
145
-
146
- For more explanation, see the [Getting Started notebook](https://github.com/instadeepai/InstaNovo/blob/main/notebooks/getting_started_with_instanovo.ipynb) in the repository.
147
-
148
-
149
- ## Citation
150
-
151
- If you use InstaNovoPlus in your research, please cite:
152
-
153
- ```bibtex
154
- @article{eloff_kalogeropoulos_2025_instanovo,
155
- title = {InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale
156
- proteomics experiments},
157
- author = {Eloff, Kevin and Kalogeropoulos, Konstantinos and Mabona, Amandla and Morell,
158
- Oliver and Catzel, Rachel and Rivera-de-Torre, Esperanza and Berg Jespersen,
159
- Jakob and Williams, Wesley and van Beljouw, Sam P. B. and Skwark, Marcin J.
160
- and Laustsen, Andreas Hougaard and Brouns, Stan J. J. and Ljungars,
161
- Anne and Schoof, Erwin M. and Van Goey, Jeroen and auf dem Keller, Ulrich and
162
- Beguir, Karim and Lopez Carranza, Nicolas and Jenkins, Timothy P.},
163
- year = {2025},
164
- month = {Mar},
165
- day = {31},
166
- journal = {Nature Machine Intelligence},
167
- doi = {10.1038/s42256-025-01019-5},
168
- issn = {2522-5839},
169
- url = {https://doi.org/10.1038/s42256-025-01019-5}
170
- }
171
- ```
172
-
173
-
174
- ## Resources
175
-
176
- - **Code Repository**: [https://github.com/instadeepai/InstaNovo](https://github.com/instadeepai/InstaNovo)
177
- - **Documentation**: [https://instadeepai.github.io/InstaNovo/](https://instadeepai.github.io/InstaNovo/)
178
- - **Publication**: [https://www.nature.com/articles/s42256-025-01019-5](https://www.nature.com/articles/s42256-025-01019-5)
179
-
180
- ## License
181
-
182
- - **Code**: Licensed under Apache License 2.0
183
- - **Model Checkpoints**: Licensed under Creative Commons Non-Commercial (CC BY-NC-SA 4.0)
184
-
185
- ## Installation
186
-
187
- ```bash
188
- pip install instanovo
189
- ```
190
-
191
- For GPU support, install with CUDA dependencies:
192
- ```bash
193
- pip install instanovo[cu126]
194
- ```
195
-
196
- ## Requirements
197
-
198
- - Python >= 3.10, < 3.13
199
- - PyTorch >= 1.13.0
200
- - CUDA (optional, for GPU acceleration)
201
-
202
-
203
- ## Support
204
-
205
- For questions, issues, or contributions, please visit the [GitHub repository](https://github.com/instadeepai/InstaNovo) or check the [documentation](https://instadeepai.github.io/InstaNovo/).
 
1
  ---
 
 
2
  tags:
3
+ - model_hub_mixin
4
+ - pytorch_model_hub_mixin
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
+ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
+ - Code: [More Information Needed]
9
+ - Paper: [More Information Needed]
10
+ - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a95bb269a974fbb70d914108c40d6389dc3fbe6a06d8d8f325f1f0f27ea9a88f
3
  size 690402444
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00cb39e8f7715fdfe3ab90a133ac2eb69082784c39d29f5f23a980e3967d4302
3
  size 690402444