Cannot run ProtNLM from Uniprot with Tensorflow in colab notebook

Question

I am interested in running Uniprot's Protein descriptor model, ProtNLM, to add some bonus descriptors for a big chunk of protein sequence I have.

They have a trial notebook available here.

Here is the full code of the notebook:

!python3 -m pip install -q -U tensorflow==2.8.2
!python3 -m pip install -q -U tensorflow-text==2.8.2
import tensorflow as tf
import tensorflow_text
import numpy as np
import re

import IPython.display
from absl import logging

tf.compat.v1.enable_eager_execution()

logging.set_verbosity(logging.ERROR)  # Turn down tensorflow warnings

def print_markdown(string):
  IPython.display.display(IPython.display.Markdown(string))

! mkdir -p protnlm

! wget -nc https://storage.googleapis.com/brain-genomics-public/research/proteins/protnlm/uniprot_2022_04/savedmodel__20221011__030822_1128_bs1.bm10.eos_cpu/saved_model.pb -P protnlm -q --no-check-certificate
! mkdir -p protnlm/variables
! wget -nc https://storage.googleapis.com/brain-genomics-public/research/proteins/protnlm/uniprot_2022_04/savedmodel__20221011__030822_1128_bs1.bm10.eos_cpu/variables/variables.index -P protnlm/variables/ -q --no-check-certificate
! wget -nc https://storage.googleapis.com/brain-genomics-public/research/proteins/protnlm/uniprot_2022_04/savedmodel__20221011__030822_1128_bs1.bm10.eos_cpu/variables/variables.data-00000-of-00001 -P protnlm/variables/ -q --no-check-certificate

imported = tf.saved_model.load(export_dir="protnlm")
infer = imported.signatures["serving_default"]

sequence = "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP AVHASLDKFLASVSTVLTSKYR" #@param {type:"string"}
sequence = sequence.replace(' ', '')

names, scores = run_inference(sequence)

for name, score, i in zip(names, scores, range(len(names))):
  print_markdown(f"Prediction number {i+1}: **{name}** with a score of **{score:.03f}**")

The one change I have made is to update the tensorflow version on these lines:

!python3 -m pip install -q -U tensorflow==2.8.2
!python3 -m pip install -q -U tensorflow-text==2.8.2

to be >=2.8.2 since the 2.8.2 version couldn't be installed.

Now, I can't run the model whatsoever. The third cell, which intakes the sequence, can't find the run_inference() function:

NameError                                 
Traceback (most recent call last)

<ipython-input-5-4a7325a0e004> in <cell line: 0>()
      8 sequence = sequence.replace(' ', '')
      9 
---> 10 names, scores = run_inference(sequence)
     11 
     12 for name, score, i in zip(names, scores, range(len(names))):

NameError: name 'run_inference' is not defined

I didn't see this function defined in the notebook, so I assumed it was internal to tensorflow (maybe only version 2.8.2, but I couldn't find anything in searching docs), or otherwise loaded with the model.

How can I get this script running again?

don't expect that we will visit external links. You should put all details in question (not in comments) as text (not images) — furas
– furas, Commented Mar 5 at 3:47
I don't see def run_inference(..):... or import run_inference - so it can't run it? You could search run_inference in Google - maybe it needs only some from ... import run_inference — furas
– furas, Commented Mar 5 at 3:49
using run_inference tensorflow in Google I found python - TensorFlow Inference - Stack Overflow with some def run_inference() in question and answers. In this question someone wrote this function from scratch and it is not part any module — furas
– furas, Commented Mar 5 at 3:53
that is...remarkable design from the colab notebook folks. Thank you for being more thorough than me, I can't believe I overlooked that "show code" section so many times. — lunchbox7804
– lunchbox7804, Commented Mar 6 at 15:39
it seems someone else missed this button and send issue :) I think I also missed this button when I was checking your link to notebook. I never saw notebook with hidden elements - so I didn't expect it. — furas
– furas, Commented Mar 7 at 5:19

furas · Accepted Answer · 2025-03-05 22:41:32Z

I checked notebook and there is button Show code below of 2. Load the model
and it shows code with def run_inference(seq):

#@markdown Please execute this cell by pressing the _Play_ button.

def query(seq):
  return f"[protein_name_in_english] <extra_id_0> [sequence] {seq}"

EC_NUMBER_REGEX = r'(\d+).([\d\-n]+).([\d\-n]+).([\d\-n]+)'

def run_inference(seq):
  labeling = infer(tf.constant([query(seq)]))
  names = labeling['output_0'][0].numpy().tolist()
  scores = labeling['output_1'][0].numpy().tolist()
  beam_size = len(names)
  names = [names[beam_size-1-i].decode().replace('<extra_id_0> ', '') for i in range(beam_size)]
  for i, name in enumerate(names):
    if re.match(EC_NUMBER_REGEX, name):
      names[i] = 'EC:' + name
  scores = [np.exp(scores[beam_size-1-i]) for i in range(beam_size)]
  return names, scores

You can see the same code in issue:

issue with the protnlm_use_model_for_inference_uniprot_2022_04.ipynb colab notebook · Issue #2073 · google-research/google-research

I found this notebook also on GitHub and it shows this code

google-research/protnlm/protnlm_use_model_for_inference_uniprot_2022_04.ipynb

Collectives™ on Stack Overflow

Cannot run ProtNLM from Uniprot with Tensorflow in colab notebook

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related