-1

I am interested in running Uniprot's Protein descriptor model, ProtNLM, to add some bonus descriptors for a big chunk of protein sequence I have.

They have a trial notebook available here.

Here is the full code of the notebook:

!python3 -m pip install -q -U tensorflow==2.8.2
!python3 -m pip install -q -U tensorflow-text==2.8.2
import tensorflow as tf
import tensorflow_text
import numpy as np
import re

import IPython.display
from absl import logging

tf.compat.v1.enable_eager_execution()

logging.set_verbosity(logging.ERROR)  # Turn down tensorflow warnings

def print_markdown(string):
  IPython.display.display(IPython.display.Markdown(string))

! mkdir -p protnlm

! wget -nc https://storage.googleapis.com/brain-genomics-public/research/proteins/protnlm/uniprot_2022_04/savedmodel__20221011__030822_1128_bs1.bm10.eos_cpu/saved_model.pb -P protnlm -q --no-check-certificate
! mkdir -p protnlm/variables
! wget -nc https://storage.googleapis.com/brain-genomics-public/research/proteins/protnlm/uniprot_2022_04/savedmodel__20221011__030822_1128_bs1.bm10.eos_cpu/variables/variables.index -P protnlm/variables/ -q --no-check-certificate
! wget -nc https://storage.googleapis.com/brain-genomics-public/research/proteins/protnlm/uniprot_2022_04/savedmodel__20221011__030822_1128_bs1.bm10.eos_cpu/variables/variables.data-00000-of-00001 -P protnlm/variables/ -q --no-check-certificate

imported = tf.saved_model.load(export_dir="protnlm")
infer = imported.signatures["serving_default"]

sequence = "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP AVHASLDKFLASVSTVLTSKYR" #@param {type:"string"}
sequence = sequence.replace(' ', '')

names, scores = run_inference(sequence)

for name, score, i in zip(names, scores, range(len(names))):
  print_markdown(f"Prediction number {i+1}: **{name}** with a score of **{score:.03f}**")

The one change I have made is to update the tensorflow version on these lines:

!python3 -m pip install -q -U tensorflow==2.8.2
!python3 -m pip install -q -U tensorflow-text==2.8.2

to be >=2.8.2 since the 2.8.2 version couldn't be installed.

Now, I can't run the model whatsoever. The third cell, which intakes the sequence, can't find the run_inference() function:

NameError                                 
Traceback (most recent call last)

<ipython-input-5-4a7325a0e004> in <cell line: 0>()
      8 sequence = sequence.replace(' ', '')
      9 
---> 10 names, scores = run_inference(sequence)
     11 
     12 for name, score, i in zip(names, scores, range(len(names))):

NameError: name 'run_inference' is not defined

I didn't see this function defined in the notebook, so I assumed it was internal to tensorflow (maybe only version 2.8.2, but I couldn't find anything in searching docs), or otherwise loaded with the model.

How can I get this script running again?

9
  • don't expect that we will visit external links. You should put all details in question (not in comments) as text (not images) Commented Mar 5 at 3:47
  • I don't see def run_inference(..):... or import run_inference - so it can't run it? You could search run_inference in Google - maybe it needs only some from ... import run_inference Commented Mar 5 at 3:49
  • using run_inference tensorflow in Google I found python - TensorFlow Inference - Stack Overflow with some def run_inference() in question and answers. In this question someone wrote this function from scratch and it is not part any module Commented Mar 5 at 3:53
  • 1
    that is...remarkable design from the colab notebook folks. Thank you for being more thorough than me, I can't believe I overlooked that "show code" section so many times. Commented Mar 6 at 15:39
  • 1
    it seems someone else missed this button and send issue :) I think I also missed this button when I was checking your link to notebook. I never saw notebook with hidden elements - so I didn't expect it. Commented Mar 7 at 5:19

1 Answer 1

1

I checked notebook and there is button Show code below of 2. Load the model
and it shows code with def run_inference(seq):

#@markdown Please execute this cell by pressing the _Play_ button.

def query(seq):
  return f"[protein_name_in_english] <extra_id_0> [sequence] {seq}"

EC_NUMBER_REGEX = r'(\d+).([\d\-n]+).([\d\-n]+).([\d\-n]+)'

def run_inference(seq):
  labeling = infer(tf.constant([query(seq)]))
  names = labeling['output_0'][0].numpy().tolist()
  scores = labeling['output_1'][0].numpy().tolist()
  beam_size = len(names)
  names = [names[beam_size-1-i].decode().replace('<extra_id_0> ', '') for i in range(beam_size)]
  for i, name in enumerate(names):
    if re.match(EC_NUMBER_REGEX, name):
      names[i] = 'EC:' + name
  scores = [np.exp(scores[beam_size-1-i]) for i in range(beam_size)]
  return names, scores

You can see the same code in issue:

issue with the protnlm_use_model_for_inference_uniprot_2022_04.ipynb colab notebook · Issue #2073 · google-research/google-research


I found this notebook also on GitHub and it shows this code

google-research/protnlm/protnlm_use_model_for_inference_uniprot_2022_04.ipynb

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.