4,462 questions
0
votes
1
answer
73
views
Seeding alignment algorithms
I am creating a one step look ahead alignment algorithm in a protein alignment context.
I am now implementing a seeded option, seeds are also provided to the function, in which the gaps are stripped ...
0
votes
1
answer
91
views
How do I get the UniprotKB API request results to match the results from Uniprot Website search?
I have a list of strings I want to search with UniProt and then get the first entry and its information.
My problem is that the following code does return results, but it doesn't return the first ...
1
vote
1
answer
72
views
How can I use wildcard paths from a Pandas dataframe as required rule inputs and outputs in Snakemake?
I have a Snakemake pipeline (https://github.com/V-Varga/SPOT-BGC/tree/main), where I generate input and output file names for various intermediate steps using wildcards that refer back to file and ...
2
votes
1
answer
183
views
Snakemake wildcard issues using touch() to avoid premature pipeline progression
I have written a bioinformatics pipeline (https://github.com/V-Varga/SPOT-BGC/tree/main) in Snakemake. While it is has been functional until now, one of the datasets I have tried to use it on is ...
1
vote
0
answers
47
views
makeblastdb Error: Duplicate seq_ids GNL|BL_ORD_ID|90650 despite unique FASTA headers and version 4 database
I’m trying to build a BLAST nucleotide database using makeblastdb (NCBI BLAST 2.16.0) inside a Singularity container. My FASTA headers have been renamed to be unique in the format:
>file<file#&...
2
votes
0
answers
64
views
Abnormal time consumed in UMAP transformation while fitting was fine, what happened?
I was working on some bioinfo task using Python, and used UMAP in this process. Despite the model was fitted in under 20 seconds, transformation failed (or I least I conclude so) given that there was ...
0
votes
0
answers
158
views
Why my Transformer model did not work well when dealing with single cell multi-omic data
The complete codes and data are available at:Google Disk
I'm working on a high-dimensional regression problem and have built a Transformer-based model in PyTorch. While the model trains, I'm observing ...
1
vote
1
answer
53
views
Rust-HTSlib Script Only Outputs BAM Header, Records Are Missing
I am working on a Rust script to process BAM files using the rust-htslib crate and expose the functionality to Python using pyo3. My goal is to read an input BAM file (which includes both a header and ...
3
votes
1
answer
233
views
Why does the ProtBERT model generate identical embeddings for all non-whitespace-separated (single token?) inputs?
Why do non-identical inputs to ProtBERT generate identical embeddings when non-whitespace-separated?
I've looked at answers here etc. but they appear to be different cases where the slicing of the out....
3
votes
1
answer
57
views
Merge many .fasta files
I’m currently working with a large dataset and need help merging multiple .fasta files. Although I’m not an expert, I’ve attempted to automate this process using a Python script. However, the merging ...
1
vote
1
answer
43
views
How can I get CDS using annotation.gff and gene sequence.fna?
This is the CDS for Pun1 from https://www.ncbi.nlm.nih.gov/datasets/gene/id/107859694/products/
NM_001324769.1:37-1359 LOC107859694 [organism=Capsicum annuum] [GeneID=107859694] [region=cds]
...
2
votes
8
answers
189
views
Remove a string character between 2 special characters in the headers of a fastq file
I have a fastq file containing several sequences with headers such as :
tail SRR11149706_1.fastq
@SRR11149706.16630586 16630586/1
CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA
@...
2
votes
1
answer
75
views
Error when running lilikoi.featuresSelection() in lilikoi R package: "non-numeric argument to binary operator"
I'm using the lilikoi R package to follow a built-in example from the official documentation. While most of the steps work correctly, I encounter an error when I
attempt to run the lilikoi....
-4
votes
1
answer
146
views
Align two separate ggplots at their x axis but keep their y axis independent
I would like to generate a few different plots in ggplot2 and assemble them in an image tool like MS publisher or Inkscape to get a single publication-ready figure.
Ideally, I would like to produce ...
0
votes
1
answer
40
views
Nextflow including function from separate “utils.nf” into main.nf with DSL2
I have written a function called "resolve" to help manage inputs for my nextflow DSL2 workflow. It works how I want, but I’d ideally like to put it in a separate utils.nf file and then ...
0
votes
0
answers
35
views
PSORTb Missing output file(s) error in Nextflow process
I'm a beginner here. I've built a few nextflow workflows for other tools before. The command for PSORTb requires you to specify the directory where the output in stored and this is where I feel the ...
0
votes
2
answers
137
views
How to Iterate over molecules in PyMol?
I am working with Gromacs .gro files in PyMol and running into problems with multi-stranded molecules. .Gro files do not have chain identifiers, which PyMol apparently needs to calculate cartoon ...
1
vote
0
answers
40
views
Issues with R gawdis function. Getting Error in names(w3) <- dimnames(x)[[2]] : 'names' attribute [8] must be the same length as the vector [5]
I'm doing an assignment for an Ecology course for my master's degree. The instructions are as follows:
Using the dataset "tussock" from the FD package seen in the first class:
1- Firstly, ...
-1
votes
1
answer
93
views
awk command in linux does not execute; always get the error (awk: line 1: syntax error at or near '}' )
I am a beginner with using linux bash for bioinformatics purpose and recently i encountered some error with this 'awk' command. ChatGPT suggestion is not helping and the task is very basic. I have a ...
2
votes
0
answers
101
views
Need help optimizing nested parallel iterators in Rust
I'm pretty new to Rust and I was trying to write a project to get my hands dirty with the language, really figure it out. I wanted to write a bioinformatics tool that uses a multiple sequence ...
0
votes
1
answer
52
views
Creating A More Efficent For Loop for SNP Frequency
I'm trying to count the frequency of SNPs every 100,000 base locations. I'm using a VCF file I've already prepared, and my professor showed me to use code such as below:
inputfile=open("...
0
votes
1
answer
102
views
Generating the global bayesian fit and local fit for a BLI experiment
I am trying to get the global bayesian fit and local fit for the raw data curve of a Bio-Layer Interferometry (BLI) experiment.
In a BLI experiment, you are first coating the tip of a sensor with ...
0
votes
1
answer
74
views
Python FileNotFoundError When Reading CSV Files in My Script
I'm a beginner and I'm working on a Python script that processes gene expression data, and I'm trying to plot volcano plots for different brain regions (EC, PC, and Hippocampus). However, I keep ...
0
votes
1
answer
127
views
Can you color branch lines in a tree based on node data? (ggtree)
Say I have a dataframe that dennotes node colors for a given tree (3 nodes for clarity but would like to expand this to 1000 nodes)
note_df
nodes
color
node1
#0d3b66
node2
#faf0ca
node3
#f4d35e
And ...
0
votes
1
answer
83
views
Convert .mol files to CCD .mmcif files with AlphaFold3
I would like to convert .mol files into CDD .mmcif files which is the input format of alphafold 3. In the code of AF3, we can find a Python function which enables it.
This function uses the Python ...
0
votes
0
answers
22
views
How to sync jbrowse scale with external scale
I am trying to integrate jbrowse into an existing platform. The current platform provides me with values such as the bin size, bpperpixel and start and end position. Is there a way to adjust jbrowse ...
1
vote
1
answer
76
views
determine position of an inserted string within another
Following this post I managed to put together a small function to place within a bigger text body (FASTA) shorter strings determined from another file based on some conditions (e.g. 100 events from a ...
0
votes
0
answers
114
views
FastQC Stalling Under Heavy System Load (JRE/JVM Issue)
Terrible title, and I'll update if a more effective way of asking can be suggested.
Problem
We're running a bioinformatics pipeline that uses FastQC for quality control. The pipeline is written in ...
1
vote
2
answers
108
views
how to randomly add a list of sequences into a text body
This is one of my first tasks with actual Python code.
What I need to do is to import a set of (FASTA) sequences, select those with a length between 400 and 500 (base pairs) characters, and randomly ...
0
votes
1
answer
193
views
How do you convert a .cif file to a .mol file?
I've been working with AlphaFold 3 on a Linux HPC, and I've been trying to use Posebusters to evaluate the results of AlphaFold 3 by comparing the predicted structures with the ground truth structures ...
0
votes
0
answers
42
views
caught segfault (address 0x2b) error when knitting BPCells + Seurat object
I have an integrated Seurat object with approximately ~480k cells, integrated using the sketch-based method detailed here (leveraging the on-disk storage capabilities of BPCells). I keep getting this ...
0
votes
0
answers
96
views
"Discrete values supplied to continuous scale" Error in R. Not sure why as I did not even use that specific function?
I am a beginner in R, and have only been doing it for a couple weeks. Recently I have been trying to engage with more advanced R material for my work in bioinformatics. I found out about ggplot and ...
-1
votes
1
answer
55
views
Cannot run ProtNLM from Uniprot with Tensorflow in colab notebook
I am interested in running Uniprot's Protein descriptor model, ProtNLM, to add some bonus descriptors for a big chunk of protein sequence I have.
They have a trial notebook available here.
Here is the ...
1
vote
1
answer
76
views
How do I properly use codonbias.scores.FrequencyOfOptimalCodons?
I am trying to write a script analyzing codon usage in sequence utilizing the codon-bias package.
I am trying to use the class codonbias.scores.FrequencyOfOptimalCodons, but when I do so in my code:
...
1
vote
0
answers
60
views
How can I store the value of a Dash slider component?
I am developing an MRI file viewer app using dash and plotly. The way it works is I can select a specific MRI file from my dataset and the app will generate a slider that could take you through the ...
2
votes
2
answers
240
views
How to add a group-specific index to a polars dataframe with an expression instead of a map_groups user-defined function?
I am curious whether I am missing something in the Polars Expression library in how this could be done more efficiently. I have a dataframe of protein sequences, where I would like to create k-long ...
0
votes
1
answer
63
views
fastqc and trimming for rnseq data
could anyone suggest what is wrong in this snakefile code; I am trying to learn snakemake so could you please suggest any useful resources to read more about snakmake.I will be thankful for all your ...
2
votes
2
answers
35
views
editing fasta-lines, keeping first (ENSP) and last (Gene-Symbol-Isoform) and adding Uniprot ID
I got a fasta file assembled from RNA-seq data like this:
>ENSP00000493376.2|ENST00000641515.2|ENSG00000186092.7|OTTHUMG00000001094.4|OTTHUMT00000003223.4|OR4F5-201|OR4F5|326
...
1
vote
0
answers
55
views
Still Having Trouble Integrating LinearFold with Arnie's bpps.py – KeyError 'linearfold_v' & Import Issues Despite Proper Configuration
I am modifying bpps.py in the Arnie Python package to integrate LinearFold for RNA secondary structure predictions. However, I keep encountering "KeyError: 'linearfold_v'", and issues when ...
1
vote
1
answer
122
views
How to subset a nextflow channel using another channel containing only one unknown value
I have a channel which emit as follows:
[[A, B, C, D], 1]
[[A, B, C, D], 2]
[[A, B, C], 3]
[[A, B, V], 4]
[[A], 5]
[[A1, B1, D], 7]
[[A1, B1, D], 8]
I have another parameter defined by the user. The ...
0
votes
0
answers
28
views
Defining a complex tandem repeat motif with indels and substitutions
I am working on a tandem repeat project and I want to define a repeated motif that is complex, including indels and substitutions, with most bases being conserved. The motif varies in length between ...
1
vote
1
answer
78
views
Snakemake: Rule is not picked up
I was trying to created a snakemake script to automate 3 tasks. The first one edits a .seg file in order to be the correct input for the next rule, the second rule computes an analysis for ...
0
votes
0
answers
44
views
Arnie (RNA Folding) Not Detecting ViennaRNA Even After Modifying utils.py
I'm trying to use the Arnie Python package for RNA folding with ViennaRNA and LinearFold. However, even after modifying utils.py to include debugging statements and manually add ViennaRNA, Arnie does ...
1
vote
1
answer
105
views
My txt. file can't be read by R for Gene Onthology analysis
So, I have 3 .txt files according to the three categories of gene enrichment I downloaded from the GO platform and they just can't be read in R, I think it's due to the inconsistent columns.
First I ...
0
votes
0
answers
30
views
MSA, Custom Jalview Annotation by sequence
I need to write a customized Jalview annotation to colour (for example in green) different positions for each sequence_id (on each row).
I need an example (I've already checked the doc). Help, Thx.
...
0
votes
0
answers
34
views
iNaturalist API – Missing observation names in response?
I wanted to see if anyone here has experience using the iNaturalist API because I'm having an issue with it. I built a web page using Python, but I'm not sure if I'm making the request incorrectly or ...
0
votes
1
answer
128
views
anndata.concat resulting in 4x the size of the individual files causing memory issues
I am new to anndata and would like to know if an issue that i am running into expected or not.
I have 28 h5ad files (Tabula Sapiens)(https://figshare.com/articles/dataset/Tabula_Sapiens_v2/27921984), ...
1
vote
1
answer
108
views
Performance issue using HPX for parallelization in C++ code
I am trying to parallelize my code using HPX in order to improve performance. Below is the original code and my attempt to refactor it using HPX.
Original Code:
std::vector<std::vector<std::pair&...
0
votes
0
answers
35
views
Park and Ramirez's Bioreactor using TensorFlow - An optimization problem
I have been tasked with optimizing the productivity of Park and Ramirez's Bioreactor using TensorFlow. To achieve this, I generate a dataset by creating random values for the "Feed" variable,...
1
vote
0
answers
213
views
1.9 TB is insufficient memory for minimap2 whole genome alignment
After running minimap2 -ax asm5 --eqx with two fasta files that are both hifiasm assemblies scaffolded to the same reference via ragtag, minimap crashes with the following output:
[M::mm_idx_gen::25....