Newest 'bioinformatics' Questions

0 votes

1 answer

73 views

Seeding alignment algorithms

I am creating a one step look ahead alignment algorithm in a protein alignment context. I am now implementing a seeded option, seeds are also provided to the function, in which the gaps are stripped ...

Jaime Duarte

1

asked Oct 29 at 11:29

0 votes

1 answer

91 views

How do I get the UniprotKB API request results to match the results from Uniprot Website search?

I have a list of strings I want to search with UniProt and then get the first entry and its information. My problem is that the following code does return results, but it doesn't return the first ...

Melissa Flassig

3

asked Oct 28 at 10:43

1 vote

1 answer

72 views

How can I use wildcard paths from a Pandas dataframe as required rule inputs and outputs in Snakemake?

I have a Snakemake pipeline (https://github.com/V-Varga/SPOT-BGC/tree/main), where I generate input and output file names for various intermediate steps using wildcards that refer back to file and ...

Vi_Varga

25

asked Oct 16 at 18:42

2 votes

1 answer

183 views

Snakemake wildcard issues using touch() to avoid premature pipeline progression

I have written a bioinformatics pipeline (https://github.com/V-Varga/SPOT-BGC/tree/main) in Snakemake. While it is has been functional until now, one of the datasets I have tried to use it on is ...

Vi_Varga

25

asked Oct 14 at 12:30

1 vote

0 answers

47 views

makeblastdb Error: Duplicate seq_ids GNL|BL_ORD_ID|90650 despite unique FASTA headers and version 4 database

I’m trying to build a BLAST nucleotide database using makeblastdb (NCBI BLAST 2.16.0) inside a Singularity container. My FASTA headers have been renamed to be unique in the format: >file<file#&...

Asad Prodhan

75

asked Oct 7 at 5:20

2 votes

0 answers

64 views

Abnormal time consumed in UMAP transformation while fitting was fine, what happened？

I was working on some bioinfo task using Python, and used UMAP in this process. Despite the model was fitted in under 20 seconds, transformation failed (or I least I conclude so) given that there was ...

user31535378

21

asked Sep 21 at 8:19

0 votes

0 answers

158 views

Why my Transformer model did not work well when dealing with single cell multi-omic data

The complete codes and data are available at:Google Disk I'm working on a high-dimensional regression problem and have built a Transformer-based model in PyTorch. While the model trains, I'm observing ...

氢氰酸

9

asked Sep 3 at 14:31

1 vote

1 answer

53 views

Rust-HTSlib Script Only Outputs BAM Header, Records Are Missing

I am working on a Rust script to process BAM files using the rust-htslib crate and expose the functionality to Python using pyo3. My goal is to read an input BAM file (which includes both a header and ...

One thousand

69

asked Aug 17 at 15:35

3 votes

1 answer

233 views

Why does the ProtBERT model generate identical embeddings for all non-whitespace-separated (single token?) inputs?

Why do non-identical inputs to ProtBERT generate identical embeddings when non-whitespace-separated? I've looked at answers here etc. but they appear to be different cases where the slicing of the out....

Maximilian Press

397

asked Jul 31 at 17:17

3 votes

1 answer

57 views

Merge many .fasta files

I’m currently working with a large dataset and need help merging multiple .fasta files. Although I’m not an expert, I’ve attempted to automate this process using a Python script. However, the merging ...

Andrea S.

31

asked Jul 23 at 18:59

1 vote

1 answer

43 views

How can I get CDS using annotation.gff and gene sequence.fna?

This is the CDS for Pun1 from https://www.ncbi.nlm.nih.gov/datasets/gene/id/107859694/products/ NM_001324769.1:37-1359 LOC107859694 [organism=Capsicum annuum] [GeneID=107859694] [region=cds] ...

alkyl official

11

asked Jul 5 at 19:23

2 votes

8 answers

189 views

Remove a string character between 2 special characters in the headers of a fastq file

I have a fastq file containing several sequences with headers such as : tail SRR11149706_1.fastq @SRR11149706.16630586 16630586/1 CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA @...

CaroZ

149

asked Jul 4 at 20:33

2 votes

1 answer

75 views

Error when running lilikoi.featuresSelection() in lilikoi R package: "non-numeric argument to binary operator"

I'm using the lilikoi R package to follow a built-in example from the official documentation. While most of the steps work correctly, I encounter an error when I attempt to run the lilikoi....

Давид Пирић

21

asked Jul 1 at 13:48

-4 votes

1 answer

146 views

Align two separate ggplots at their x axis but keep their y axis independent

I would like to generate a few different plots in ggplot2 and assemble them in an image tool like MS publisher or Inkscape to get a single publication-ready figure. Ideally, I would like to produce ...

Cobalamin

37

asked Jun 29 at 10:45

0 votes

1 answer

40 views

Nextflow including function from separate “utils.nf” into main.nf with DSL2

I have written a function called "resolve" to help manage inputs for my nextflow DSL2 workflow. It works how I want, but I’d ideally like to put it in a separate utils.nf file and then ...

rbierman

91

asked Jun 27 at 15:52

0 votes

0 answers

35 views

PSORTb Missing output file(s) error in Nextflow process

I'm a beginner here. I've built a few nextflow workflows for other tools before. The command for PSORTb requires you to specify the directory where the output in stored and this is where I feel the ...

Sravan Krishna

1

asked Jun 15 at 9:51

0 votes

2 answers

137 views

How to Iterate over molecules in PyMol?

I am working with Gromacs .gro files in PyMol and running into problems with multi-stranded molecules. .Gro files do not have chain identifiers, which PyMol apparently needs to calculate cartoon ...

Erik

332

asked Jun 12 at 12:18

1 vote

0 answers

40 views

Issues with R gawdis function. Getting Error in names(w3) <- dimnames(x)[[2]] : 'names' attribute [8] must be the same length as the vector [5]

I'm doing an assignment for an Ecology course for my master's degree. The instructions are as follows: Using the dataset "tussock" from the FD package seen in the first class: 1- Firstly, ...

Jonas Rosa

11

asked Jun 6 at 15:14

-1 votes

1 answer

93 views

awk command in linux does not execute; always get the error (awk: line 1: syntax error at or near '}' )

I am a beginner with using linux bash for bioinformatics purpose and recently i encountered some error with this 'awk' command. ChatGPT suggestion is not helping and the task is very basic. I have a ...

Luka Jašović

1

asked Jun 1 at 17:56

2 votes

0 answers

101 views

Need help optimizing nested parallel iterators in Rust

I'm pretty new to Rust and I was trying to write a project to get my hands dirty with the language, really figure it out. I wanted to write a bioinformatics tool that uses a multiple sequence ...

Abhirath Anand

205

asked May 18 at 4:17

0 votes

1 answer

52 views

Creating A More Efficent For Loop for SNP Frequency

I'm trying to count the frequency of SNPs every 100,000 base locations. I'm using a VCF file I've already prepared, and my professor showed me to use code such as below: inputfile=open("...

yamianne

1

asked May 16 at 18:50

0 votes

1 answer

102 views

Generating the global bayesian fit and local fit for a BLI experiment

I am trying to get the global bayesian fit and local fit for the raw data curve of a Bio-Layer Interferometry (BLI) experiment. In a BLI experiment, you are first coating the tip of a sensor with ...

Cobalamin

37

asked May 16 at 11:55

0 votes

1 answer

74 views

Python FileNotFoundError When Reading CSV Files in My Script

I'm a beginner and I'm working on a Python script that processes gene expression data, and I'm trying to plot volcano plots for different brain regions (EC, PC, and Hippocampus). However, I keep ...

Farah Yasser

1

asked Apr 23 at 20:14

0 votes

1 answer

127 views

Can you color branch lines in a tree based on node data? (ggtree)

Say I have a dataframe that dennotes node colors for a given tree (3 nodes for clarity but would like to expand this to 1000 nodes) note_df nodes color node1 #0d3b66 node2 #faf0ca node3 #f4d35e And ...

Sam Degregori

83

asked Apr 22 at 21:07

0 votes

1 answer

83 views

Convert .mol files to CCD .mmcif files with AlphaFold3

I would like to convert .mol files into CDD .mmcif files which is the input format of alphafold 3. In the code of AF3, we can find a Python function which enables it. This function uses the Python ...

user30270061

1

asked Apr 14 at 16:19

0 votes

0 answers

22 views

How to sync jbrowse scale with external scale

I am trying to integrate jbrowse into an existing platform. The current platform provides me with values such as the bin size, bpperpixel and start and end position. Is there a way to adjust jbrowse ...

david dami

83

asked Apr 14 at 15:43

1 vote

1 answer

76 views

determine position of an inserted string within another

Following this post I managed to put together a small function to place within a bigger text body (FASTA) shorter strings determined from another file based on some conditions (e.g. 100 events from a ...

Matteo

435

asked Apr 11 at 8:22

0 votes

0 answers

114 views

FastQC Stalling Under Heavy System Load (JRE/JVM Issue)

Terrible title, and I'll update if a more effective way of asking can be suggested. Problem We're running a bioinformatics pipeline that uses FastQC for quality control. The pipeline is written in ...

GilChrist19

67

asked Apr 10 at 23:47

1 vote

2 answers

108 views

how to randomly add a list of sequences into a text body

This is one of my first tasks with actual Python code. What I need to do is to import a set of (FASTA) sequences, select those with a length between 400 and 500 (base pairs) characters, and randomly ...

Matteo

435

asked Apr 8 at 12:37

0 votes

1 answer

193 views

How do you convert a .cif file to a .mol file?

I've been working with AlphaFold 3 on a Linux HPC, and I've been trying to use Posebusters to evaluate the results of AlphaFold 3 by comparing the predicted structures with the ground truth structures ...

melee

113

asked Mar 31 at 16:26

0 votes

0 answers

42 views

caught segfault (address 0x2b) error when knitting BPCells + Seurat object

I have an integrated Seurat object with approximately ~480k cells, integrated using the sketch-based method detailed here (leveraging the on-disk storage capabilities of BPCells). I keep getting this ...

user29769413

1

asked Mar 12 at 17:17

0 votes

0 answers

96 views

"Discrete values supplied to continuous scale" Error in R. Not sure why as I did not even use that specific function?

I am a beginner in R, and have only been doing it for a couple weeks. Recently I have been trying to engage with more advanced R material for my work in bioinformatics. I found out about ggplot and ...

user29964013

1

asked Mar 11 at 13:40

-1 votes

1 answer

55 views

Cannot run ProtNLM from Uniprot with Tensorflow in colab notebook

I am interested in running Uniprot's Protein descriptor model, ProtNLM, to add some bonus descriptors for a big chunk of protein sequence I have. They have a trial notebook available here. Here is the ...

lunchbox7804

43

asked Mar 3 at 20:10

1 vote

1 answer

76 views

How do I properly use codonbias.scores.FrequencyOfOptimalCodons?

I am trying to write a script analyzing codon usage in sequence utilizing the codon-bias package. I am trying to use the class codonbias.scores.FrequencyOfOptimalCodons, but when I do so in my code: ...

Shlomo Goren

11

asked Mar 3 at 15:45

1 vote

0 answers

60 views

How can I store the value of a Dash slider component?

I am developing an MRI file viewer app using dash and plotly. The way it works is I can select a specific MRI file from my dataset and the app will generate a slider that could take you through the ...

Malek kchaou

11

asked Mar 3 at 0:16

2 votes

2 answers

240 views

How to add a group-specific index to a polars dataframe with an expression instead of a map_groups user-defined function?

I am curious whether I am missing something in the Polars Expression library in how this could be done more efficiently. I have a dataframe of protein sequences, where I would like to create k-long ...

Olga Botvinnik

1,694

asked Feb 26 at 16:25

0 votes

1 answer

63 views

fastqc and trimming for rnseq data

could anyone suggest what is wrong in this snakefile code; I am trying to learn snakemake so could you please suggest any useful resources to read more about snakmake.I will be thankful for all your ...

bioinfonext

121

asked Feb 26 at 10:04

2 votes

2 answers

35 views

editing fasta-lines, keeping first (ENSP) and last (Gene-Symbol-Isoform) and adding Uniprot ID

Ulrike Resch

21

asked Feb 22 at 14:45

1 vote

0 answers

55 views

Still Having Trouble Integrating LinearFold with Arnie's bpps.py – KeyError 'linearfold_v' & Import Issues Despite Proper Configuration

I am modifying bpps.py in the Arnie Python package to integrate LinearFold for RNA secondary structure predictions. However, I keep encountering "KeyError: 'linearfold_v'", and issues when ...

Hugh Redford

11

asked Feb 19 at 21:11

1 vote

1 answer

122 views

How to subset a nextflow channel using another channel containing only one unknown value

I have a channel which emit as follows: [[A, B, C, D], 1] [[A, B, C, D], 2] [[A, B, C], 3] [[A, B, V], 4] [[A], 5] [[A1, B1, D], 7] [[A1, B1, D], 8] I have another parameter defined by the user. The ...

Ahkam

13

asked Feb 19 at 14:06

0 votes

0 answers

28 views

Defining a complex tandem repeat motif with indels and substitutions

I am working on a tandem repeat project and I want to define a repeated motif that is complex, including indels and substitutions, with most bases being conserved. The motif varies in length between ...

Grégoire Blavier

1

asked Feb 19 at 6:03

1 vote

1 answer

78 views

Snakemake: Rule is not picked up

I was trying to created a snakemake script to automate 3 tasks. The first one edits a .seg file in order to be the correct input for the next rule, the second rule computes an analysis for ...

Alessandra Bonilla Salon

13

asked Feb 11 at 11:12

0 votes

0 answers

44 views

Arnie (RNA Folding) Not Detecting ViennaRNA Even After Modifying utils.py

I'm trying to use the Arnie Python package for RNA folding with ViennaRNA and LinearFold. However, even after modifying utils.py to include debugging statements and manually add ViennaRNA, Arnie does ...

Hugh Redford

11

asked Feb 6 at 13:20

1 vote

1 answer

105 views

My txt. file can't be read by R for Gene Onthology analysis

So, I have 3 .txt files according to the three categories of gene enrichment I downloaded from the GO platform and they just can't be read in R, I think it's due to the inconsistent columns. First I ...

Julieta González

11

asked Feb 5 at 6:10

0 votes

0 answers

30 views

MSA, Custom Jalview Annotation by sequence

I need to write a customized Jalview annotation to colour (for example in green) different positions for each sequence_id (on each row). I need an example (I've already checked the doc). Help, Thx. ...

mscr

1

asked Jan 31 at 16:59

0 votes

0 answers

34 views

iNaturalist API – Missing observation names in response?

I wanted to see if anyone here has experience using the iNaturalist API because I'm having an issue with it. I built a web page using Python, but I'm not sure if I'm making the request incorrectly or ...

Adolfo Morales

9

asked Jan 31 at 11:19

0 votes

1 answer

128 views

anndata.concat resulting in 4x the size of the individual files causing memory issues

I am new to anndata and would like to know if an issue that i am running into expected or not. I have 28 h5ad files (Tabula Sapiens)(https://figshare.com/articles/dataset/Tabula_Sapiens_v2/27921984), ...

Danish Zahid Malik

713

asked Jan 27 at 12:56

1 vote

1 answer

108 views

Performance issue using HPX for parallelization in C++ code

I am trying to parallelize my code using HPX in order to improve performance. Below is the original code and my attempt to refactor it using HPX. Original Code: std::vector<std::vector<std::pair&...

tlparolin

13

asked Jan 24 at 22:19

0 votes

0 answers

35 views

Park and Ramirez's Bioreactor using TensorFlow - An optimization problem

I have been tasked with optimizing the productivity of Park and Ramirez's Bioreactor using TensorFlow. To achieve this, I generate a dataset by creating random values for the "Feed" variable,...

Bernardo Ribeiro

1

asked Jan 24 at 12:25

1 vote

0 answers

213 views

1.9 TB is insufficient memory for minimap2 whole genome alignment

After running minimap2 -ax asm5 --eqx with two fasta files that are both hifiasm assemblies scaffolded to the same reference via ragtag, minimap crashes with the following output: [M::mm_idx_gen::25....

altuffin

11

asked Jan 22 at 12:09

Collectives™ on Stack Overflow