0

I'm a beginner and I'm working on a Python script that processes gene expression data, and I'm trying to plot volcano plots for different brain regions (EC, PC, and Hippocampus). However, I keep encountering a FileNotFoundError when the script attempts to load the CSV files for each region.

What I'm trying to do: I want the script to read each of the CSV files for the respective brain regions and generate a volcano plot.

Code:

`import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# 1. Load the 203 common genes
common = pd.read_csv('common_genes_all_regions.csv')['GeneSymbol'].tolist()

def plot_volcano(region_label, filename):
    # 2. Read without header, then select columns: G (6), B (1), F (5)
    df = pd.read_csv(filename, header=None, usecols=[6,1,5])
    df.columns = ['Gene','p_value','logFC']
    df = df[df['Gene'].isin(common)]
    df['neglog10p'] = -np.log10(df['p_value'])

    # 3. Plot
    plt.figure(figsize=(6,5))
    plt.scatter(df['logFC'], df['neglog10p'],
                c=(df['p_value']<0.05)&(df['logFC'].abs()>1),
                cmap='coolwarm', edgecolor='k', alpha='0.7')
    plt.axhline(-np.log10(0.05), color='grey', linestyle='--')
    plt.axvline(1, color='grey', linestyle='--')
    plt.axvline(-1, color='grey', linestyle='--')
    plt.title(f"{region_label} Volcano (203 common genes)")
    plt.xlabel('Log2 Fold Change')
    plt.ylabel('-Log10(p-value)')
    plt.tight_layout()
    plt.savefig(f'volcano_{region_label}.png', dpi=300)
    plt.show()
    print(f"→ Saved volcano_{region_label}.png")

# 4. Generate for each region file
plot_volcano('EC', 'EC FILE.csv')
plot_volcano('PC', 'PC FILE.csv')
plot_volcano('HIPPOCAMPUS', 'HIPPOCAMPUS FILE.csv')`

Error Message:

FileNotFoundError: [Errno 2] No such file or directory: 'EC FILE.csv'

The file names are correctly spelled, and I’ve checked the directory where the script is running.

Problem:

I have the files EC FILE.csv, PC FILE.csv, and HIPPOCAMPUS FILE.csv in the same directory as the script.

However, when I try to run the script, I get a FileNotFoundError indicating that the file 'EC FILE.csv' cannot be found, and the same happens for the other files as well.

Steps I've Taken: Verified that the files are indeed in the same directory as the script.

Printed the absolute path of the files using os.path.abspath().

Checked for any typos in the file names (including case sensitivity).

Tried providing the absolute path directly in the plot_volcano function to see if it resolves the issue.

The file names are correctly spelled, and I’ve checked the directory where the script is running.

System Information: Operating system: macOS

What am I missing or doing wrong? Any help would be greatly appreciated

2
  • Keep in mind that when you use a relative path, the path is relative to the directory the script was run from, not relative to the script's location. This makes relative paths quite fragile, since if you run the script from the wrong location, the relative paths will be wrong. If you open a terminal and then do python ./some/path/script.py, . is the directory being read. Commented Apr 23 at 21:50
  • print(os.getcwd()) to see if the current directory is the directory with the files in them. Commented Apr 23 at 22:14

1 Answer 1

0

If all your files are in the same directory as the script, and you want to make sure you can run them from whichever current working directory, you can define a "base directory" variable like this:

from os import path
BASE_DIR = path.dirname(path.abspath(__file__))

and then you can use path.join to add the absolute path to the file, for example:

common = pd.read_csv(path.join(BASE_DIR, 'common_genes_all_regions.csv'))['GeneSymbol'].tolist()

Of course if, later on, you will decide to move this into a subfolder, for example in Scripts/CSV2Volcano.py, you can simply manipulate the BASE_DIR variable by adding "..", that is:

from os import path
BASE_DIR = path.join(path.dirname(path.abspath(__file__), "..")

However, if you will decide to make this more complex later on, and you decide to import this sub-folder script from a main file in the main directory, you will need to do this instead:

BASE_DIR = path.dirname(inspect.stack()[-1].filename)

This will get the directory of the "calling" script, that is the script which is using import Scripts.CSV2Volcano in the main directory.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.