Python NumPy 2.0 incompatibilities

NumPy logo

Summary

The new Numpy 2.0 is not compatible with many other packages, as I discovered after many lost hours.

The final Docker images can be downloaded from alleleanalyzer.

NumPy in a Docker

I was interested to implement a process involving Python packages from a paper published in 2019: “AlleleAnalyzer: a tool for personalized and allele-specific sgRNA design” (Keough et al. 2019) and following what appeared to be a “simple” tutorial with what seemed “reasonable” requirements.

While “Python is great” and “environments” are allegedly “super great” my experience with this has not be “great” and I usually prefer to work within a Docker container, most often under some version of Linux with a pre-installed version of Python.

The requirements within the AlleleAnalyzer GitHub repository were shown as:

pytest>=3.4
pandas>=0.21.0
numpy>=1.13.3
docopt>=0.6.2
pyfaidx>=0.5.1
regex
Biopython
bcftools>=1.5
pulp
pytables

I rather quickly found that bcftools is not installed by Python (but might by Conda?) while the rest seemed “OK” at first. Therefore, this line was removed (or commented out.)

The it appeared that pytables is not the name of the PyTables module but simply tables.

However, regardless of what I tried there was always what appeared to be the same error regardless of what else I was trying to “fix”:

ValueError: numpy.dtype size changed, may indicate binary incompatibility

After multiple web search for answers, I finally recognized that a few of the posts were pointing out to the fact that NumPy 2.0 was probably creating the problems. Indeed, I was able to “suddenly make it work” by removing that version and replace it with version 1.23 (with Python 3.9) or version 1.26 (with Python 3.12.)

The requirement written as numpy>=1.13.3 causes version 2.0.1 to be installed. Therefore, I changed that requirement to numpy==1.23 or numpy==1.26.

Final Docker images

At first I thought that the errors were caused by the fact the there had been too many changes since the 2019 paper publication, and I started to use a Python version that was released in October 2019: version 3.9.

The two “finalists” for Python 3.9.19 and 3.12.4 have been uploaded to the Docker Hub as alleleanalyzer.

Example to run, from a Terminal:

docker run -it --rm -v $PWD:/data jysgro/alleleanalyzer:v1-py3.12.4

If, according to the tutorial, we are within the tutorial_directory with the downloaded sample files, we can run their command:

python3 ../AlleleAnalyzer/preprocessing/generate_gens_dfs/get_gens_df.py wtc_phased_hg19.bcf 1:12040238-12073572 mfn2_wtc_hg19

Note: The sample files are at a different web address than that of the tutorial. The .bcf file(s) are now located within the directory: https://alleleanalyzer.pollard.gladstone.org/excisionFinderData_public/gRNA_tutorial_sample_data/sample_input/

HDF5 format

The resulting file from the python3 command will be called: mfn2_wtc_hg19.h5

To decode this file, one can use the h5dump utility from the software collection of the “HDFGroup” which can be obtained from https://hdfgroup.org/downloads/hdf5/

References

Keough, K.C., Lyalina, S., Olvera, M.P. et al. AlleleAnalyzer: a tool for personalized and allele-specific sgRNA design. Genome Biol 20, 167 (2019). https://doi.org/10.1186/s13059-019-1783-3


Image Credits: portion of an Adobe Firefly rendering with prompt “A symbolic array of numbers, crisp neo-pop illustration; pencil sketch on old yellowed, stained paper” and Content Type as “Art” and “Monochromatic“. Based on the Adobe Firefly template “Old tall sailing ship on the ocean”.  Numpy Logo added.