Summary
The new Numpy 2.0
is not compatible with many other packages, as I discovered after many lost hours.
The final Docker images can be downloaded from alleleanalyzer.
NumPy in a Docker
I was interested to implement a process involving Python packages from a paper published in 2019: “AlleleAnalyzer: a tool for personalized and allele-specific sgRNA design” (Keough et al. 2019) and following what appeared to be a “simple” tutorial with what seemed “reasonable” requirements.
While “Python is great” and “environments” are allegedly “super great” my experience with this has not be “great” and I usually prefer to work within a Docker container, most often under some version of Linux with a pre-installed version of Python.
The requirements within the AlleleAnalyzer GitHub repository were shown as:
pytest>=3.4
pandas>=0.21.0
numpy>=1.13.3
docopt>=0.6.2
pyfaidx>=0.5.1
regex
Biopython
bcftools>=1.5
pulp
pytables
I rather quickly found that bcftools
is not installed by Python (but might by Conda?) while the rest seemed “OK” at first. Therefore, this line was removed (or commented out.)
The it appeared that pytables
is not the name of the PyTables module but simply tables
.
However, regardless of what I tried there was always what appeared to be the same error regardless of what else I was trying to “fix”:
ValueError: numpy.dtype size changed, may indicate binary incompatibility
After multiple web search for answers, I finally recognized that a few of the posts were pointing out to the fact that NumPy 2.0
was probably creating the problems. Indeed, I was able to “suddenly make it work” by removing that version and replace it with version 1.23
(with Python 3.9) or version 1.26
(with Python 3.12.)
The requirement written as numpy>=1.13.3
causes version 2.0.1
to be installed. Therefore, I changed that requirement to numpy==1.23
or numpy==1.26
.
Final Docker images
At first I thought that the errors were caused by the fact the there had been too many changes since the 2019 paper publication, and I started to use a Python version that was released in October 2019: version 3.9.
The two “finalists” for Python 3.9.19
and 3.12.4
have been uploaded to the Docker Hub as alleleanalyzer.
Example to run, from a Terminal:
docker run -it --rm -v $PWD:/data jysgro/alleleanalyzer:v1-py3.12.4
If, according to the tutorial, we are within the tutorial_directory
with the downloaded sample files, we can run their command:
python3 ../AlleleAnalyzer/preprocessing/generate_gens_dfs/get_gens_df.py wtc_phased_hg19.bcf 1:12040238-12073572 mfn2_wtc_hg19
Note: The sample files are at a different web address than that of the tutorial. The .bcf
file(s) are now located within the directory: https://alleleanalyzer.pollard.gladstone.org/excisionFinderData_public/gRNA_tutorial_sample_data/sample_input/
HDF5 format
The resulting file from the python3 command will be called: mfn2_wtc_hg19.h5
To decode this file, one can use the h5dump
utility from the software collection of the “HDFGroup” which can be obtained from https://hdfgroup.org/downloads/hdf5/
References
Keough, K.C., Lyalina, S., Olvera, M.P. et al. AlleleAnalyzer: a tool for personalized and allele-specific sgRNA design. Genome Biol 20, 167 (2019). https://doi.org/10.1186/s13059-019-1783-3