Installation
Requirements
Python >= 3.8
Apache Spark >= 3.0 (native median function available in 3.4.0+, fallback for older versions)
Java 8 or 11 (required by Spark)
Install from PyPI
pip install pyspark-analyzer
Install from Source
Using pip
git clone https://github.com/bjornvandijkman1993/pyspark-analyzer.git
cd pyspark-analyzer
pip install -e .
Using uv (recommended for development)
git clone https://github.com/bjornvandijkman1993/pyspark-analyzer.git
cd pyspark-analyzer
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
Verify Installation
import pyspark_analyzer
print(pyspark_analyzer.__version__)
# Run the verification script
python examples/installation_verification.py
Troubleshooting
Java Not Found
If you encounter Java-related errors, ensure Java is properly installed:
java -version
Spark Configuration Issues
Set the following environment variables if needed:
export JAVA_HOME=/path/to/java
export SPARK_HOME=/path/to/spark
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
For more detailed troubleshooting, see our Troubleshooting Guide.