Installation¶
Prerequisites¶
| Tool | Version | Purpose |
|---|---|---|
| Docker | 20.10+ | Container runtime for all services |
| uv | 0.4+ | Python package manager |
| just | 1.0+ | Command runner |
| Python | 3.12+ | Runtime |
| Java | 17+ | Required only for PySpark |
Steps¶
1. Clone the Repository¶
2. Install Dependencies¶
This runs uv sync --all-groups and installs pre-commit hooks.
3. Configure Environment¶
Edit .env to set your preferred catalog:
CATALOG="lakekeeper" # Options: nessie, lakekeeper, postgres, glue
CATALOG_NAME="lakekeeper"
AWS_ACCESS_KEY_ID="minioadmin"
AWS_SECRET_ACCESS_KEY="miniopassword"
See Configuration for all available settings.
4. Set Up Local DNS (Important)¶
The .env file uses Docker service names (minio, postgres_db, nessie, lakekeeper). For local Python development outside Docker, add them to your hosts file:
This allows the same .env to work both inside Docker containers and for local development.
5. Start Services¶
6. Verify¶
| Service | URL | Credentials |
|---|---|---|
| MinIO Console | http://localhost:9001 | minioadmin / miniopassword |
| Lakekeeper API | http://localhost:8181 | - |
| Nessie API | http://localhost:19120 | - |
| Spark Master UI | http://localhost:8081 | - |