Skip to content

Installation

Prerequisites

Tool Version Purpose
Docker 20.10+ Container runtime for all services
uv 0.4+ Python package manager
just 1.0+ Command runner
Python 3.12+ Runtime
Java 17+ Required only for PySpark

Steps

1. Clone the Repository

git clone https://github.com/montanarograziano/poor-man-lakehouse.git
cd poor-man-lakehouse

2. Install Dependencies

just install

This runs uv sync --all-groups and installs pre-commit hooks.

3. Configure Environment

cp .env.example .env

Edit .env to set your preferred catalog:

CATALOG="lakekeeper"          # Options: nessie, lakekeeper, postgres, glue
CATALOG_NAME="lakekeeper"
AWS_ACCESS_KEY_ID="minioadmin"
AWS_SECRET_ACCESS_KEY="miniopassword"

See Configuration for all available settings.

4. Set Up Local DNS (Important)

The .env file uses Docker service names (minio, postgres_db, nessie, lakekeeper). For local Python development outside Docker, add them to your hosts file:

echo "127.0.0.1 minio postgres_db nessie lakekeeper" | sudo tee -a /etc/hosts

This allows the same .env to work both inside Docker containers and for local development.

5. Start Services

just up lakekeeper   # Recommended: Core + Lakekeeper catalog

6. Verify

Service URL Credentials
MinIO Console http://localhost:9001 minioadmin / miniopassword
Lakekeeper API http://localhost:8181 -
Nessie API http://localhost:19120 -
Spark Master UI http://localhost:8081 -

Useful Commands

just help            # List all available commands
just up nessie       # Start with Nessie catalog
just down            # Stop all services
just up-clean nessie # Clean restart (wipes data)
just lint            # Run ruff + mypy
just test            # Run all tests
just test-coverage   # Tests with coverage report
just logs            # Follow Docker logs