Barebones Pachyderm Pipeline

Here is the most vanilla Pachyderm pipeline set up you can get. This will remove cruft so you can focus on the basic parts and explore. No local docker registry needed, no building/pushing images, as little bullshit as possible. This is just a way to get quick feedback to make sure the basic nuts and bolts are working.

Prereq

Pachyderm (ie pachctl) is set up on your machine already. Go with k8s in Docker or something simple.

Pipeline Spec

{
  "pipeline": {
    "name": "hello"
  },
  "description": "say hello when a file shows up",
  "input": {
    "pfs": {
      "repo": "rando",
      "glob": "/*"
    }
  },
  "transform": {
    "cmd": [
      "python",
      "-c" ,
      "import time; from pathlib import Path; print(time.time()); print([str(x) for x in Path('/pfs/rando').iterdir()])"
      ],
    "image": "python:3.9.13"
  }
}

watches a repo named rando for anything showing up
uses a public python docker image
transforms with inline python, logging the time and files received from rando

Save this as hello-pipeline.json.

Reset Script

#!/usr/bin/env bash

yes | pachctl delete all

pachctl create repo rando
pachctl create pipeline -f hello-pipeline.json
echo "some sweet file" > some-file.txt
pachctl put file rando@master -f some-file.txt

deletes everything (careful if you have other stuff you need)
creates the repo
creates the pipeline in your config
creates a test file, puts it to the repo

Save this as reset.sh and chmod +x reset.sh so you can run it with ./reset.sh.

View Simple Output

pachctl logs --pipeline hello -f

This tails the log for the hello job. It may take a moment to become available right after you run ./reset.sh.

That’s it!

You can cram bits of code into the inline python, reset, start the logs, and put files. It is a pretty quick feedback loop with minimal external variables.