Barebones Pachyderm Pipeline
Here is the most vanilla Pachyderm pipeline set up you can get. This will remove cruft so you can focus on the basic parts and explore. No local docker registry needed, no building/pushing images, as little bullshit as possible. This is just a way to get quick feedback to make sure the basic nuts and bolts are working.
Prereq
Pachyderm (ie pachctl) is set up on your machine already. Go with k8s in Docker or something simple.
Pipeline Spec
{
"pipeline": {
"name": "hello"
},
"description": "say hello when a file shows up",
"input": {
"pfs": {
"repo": "rando",
"glob": "/*"
}
},
"transform": {
"cmd": [
"python",
"-c" ,
"import time; from pathlib import Path; print(time.time()); print([str(x) for x in Path('/pfs/rando').iterdir()])"
],
"image": "python:3.9.13"
}
}
- watches a repo named
rando
for anything showing up - uses a public python docker image
- transforms with inline python, logging the time and files received from
rando
Save this as hello-pipeline.json
.
Reset Script
#!/usr/bin/env bash
yes | pachctl delete all
pachctl create repo rando
pachctl create pipeline -f hello-pipeline.json
echo "some sweet file" > some-file.txt
pachctl put file rando@master -f some-file.txt
- deletes everything (careful if you have other stuff you need)
- creates the repo
- creates the pipeline in your config
- creates a test file, puts it to the repo
Save this as reset.sh
and chmod +x reset.sh
so you can run it with ./reset.sh
.
View Simple Output
pachctl logs --pipeline hello -f
This tails the log for the hello
job. It may take a moment to become
available right after you run ./reset.sh
.
That’s it!
You can cram bits of code into the inline python, reset, start the logs, and put files. It is a pretty quick feedback loop with minimal external variables.