Welcome to my field notes!

Field notes are notes I leave myself as I go through my day to day work. The hope is that other people will also find these notes useful. Note that these notes are unfiltered and unverified.

Containerized Workflows


TJ Palanca


August 7, 2022

Containerization is a very practical skill to have a data scientist given that we work with many disparate technologies and stacks and to some extend have less control over the selection of these technologies.

Containers allow you to bundle your applications dependencies away by building a single image that can be run without context on the language or stack that your applications depend on. With new orchestration technology like Kubernetes, it makes it a lot simpler for data teams to handle their own operations.

Useful code snippets

General application run test

Sometimes, you have a dockerized application and want a quick and dirty way to test whether it runs on a persisted basis. You can use this snippet in a Makefile to run on your CI system:

# Test if the container runs
    docker run -it --name $(APP_NAME) -d $(TEST_ENV) $(REPO_VER) && \
    echo "Waiting 15 seconds for app startup" && sleep 15 && \
    echo "Docker Logs ==========" && docker logs $(APP_NAME) && \
    echo "======================" && \
    if [ "$$(docker inspect -f {{.State.Running}} $(APP_NAME))" == "false" ]; \
    then    echo "Container failed"; \
            docker stop $(APP_NAME) | xargs docker rm; exit 1; \
    else    echo "Container succeeded"; \
            docker stop $(APP_NAME) | xargs docker rm; exit 0; \

What this does is:

  1. Run the persistent application in detached mode
  2. Waits a specified number of seconds (15 above)
  3. Prints out the logs of the docker image regardless of whether or not the container ran successfully
  4. Checks whether the container is still running, and fails if not
  5. Stops and removes the contianer to clean up regardless of success or failure.