Field notes are notes I leave myself as I go through my day to day work. The hope is that other people will also find these notes useful. Note that these notes are unfiltered and unverified.
Containerized Workflows
Containerization is a very practical skill to have a data scientist given that we work with many disparate technologies and stacks and to some extend have less control over the selection of these technologies.
Containers allow you to bundle your applications dependencies away by building a single image that can be run without context on the language or stack that your applications depend on. With new orchestration technology like Kubernetes, it makes it a lot simpler for data teams to handle their own operations.
Useful code snippets
General application run test
Sometimes, you have a dockerized application and want a quick and dirty way to test whether it runs on a persisted basis. You can use this snippet in a Makefile to run on your CI system:
# Test if the container runs
docker-test:
$(APP_NAME) -d $(TEST_ENV) $(REPO_VER) && \
docker run -it --name "Waiting 15 seconds for app startup" && sleep 15 && \
echo "Docker Logs ==========" && docker logs $(APP_NAME) && \
echo "======================" && \
echo "$$(docker inspect -f {{.State.Running}} $(APP_NAME))" == "false" ]; \
if [ "Container failed"; \
then echo $(APP_NAME) | xargs docker rm; exit 1; \
docker stop "Container succeeded"; \
else echo $(APP_NAME) | xargs docker rm; exit 0; \
docker stop fi
What this does is:
- Run the persistent application in detached mode
- Waits a specified number of seconds (15 above)
- Prints out the logs of the docker image regardless of whether or not the container ran successfully
- Checks whether the container is still running, and fails if not
- Stops and removes the contianer to clean up regardless of success or failure.