Skip to content
Home » 2025.06.26 – Stream Notes

2025.06.26 – Stream Notes

  • by
  • Stream Notes
    • Coding
      • Make a test set of data for GitHub Actions to work on
      • This data needs to be committed to DVC
      • What do we want to do with the GitHub Action?
        • Test that DVC/DagsHub works?
          • Seems like I need a subset of data to fit
          • Could also reduce what the runner does, maybe all it needs to do is show that the app is working and/or that a connection can be established with whatever large file storage of choice
      • What benefits does DVC actually provide?
        • It was intended to let large datafiles/sets live alongside code and avoid hassles with Git LFS
        • Now they’ve introduced data pipelines
        • Reddit suggested
          • Combination of DVC and Mlflow to manage my projects. DVC creates the pipelines and at the modeling stage I register the model at Mlflow.
            • DVC is very good for creating models and structure a data pipeline, skipping stages that didn’t change. Also is a good tool to create and reproduce experiments in a teams workplace. But since you have data coming daily in a streaming like scenario, I wouldn’t recommend it, since it its main features rely on files and model training and not on retraining and model deployment.
      • TODO Roll API key for DagsHub
      • TODO refactor workflow to simplify what the runner is doing
    • From Chat / Derail
    • We raided Valkeryias

Socials