Skip to content
Home » 2022.08.04 – Stream Notes

2022.08.04 – Stream Notes

  • by

Stream notes

Video

Here’s the VOD:

Below is an embed


Summary

  1. Intro
    1. Admin and follow up on yesterday’s to dos
  2. Coding
    1. Fitting Random Forests
    2. lattjorr asked a cool question: How do I compare two linear regression models? one cuts the amount of variables used by 2/3 but how to explain which is better using this Telco Customer Churn dataset
      1. Crono’s business answer: the model that is easier to explain to the people paying you and/or the one that captures most of the true behavior with less data
        1. This Quora discussion is better because there’s actual processes
      2. Found a discussion of F statistic that had improper language about p-value meaning and that derailed and talked about the controversy around p-values
        1. How the American Stastiscal Association feels about p-value misuse
        2. Older discussion about improper interpretations of p-values
        3. Using t tests to compare sample vs population means
      3. Related to my business focused answer above was a reddit post asking about what 20% tools people are using to do 80% of their work
    3. Suggested checking out the subreddits for datascience and statistics
    4. kmode has an exam in a super interesting subject: Uncertainty quantification
    5. Cat struggles to drink water from a faucet
    6. Want to chill out? Watch capybara
  3. Music
    1. Italian Experimental Obscurities, Soundtracks and Library Music
    2. Guest Mix: Psychedelic Cumbia with Krishna Villar
    3. Records from Poland
    4. Reggae Inspired Japanese City Pop
  4. We got raided by R4D4R_Live!
  5. We raided AppleGlass (they/them)

Shoutouts

Streamers who were active in chat

  1. bedtimebear_808
  2. valkeryias
  3. R4D4R_Live
  4. Yuka_with_Data

To Do

Code

  • [ ] How to change wordpress endpoint, specifically /admin
  • [ ] send Leah the math derivations for decision trees
  • [ ] Add custom function to look at the term frequency distributions
    • [ ] This is even what the scikit-learn docs do…they should just build it in to the algorithms…
  • [ ] What is a good minimium document frequency for a term and what is a cutoff? 1/number of topics?
  • [ ] Look at the example on sklearn for NMF and topic extraction
  • [ ] Add tree visualization libraries to virtual environment: ETE or Graphviz (which is in sklearn now) or Plotly or treelib (which is the most simple)
  • [ ] Sery bot for follow botting protection

Photo

  • [ ] Try to see if the preview pane in Bridge can get shifted left a little

Socials