Skip to content
Home » 2022.07.06 – Stream Notes

2022.07.06 – Stream Notes

  • by

Stream notes

Video

Here’s the VOD:

Below is an embed


Summary

  1. Intro
    1. Pokemon affecting brain development
  2. Code stuff
    1. What to do next with MeaLeon?
      1. Can continue to tweak stopwords and CountVectorizer and LDA, but is it worth it? Might start approaching asymptotic energy reward
      2. Could try using Gensim for HDP algorithm to find ideal number of topics?
        1. Would have to install Gensim in the virtual environment
      3. Could recombine transformed matrices with original dataframe and do cosine similarity already
      4. Or a classification run
    2. Reduced parameter topic modeling
      1. Hiearchical Dirichlet Process in Gensim
        1. Possible sklearn wrapper
      2. Non negative matrix factorization
        1. In Gensim
        2. In scikit-learn
    3. R4D4R had a good question on minimium document frequency for a term and what is a cutoff? 1/number of topics?
      1. Led to looking up TF-IDF
        1. Possible improvements include
          1. TF-IDuF
            1. Original paper PDF
            2. Original paper via ResearchGate
          2. TF-Proportional Document Frequency
          3. However neither of these are in sklearn directly
    4. Open and run the notebook in Jupyter instead of VSCode?
      1. Having some odd errors and not sure what the source is

Shoutouts

Streamers who were active in chat

  1. R4D4R
  2. Valkeryias
  3. intelijens
  4. Shafloy

Music

  1. UK Jazz on Vinyl with Yemeksepeti Banabi
  2. Guest Mix: Records from Panama with Jota Ortiz (7" Special)
  3. Guest Mix: Peruvian Chicha/Cumbia with Cal Jader

To Do

  • [X] Prepare Jupyter for next time
  • [X] Add !love command for TQDM
  • [] What is a good minimium document frequency for a term and what is a cutoff? 1/number of topics?

Socials