Stream notes
Video
Here’s the VOD:
Below is an embed
Summary
- Intro
- Code stuff
- What to do next with MeaLeon?
- Can continue to tweak stopwords and CountVectorizer and LDA, but is it worth it? Might start approaching asymptotic energy reward
- Could try using Gensim for HDP algorithm to find ideal number of topics?
- Would have to install Gensim in the virtual environment
- Could recombine transformed matrices with original dataframe and do cosine similarity already
- Or a classification run
- Reduced parameter topic modeling
- Hiearchical Dirichlet Process in Gensim
- Non negative matrix factorization
- R4D4R had a good question on minimium document frequency for a term and what is a cutoff? 1/number of topics?
- Led to looking up TF-IDF
- Possible improvements include
- TF-IDuF
- TF-Proportional Document Frequency
- However neither of these are in sklearn directly
- Possible improvements include
- Led to looking up TF-IDF
- Open and run the notebook in Jupyter instead of VSCode?
- Having some odd errors and not sure what the source is
- What to do next with MeaLeon?
Shoutouts
Streamers who were active in chat
Music
- UK Jazz on Vinyl with Yemeksepeti Banabi
- Guest Mix: Records from Panama with Jota Ortiz (7" Special)
- Guest Mix: Peruvian Chicha/Cumbia with Cal Jader
To Do
- [X] Prepare Jupyter for next time
- [X] Add !love command for TQDM
- [] What is a good minimium document frequency for a term and what is a cutoff? 1/number of topics?