Stream notes
Video
Here’s the VOD:
Below is an embed
Summary
- Intro
- Work stuff
- Short stream
- Coding
- MeaLeon dataset available on Kaggle
- Doing silhouette analysis on the tSVD->tSNE transformed 2D dataset. 6 clusters has highest silhouette analysis, which means that might be the ideal number
- Number starts going back up in 40s, but that’s probably just over fitting
Shoutouts
Streamers who were active in chat
To Do
Code
- [ ] Add custom function to look at the term frequency distributions
- [ ] This is even what the scikit-learn docs do…they should just build it in to the algorithms…
- [ ] What is a good minimium document frequency for a term and what is a cutoff? 1/number of topics?
- [ ] Look at the example on sklearn for NMF and topic extraction
- [ ] add ingredient information to search result box
- [ ] Make the background tile
- [ ] take another look at streamlit
- [ ] Look up math theory behind t-SNE
- [ ] optuna for automated hyperparameter searching
- [ ] Should .lock and tool-versions be added to .gitignore? I’ve never committed them
- [ ] Refactor to use Pola.rs
- [ ] Write better/more thorough docstrings
- [ ] Look up examples of applying OOP practices to data science
- [ ] Make a PR to update the documentation for how **kwargs are used for sklearn Pipeline (their example and docs are seemingly incorrect)
- [ ] Figure out the ideal process for putting a corpus with collection of documents through a sklearn pipeline considering you previously got the overall counts and then did tfidf on the individual recipes using the CV from overall.
Photo
- [ ] Try to see if the preview pane in Bridge can get shifted left a little
General Stream/Admin
- [ ] How to change wordpress endpoint, specifically /admin
- [ ] Make hand cam a separate scene?
- [ ] Make subgoal buying a wet fart soundboard (thanks chat)
– [ ] CornoZeewo scuffed model - [ ] Corno mode
- [ ] CuwonoZeewo scuffed mode
- [ ] Need a Klawful knife emote, or a shotgun (thanks R4D4R)
- [ ] Install minecraft
- [X] Check and commit from laptop
- [ ] Update suggested python dev list
- [ ] Add statquest channel
- [ ] 2022.10.14 First ever follow bot attack
- [ ] Expensive redeem: crono tells lore from vtuber chat
- [ ] Add smort command
- [ ] Maweexy wants "Data Da Da", maybe that can be a different mode
- [ ] Intelijens +1
- [ ] Keyboard command
- [ ] Really need to set up a timer for breaks/ads
- [ ] Or just have fewer ads, Crono
- [X]
Share the actual data for MeaLeon (remove it from .gitignore)- [X] Files are too big: Free GitHub has a per file limit of 50MB
- [X] JSON has been shared to Kaggle
- [ ] Add contributing guidelines to repo
- [ ] Use silhouette analysis to determine number of clusters
- [ ] Elbow method said 6, my hypothesis was 12
- [ ] Use KMeans to predict cluster for the Missing Cuisine recipes
- [ ] Run clustering on the untransformed dataframes to see what we get out
- [X] Fix cuisine filter (Vietnamese did not exclude Asian or Chinese, want to check "parent" and "sibling" cuisines)
- [] Update README