Stream notes
Video
Here’s the VOD:
Below is an embed
Summary
- Intro
- Happy Halloween
- Weekend
- Got the Miata!
- Coding
- Try to figure out ideal number of clusters from silhouette analysis
- From chat
- texoport: what’s new in clustering
- DBSCAN vs…
- Hypersphere density based approach
- Maybe hold off on this one, but density
- HDBSCAN?
- LGBM or XGBoost?
- Wait but Why
- texoport: what’s new in clustering
- Derail
- Ming Tsai opened a new restaurant in Big Sky, Montana
- Group therapy and advice session
- We got raided by TimeEnjoyed
- MeaLeon redeem for Taiwanese braised pork
- MeaLeon redeem for fortune cookies (American)
- We got raided by dm_adam2
- We got raided by Maweexy
- MeaLeon is down…
- We got raided by data_dude
- MeaLeon…
- We raided NovaLiminal
Shoutouts
Streamers who were active in chat
To Do
Code
- [ ] Add custom function to look at the term frequency distributions
- [ ] This is even what the scikit-learn docs do…they should just build it in to the algorithms…
- [ ] What is a good minimium document frequency for a term and what is a cutoff? 1/number of topics?
- [ ] Look at the example on sklearn for NMF and topic extraction
- [ ] add ingredient information to search result box
- [ ] Make the background tile
- [ ] take another look at streamlit
- [ ] Look up math theory behind t-SNE
- [ ] optuna for automated hyperparameter searching
- [ ] Should .lock and tool-versions be added to .gitignore? I’ve never committed them
- [ ] Refactor to use Pola.rs
- [ ] Write better/more thorough docstrings
- [ ] Look up examples of applying OOP practices to data science
- [ ] Make a PR to update the documentation for how **kwargs are used for sklearn Pipeline (their example and docs are seemingly incorrect)
- [ ] Figure out the ideal process for putting a corpus with collection of documents through a sklearn pipeline considering you previously got the overall counts and then did tfidf on the individual recipes using the CV from overall.
Photo
- [ ] Try to see if the preview pane in Bridge can get shifted left a little
General Stream/Admin
- [ ] How to change wordpress endpoint, specifically /admin
- [ ] Make hand cam a separate scene?
- [ ] Make subgoal buying a wet fart soundboard (thanks chat)
– [ ] CornoZeewo scuffed model - [ ] Corno mode
- [ ] CuwonoZeewo scuffed mode
- [ ] Need a Klawful knife emote, or a shotgun (thanks R4D4R)
- [ ] Install minecraft
- [X] Check and commit from laptop
- [ ] Update suggested python dev list
- [ ] Add statquest channel
- [ ] 2022.10.14 First ever follow bot attack
- [ ] Expensive redeem: crono tells lore from vtuber chat
- [ ] Add smort command
- [ ] Maweexy wants "Data Da Da", maybe that can be a different mode
- [ ] Intelijens +1
- [ ] !Keyboard command
- [X] Really need to set up a timer for breaks/ads
- [ ] Or just have fewer ads, Crono
- [X]
Share the actual data for MeaLeon (remove it from .gitignore)- [X] Files are too big: Free GitHub has a per file limit of 50MB
- [X] JSON has been shared to Kaggle
- [ ] Add contributing guidelines to repo
- [ ] Use silhouette analysis to determine number of clusters
- [ ] Elbow method said 6, my hypothesis was 12
- [ ] Not worthwhile on the tSNE transformed data
- [ ] Use KMeans to predict cluster for the Missing Cuisine recipes
- [ ] Run clustering on the untransformed dataframes to see what we get out
- [X] Fix cuisine filter (Vietnamese did not exclude Asian or Chinese, want to check "parent" and "sibling" cuisines)
- [] Update README
- [] Check speed limiter on Edamam, it may be too restrictive
- See if there’s an error code to diagnose
- For next stream
- Classify and or cluster the missing labels to see how they can augment existing recipe database
- 2022.10.31 Raided 4 times!