Stream notes
Video
Here’s the VOD:
Below is an embed
Summary
- Intro
- What a week
- Voted
- Coding
- We’re gonna try to fit Kmeans on the TFIDF transformed recipes and see what we get
- This was not a great idea: we have ~3400 dimensions that are largely sparse
- Trying FeatureAgglomeration to cut down on features and then trying KMeans
- All the demo silhoutte analysis plots look so perfect…
- Derail/From Chat
- Add it to the training set, Walmart
- From PurplePurrPurrin: Hackers strike Australian defense contractor
- Also from PurplePurrPurrin: Optus and Medibank
- Raided by earend
- MeaLeon redeem tonkatsu (Japanese)
- Raided by maweexy
- Raided by CodingWithStrangers
- MeaLeon redeem for bruschetta (Italian)
Shoutouts
Streamers who were active in chat
To Do
Code
- [ ] Add custom function to look at the term frequency distributions
- [ ] This is even what the scikit-learn docs do…they should just build it in to the algorithms…
- [ ] What is a good minimium document frequency for a term and what is a cutoff? 1/number of topics?
- [ ] Look at the example on sklearn for NMF and topic extraction
- [ ] add ingredient information to search result box
- [ ] Make the background tile
- [ ] take another look at streamlit
- [ ] Look up math theory behind t-SNE
- [ ] optuna for automated hyperparameter searching
- [ ] Should .lock and tool-versions be added to .gitignore? I’ve never committed them
- [ ] Refactor to use Pola.rs
- [ ] Write better/more thorough docstrings
- [ ] Look up examples of applying OOP practices to data science
- [ ] Make a PR to update the documentation for how **kwargs are used for sklearn Pipeline (their example and docs are seemingly incorrect)
- [ ] Figure out the ideal process for putting a corpus with collection of documents through a sklearn pipeline considering you previously got the overall counts and then did tfidf on the individual recipes using the CV from overall.
Photo
- [ ] Try to see if the preview pane in Bridge can get shifted left a little
General Stream/Admin
- [ ] How to change wordpress endpoint, specifically /admin
- [ ] Make hand cam a separate scene?
- [ ] Make subgoal buying a wet fart soundboard (thanks chat)
– [ ] CornoZeewo scuffed model - [ ] Corno mode
- [ ] CuwonoZeewo scuffed mode
- [ ] Need a Klawful knife emote, or a shotgun (thanks R4D4R)
- [ ] Install minecraft
- [X] Check and commit from laptop
- [ ] Update suggested python dev list
- [ ] Add statquest channel
- [ ] 2022.10.14 First ever follow bot attack
- [ ] Expensive redeem: crono tells lore from vtuber chat
- [ ] Add smort command
- [ ] Maweexy wants "Data Da Da", maybe that can be a different mode
- [ ] Intelijens +1
- [ ] !Keyboard command
- [X] Really need to set up a timer for breaks/ads
- [ ] Or just have fewer ads, Crono
- [X]
Share the actual data for MeaLeon (remove it from .gitignore)- [X] Files are too big: Free GitHub has a per file limit of 50MB
- [X] JSON has been shared to Kaggle
- [ ] Add contributing guidelines to repo
- [X] Use silhouette analysis to determine number of clusters
- [X] Elbow method said 6, my hypothesis was 12
- [ ] Not worthwhile on the tSNE transformed data
- [ ] Use KMeans to predict cluster for the Missing Cuisine recipes
- [ ] Move to rawer data (see step below)
- [ ] Run clustering on the untransformed dataframes to see what we get out
- [X] Fix cuisine filter (Vietnamese did not exclude Asian or Chinese, want to check "parent" and "sibling" cuisines)
- [] Update README
- [] Check speed limiter on Edamam, it may be too restrictive
- See if there’s an error code to diagnose
- For next stream
- Classify and or cluster the missing labels to see how they can augment existing recipe database
- 2022.10.31 Raided 4 times!
- [] Migrate to DigitalOcean already