Hmm it’s $600 and for people familiar with JavaScript
Need NMF to perform the dimension reduction on searched recipes, so that they can be transformed with HDBSCAN but you don’t need the NMF vector for the similarity analysis
ehhhh….NMF is dimension reduction to important things…
If you have a multi monitor setup, you can get moo0 system monitor: go into the settings and activate “bottleneck”, “burdened by”, “CPU loader”, “thread loader” and “handle loader” to see who the culprit is that’s killing your system
also, do NOT have it start with windows, it needs admin access and it won’t ask for it on auto startup, so many of its sensors won’t work properly then
I recommend that you run “layout horizontal B”, depending on your hardware, I also recommend that you run 0.5 refresh frequency (I have a Ryzen 1700X, so if you have anything better, you can def. run 0.5 with no issues
TODO can you do term similarity elimination? So, “mustard” knocks out “spicy mustard”, “yellow mustard”, “brown mustard” or “garlic” knocks out “clove” and “garlic clove”, “olive oil” -> “olives” and “oil”
Could be term pruning (manual or automated)
This actually seems pretty hard: would have to do a similarity comparison within topics and vectors to remove the similar terms
Perhaps redoing the way bag of words works here (you’re allowing n grams to cross into the next item on the list) is a fix
DONE I thought “distance to label” was not a metric in HDBSCAN, but double check
Does not look like it
Success questions
What recipes would you want to be similar?
“Garlic bread”? Maybe that should return “garlic naan” as most similar
What should be midway?
What should be wrong/don’t show as similar?
If the cuisine is too similar, don’t show (Italian pizza vs American pizza)
Edge case: similar dish name
Should there be a minimum number of similar ingredients?
Ideal use case for MeaLeon is “I have these ingredients, what can I make if I’m used to making this recipe, but different”
TODO put data (NMF and/or HDBSCAN) into Learning to Rank ML algo
1) take k random recipes (like 100), then take n random pairs from those recipes, rank them manually. 2) then calc and plot them on HDBScan 2D, and also 10 dim NMF topics (radar plot) so that you can see physical distances and how well they correlate to your manual rankings. ie. are italian pizza and american pizza closer in NMF topic radar plot than garlic bread and garlic naan?