Skip to content
Home » 2023.01.13 – Stream Notes

2023.01.13 – Stream Notes

  • by
  • Intro
    • How was your day?
    • Any plans for this weekend?
  • "Coding"
    • Brainstorm how to review recipe similarity for learning to rank recommender systems
    • What is Learning to Rank/Machine Learning to Rank
  • What are we looking to judge for MeaLeon?
    • Is the suggestion even good?
      • E.g., returning hollandaise for buffalo wings is not a good suggestion
    • How different/similar are the suggestions?
      • E.g., If I search tiramisu, it should not return tiramisu back, even if it’s from a different cuisine
    • How similar are the ingredients?
      • E.g., this recipe looks cool, but I can’t really make it with what I actually have. Like chicken thighs vs chicken breast vs whole chicken
  • MLR Notes
    • Since MLR works better with two phases, perhaps the "fast result" can use existing TFIDF ideas/methods, and MLR can swoop in after
    • For MLR, we use feature vectors, these features are typically of 3 groups
      • Static (query-independent) features which are dependent on the document
      • Dynamic (query-dependent) features which depend on the contents of the document and the query (like TF-IDF score)
      • Query-level features (query features), like the number of words in a query
    • LETOR uses:
      • TF, TF-IDF, BM25, and language modeling scores of document’s zones
      • Lengths and IDF sums of document’s zones
      • Doc’s PageRank
    • What metric will you use for MeaLeon’s learning to rank?
      • First thought was Mean Average Precision
        • I was interpreting as how often the recommended recipes are different enough from the searched one (see the "what are we looking to judge" above)
        • This would be good for binary judgements
      • DCG is used in academia
      • Can move to expected reciprocal rank or Yandex’s pfound
    • Other article of interest
      • Two tower architecture with two neural nets/multilayer perceptrons
        • 1 tower processes the query, the other processes the candidate
        • Query can be a user in an user-2-item scenario (home feed) or an item in an item-2-item scenario (related items recommendation)
      • "How is this problem framed as a machine learning task?"
        • For candidate generation, it’s a supervised multi-class classification task: For each observation in the dataset, we want to accurately output the correct label among a set of possible ones
          • Perhaps we can use a softmax on the cosine similarity to determine likelihood of falling into the "too similar (cuisine)" etc label
  • Should MeaLeon return more than 5 recipes? or more than 1 page
    • Kinda think no
    • Presents too many choices for a simple app
    • Eventually when user data and history is stored, showing more than 5 choices is cluttered and excessive
      • Maybe have a way to unlock "slow query" if for some reason the user doesn’t like the top 5
  • TODO #MeaLeon show an example recipe for the user’s query

Socials