2024.10.02 – Stream Notes

by awchen
June 16, 2025June 16, 2025

Stream Notes
- Doing
  - Work through the Vespa example(s) and see what we can adapt for MeaLeon
    - Text Search Tutorial
      - Goal of this tutorial is: we want to build an end-to-end search application that returns relevant documents to a text query
      - Looks like n-grams in Vespa refers to character n-grams instead of word-based ones
        
        Vespa docs recommend n-gram matching for languages that are not tokenized (like Asian languages)
        
        Also says these are generally not useful for text searching, unless ngrams are needed for increased recall
        
        Later in the tutorial, looking at how queries and documents can get processed, sounds like it’s better to enforce one language instead of allowing autodetection, which can struggle with short query strings
      - Rank operand does not change the retrieval or matching as the number of documents exposed to ranking is the same as before. The rank operator can be used to implement a variety of use case around boosting
        
        Can I use this for ingredient suggestions? Or a way to restrict clustering of similar cuisines?
        
        Dependent on the ranking algorithm
    - TODO Look up how to reuse one Docker image to run multiple containers. Having an issue trying to do different tutorials using same Vespa Docker image
  - Machine Learning subreddit recommended this NLP Newsletter: https://nlp.elvissaravia.com
- From Chat / Derail

Socials

Related

Tags:Developer Tools MeaLeon NLP Python