--- Summary:

  • Private Chat / QA over docs at ~25 tokens / s with 13b Llama-v2 (on Mac M2 max gpu).
  • Using @trychroma vectorDB, @nomic_ai GPT4all embeddings, LLama-v2 Full recipe added to @LangChainAI docs: https://t.co/amzJ9ZcfeE pic.twitter.com/KNgNdxRDqB— Lance Martin (@RLanceMartin) July 19, 2023

--- Full Article:

Author: Lance Martin Profile: https://twitter.com/RLanceMartin Source: https://x.com/RLanceMartin/status/1681733901042196482

--- Embedded Post (converted):

Private Chat / QA over docs at ~25 tokens / s with 13b Llama-v2 (on Mac M2 max gpu). Using @trychroma vectorDB, @nomic_ai GPT4all embeddings, LLama-v2

Full recipe added to @LangChainAI docs: https://t.co/amzJ9ZcfeE pic.twitter.com/KNgNdxRDqB— Lance Martin (@RLanceMartin) July 19, 2023