From LLM to Embedding: The Future of Model Training

Published On Tue May 07 2024
From LLM to Embedding: The Future of Model Training

Qompass | LinkedIn

Cost Conscious AI Services is pleased to offer Convert your favorite LLM into an embedding model with LLM2Vec. Perform RAG with your model for generation and retrieval. Follow a simple 2-step process for conversion:

  1. Load the desired LLM into llm2vec package for conversion to an embedding model.
  2. Perform MNTP supervised or SIMCSE unsupervised training for better results on retrieval tasks.

I tried embedding model conversion on LLama-3 with 1xA100. Performed training on Cosmopedia 100k subset with MNTP.

Find out more here:

Embracing the Human Element in AI

Announcing a new annotated data set on @huggingface of all U.S. bills - approximately 119,000 bills - since the 108th Congress. What makes this dataset special is that it includes labels (*policy area* and *legislative subjects*) that have been painstakingly annotated over the last 20 years by expert analysts at the Congressional Research Service of the Library of Congress.

Revised framework for automated Bill of Materials generation

Are you inspired to train a model to do better?

The essential question that is asked by this data set is "what is this bill 'about'", a challenge when selecting through thousands of subject areas to classify a bill that could be more than 500 pages long. This is a task that takes a great deal of *judgment*. Now, with this training set, it should be possible to apply traditional classification approaches, to fine-tune an LLM, or any number of ML approaches.

It is a relevant problem because the Library of Congress continues to label bills by hand, and automation could allow them to process bills a whole lot faster. If you're interested, grab the dataset from Huggingface, or get in touch to collaborate!

Find out more here: https://lnkd.in/gfeZ4tBn

Transforming LLMs into Embedding Powerhouses with LLM2Vec

Excited to announce Microsoft's collaboration with Seedrs to bring you SERIES AI; the disruptive AI and funding accelerator for Seed to Series A founders!

Key Dates:

  • šŸ“… Applications close: 10th May 2024
  • šŸš€ Accelerator commences: 13th May 2024
  • šŸŽ‰ Pitching Demo Day: 17th June 2024
  • šŸ“ Pitching Demo Day Location: Microsoft Reactor in London Paddington

See you there!

Find out more and apply here: https://lnkd.in/eEArAqP2

Learn more by visiting the following links:

Ghana Cedis Currency (GC3558) dataset acquisition process

A quick look at what it means to be environmentally conscious, as seen from @meta AI's example set with their open reporting of CO2 emissions from the LLaMA3 training process.