How to run LLMs locally on your machine?

One of the questions I had when first started working with LLMs was around local development to support quick prototyping without worrying much of the cost (e.g. I’m still protytping for the GenAI usecase, I’d need to test the engineering pipeline multiple times which might lead to multiple prompt requests == “$$”.) or data privacy (e.g. What if I do not feel comfortable sharing my datasets, chat histories through OpenAI or Huggingface API calls to connect to remotely hosted LLMs?)

In particular, I have experienced trying out GPT4ALL and Ollama. Here are some documentations:

GPT4ALL (Github)

  • This was the very first approach I tried. It was easy to setup, and it also provides a chat client (front end component) available for downloads. I only tried the backend API by installing gpt4all and langchain. After that, we can download and try different LLMs model on the GPT4ALL leaderboard.

  • We can directly run GPT4 model with python using gpt4all (documentation) or using Langchain (documentation):
    from langchain_community.llms import GPT4All
    from langchain.chains import LLMChain
    from langchain.prompts import PromptTemplate
    from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
    
    local_path = 'path to where you save the model bin file'
    llm = GPT4All(model=local_path, callbacks=[StreamingStdOutCallbackHandler()])
    
    template = """{question}"""
    
    prompt = PromptTemplate.from_template(template)
    llm_chain = LLMChain(prompt=prompt, llm=llm)
    question = "What would be a good time to eat lunch?"
    
    llm_chain.run(question)
    
  • 2024-03-18 edits: The above code would throw a pydantic validation error (similar issues raise here). With a few research online, this might be due to compatibiilty issue. The above code used to work on pydantic==1.10.0, langchain==0.0.320 and gpt4all==2.0.0. However, now my environmet is using pydantic=2.6.4 and langchain==0.1.12.
    pydantic.v1.error_wrappers.ValidationError: 1 validation error for GPT4All __root__ -> __root__
    Serializable.__init__() takes 1 positional argument but 2 were given (type=type_erro
    

    Therefore, it seems further experiment and research on pacakge versions is needed before using gpt4all.

Ollama (Github)

  • Very easy to setup. I followed the quickstart steps to install ollama on my Mac, and then run ollama docker image to start my local container. After these steps, I’m able to pull or run different LLMs available on their leaderboard.

  • We can directly run Ollama models from Terminal after setup:
    ollama run llama2 "What would be a good time to eat lunch?"
    
  • If need to further develop with python, it is also well integrated with Langchain:
    from langchain_community.llms import Ollama
    from langchain_community.chat_models import ChatOllama
    from langchain_core.messages import HumanMessage
    
    llm = Ollama(model="llama2")
    chat_model = ChatOllama()
    
    text = "What would be a good time to eat lunch?"
    messages = [HumanMessage(content=text)]
    
    llm.invoke(text)
    
  • If would like to use Ollama with a front-end interface, there are other github repos that have already implemented integration that we can leverage. I’ll document my findings in another journal!

In general, I found the Ollama setup more easy to kick-off and to start my own experiments locally.