Nvidia’s new tool lets you run GenAI models on your PC

Nvidia, always keen to encourage the purchase of the latest GPUs, is releasing a tool that allows owners of GeForce RTX 30-series and 40-series cards to run AI-powered chatbots offline on their Windows PCs.

This tool is called Chat with RTX and allows the user to customize his GenAI model in line with OpenAI’s ChatGPT by connecting it to documents, files, and notes that can be queried.

“Rather than searching through notes or saved content, users can simply type queries,” Nvidia writes in a blog post. “For example, one could ask, ‘What was the restaurant my partner recommended while in Las Vegas?’ and Chat with RTX will scan local files the user points it to and provide the answer with context.”

Chat with RTX uses this Open-AI Startup Mistral source model by default, but also supports other text-based models such as Metas Llama 2. Nvidia warns that downloading all the necessary files will use a significant amount of storage space (50 GB to 100 GB, depending on the file). Selected model.

Chat in RTX currently works with text, PDF, .doc, .docx, and .xml formats. When you specify a folder in the app that contains supported files, the files are loaded into the model’s fine-tuning dataset. Additionally, Chat with RTX can use the URL of a YouTube playlist to load transcriptions of videos in the playlist, allowing selected models to query that content.

However, there are some limitations to be aware of, which Nvidia thankfully details in a how-to guide.

Chats with RTX do not remember context. This means the app will not consider previous questions when answering follow-up questions. For example, you might ask, “What is a common bird in North America?” and then “What color is it?” When you’re chatting with RTX, you can’t tell we’re talking about birds.

Nvidia also recognizes that the relevance of an app’s answers can be affected by a variety of factors. Some factors are easier to control than others, such as the wording of the question, the performance of the model you choose, and the amount of fine-tuning. data set. Querying facts contained in multiple documents can yield better results than querying a summary of a single document or set of documents. Additionally, larger data sets generally improve response quality. He said Nvidia will also improve the quality of responses, as well as route chats with RTX to more content on specific topics. Therefore,

Chat with RTX is more of a toy than something that can be used in a production environment. Still, there’s something to be said for apps that make it easy to run AI models locally. This is a growing trend.

In a recent report, the World Economic Forum predicted “dramatic” growth in affordable devices that can interact with GenAI models offline, including PCs, smartphones, Internet of Things devices, and networking devices. According to WEF, the reason is clear benefits. Not only are offline models inherently private, the data processed by the device on which they are running is never exposed. Offline models also have lower latency and are more cost-effective. Cloud-hosted model.

Of course, democratizing the tools for running and training models opens the door to malicious actors. A quick Google search will turn up many entries with models tailored to malicious content on the Internet. But proponents of apps like Chat with RTX argue that the benefits outweigh the drawbacks.