Nvidia's local, private AI chatbot is a high-profile step toward cloud independence.
See full article...
See full article...
True enough but there are use cases where the echo chamber doesn't have to be completely restrictive. With a broad-ranging and somewhat well curated source of data, this could be a useful system for searching and synthesizing a significant body of information.Well, locally is the only way I'd be willing to use an AI in most cases, but doesn't this approach make it an echo chamber of my own making? I'd be producing an AI entirely influenced by my own biases.
I think you may be misunderstanding here. The model runs locally, as in it isn't pinging the cloud or forwarding your requests or results anywhere. But it's still using a starting model (well, any of several models), then connecting it to your files as a dataset. You're not locally training the full model from scratch.Well, locally is the only way I'd be willing to use an AI in most cases, but doesn't this approach make it an echo chamber of my own making? I'd be producing an AI entirely influenced by my own biases.
If you check out the linked video it looks like ollama is an option along with mistral for this. Seems to pretty much be a wrapper for those models with a nice Gui, so you can play with them in windows without using wsl.I'm actually more interested in this opensource project https://github.com/openlm-research/open_llama
As the training data is being shared by the community. That is huge. Which leads into the question can this use those training models?
I was hired to relay AI news quickly and briefly every day, and sadly that doesn't give me a chance to go in-depth on every topic. Just today there are 6-7 topics that are highly newsworthy that I can't hit due to time constraints. I wish I could. It's frustrating that I can't. Many times the news is about things I can't personally use or test, but people still need to know about them, and the news cycle moves very quickly. So I opt to try to hit them briefly so you can know about them, and then I hope to go in-depth on other topics strategically, where time allows.That was good where the article at the end had the real-world critical test report, but Ars should move beyond this "Product Announcement" trend of LLM articles/coverage. (I almost said "news coverage" there instead of "article coverage", but in our current world nobody can or will come up with "news" that isn't corporate announcements of products...when surely there must be more to discuss, i.e. articles on a topic and critical analyses etc that are not necessarily "news" and aren't PR messaging.) Right now on the front-page there are "NVIDIA CEO said something" and "NVIDIA has new product" headlines, and this isn't unusual.
Marketing is a different thing. Journalism isn't supposed to just be the marketing under a different org name. We come to learn what the marketers and CEO won't say in a carefully prepared statement boasting about their newest products and tailored solely exclusively toward their own benefit not the public's.
You'll end up with another you!Well, locally is the only way I'd be willing to use an AI in most cases, but doesn't this approach make it an echo chamber of my own making? I'd be producing an AI entirely influenced by my own biases.
SLI FTW!Does it support multiple GPUs?
Well, locally is the only way I'd be willing to use an AI in most cases, but doesn't this approach make it an echo chamber of my own making? I'd be producing an AI entirely influenced by my own biases.
I may be misunderstanding. You download an existing LLM to use as the backend, but it's also my understanding that you can point it to sources of your own choosing. Are those sources only meant to be searched, not also ingested? I suppose I always thought that whatever an LLM is searching, it is also adding to its training data, but that may not be the case (or may depend on the model).I think you may be misunderstanding here. The model runs locally, as in it isn't pinging the cloud or forwarding your requests or results anywhere. But it's still using a starting model (well, any of several models), then connecting it to your files as a dataset. You're not locally training the full model from scratch.
This type of model does not learn or change on the fly, it will not update at all from how it was originally trained, apart from temporary state, unless it is explicitly finetuned on new data.You'll end up with another you!
Chat With RTX works on Windows PCs equipped with NVIDIA GeForce RTX 30 or 40 Series GPUs with at least 8GB of VRAM. From the article
No? I think you're misunderstanding how this is useful if you're thinking it as an echo chamber.Well, locally is the only way I'd be willing to use an AI in most cases, but doesn't this approach make it an echo chamber of my own making? I'd be producing an AI entirely influenced by my own biases.
Is there any chance that some targeted deep-dives could become a somewhat more frequent thing? Y'all do a good job with going broad and have a few solid articles about AI/ML as a whole. But I'd dearly love to see articles that go over some of these tools/projects in more detail.I was hired to relay AI news quickly and briefly every day, and sadly that doesn't give me a chance to go in-depth on every topic. Just today there are 6-7 topics that are highly newsworthy that I can't hit due to time constraints. I wish I could. It's frustrating that I can't. Many times the news is about things I can't personally use or test, but people still need to know about them, and the news cycle moves very quickly. So I opt to try to hit them briefly so you can know about them, and then I hope to go in-depth on other topics strategically, where time allows.
I'm a big fan of presenting information and letting readers decide without necessarily telling them what to think. There are exceptions to that, of course, where I take a harder tack on something, but I provide critical perspectives where appropriate, including in this particular article, which does not simply parrot a press release. Would Nvidia marketing want you to know that its fancy new Chat with RTX demo app crashes frequently, is an ungainly mess of dependencies, feels like an open source re-skin, and is rough around the edges? I doubt it. But this article lets you know, so we're not just an extension of marketing.
SighIt's funny that this article is next to this one:
AI-powered romantic chatbots are a privacy nightmare
https://meincmagazine.com/ai/2024/02/ai-powered-romantic-chatbots-are-a-privacy-nightmare/
Perhaps Nvidia's solution solves this problem...
lol, exactly what I was thinking, only 2 min faster than me!I'm worried that even locally run, that being a vended product from a company means that it will have to have telemetry to support whatever PM in charge of this needs to show that it's driving sales of GPUs. And that usually doesn't' stop there. They'll want to know what users are doing with it so they can "make it better" and sell more GPUs.
If Geforce Experience is any guide, there will be plenty of telemetry data shared. Maybe it can be opted out though...I'm worried that even locally run, that being a vended product from a company means that it will have to have telemetry to support whatever PM in charge of this needs to show that it's driving sales of GPUs. And that usually doesn't' stop there. They'll want to know what users are doing with it so they can "make it better" and sell more GPUs.
Everything they do drives sales of their HWI'm confused as to what NVIDIA gets out of this... Is it just supposed to drive sales of their hardware? If so that's fine and admirable, but is everything truly local or is the software phoning home every so often?
Training models is very computationally intensive. However you can give a trained model some text and it will "understand" it and you can ask questions about it. As I understand there's usually a pretty limited amount it can remember like that.I may be misunderstanding. You download an existing LLM to use as the backend, but it's also my understanding that you can point it to sources of your own choosing. Are those sources only meant to be searched, not also ingested? I suppose I always thought that whatever an LLM is searching, it is also adding to its training data, but that may not be the case (or may depend on the model).
I'm sorry, but this post is nonsense. That's not how LLMs work at all. There is no built-in timeouts or anything of the sort. The compute power you have has no effect on the quality that an LLM outputs. It affects the speed and nothing more. I've studied and been active in the LLM scene for a number of years now. So I can say that with confidence.Unless the LLM has built-in timeouts to stop itself from vanishing down recursive rabbit holes and getting lost in semantic weeds, and running it on below spec hardware means those timeouts fire after much less processing power has been spent, leading to low-quality output results.
While I disagree with WereCatf's conduct, they are not actually wrong about the importance of VRAM. The biggest bottleneck that LLMs face is not compute, but memory bandwidth. For each token that an LLM generates each layer of the model has to be processed. Which essentially means the entire model has to be read through once for each token.No, you have no idea how to troubleshoot a problem. You say "Oh, they got really slow when they exceeded my VRAM." That doesn't necessarily mean that's the problem, it just means that you can investigate that lead.
For all you know, when the models get larger, the amount of computation required increases significantly, and the VRAM isn't the main issue.
LM Studio is great, but it is not in fact open source. They have a Github account, but all they store there is their config files and model catalog. The app itself is entirely closed source.https://lmstudio.ai/ is open source, supports more models, and isn't hardware-locked to Nvidia. (It even runs on Apple Silicon.)
Good to know! Thanks for the correction. llama.cpp is indeed the real deal here, at least as far as open source goes. I don't think I understood they weren't coming from the same place, which is probably how the people behind LM Studio want it.LM Studio is great, but it is not in fact open source. They have a Github account, but all they store there is their config files and model catalog. The app itself is entirely closed source.
It is built on top of the open source llama.cpp project, but llama.cpp is MIT licensed, not GPL. Which is why LM Studio can use it without sharing any code themselves.
While I could agree with you, this is no different than the announcements from, well, everyone else. All such announcements are promotional. How the hell would anyone know these things are here, or coming, if they weren't announced. Is that totally advertising? No. The message here is ALSO to investors. Kind of a two-fer in that respect by telling the investors, and the public, what's on tap.That was good where the article at the end had the real-world critical test report, but Ars should move beyond this "Product Announcement" trend of LLM articles/coverage. (I almost said "news coverage" there instead of "article coverage", but in our current world nobody can or will come up with "news" that isn't corporate announcements of products...when surely there must be more to discuss, i.e. articles on a topic and critical analyses etc that are not necessarily "news" and aren't PR messaging.) Right now on the front-page there are "NVIDIA CEO said something" and "NVIDIA has new product" headlines, and this isn't unusual.
Marketing is a different thing. Journalism isn't supposed to just be the marketing under a different org name. We come to learn what the marketers and CEO won't say in a carefully prepared statement boasting about their newest products and tailored solely exclusively toward their own benefit not the public's.
I have a later model RTX 30 series with 12 gb VRAM, so I may check it out when it has more polish. I set up an older model than this for my occasional brain-storming chats, but it's kind of complicated. If they have a simpler process, and it works with Linux, then I it may be worth while to try.Chat With RTX works on Windows PCs equipped with NVIDIA GeForce RTX 30 or 40 Series GPUs with at least 8GB of VRAM. It uses a combination of retrieval-augmented generation (RAG), NVIDIA TensorRT-LLM software, and RTX acceleration to enable generative AI capabilities directly on users' devices.
Take a look at Ollama and the web-UI for it. With the web-UI installed, it's quite literally point-and-click to install different models and choose which one you want to use for which chat. Runs on e.g. a Linux-server and can be installed natively or via e.g. Docker.I have a later model RTX 30 series with 12 gb VRAM, so I may check it out when it has more polish. I set up an older model than this for my occasional brain-storming chats, but it's kind of complicated. If they have a simpler process, and it works with Linux, then I it may be worth while to try.
Like I kept seeing people complain that Alan Wake 2 wouldn't run on their "expensive GPU"...from 2018. And then be angry at the developers and say they're lazy and just "not optimizing."
There are other ways to run these models. Unless you're intent on using this specific wrapper, you might as well use one of the alternatives.I got the hardware to run this (30xx series GPU with 16GB Vram and 32GB ram). But I will need to upgrade to Windows 11, since thats stated as required at the download site.
Damn it, finally I may be forced to upgrade since I want to try this out ......![]()