i tried with setups that used ollama. even with a rtx 3070 (8GB) and 3080 (10GB) i wasnt able to use any models for tool calling unless ollama offloaded considerable amount of work to the cpu and slowed everything to a crawl.
im considering getting a 5090 (32GB) to try again with more recent models like glm 4.7.
what are you looking to do?
