You might require to use the gpu_memory_limit and/or lora_on_cpu config choices to prevent jogging away from memory. If you still run out of CUDA memory, it is possible to make an effort to merge in technique RAM https://ztndz.com/story19891605/indicators-on-https-imtoken-wt-com-you-should-know