{"id":2300,"date":"2025-02-11T14:44:18","date_gmt":"2025-02-11T05:44:18","guid":{"rendered":"https:\/\/blog.peddals.com\/?p=2300"},"modified":"2025-11-15T22:39:20","modified_gmt":"2025-11-15T13:39:20","slug":"fine-tune-vram-size-of-mac-for-llm","status":"publish","type":"post","link":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/","title":{"rendered":"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1)"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">When using a large language model (LLM) locally, the key point to pay attention to is how to run it at 100% GPU usage, that is, how to fit everything into VRAM (GPU memory). If the model overflows from VRAM, it can cause a decrease in response speed, make the entire OS laggy, and in the worst case, crash the OS.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When using a local LLM, the combination of the parameter size and quantization size of the model that can be run, as well as the context length available for use, is generally determined by the capacity of the Unified Memory installed on an Apple Silicon Mac. This article will share methods to exceed the &#8220;set&#8221; limitations through some deeper settings, optimizing the processing speed and usable context length of local LLMs. If your Mac has a larger amount of Unified Memory installed, it becomes possible to run multiple LLMs or even larger models (= with higher performance) that were previously difficult to execute.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fine-tuning a generative AI model is not something amateurs can easily undertake, but since &#8220;environmental fine-tuning&#8221; is involved, you can easily try it out and see results right away. This covers the basics, so even if you&#8217;re a beginner you should give it a read if interested.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">First, let&#8217;s find out the model size that works on your Mac<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Mac&#8217;s Unified Memory can be accessed by both the CPU and GPU, but there is a set proportion that the GPU can use. Based on some posts I&#8217;ve seen on forums, if no settings have been changed, for Unified Memory of <strong>64GB or more<\/strong>, it seems that <strong>up to 3\/4 (75%)<\/strong> can be used by the GPU; for le<strong>ss than 64GB, about 2\/3 (approximately 66%)<\/strong> can be utilized. Since my Mac has 32GB RAM installed, this means the GPU can use up to 21.33GB of it. If <a href=\"https:\/\/lmstudio.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">LM Studio<\/a> is installed, you can check the hardware resources (Command + Shift + H), where VRAM will show something like the below.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"374\" height=\"242\" src=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/Memory-Capacity.png\" alt=\"\" class=\"wp-image-2092\" style=\"width:207px;height:auto\" srcset=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/Memory-Capacity.png 374w, https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/Memory-Capacity-300x194.png 300w\" sizes=\"auto, (max-width: 374px) 100vw, 374px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">When you see &#8220;Likely too large&#8221; in red while downloading a model in LM Studio, it is telling you that the model is too big for your VRAM capacity. The following screenshot shows that the <a href=\"https:\/\/huggingface.co\/mlx-community\/DeepSeek-R1-Distill-Llama-70B-8bit\" target=\"_blank\" rel=\"noreferrer noopener\">DeepSeek R1<\/a> parameter size of 70B, with an 8-bit quantized MLX format model taking up 74.98GB, so it&#8217;s letting you know that it may not work on your environment.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"846\" height=\"100\" src=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image.png\" alt=\"\" class=\"wp-image-2093\" style=\"width:502px;height:auto\" srcset=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image.png 846w, https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-300x35.png 300w, https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-768x91.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In <a href=\"https:\/\/ollama.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Ollama<\/a>, similar value is output as <code>recommendedMaxWorkingSetSize<\/code> in the log file. Below are the outputs from my environment (server2.log was the latest log file):<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #DCDCAA\">%<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">grep<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">~\/.ollama\/logs\/server2.log<\/span><span style=\"color: #D4D4D4\">|<\/span><span style=\"color: #DCDCAA\">tail<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #569CD6\">-5<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Looking for a model of usable size (for beginners)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Just because a model you want to use is actually smaller than your VRAM does not mean it will be usable. The prompts you input and the text output by the LLM also use VRAM. Therefore, even if a model itself is 21GB in size, it won&#8217;t run smoothly. To find a model that fits within your actual VRAM, you would try models of plausible sizes in sequence based on the following information.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Look to models with <strong>fewer parameters<\/strong> (if 140B or 70B are not feasible, consider 32B \u2192 14B \u2192 7B, etc.)<\/li>\n\n\n\n<li>Search for <strong>quantized<\/strong> models (such as 8bit, 4bit for MLX or Q8, Q4_K_M, etc. for GGUF format models)<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Models with a smaller number of parameters tend to be created by distilling the original larger model or training them on less data. The goal is to reduce the amount of knowledge while minimizing degradation in features and performance. Depending on the capabilities and use cases of the models themselves, many popular ones these days are usable at around 10 to 30 billion parameters. With fewer parameters, the computation (inference) time also becomes shorter.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The other factor &#8220;quantization&#8221; is a method to reduce the size of a model using a different approach. Although this expression may not be common and might not be entirely accurate, it can be interpreted similarly to reducing resolution or decreasing color depth in images. While it&#8217;s not exactly the same upon closer inspection, it&#8217;s a technique that reduces the size to an extent where performance degradation is barely noticeable. Quantization also increases processing speed. Generally, it is said that with 8-bit or Q8 quantization, the benefits of faster processing and smaller size outweigh the percentage of performance loss. The model size decreases as the number gets smaller, but so does performance; therefore, around 4-bit or Q4_K_M would be considered the minimum threshold to maintain decent performance (the last letters S\/M\/L in GGUF format stand for Small\/Medium\/Large sizes).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After trying out several downloads, you see the maximum model size you can use on your Mac. In my case, with models that offer multiple parameter sizes, I try downloading one that pushes it to the limit at 32B Q4_K_M, and also download either F16 or Q8 of a smaller parameter like 14B.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Please note, when choosing a vision model, VLM, or so-called multimodal models, it is better to select ones that are even smaller in size compared to language models (LLM). This is because processing tasks such as reading images and determining what is depicted often requires more VRAM, given that images tend to be larger in size than text.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Download and use LLMs<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">LM Studio allows you to download directly via the Download button and conduct chats through its GUI. For Ollama, after selecting a model on the <a href=\"https:\/\/ollama.com\/search\" target=\"_blank\" rel=\"noreferrer noopener\">Models<\/a> page, if there are options for parameter counts or quantization, choose them from the dropdown menu, then download and run using the Terminal.app (<code>ollama run modelname<\/code>). Both applications can function as API servers, allowing you to use downloaded models from other apps. I often use <a href=\"https:\/\/dify.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">Dify<\/a>, which makes it easy to create AI applications. For methods on how to use the APIs of Ollama and LM Studio via Dify, please check my posts below. (Japanese only for now. I&#8217;ll translate in the near future.)<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-peddals-blog wp-block-embed-peddals-blog\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"9FTsoAoBNy\"><a href=\"https:\/\/blog.peddals.com\/run-dify-and-ollama-on-separate-macs\/\">Dify \u3068 Ollama \u3092\u5225\u3005\u306e Mac \u3067\u52d5\u304b\u3059\u30ed\u30fc\u30ab\u30eb LLM \u74b0\u5883<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"&#8220;Dify \u3068 Ollama \u3092\u5225\u3005\u306e Mac \u3067\u52d5\u304b\u3059\u30ed\u30fc\u30ab\u30eb LLM \u74b0\u5883&#8221; &#8212; Peddals Blog\" src=\"https:\/\/blog.peddals.com\/run-dify-and-ollama-on-separate-macs\/embed\/#?secret=3y0mcAQEBQ#?secret=9FTsoAoBNy\" data-secret=\"9FTsoAoBNy\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">Japanese only for now. Get it translated by your web browser.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-peddals-blog wp-block-embed-peddals-blog\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"aaKOZ9IWXm\"><a href=\"https:\/\/blog.peddals.com\/pixtral-mlx-lm-studio-mac\/\">Mac \u306e\u30ed\u30fc\u30ab\u30eb\u30aa\u30f3\u30ea\u30fc\u74b0\u5883\u3067\u3001\u753b\u50cf\u8a8d\u8b58 AI \u306e Pixtral 12B MLX \u7248\u3092\u4f7f\u3046 (LM Studio \u7de8)<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"&#8220;Mac \u306e\u30ed\u30fc\u30ab\u30eb\u30aa\u30f3\u30ea\u30fc\u74b0\u5883\u3067\u3001\u753b\u50cf\u8a8d\u8b58 AI \u306e Pixtral 12B MLX \u7248\u3092\u4f7f\u3046 (LM Studio \u7de8)&#8221; &#8212; Peddals Blog\" src=\"https:\/\/blog.peddals.com\/pixtral-mlx-lm-studio-mac\/embed\/#?secret=tqSz4fFBre#?secret=aaKOZ9IWXm\" data-secret=\"aaKOZ9IWXm\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">Japanese only for now. Get it translated by your web browser.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">What is the Context Length<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">&#8220;Context length&#8221; refers to the size of the text (actually tokens) exchanged between a user and an LLM during chat. It seems that this varies by model (tokenizer), with Japanese being approximately 1 character = 1+\u03b1 tokens, and English being about <strong>1 word = 1 (+\u03b1) token(s)<\/strong>. Additionally, each model has a maximum context length it can handle, which you can check using the <code>ollama show modelname<\/code> command in Ollama or by clicking on the gear icon next to the model name in My Models on the left side in LM Studio.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When chatting with Ollama from the terminal, the default context length seems to be 2048, and when chatting within the app using LM Studio, it is 4096. If you want to handle longer texts, you need to change the model settings or specify them via the API. Note that <strong>increasing the context length requires more VRAM capacity<\/strong>, and <strong>if it overflows, performance will slow down<\/strong>. I have documented the solution in the following article.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-peddals-blog wp-block-embed-peddals-blog\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"Ks9EJWiJCR\"><a href=\"https:\/\/blog.peddals.com\/en\/optimize-context-size-for-ollama-server\/\">A solution for slow LLMs on Ollama server when accessing from Dify or Continue<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"&#8220;A solution for slow LLMs on Ollama server when accessing from Dify or Continue&#8221; &#8212; Peddals Blog\" src=\"https:\/\/blog.peddals.com\/en\/optimize-context-size-for-ollama-server\/embed\/#?secret=INZwKG0gTS#?secret=Ks9EJWiJCR\" data-secret=\"Ks9EJWiJCR\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">If Japanese page opens, click on &#8220;English&#8221; in the right hand.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The article you are currently reading explains how to fine-tune macOS itself by making changes. <strong>This allows for increasing the amount of VRAM (allocation) that can be used by the GPU, enabling the use of larger models and handling longer contexts.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Check Activity Monitor for resources usage<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">First, let&#8217;s confirm if the model is performing well by checking the system resource usage when the LLM is running. This can be done using the Activity Monitor in the Utilities folder on macOS. If memory pressure remains high and stable at green levels and the GPU stays at Max status, it indicates that AI operations are being conducted within the hardware capacity limits of your Mac. Even if memory pressure is yellow but steady without fluctuations, it&#8217;s acceptable. Below is an example from running deepseek-r1:32b Q4_K_M on Ollama from Dify (the low load on CPU and GPU is due to other applications).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"672\" height=\"505\" src=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-4.png\" alt=\"\" class=\"wp-image-2098\" style=\"width:449px;height:auto\" srcset=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-4.png 672w, https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-4-300x225.png 300w\" sizes=\"auto, (max-width: 672px) 100vw, 672px\" \/><figcaption class=\"wp-element-caption\">When the model was loaded and working<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"663\" height=\"504\" src=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-2.png\" alt=\"\" class=\"wp-image-2096\" style=\"width:447px;height:auto\" srcset=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-2.png 663w, https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-2-300x228.png 300w\" sizes=\"auto, (max-width: 663px) 100vw, 663px\" \/><figcaption class=\"wp-element-caption\">Once inference was complete, Ollama released memory usage.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"671\" height=\"505\" src=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-5.png\" alt=\"\" class=\"wp-image-2100\" style=\"width:447px;height:auto\" srcset=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-5.png 671w, https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/image-5-300x226.png 300w\" sizes=\"auto, (max-width: 671px) 100vw, 671px\" \/><figcaption class=\"wp-element-caption\">Even when the memory pressure is yellow but flat, LLM and macOS are working stably.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Also, you can see the size of the memory being used by the model with the <code>ollama ps<\/code> command and the load on the CPU\/GPU. In the following example, it shows that 25GB is being processed 100% on GPU VRAM.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #DCDCAA\">%<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">ollama<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">ps<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">NAME<\/span><span style=\"color: #D4D4D4\">               <\/span><span style=\"color: #CE9178\">ID<\/span><span style=\"color: #D4D4D4\">              <\/span><span style=\"color: #CE9178\">SIZE<\/span><span style=\"color: #D4D4D4\">     <\/span><span style=\"color: #CE9178\">PROCESSOR<\/span><span style=\"color: #D4D4D4\">    <\/span><span style=\"color: #CE9178\">UNTIL<\/span><span style=\"color: #D4D4D4\">               <\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">deepseek-r1:32b<\/span><span style=\"color: #D4D4D4\">    <\/span><span style=\"color: #B5CEA8\">38056<\/span><span style=\"color: #CE9178\">bbcbb2d<\/span><span style=\"color: #D4D4D4\">    <\/span><span style=\"color: #B5CEA8\">25<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">GB<\/span><span style=\"color: #D4D4D4\">    <\/span><span style=\"color: #B5CEA8\">100<\/span><span style=\"color: #CE9178\">%<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">GPU<\/span><span style=\"color: #D4D4D4\">     <\/span><span style=\"color: #B5CEA8\">29<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">minutes<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">from<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">now<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Fine-tuning (1) Increase the usable VRAM capacity<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The blog post above describes how to manipulate context length so as not to exceed the VRAM size specified for macOS (66% or 75% of unified memory). Below, I will explain a method to change this limitation and increase the amount of VRAM capacity available to the GPU. This setting is likely to be effective on Macs with more than 32GB of RAM. The larger the installed RAM capacity, the higher the effect (with 128GB of RAM, standard 96GB of VRAM can be increased to 120GB !!).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One note, the commands I am introducing are only valid for macOS version 15.0 and above. There seem to be another command that works with earlier versions, but since I haven&#8217;t tried them myself, I won&#8217;t introduce those here. Also, obviously, you cannot specify more than your actual RAM size (referenced from: <a href=\"https:\/\/ml-explore.github.io\/mlx\/build\/html\/python\/_autosummary\/mlx.core.metal.set_wired_limit.html\" target=\"_blank\" rel=\"noreferrer noopener\">mlx.core.metal.set_wired_limit<\/a>). As a positive point, the settings specified by command revert to default upon a restart of your Mac, so you can try them with almost no risk.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to change, check, and reset VRAM capacity<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Before making changes, let&#8217;s decide how much VRAM capacity to allocate for the GPU. It&#8217;s good to assign the remaining RAM capacity to the GPU after reserving what is needed by the apps you frequently use. If you&#8217;re unsure, you could keep 8GB (the minimum RAM size of Macs up to M3) for the CPU and allocate all the rest to VRAM (that&#8217;s what I did). The unit for allocation is MB (megabytes), so multiply the number by 1024. In my case, since I want to set 24GB as VRAM from a total of 32GB minus 8GB for the CPU, I allocate 24 * 1024 = 24576. The command would look like this, but you should change 24576 to your desired allocation value and execute it:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#D4D4D4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>sudo sysctl iogpu.wired_limit_mb=24576<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #DCDCAA\">sudo<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">sysctl<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">iogpu.wired_limit_mb=<\/span><span style=\"color: #B5CEA8\">24576<\/span><\/span><\/code><\/pre><span style=\"display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#1E1E1E;color:#c7c7c7;font-size:12px;line-height:1;position:relative\">Zsh<\/span><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Example:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #DCDCAA\">%<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">sudo<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">sysctl<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">iogpu.wired_limit_mb=<\/span><span style=\"color: #B5CEA8\">24576<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">Password:<\/span><span style=\"color: #D4D4D4\"> (input <\/span><span style=\"color: #CE9178\">password<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">if<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">required<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">iogpu.wired_limit_mb:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">0<\/span><span style=\"color: #D4D4D4\"> -&gt; <\/span><span style=\"color: #B5CEA8\">24576<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">This will be reflected immediately. In LM Studio, you just need to quit and relaunch it, then open Command + Shift + H to see the set VRAM size.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"368\" height=\"240\" src=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/32.00-GB.png\" alt=\"\" class=\"wp-image-2122\" style=\"width:186px;height:auto\" srcset=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/32.00-GB.png 368w, https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/32.00-GB-300x196.png 300w\" sizes=\"auto, (max-width: 368px) 100vw, 368px\" \/><figcaption class=\"wp-element-caption\">It was 21.33 GB previously, so gained 2.67 GB!<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Check Ollama log after running an LLM to see the new VRAM size (although it is not the specified value, you can see the increased value):<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #DCDCAA\">%<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">grep<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">~\/.ollama\/logs\/server2.log<\/span><span style=\"color: #D4D4D4\">|<\/span><span style=\"color: #DCDCAA\">tail<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">22906.50<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #569CD6\">*<\/span><span style=\"color: #CE9178\">before<\/span><span style=\"color: #569CD6\">*<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">25769.80<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #569CD6\">*<\/span><span style=\"color: #CE9178\">now<\/span><span style=\"color: #569CD6\">*<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">25769.80<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">25769.80<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">ggml_metal_init:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">recommendedMaxWorkingSetSize<\/span><span style=\"color: #D4D4D4\">  <\/span><span style=\"color: #CE9178\">=<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">25769.80<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">MB<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Here are couple of more related commands:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Check the current value:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #DCDCAA\">%<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">sudo<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">sysctl<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">iogpu.wired_limit_mb<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">Password:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">iogpu.wired_limit_mb:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">24576<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D4D4D4\">(<\/span><span style=\"color: #DCDCAA\">Default<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">is<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">zero<\/span><span style=\"color: #D4D4D4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">iogpu.wired_limit_mb:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">0<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Set the value to default:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #DCDCAA\">%<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">sudo<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">sysctl<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">iogpu.wired_limit_mb=<\/span><span style=\"color: #B5CEA8\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">Password:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">iogpu.wired_limit_mb:<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">24576<\/span><span style=\"color: #D4D4D4\"> -&gt; <\/span><span style=\"color: #B5CEA8\">0<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">If something goes wrong with this setting, go ahead and reboot the Mac, and <strong>it will revert to the default value<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If the current state seems to work fine with a certain amount, you may want to use the new VRAM capacity even after rebooting. In that case, you can achieve this by adding the following command to the <code>\/etc\/sysctl.conf<\/code> file. Please replace the number in the last line with the size you want to specify. However, since an error occurs and it cannot be specified if a value greater than the RAM capacity is designated, to avoid having the startup fail, please proceed with the work carefully.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#1E1E1E\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#D4D4D4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>sudo touch \/etc\/sysctl.conf\nsudo chown root:wheel \/etc\/sysctl.conf\nsudo chmod 0644 \/etc\/sysctl.conf\necho \"iogpu.wired_limit_mb=24576\" >> \/etc\/sysctl.conf<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki dark-plus\" style=\"background-color: #1E1E1E\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #DCDCAA\">sudo<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">touch<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">\/etc\/sysctl.conf<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">sudo<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">chown<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">root:wheel<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">\/etc\/sysctl.conf<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">sudo<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">chmod<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #B5CEA8\">0644<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">\/etc\/sysctl.conf<\/span><\/span>\n<span class=\"line\"><span style=\"color: #DCDCAA\">echo<\/span><span style=\"color: #D4D4D4\"> <\/span><span style=\"color: #CE9178\">&quot;iogpu.wired_limit_mb=24576&quot;<\/span><span style=\"color: #D4D4D4\"> &gt;&gt; <\/span><span style=\"color: #CE9178\">\/etc\/sysctl.conf<\/span><\/span><\/code><\/pre><span style=\"display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#1E1E1E;color:#c7c7c7;font-size:12px;line-height:1;position:relative\">Zsh<\/span><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">After rebooting, if the value set by <code>sudo sysctl iogpu.wired_limit_mb<\/code> is as expected, you are done. If you want to manually reset it to the default value, use <code>sudo sysctl iogpu.wired_limit_mb=0<\/code>. To completely revert to the default settings, remove the added line from <code>\/etc\/sysctl.conf<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Part 2 is now available.<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Actually, I was planning to include the settings for Ollama&#8217;s K\/V cache in this article as well, but it has become quite long, so I wrote it in a different post below. By configuring the K\/V cache (and Flash attention), you can reduce the VRAM usage while minimizing the performance degradation of the LLM, and also improve processing speed. <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-peddals-blog wp-block-embed-peddals-blog\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"kzFc90wux7\"><a href=\"https:\/\/blog.peddals.com\/en\/ollama-vram-fine-tune-with-kv-cache\/\">Optimizing Ollama VRAM Settings for Using Local LLM on macOS (Fine-tuning: 2)<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"&#8220;Optimizing Ollama VRAM Settings for Using Local LLM on macOS (Fine-tuning: 2)&#8221; &#8212; Peddals Blog\" src=\"https:\/\/blog.peddals.com\/en\/ollama-vram-fine-tune-with-kv-cache\/embed\/#?secret=oV9I66neGV#?secret=kzFc90wux7\" data-secret=\"kzFc90wux7\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\"><summary>Image by Stable Diffusion (Mochi Diffusion)<\/summary>\n<p class=\"wp-block-paragraph\">&#8220;Growing juicy apple&#8221; or &#8220;apple started shining&#8221; are closer explanations of an image in my mind, but none of generated images satisfied me. Finally this simple prompt generated an image looked fine.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Date:<br>2025-1-29 23:50:07<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Model:<br>realisticVision-v51VAE_original_768x512_cn<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Size:<br>768 x 512<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Include in Image:<br>an apple turning shiny red<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Exclude from Image:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Seed:<br>3293091901<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Steps:<br>20<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Guidance Scale:<br>20.0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Scheduler:<br>DPM-Solver++<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">ML Compute Unit:<br>CPU &amp; GPU<\/p>\n<\/details>\n","protected":false},"excerpt":{"rendered":"<p>When using a large language model (LLM) locally, the key point to pay attention to is how to run it at 100% GP &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1)&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":2135,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_locale":"en_US","_original_post":"https:\/\/blog.peddals.com\/?p=2091","footnotes":""},"categories":[25,8],"tags":[12],"class_list":["post-2300","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","category-macos","tag-mac","en-US"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1) | Peddals Blog<\/title>\n<meta name=\"description\" content=\"Mac fine-tuning (1), increasing VRAM allocation for local LLMs\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1) | Peddals Blog\" \/>\n<meta property=\"og:description\" content=\"Mac fine-tuning (1), increasing VRAM allocation for local LLMs\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/\" \/>\n<meta property=\"og:site_name\" content=\"Peddals Blog\" \/>\n<meta property=\"article:published_time\" content=\"2025-02-11T05:44:18+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-15T13:39:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/an-apple-turning-shiny-red.505.3293091901.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"768\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Handsome\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Handsome\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"34 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/\"},\"author\":{\"name\":\"Handsome\",\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/#\\\/schema\\\/person\\\/81b2dabca748c3d11a45722f02d9d994\"},\"headline\":\"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1)\",\"datePublished\":\"2025-02-11T05:44:18+00:00\",\"dateModified\":\"2025-11-15T13:39:20+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/\"},\"wordCount\":2099,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/#\\\/schema\\\/person\\\/81b2dabca748c3d11a45722f02d9d994\"},\"image\":{\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/blog.peddals.com\\\/wp-content\\\/uploads\\\/2025\\\/01\\\/an-apple-turning-shiny-red.505.3293091901.jpg\",\"keywords\":[\"mac\"],\"articleSection\":[\"AI\\\/LLM\",\"macOS\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/\",\"url\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/\",\"name\":\"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1) | Peddals Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/blog.peddals.com\\\/wp-content\\\/uploads\\\/2025\\\/01\\\/an-apple-turning-shiny-red.505.3293091901.jpg\",\"datePublished\":\"2025-02-11T05:44:18+00:00\",\"dateModified\":\"2025-11-15T13:39:20+00:00\",\"description\":\"Mac fine-tuning (1), increasing VRAM allocation for local LLMs\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/#primaryimage\",\"url\":\"https:\\\/\\\/blog.peddals.com\\\/wp-content\\\/uploads\\\/2025\\\/01\\\/an-apple-turning-shiny-red.505.3293091901.jpg\",\"contentUrl\":\"https:\\\/\\\/blog.peddals.com\\\/wp-content\\\/uploads\\\/2025\\\/01\\\/an-apple-turning-shiny-red.505.3293091901.jpg\",\"width\":768,\"height\":512},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/en\\\/fine-tune-vram-size-of-mac-for-llm\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u30db\u30fc\u30e0\",\"item\":\"https:\\\/\\\/blog.peddals.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/#website\",\"url\":\"https:\\\/\\\/blog.peddals.com\\\/\",\"name\":\"Peddals Blog\",\"description\":\"AI, LLM, Python, Mac, Pythonista3, iOS, etc. in Japanese and English\",\"publisher\":{\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/#\\\/schema\\\/person\\\/81b2dabca748c3d11a45722f02d9d994\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/blog.peddals.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/blog.peddals.com\\\/#\\\/schema\\\/person\\\/81b2dabca748c3d11a45722f02d9d994\",\"name\":\"Handsome\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/51d7363349ec538c4d62c9ebe89488fd7388729ad0c9dfeebd8bb32ebfb11f17?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/51d7363349ec538c4d62c9ebe89488fd7388729ad0c9dfeebd8bb32ebfb11f17?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/51d7363349ec538c4d62c9ebe89488fd7388729ad0c9dfeebd8bb32ebfb11f17?s=96&d=mm&r=g\",\"caption\":\"Handsome\"},\"logo\":{\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/51d7363349ec538c4d62c9ebe89488fd7388729ad0c9dfeebd8bb32ebfb11f17?s=96&d=mm&r=g\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1) | Peddals Blog","description":"Mac fine-tuning (1), increasing VRAM allocation for local LLMs","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/","og_locale":"en_US","og_type":"article","og_title":"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1) | Peddals Blog","og_description":"Mac fine-tuning (1), increasing VRAM allocation for local LLMs","og_url":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/","og_site_name":"Peddals Blog","article_published_time":"2025-02-11T05:44:18+00:00","article_modified_time":"2025-11-15T13:39:20+00:00","og_image":[{"width":768,"height":512,"url":"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/an-apple-turning-shiny-red.505.3293091901.jpg","type":"image\/jpeg"}],"author":"Handsome","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Handsome","Est. reading time":"34 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/#article","isPartOf":{"@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/"},"author":{"name":"Handsome","@id":"https:\/\/blog.peddals.com\/#\/schema\/person\/81b2dabca748c3d11a45722f02d9d994"},"headline":"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1)","datePublished":"2025-02-11T05:44:18+00:00","dateModified":"2025-11-15T13:39:20+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/"},"wordCount":2099,"commentCount":0,"publisher":{"@id":"https:\/\/blog.peddals.com\/#\/schema\/person\/81b2dabca748c3d11a45722f02d9d994"},"image":{"@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/an-apple-turning-shiny-red.505.3293091901.jpg","keywords":["mac"],"articleSection":["AI\/LLM","macOS"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/","url":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/","name":"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1) | Peddals Blog","isPartOf":{"@id":"https:\/\/blog.peddals.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/#primaryimage"},"image":{"@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/an-apple-turning-shiny-red.505.3293091901.jpg","datePublished":"2025-02-11T05:44:18+00:00","dateModified":"2025-11-15T13:39:20+00:00","description":"Mac fine-tuning (1), increasing VRAM allocation for local LLMs","breadcrumb":{"@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/#primaryimage","url":"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/an-apple-turning-shiny-red.505.3293091901.jpg","contentUrl":"https:\/\/blog.peddals.com\/wp-content\/uploads\/2025\/01\/an-apple-turning-shiny-red.505.3293091901.jpg","width":768,"height":512},{"@type":"BreadcrumbList","@id":"https:\/\/blog.peddals.com\/en\/fine-tune-vram-size-of-mac-for-llm\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u30db\u30fc\u30e0","item":"https:\/\/blog.peddals.com\/"},{"@type":"ListItem","position":2,"name":"Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1)"}]},{"@type":"WebSite","@id":"https:\/\/blog.peddals.com\/#website","url":"https:\/\/blog.peddals.com\/","name":"Peddals Blog","description":"AI, LLM, Python, Mac, Pythonista3, iOS, etc. in Japanese and English","publisher":{"@id":"https:\/\/blog.peddals.com\/#\/schema\/person\/81b2dabca748c3d11a45722f02d9d994"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.peddals.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/blog.peddals.com\/#\/schema\/person\/81b2dabca748c3d11a45722f02d9d994","name":"Handsome","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/51d7363349ec538c4d62c9ebe89488fd7388729ad0c9dfeebd8bb32ebfb11f17?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/51d7363349ec538c4d62c9ebe89488fd7388729ad0c9dfeebd8bb32ebfb11f17?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/51d7363349ec538c4d62c9ebe89488fd7388729ad0c9dfeebd8bb32ebfb11f17?s=96&d=mm&r=g","caption":"Handsome"},"logo":{"@id":"https:\/\/secure.gravatar.com\/avatar\/51d7363349ec538c4d62c9ebe89488fd7388729ad0c9dfeebd8bb32ebfb11f17?s=96&d=mm&r=g"}}]}},"_links":{"self":[{"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/posts\/2300","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/comments?post=2300"}],"version-history":[{"count":15,"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/posts\/2300\/revisions"}],"predecessor-version":[{"id":3203,"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/posts\/2300\/revisions\/3203"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/media\/2135"}],"wp:attachment":[{"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/media?parent=2300"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/categories?post=2300"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.peddals.com\/wp-json\/wp\/v2\/tags?post=2300"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}