Peddals Blog

2025-06-17

Apply local aliases and exports in pipenv/Python

This article explains a method to automatically load a configuration file (like .zshrc) containing aliases and exports in the current directory when zsh (bash) is loaded. The Python virtual environment pipenv loads a new shell, which allows aliases and other settings to be automatically applied. When exiting pipenv, the virtual environment’s settings become invalid, so they do not affect your base shell or other environments. This can be implemented easily (and effectively), but probably because pipenv isn’t that popular, I could not find exact information easily. Hope you love it.

Contents

1 Environment
2 Steps
3 Similar method for venv
- 3.1 Difference with venv
4 Why I needed this

Environment

Shell: zsh (bash seems to work as well.)

Python virtual environment: pipenv

OS: macOS Sequoia (should not matter)

Steps

To use pipenv on macOS, first install it with brew install pipenv. The steps to easily create a virtual environment are like this:

mkdir my_project # Create a project directory.
cd my_project # Go in to the directory.
pipenv --python 3.11 # Create a virtual environment with Python 3.11
pipenv shell # Enter the virtual env. To exit, enter "exit" or press ctrl + D

Add a line to ~/.zshrc.

Add the following to the .zshrc in your home directory.

# Load .zshrc.local if it exists in the current directory.
[[ -f .zshrc.local ]] && source .zshrc.local

This checks whether the .zshrc.local file exists in the current directory as a conditional expression on the left side of &&. If it is true, the source command on the right is executed, loading the .zshrc.local file (I haven’t tried it, but apparently the same method works in bash in terms of syntax).

Place .zshrc.local in the root of pipenv directory.

Of course, anything you can write in .zshrc such as alias or export are allowed to be added. Here is a simple sample for reference:

alias t='time'
export HW="Hello, World!"

Enter the virtual environment and test.

Example of commands and outputs:

$ pipenv shell # Enter the virtual env.
$ t # alias of the time command
(Output of the time command appears here.)

$ echo $HW
Hello, World!

Exit from the virtual environment and test.

Example of commands and outputs:

$ exit # or ctrl + D to exit.
$ t
zsh: command not found: t

$echo $HW

(An empty line appears here.)

Note

If projects to be published on GitHub, etc., be sure to add .zshrc.local to .gitignore.

Similar method for venv

The most popular virtual environment tool venv does not load a new shell. Therefore, similar functionality needs to be implemented through alternative means.

Add the below to the bottom of bin/activate in your venv directory. That’s the same thing as added to the ~/.zshrc in the above.

# Load .zshrc.local if it exists in the current directory.
[[ -f .zshrc.local ]] && source .zshrc.local

It involves one more step than using pipenv, but this allows you to do roughly the same thing.

Difference with venv

This method for venv won’t load a new shell but .zshrc.local, so applied aliases and exports are still valid even after exiting by deactivate. Maybe you better close terminal session to avoid potential conflicts.

Why I needed this

Recently, I run mlx-lm.server to use the MLX version of LLMs. Unlike Ollama, it often happens that the memory is not released (the memory pressure remains high). So, I have no choice but to stop it with Ctrl + C each time and restart the server from the CLI. However, I felt stressed because I couldn’t immediately re-execute it by a single push of the up arrow key when I was entering commands in another terminal window. That’s why I thought about creating an alias that is only valid in the pipenv environment.

I couldn’t find the solution as easily as I thought on the web, but after consulting with local QwQ, Qwen3, and ChatGPT, I eventually figured it out by myself. When I asked each LLM for their evaluation, they praised me with “Great!” and made me happy. It was the motivation of this blog post. haha!

Image by Stable Diffusion (Mochi Diffusion)

I couldn’t imagine what kind of image would go well with this post, so I just got a showroom with various bicycles generated. Still not sure if the image is a good match.

Date:
2025-6-14 19:47:15

Model:
realisticVision-v51VAE_original_768x512_cn

Size:
768 x 512

Include in Image:
showroom with different types of bycicles

Exclude from Image:

Seed:
1251791658

Steps:
20

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
CPU & GPU

2025-02-112025-03-03

Optimizing Ollama VRAM Settings for Using Local LLM on macOS (Fine-tuning: 2)

As of January 2025, there are settings for acceleration and VRAM optimization available for trial use in Ollama. It seems that both may become standard settings soon, but since the latest version 0.5.7 at the time of writing requires users to set them up themselves, I will share how to do so.

For those using local LLMs on Apple Silicon Mac (M series CPU), please also check out the previous article. It introduces how to allocate memory to the Mac’s GPU.

Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1)

Contents

1 Environment
2 Official sources of information
3 Fine-tuning (2) Reduce VRAM Usage and Increase Speed with Flash Attention
4 Fine-tuning (3) Reduce VRAM Usage by K/V Cache Quantization
5 Set Variables Permanently
6 Super Helpful Tool – Interactive VRAM Estimator
7 Last Miscellaneous Notes

Environment

This is a setting for Ollama, so it should not depend on the OS, but I only touched on how to do this on macOS. Also, there seem to be ways to install it by building from source code, using brew, or running with Docker, but I don’t know how to set it up without using the app, so please look into that. Sorry.

macOS: Sequoia 15.1.1
Ollama: 0.5.7 (Ollama.app downloadable at the official website.)

Official sources of information

Ollama FAQ:

How can I enable Flash Attention? (Flash Attention environment variable)

How can I set the quantization type for the K/V cache? (K/V cache environment variable and notes)

The blog of the contributor who introduced K/V caching features to Ollama:

Bringing K/V Context Quantisation to Ollama (Technical details. Very interesting.)

Fine-tuning (2) Reduce VRAM Usage and Increase Speed with Flash Attention

The method I wrote in my previous blog post above was (1), so here I will start from (2).

First, enable Flash Attention in Ollama. Flash Attention helps reduce VRAM usage and also increases the computation speed of LLMs. As it has been explained in various documents, there don’t seem to be any negative impacts from enabling this feature. While some claim that it triples the speed, even if it doesn’t quite do that, there’s no reason not to enable it if all effects are positive. It seems likely that Ollama will default to having this enabled in the future, but for now, you need to enable it yourself. If you’re using a Mac, run the following command in Terminal:

launchctl setenv OLLAMA_FLASH_ATTENTION 1

To disable (revert), set the above value from 1 to 0. To check the current settings, run the getenv command. Below is an example of its execution when it is enabled, returning a 1.

% launchctl getenv OLLAMA_FLASH_ATTENTION
1

Fine-tuning (3) Reduce VRAM Usage by K/V Cache Quantization

K/V cache quantization seems to be a technique that improves computational efficiency by quantizing the context cache and reducing the required memory. It is also referred to as K/V context cache quantization at times. While fine-tuning (1) increased VRAM for loading LLMs to handle larger models or longer contexts, K/V cache achieves similar results by reducing the amount of memory used during model execution. While 8-bit quantization of the model itself causes only minor performance degradation and improves speed, it is expected that K/V cache quantization will have a similar effect on context cache size. When 8-bit quantization is applied to the K/V cache, the required memory amount becomes about half of what it would be without quantization, allowing for doubling the usable context length.

This feature is currently marked as Experimental in Ollama, and there is a possibility that performance may degrade when using embedding models, vision-multimodal models, or high-attention-head type models. Therefore, it seems that Ollama automatically disables this setting when an Embed model is detected. So, understanding that compatibility issues with the model could be a problem, you should try it out and if performance decreases, disable it. Unfortunately, there is no way to enable or disable this for each model at present.

Here are the settings: When it comes to quantization options, you can choose between 8-bit (q8_0) and 4-bit (q4_0), though by default there is no quantization (f16). If you opt for 4-bit, while memory reduction will be significant, performance will also decrease. Therefore, unless it’s a case where you need to use models that previously couldn’t run on GPU alone, choose 8-bit. Additionally, enabling Flash Attention is necessary as a prerequisite; please proceed after executing the fine-tuning (2) mentioned above. The command for Mac (in the case of 8-bit) would be as follows:

launchctl setenv OLLAMA_KV_CACHE_TYPE "q8_0"

To reset to default, specify “f16” as the value. To check the current setting, run the getenv command. Example:

% launchctl getenv OLLAMA_KV_CACHE_TYPE
q8_0

After setting up, you can run the model in Ollama and check the logs to see the quantization and cache size. In the following example, it is default f16 until halfway, and after the change, it becomes q8_0, showing that the overall size has decreased.

(Feb 16, 2025: corrected command.)

% grep "KV self size" ~/.ollama/logs/server2.log|tail
llama_new_context_with_model: KV self size  = 1792.00 MiB, K (f16):  896.00 MiB, V (f16):  896.00 MiB
llama_new_context_with_model: KV self size  = 1536.00 MiB, K (f16):  768.00 MiB, V (f16):  768.00 MiB
llama_new_context_with_model: KV self size  =  512.00 MiB, K (f16):  256.00 MiB, V (f16):  256.00 MiB
llama_new_context_with_model: KV self size  = 1792.00 MiB, K (f16):  896.00 MiB, V (f16):  896.00 MiB
llama_new_context_with_model: KV self size  = 1792.00 MiB, K (f16):  896.00 MiB, V (f16):  896.00 MiB
llama_new_context_with_model: KV self size  =  952.00 MiB, K (q8_0):  476.00 MiB, V (q8_0):  476.00 MiB
llama_new_context_with_model: KV self size  =  952.00 MiB, K (q8_0):  476.00 MiB, V (q8_0):  476.00 MiB
llama_new_context_with_model: KV self size  =  680.00 MiB, K (q8_0):  340.00 MiB, V (q8_0):  340.00 MiB
llama_new_context_with_model: KV self size  =  816.00 MiB, K (q8_0):  408.00 MiB, V (q8_0):  408.00 MiB
llama_new_context_with_model: KV self size  = 1224.00 MiB, K (q8_0):  612.00 MiB, V (q8_0):  612.00 MiB

Set Variables Permanently

With the above two setup methods, the settings will be initialized after restarting the Mac. Below, I introduce a method to create a script that sets the environment variables and can be launched when you log in.

1. Launch Script Editor in Applications > Utilities.

2. Command + N to open a new window and copy-paste the below script. It simply sets environment variables then launch Ollama.

do shell script "launchctl setenv OLLAMA_HOST "0.0.0.0""
do shell script "launchctl setenv OLLAMA_FLASH_ATTENTION 1"
do shell script "launchctl setenv OLLAMA_KV_CACHE_TYPE "q8_0""
tell application "Ollama" to run

3. File menu > Export As > set like below and Save:

Export As: LaunchOllama.app
Where: Application
File Format: Application

4. Apple menu > Settings > General > Login items

5. If you already have Ollama.app, click on [ – ] button to remove it.

6. Click on [ + ] and select the app LaunchOllama.app you just created in the step #3.

7. Reboot your Mac, login, navigate to http://localhost:11434 and run command such as launchctl getenv OLLAMA_FLASH_ATTENTION to see 1 is returned.

Super Helpful Tool – Interactive VRAM Estimator

In the K/V cache feature contributor’s blog introduced earlier, there is a super useful tool called Interactive VRAM Estimator. You can find if a model you want to use will fit in your VRAM with this tool. A combination of the parameter size of the model, the context length, and the quantization level, it estimates the total size in VRAM per K/V Cache quantization level.

For example, in the case of DeepSeek-R1:32B_Q4_K_M, you would choose 32B and Q4_K_M. If you have set up the K/V cache for Q8_0 this time, while looking at the Total of the green bar, select the Context Size to estimate the VRAM size required to run with the combination.

It estimates 16K tokens should fit in 21.5GB VRAM

With 32K (= 32768) tokens見込み, it exceeds my Mac’s VRAM of 24GB, so I’ll enable the Advanced mode in the top right to come up with a more aggressive number. By tweaking the Context Size slider while keeping an eye on the Total of Q8_0, it seems that 24K (24 * 1024=24576) fits within 23GB RAM. Awesome, huh?

So, here’s the result of running ollama ps after putting 24576 in the Size of context window for the generative AI app I made with Dify. It’s processing at a neat 100% GPU usage. Victory!

This is where you set the context length of your AI app in Dify:

Last Miscellaneous Notes

In the previous and this article, I introduced methods for fine-tuning the environment side to run LLMs effectively. Since I only have 32GB of unified memory, it’s been always challenging for me to use LLMs. Thanks to new technology, it has become easier to enjoy open-source LLMs more than before. I hope that even one more person can do so.

I have not conducted any investigations regarding execution speed, so please try it out yourself. At least, just by understanding and implementing the method to accommodate the memory required by LLMs or fit them into 100% VRAM, I think you will find that recent models can be quite enjoyable at a practical speed. 10 tokens per sec should be enough most cases.

To be honest, I think it’s tough to do all sorts of things with a local LLM on just 16GB. On the other hand, if you have 128GB, you could run locally LLMs in parallel.

Recently, while Chinese companies’ models have been highly praised for their performance, there are also discussions about prohibiting their use due to concerns over information leaks. Since you can run them locally, you don’t need to worry and can try them freely. Personally, I like the performance and quick response of the newly released French model mistral-small:24b. It’s also very nice that it doesn’t involve Chinese language or characters like Chinese-made models do (maybe I’m a bit sick of it). Does anyone know when the final (non-preview) version of QwQ will be available?

Image by Stable Diffusion (Mochi Diffusion)

Simply, I asked for an image of lots of goods loaded onto a llama. Initially, I had Mistral-Small 24B create prompts based on my image, but it was completely unsatisfactory. It seems that rather than writing all sorts of things, just listing essential words and repeating generation leads to something more fitting.

Date:
2025-2-2 1:55:30

Model:
realisticVision-v51VAE_original_768x512_cn

Size:
768 x 512

Include in Image:
A Llama with heavy load of luggage on it

Exclude from Image:

Seed:
2221886765

Steps:
20

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
CPU & GPU

2025-02-112025-02-11

Optimizing VRAM Settings for Using Local LLM on macOS (Fine-tuning: 1)

When using a large language model (LLM) locally, the key point to pay attention to is how to run it at 100% GPU usage, that is, how to fit everything into VRAM (GPU memory). If the model overflows from VRAM, it can cause a decrease in response speed, make the entire OS laggy, and in the worst case, crash the OS.

When using a local LLM, the combination of the parameter size and quantization size of the model that can be run, as well as the context length available for use, is generally determined by the capacity of the Unified Memory installed on an Apple Silicon Mac. This article will share methods to exceed the “set” limitations through some deeper settings, optimizing the processing speed and usable context length of local LLMs. If your Mac has a larger amount of Unified Memory installed, it becomes possible to run multiple LLMs or even larger models (= with higher performance) that were previously difficult to execute.

Fine-tuning a generative AI model is not something amateurs can easily undertake, but since “environmental fine-tuning” is involved, you can easily try it out and see results right away. This covers the basics, so even if you’re a beginner you should give it a read if interested.

Contents

1 First, let’s find out the model size that works on your Mac
2 Looking for a model of usable size (for beginners)
3 Download and use LLMs
4 What is the Context Length
5 Check Activity Monitor for resources usage
6 Fine-tuning (1) Increase the usable VRAM capacity
- 6.1 How to change, check, and reset VRAM capacity
7 Part 2 is now available.

First, let’s find out the model size that works on your Mac

Mac’s Unified Memory can be accessed by both the CPU and GPU, but there is a set proportion that the GPU can use. Based on some posts I’ve seen on forums, if no settings have been changed, for Unified Memory of 64GB or more, it seems that up to 3/4 (75%) can be used by the GPU; for less than 64GB, about 2/3 (approximately 66%) can be utilized. Since my Mac has 32GB RAM installed, this means the GPU can use up to 21.33GB of it. If LM Studio is installed, you can check the hardware resources (Command + Shift + H), where VRAM will show something like the below.

When you see “Likely too large” in red while downloading a model in LM Studio, it is telling you that the model is too big for your VRAM capacity. The following screenshot shows that the DeepSeek R1 parameter size of 70B, with an 8-bit quantized MLX format model taking up 74.98GB, so it’s letting you know that it may not work on your environment.

In Ollama, similar value is output as recommendedMaxWorkingSetSize in the log file. Below are the outputs from my environment (server2.log was the latest log file):

% grep recommendedMaxWorkingSetSize ~/.ollama/logs/server2.log|tail -5
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB

Looking for a model of usable size (for beginners)

Just because a model you want to use is actually smaller than your VRAM does not mean it will be usable. The prompts you input and the text output by the LLM also use VRAM. Therefore, even if a model itself is 21GB in size, it won’t run smoothly. To find a model that fits within your actual VRAM, you would try models of plausible sizes in sequence based on the following information.

Look to models with fewer parameters (if 140B or 70B are not feasible, consider 32B → 14B → 7B, etc.)
Search for quantized models (such as 8bit, 4bit for MLX or Q8, Q4_K_M, etc. for GGUF format models)

Models with a smaller number of parameters tend to be created by distilling the original larger model or training them on less data. The goal is to reduce the amount of knowledge while minimizing degradation in features and performance. Depending on the capabilities and use cases of the models themselves, many popular ones these days are usable at around 10 to 30 billion parameters. With fewer parameters, the computation (inference) time also becomes shorter.

The other factor “quantization” is a method to reduce the size of a model using a different approach. Although this expression may not be common and might not be entirely accurate, it can be interpreted similarly to reducing resolution or decreasing color depth in images. While it’s not exactly the same upon closer inspection, it’s a technique that reduces the size to an extent where performance degradation is barely noticeable. Quantization also increases processing speed. Generally, it is said that with 8-bit or Q8 quantization, the benefits of faster processing and smaller size outweigh the percentage of performance loss. The model size decreases as the number gets smaller, but so does performance; therefore, around 4-bit or Q4_K_M would be considered the minimum threshold to maintain decent performance (the last letters S/M/L in GGUF format stand for Small/Medium/Large sizes).

After trying out several downloads, you see the maximum model size you can use on your Mac. In my case, with models that offer multiple parameter sizes, I try downloading one that pushes it to the limit at 32B Q4_K_M, and also download either F16 or Q8 of a smaller parameter like 14B.

Please note, when choosing a vision model, VLM, or so-called multimodal models, it is better to select ones that are even smaller in size compared to language models (LLM). This is because processing tasks such as reading images and determining what is depicted often requires more VRAM, given that images tend to be larger in size than text.

Download and use LLMs

LM Studio allows you to download directly via the Download button and conduct chats through its GUI. For Ollama, after selecting a model on the Models page, if there are options for parameter counts or quantization, choose them from the dropdown menu, then download and run using the Terminal.app (ollama run modelname). Both applications can function as API servers, allowing you to use downloaded models from other apps. I often use Dify, which makes it easy to create AI applications. For methods on how to use the APIs of Ollama and LM Studio via Dify, please check my posts below. (Japanese only for now. I’ll translate in the near future.)

Dify と Ollama を別々の Mac で動かすローカル LLM 環境

Japanese only for now. Get it translated by your web browser.

Mac のローカルオンリー環境で、画像認識 AI の Pixtral 12B MLX 版を使う (LM Studio 編)

Japanese only for now. Get it translated by your web browser.

What is the Context Length

“Context length” refers to the size of the text (actually tokens) exchanged between a user and an LLM during chat. It seems that this varies by model (tokenizer), with Japanese being approximately 1 character = 1+α tokens, and English being about 1 word = 1 (+α) token(s). Additionally, each model has a maximum context length it can handle, which you can check using the ollama show modelname command in Ollama or by clicking on the gear icon next to the model name in My Models on the left side in LM Studio.

When chatting with Ollama from the terminal, the default context length seems to be 2048, and when chatting within the app using LM Studio, it is 4096. If you want to handle longer texts, you need to change the model settings or specify them via the API. Note that increasing the context length requires more VRAM capacity, and if it overflows, performance will slow down. I have documented the solution in the following article.

A solution for slow LLMs on Ollama server when accessing from Dify or Continue

If Japanese page opens, click on “English” in the right hand.

The article you are currently reading explains how to fine-tune macOS itself by making changes. This allows for increasing the amount of VRAM (allocation) that can be used by the GPU, enabling the use of larger models and handling longer contexts.

Check Activity Monitor for resources usage

First, let’s confirm if the model is performing well by checking the system resource usage when the LLM is running. This can be done using the Activity Monitor in the Utilities folder on macOS. If memory pressure remains high and stable at green levels and the GPU stays at Max status, it indicates that AI operations are being conducted within the hardware capacity limits of your Mac. Even if memory pressure is yellow but steady without fluctuations, it’s acceptable. Below is an example from running deepseek-r1:32b Q4_K_M on Ollama from Dify (the low load on CPU and GPU is due to other applications).

Once inference was complete, Ollama released memory usage.

Even when the memory pressure is yellow but flat, LLM and macOS are working stably.

Also, you can see the size of the memory being used by the model with the ollama ps command and the load on the CPU/GPU. In the following example, it shows that 25GB is being processed 100% on GPU VRAM.

%  ollama ps
NAME               ID              SIZE     PROCESSOR    UNTIL               
deepseek-r1:32b    38056bbcbb2d    25 GB    100% GPU     29 minutes from now

Fine-tuning (1) Increase the usable VRAM capacity

The blog post above describes how to manipulate context length so as not to exceed the VRAM size specified for macOS (66% or 75% of unified memory). Below, I will explain a method to change this limitation and increase the amount of VRAM capacity available to the GPU. This setting is likely to be effective on Macs with more than 32GB of RAM. The larger the installed RAM capacity, the higher the effect (with 128GB of RAM, standard 96GB of VRAM can be increased to 120GB !!).

One note, the commands I am introducing are only valid for macOS version 15.0 and above. There seem to be another command that works with earlier versions, but since I haven’t tried them myself, I won’t introduce those here. Also, obviously, you cannot specify more than your actual RAM size (referenced from: mlx.core.metal.set_wired_limit). As a positive point, the settings specified by command revert to default upon a restart of your Mac, so you can try them with almost no risk.

How to change, check, and reset VRAM capacity

Before making changes, let’s decide how much VRAM capacity to allocate for the GPU. It’s good to assign the remaining RAM capacity to the GPU after reserving what is needed by the apps you frequently use. If you’re unsure, you could keep 8GB (the minimum RAM size of Macs up to M3) for the CPU and allocate all the rest to VRAM (that’s what I did). The unit for allocation is MB (megabytes), so multiply the number by 1024. In my case, since I want to set 24GB as VRAM from a total of 32GB minus 8GB for the CPU, I allocate 24 * 1024 = 24576. The command would look like this, but you should change 24576 to your desired allocation value and execute it:

sudo sysctl iogpu.wired_limit_mb=24576

Example:

% sudo sysctl iogpu.wired_limit_mb=24576
Password: (input password if required)
iogpu.wired_limit_mb: 0 -> 24576

This will be reflected immediately. In LM Studio, you just need to quit and relaunch it, then open Command + Shift + H to see the set VRAM size.

It was 21.33 GB previously, so gained 2.67 GB!

Check Ollama log after running an LLM to see the new VRAM size (although it is not the specified value, you can see the increased value):

% grep recommendedMaxWorkingSetSize ~/.ollama/logs/server2.log|tail
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB *before*
ggml_metal_init: recommendedMaxWorkingSetSize  = 25769.80 MB *now*
ggml_metal_init: recommendedMaxWorkingSetSize  = 25769.80 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 25769.80 MB
ggml_metal_init: recommendedMaxWorkingSetSize  = 25769.80 MB

Here are couple of more related commands:

Check the current value:

% sudo sysctl iogpu.wired_limit_mb
Password:
iogpu.wired_limit_mb: 24576

(Default is zero)
iogpu.wired_limit_mb: 0

Set the value to default:

% sudo sysctl iogpu.wired_limit_mb=0
Password:
iogpu.wired_limit_mb: 24576 -> 0

If something goes wrong with this setting, go ahead and reboot the Mac, and it will revert to the default value.

If the current state seems to work fine with a certain amount, you may want to use the new VRAM capacity even after rebooting. In that case, you can achieve this by adding the following command to the /etc/sysctl.conf file. Please replace the number in the last line with the size you want to specify. However, since an error occurs and it cannot be specified if a value greater than the RAM capacity is designated, to avoid having the startup fail, please proceed with the work carefully.

sudo touch /etc/sysctl.conf
sudo chown root:wheel /etc/sysctl.conf
sudo chmod 0644 /etc/sysctl.conf
echo "iogpu.wired_limit_mb=24576" >> /etc/sysctl.conf

After rebooting, if the value set by sudo sysctl iogpu.wired_limit_mb is as expected, you are done. If you want to manually reset it to the default value, use sudo sysctl iogpu.wired_limit_mb=0. To completely revert to the default settings, remove the added line from /etc/sysctl.conf.

Part 2 is now available.

Actually, I was planning to include the settings for Ollama’s K/V cache in this article as well, but it has become quite long, so I wrote it in a different post below. By configuring the K/V cache (and Flash attention), you can reduce the VRAM usage while minimizing the performance degradation of the LLM, and also improve processing speed.

Optimizing Ollama VRAM Settings for Using Local LLM on macOS (Fine-tuning: 2)

Image by Stable Diffusion (Mochi Diffusion)

“Growing juicy apple” or “apple started shining” are closer explanations of an image in my mind, but none of generated images satisfied me. Finally this simple prompt generated an image looked fine.

Date:
2025-1-29 23:50:07

Model:
realisticVision-v51VAE_original_768x512_cn

Size:
768 x 512

Include in Image:
an apple turning shiny red

Exclude from Image:

Seed:
3293091901

Steps:
20

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
CPU & GPU

2024-12-27

MLX-LM API streaming QwQ-32B-Preview with Dify (faster than Ollama)

In this Ollama GitHub issue, there are many comments requesting support for the MLX backend, and some even write that it is 20-40% faster than llama.cpp (GGUF). Curious about these comments, I decided to try the MLX version of my favorite QwQ-32B-Preview – QwQ is Alibaba Qwen team’s open reasoning large language model (LLM) similar to OpenAI’s o1, which iteratively improves answer accuracy.

In conclusion, MLX version is indeed slightly faster. The person who wrote the comment mentioned using an M3 Mac, so the difference might be more noticeable on newer Mac models with M4 chips. Since I tried it out, I’ll leave the method here for reference, Dify with MLX-LM as a local LLM model provider.

By the way, is this an official Ollama X post? It could also be interpreted as hinting that Ollama will officially support the MLX backend.

Can’t wait for Ollama’s MLX support ❤️❤️❤️
— ollama (@ollama) December 7, 2024

Contents

1 What’s MLX?
2 What’s MLX-LM?
3 You can use LM Studio
4 Launch MLX-LM API Server
5 Register in Dify
- 5.1 Add as an OpenAI-API Compatible Model
- 5.2 Create a Chatbot
6 Dify judged MLX was the winner
7 Can MLX-LM Replace Ollama?

What’s MLX?

To put it simply, MLX is Apple’s official machine learning framework for Apple Silicon. It can utilize both the GPU and CPU. Although it may not always achieve peak performance, some reports from various experiments show that it can be faster than using PyTorch with MPS in certain cases.

MLX official GitHub: https://ml-explore.github.io/mlx/build/html/index.html

So, when we refer to an “MLX version of LLM,” we are talking about an open large language model (LLM) that has been converted to run using the MLX framework.

What’s MLX-LM?

MLX-LM is an execution environment for large language models (LLMs) that have been converted to run using MLX. In addition to running the models, it also includes features such as converting models from Hugging Face into MLX format and running an API server. This article introduces how to use it as an API server.

MLX-LM official GitHub: https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md

There is also a similar execution environment MLX-VLM, which supports vision models such as Pixtral and Qwen2-VL.

MLX-VLM official GitHub: https://github.com/Blaizzy/mlx-vlm

There is also a Python package FastMLX that can function as an API server for both MLX-LM and MLX-VLM. Functionally, it is quite appealing. However, the vision models only accept image URLs or paths (which makes them unusable with Dify), and text streaming often fails and throws exceptions. It requires a lot of effort to make it work properly, so I have given up for now. If you are interested, give it a try.

FastMLX official GitHub: https://github.com/arcee-ai/fastmlx

You can use LM Studio

LM Studio can use MLX models, so if you don’t need to use Dify or prefer not to, you can stop reading here. Additionally, you can register LM Studio as an OpenAI API-compatible model provider in Dify. However, with LM Studio, responses from the LLM may not stream smoothly. Therefore, if you plan to use MLX LLMs with Dify, it is better to utilize the API server functionality of MLX-LM.

Launch MLX-LM API Server

Install

To use MLX-LM install MLX-LM in your virtual environment. The version I confirmed was the latest, 0.20.4.

pip install mlx-lm

Start API Server Once

To set up the server, use the mlx_lm.server command (note that the actual command uses an underscore instead of a dash as installed). If Dify or other API clients are running on different hosts or if other servers are using the port, you can specify options as shown in the example below. In my case, Dify is running on another Mac and there’s also a text-to-speech server running on my main Mac, so I specify each accordingly. For more details on the options, check mlx_lm --help. The --log-level option is optional.

mlx_lm.server --host 0.0.0.0 --port 8585 --log-level INFO

The server must be running when you see something like below:

% mlx_lm.server --host 0.0.0.0 --port 8585 --log-level INFO
/Users/handsome/Documents/Python/FastMLX/.venv/lib/python3.11/site-packages/mlx_lm/server.py:682: UserWarning: mlx_lm.server is not recommended for production as it only implements basic security checks.
  warnings.warn(
2024-12-15 21:33:25,338 - INFO - Starting httpd at 0.0.0.0 on port 8585...

Download LLM

I selected the 4-bit quantized model of QwQ (18.44GB) because it must fit in 32GB of RAM.

HuggingFace: https://huggingface.co/mlx-community/QwQ-32B-Preview-4bit

Open another terminal window while the MLX-LM server is running, write and save a simple script like the one below, and then run it with Python to download the model.

import requests

url = "http://localhost:8585/v1/models"
params = {
    "model_name": "mlx-community/QwQ-32B-Preview-4bit",
}

response = requests.post(url, params=params)
print(response.json())

python add_models.py

Once the download is complete, you can stop the server by pressing Ctrl + C. By the way, the model downloaded using this method can also be loaded by LM Studio. If you want to try both applications, downloading via command line will help reduce storage space (although the folder names become non-human friendly in LM Studio).

Start API Server with a LLM

The model is saved in ~/.cache/huggingface/hub/, and for this example, it will be in the folder models--mlx-community--QwQ-32B-Preview-4bit. The path passed to the server command needs to go deeper into the snapshot directory where the config.json file is located.

The command to start the API server would look like this:

mlx_lm.server --host 0.0.0.0 --port 8585 --model /Users/handsome/.cache/huggingface/hub/models--mlx-community--QwQ-32B-Preview-4bit/snapshots/e3bdc9322cb82a5f92c7277953f30764e8897f85

Once the server starts, you can confirm installed models by navigating to: http://localhost:8585/v1/models

{"object": "list", "data": [{"id": "mlx-community/QwQ-32B-Preview-4bit", "object": "model", "created": 1734266953}

Register in Dify

Add as an OpenAI-API Compatible Model

To register the model in Dify, you will add it as an OpenAI-API-compatible LLM model. The model name is the one mentioned frequently above. The URL needs to include the port number and /v1, and you can use something like \n\n for the Delimiter.

Create a Chatbot

When creating a Chatbot Chatflow, select the model you just added with 4096 for the Max Tokens. This size fits in 32GB RAM and runs 100% on GPU. To avoid getting answers in Chinese, try the sample System prompt below. QwQ may still use some Chinese sentences from time to time though.

Never ever use Chinese. Always answer in English or language used to ask.

Comparing to Ollama, configurable parameters are limited for OpenAI API compatible models.

That’s about it. Enjoy the speed of MLX version of your LLM.

Dify judged MLX was the winner

Now that everything is set up, I created chatbots using the same conditions with both GGUF (ollama pull qwq:32b-preview-q4_K_M) and MLX. The settings were as follows: Temperature=0.1, Size of context window=4096, Keep Alive=30m, with all other settings at their default values. I asked seven different types of questions to see the differences.

Based on Dify’s Monitoring, it seems that the MLX version was 30-50% faster. However, in practical use, I didn’t really notice a significant difference; both seemed sufficiently fast to me. Additionally, the performance gap tended to be more noticeable with larger amounts of generated text. In this test, MLX produced more text before reaching an answer, which might have influenced the results positively for MLX. The nature of the QwQ model may also have contributed to these favorable outcomes.

Overall, it’s reasonable to say that MLX is about 30% faster than GGUF, without exaggeration. First image below is MLX and the next one is GGUF.

Ollama (GGUF) 10 T/s is also fast enough.

Prompts I used for performance testing:

(1) Math:
I would like to revisit and learn calculus (differential and integral) now that I am an adult. Could you teach me the basics?

(2) Finance and documentation:
I would like to create a clear explanation of a balance sheet. First, identify the key elements that need to be communicated. Next, consider the points where beginners might make mistakes. Then, create the explanation, and finally, review the weak points of the explanation to produce a final version.

(3) Quantum biology:
Explain photosynthesis in quantum biology using equations.

(4) Python scripting:
Please write a Python script to generate a perfect maze. Use "#" for walls and " " (space) for floors. Add an "S" at the top-left floor as the start and a "G" at the bottom-right floor as the goal. Surround the entire maze with walls.

(5) Knowledge:
Please output the accurate rules for the board game Othello (Reversi).

(6) Planning:
You are an excellent web campaign marketer. Please come up with a "Fall Reading Campaign" idea that will encourage people to share on social media.

### Constraints
- The campaign should be easy for everyone to participate in.
- Participants must post using a specific hashtag.
- The content should be engaging enough that when others read the posts, they want to mention or create their own posts.
- This should be an organic buzz campaign without paid advertising.

(7) Logic puzzle:
Among A to D, three are honest and one is a liar. Who is the liar?

A: D is lying.
B: I am not lying.
C: A is not lying.
D: B is lying.

Can MLX-LM Replace Ollama?

If you plan to stick with a single LLM, I think MLX-LM is fine. However, in terms of ease of use and convenience, Ollama is clearly superior, so it may not be ideal for those who frequently switch between multiple models. FastMLX, which was mentioned earlier, allows model switching from the client side, so it could be a viable option if you are seriously considering migrating. That said, based on what seems to be an official X post from Ollama, they might eventually support MLX, so I’m inclined to wait for that.

Regardless, this goes slightly off the original GGUF vs MLX comparison, but personally, I find QwQ’s output speed sufficient for chat-based applications. It’s smart as well (I prefer Qwen2.5 Coder for coding, though). Try it out if you haven’t.

Oh, by the way, most of this post was translated by QwQ from Japanese. Isn’t that great?

Image by Stable Diffusion (Mochi Diffusion)

When I asked images of “a robot running on a big apple”, most of them had robot in NYC. Yeah, sure. Simply ran several attempts and picked one looked the best. If the model learned from old school Japanese anime and manga, I could get something closer to my expectation.

Date:
2024-12-16 0:38:20

Model:
realisticVision-v51VAE_original_768x512_cn

Size:
768 x 512

Include in Image:
fancy illustration, comic style, smart robot running on a huge apple

Exclude from Image:

Seed:
2791567837

Steps:
26

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
CPU & GPU

2024-12-26

Solve the problem that the mouse wheel scrolls backwards for a moment

This article is mainly for Mac users. The mice in question are average or budget models that cost around $10~$40. The issue I was able to resolve pertains to USB dongle types.

Note: This is not about “how to reverse the direction of the mouse wheel rotation and the scroll direction on a Mac.” That setting can be easily found.

This article explains how to resolve an issue where the scroll direction momentarily reverses while using the mouse. Specifically, it addresses the problem where, after stopping the scroll and trying to scroll in the same direction again after a short pause, the scroll momentarily reverses, causing you to lose where you’re looking at. If this method does not solve your issue, please try other solutions available on the numerous other websites.

Contents

1 Resolution
2 Why This Works – Here’s My Guess
3 Test Results and the Mouse I’m Using

Resolution

In the Mouse settings of System Settings under the Apple menu, lower the Scrolling Speed by about two notches. The optimal setting may vary, so adjust it to see what works best for you. This should resolve the issue mentioned above. I think you might not even notice a significant decrease in scroll speed even if you lower it by three or four notches. Nonetheless, this should solve the problem.

Why This Works – Here’s My Guess

If you’ve ever taken apart a mouse after hearing advice like “clean the wheel every few months,” you might know that most mouse wheels have many fine grooves on the inside. Light or laser is emitted from one side and detected by a sensor on the other side to read the wheel’s rotation. With cheaper mice, when Mac increases the sensitivity (or scroll speed), the mouse may not be able to keep up with the reading speed required, resulting in an incorrect detection of reverse rotation. By slowing down the scroll speed, you allow the Mac to read the wheel’s movements more accurately and consistently, which should result in the correct behavior. That’s my guess, anyway.

Test Results and the Mouse I’m Using

In my case, I use the same mouse with Windows for work and Mac for personal use and have never experienced this issue on Windows. This led me to rule out issues such as dirt, hardware malfunction, or battery depletion.

One day, I recalled that old Mac OS had a higher resolution/read frequency than Windows for sensitive controls of mice even with balls inside decades ago. Based on this, I tried lowering the scrolling speed, and voila! It worked! Since making this change, the issue has barely occurred at all. None of web pages Google suggested had this information, so I hope this article will help you mitigate the annoying mouse wheel issue.

For reference, the mouse I am using is a Logitech M220 (laser, silent type, comes with a USB dongle) which costs only about $10. I really like the appropriate resistance of the wheel and the ease of clicking, so it’s great that this solution worked.

Image by Stable Diffusion (Mochi Diffusion)

For the top image, when I tried to create an illustration of the mouse hero defeating a mad scientist, I ended up with only fake images resembling famous mice characters. Since none of those looked fine to use, I ultimately went with a strange character that no one would complain about.

Date:
2024-10-18 0:29:23

Model:
realisticVision-v51VAE_original_768x512_cn

Size:
768 x 512

Include in Image:
comicbook cover, the super hero mouse-man versus a mad doctor

Exclude from Image:

Seed:
2438098213

Steps:
25

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
CPU & GPU

2024-09-012025-01-01

A solution for slow LLMs on Ollama server when accessing from Dify or Continue

Recently, the performance of open-source and open-weight LLMs has been amazing, and for coding assistance, DeepSeek Coder V2 Lite Instruct (16B) is sufficient, while for Japanese and English chat or translation, Llama 3.1 Instruct (8B) is enough. When running Ollama from the Terminal app and chatting, the generated text and response speed are truly surprising, making it feel like you can live without the internet for a while.

However, when using the same model through Dify or Visual Studio Code’s LLM extension Continue, you may notice the response speed becomes extremely slow. In this post, I will introduce a solution to this problem. Your problem may be caused by something else, but since it is easy to check and fix, I recommend checking the Conclusion section of this post.

Contents

1 Confirmed Environment
- 1.1 OS and app versions:
- 1.2 LLM and size
2 Conclusion
3 Changing Context Length
- 3.1 Dify
- 3.2 Continue (VS Code LLM extension)
4 Checking Context Length of LLM
5 Context Length in App Settings
- 5.1 Dify > Model Provider > Ollama
- 5.2 Continue > “models”
6 What’s happening when LLM processing is slow (based on what I see)
7 Finding a suitable size of context length
8 Ollama, I’m sorry (you can skip this)

Confirmed Environment

OS and app versions:

macOS: 14.5
Ollama: 0.3.8
Dify: 0.6.15
Visual Studio Code - Insiders: 1.93.0-insider
Continue: 0.8.47

LLM and size

Model name	Model size	Context length	Ollama download command
llama3.1:8b-instruct-fp16	16 GB	131072	`ollama pull llama3.1:8b-instruct-fp16`
deepseek-coder-v2:16b-lite-instruct-q8_0	16 GB	163840	`ollama run deepseek-coder-v2:16b-lite-instruct-q8_0`
deepseek-coder-v2:16b-lite-instruct-q6_K	14 GB	163840	`ollama pull deepseek-coder-v2:16b-lite-instruct-q6_K`

Mac with 32GB RAM is capable of running them on memory.

Conclusion

Check the context length and lower it.

By setting “Size of context window” in Dify or Continue to a sufficiently small value, you can solve this problem. Don’t set a number just because the model supports it or for future use; instead, use the default value (2048) or 4096 and test chatting with a small number of words. If you get a response as you expect, congrats, the issue is resolved.

Context size: It is also called "context window" or "context length." It represents the total number of tokens that an LLM can process in one interaction. Token count is approximately equal to word count in English and other supported languages. In the table above, Llama 3.1 has a context size of 131072, so it can handle approximately 65,536 words text as input and output.

Changing Context Length

Dify

Open the LLM block in the studio app and click on the model name to access detailed settings.
Scroll down to find “Size of cont…” (Size of content window) and uncheck it or enter 4096.
The default value is 2048 when unchecked.

Continue (VS Code LLM extension)

Open the config.json file in the Continue pane’s gear icon.
Change the contextLength and maxTokens values to 4096 and 2048, respectively. Note that maxTokens is the maximum number of tokens generated by the LLM, so we set it half.

    {
      "title": "Chat: llama3.1:8b-instruct-fp16",
      "provider": "ollama",
      "model": "llama3.1:8b-instruct-fp16",
      "apiBase": "http://localhost:11434",
      "contextLength": 4096,
      "completionOptions": {
        "temperature": 0.5,
        "top_p": "0.5",
        "top_k": "40",
        "maxTokens": 2048,
        "keepAlive": 3600
      }
    }

Checking Context Length of LLM

The easiest way is to use the Ollama’s command ollama show <modelname> to display the context length. Example:

% ollama show llama3.1:8b-instruct-fp16
  Model                                          
  	arch            	llama 	                         
  	parameters      	8.0B  	                         
  	quantization    	F16   	                         
  	context length  	131072	                         
  	embedding length	4096  	                         
  	                                               
  Parameters                                     
  	stop	"<|start_header_id|>"	                      
  	stop	"<|end_header_id|>"  	                      
  	stop	"<|eot_id|>"         	                      
  	                                               
  License                                        
  	LLAMA 3.1 COMMUNITY LICENSE AGREEMENT        	  
  	Llama 3.1 Version Release Date: July 23, 2024

Context Length in App Settings

Dify > Model Provider > Ollama

When adding an Ollama model to Dify, you can override the default value of 4096 for Model context length and Upper bound for max tokens. Since setting a upper limit may make debugging difficult if issues arise, it’s better to set both values to the model’s context length and adjust the Size of content window in individual AI apps.

Continue > “models”

In the “models” section of the config.json, you can add multiple settings for different context length by including a description like “Fastest Max Size” or “4096“. For example, I set the title to “Chat: llama3.1:8b-instruct-fp16 (Fastest Max Size)” and changed the contextLength value to 24576 and maxTokens value to 12288. This combination was the highest that I confirmed working perfectly on my Mac with 32 GB RAM.

    {
      "title": "Chat: llama3.1:8b-instruct-fp16 (Fastest Max Size)",
      "provider": "ollama",
      "model": "llama3.1:8b-instruct-fp16",
      "apiBase": "http://localhost:11434",
      "contextLength": 24576,
      "completionOptions": {
        "temperature": 0.5,
        "top_p": "0.5",
        "top_k": "40",
        "maxTokens": 12288,
        "keepAlive": 3600
      }
    }

What’s happening when LLM processing is slow (based on what I see)

When using ollama run, LLM runs quickly, but when using Ollama through Dify or Continue, it becomes slow due to large size of context length. Let’s check the process with ollama ps. Below are examples – first one had the max context length 131072 and the second one had 24576:

% ollama ps
NAME                     	ID          	SIZE 	PROCESSOR      	UNTIL               
llama3.1:8b-instruct-fp16	a8f4d8643bb2	49 GB	54%/46% CPU/GPU	59 minutes from now	

% ollama ps
NAME                     	ID          	SIZE 	PROCESSOR	UNTIL              
llama3.1:8b-instruct-fp16	a8f4d8643bb2	17 GB	100% GPU 	4 minutes from now

In the slow case, SIZE is much larger than the actual model size (16 GB), and processing occurs on CPU at 54% and GPU at 46%. It seems that Ollama processes LLM as a larger size model when a large size context length is passed via API regardless of the actual number of tokens being processed. This is only my assumption, but the above tells.

Finding a suitable size of context length

After understanding the situation, let’s take countermeasures. If you can live with 4096 tokens, it’s fine, but I want to process as many tokens as possible. Unfortunately, I couldn’t find Ollama’s specifications, so I tried adjusting the context length by hand and found that a value of 24576 (4096*6) works for Llama 3.1 8B F16 and DeepSeek-Coder-V2-Lite-Instruct Q6_K.

Note that using non-multiple-of-4096 values may cause character corruption, so be careful. Also, when using Dify, the SIZE value will be smaller than in Continue.

Ollama, I’m sorry (you can skip this)

I thought Ollama’s server processing was malfunctioning because LLM ran quickly when running on CLI but became slow when used through API. However, after trying an advice “Try setting context length to 4096” from an issue discussion about Windows + GPU, I found that it actually solved the problem.

Ollama, I’m sorry for doubting you!

Image by Stable Diffusion (Mochi Diffusion)

This time I wanted an image of a small bike overtaking a luxurious van or camper, but it wasn’t as easy as I thought somehow. Most of generated images had two bikes, a bike and a van on reversing lanes, a van cut off of the sight, etc. Only this one had a bike leading a van.

Date:
2024-9-1 2:57:00

Model:
realisticVision-v51VAE_original_768x512_cn

Size:
768 x 512

Include in Image:
A high-speed motorcycle overtaking a luxurious van

Exclude from Image:

Seed:
2448773039

Steps:
20

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
All

2024-07-222024-07-22

Run Meta’s Audio Generation AI model, AudioGen, on macOS with MPS (GPU)

Meta, the company behind Facebook, released AudioCraft – an AI capable of generating music and sound effects from English text. The initial version, v0.0.1, dropped in June 2023, followed by few revisions and the latest (as of now writing this) v1.3.0 in May 2024. The best part? You can run it locally for free!

However, there’s a catch: official support is limited to NVIDIA GPUs or CPUs. macOS users are stuck with CPU-only execution. Frustrating, right?

After much research and experimentation, I discovered a way to speed up the generation process for AudioGen, AudioCraft’s sound effects generator, by leveraging Apple Silicon’s GPU – MPS (Metal Performance Shaders)!

In this article, I’ll share my findings and guide you through the steps to unlock faster audio generation on your Mac.

AudioCraft: https://ai.meta.com/resources/models-and-libraries/audiocraft

GitHub: https://github.com/facebookresearch/audiocraft

Contents

1 Notes
2 Environment Setup
3 Sample Code
4 Usage
5 MPS cannot be used with MusicGen or MAGNeT.

Notes

While AudioCraft’s code is released under the permissive MIT license, it’s important to note that the model weights (the pre-trained files downloaded from Hugging Face) are distributed under the CC-BY-NC 4.0 license, which prohibits commercial use. Therefore, be mindful of this restriction if you plan to publicly share any audio generated using AudioCraft.

AudioCraft also includes MusicGen, a model for generating music, as well as MAGNeT, a newer, faster, and supposedly higher-performing model. Unfortunately,
I wasn’t able to get these models running with MPS.

While development isn’t stagnant, there are a few open issues on GitHub, hinting at possible future official support. However, even though you can run AudioCraft locally for free, unlike platforms like Stable Audio which offer commercial licenses for a fee, it seems unlikely that any external forces besides the passionate efforts of open-source programmers will drive significant progress. So, let’s manage our expectations!

Environment Setup

Confirmed Working Environment

macOS: 14.5
ffmpeg version 7.0.1

Setup Procedure

Install ffmpeg if not installed yet. You need brew installed.

brew install ffmpeg

Create a directory and clone the AudioCraft repository. Choose your preferred directory name.

mkdir AudioCraft_MPS
cd AudioCraft_MPS
git clone https://github.com/facebookresearch/audiocraft.git .

Set up a virtual environment. I prefer pipenv, but feel free to use your favorite. Python 3.9 or above is required.

pipenv --python 3.11
pipenv shell

Install PyToch with a specific version 2.1.0.

pip install torch==2.1.0

Set xformer’s version to 0.0.20 in requirements.txt. MPS doesn’t support xformers, but this was the easiest workaround. The example below uses vim, but feel free to use your preferred text editor.

vi requirements.txt
#xformer<0.0.23
xformers==0.0.20

Install everything, and the environment is set up!

pip install -e .

Edit one file to use MPS for generation.

Modify the following file to use MPS only for encoding:

audiocraft/models/encodec.py

The line numbers might vary depending on the version of the cloned repository, but the target is the decode() method within the class EncodecModel(CompressionModel):. Comment out the first out = self.decoder(emb) in the highlighted section and add the if~else block below it.

    def decode(self, codes: torch.Tensor, scale: tp.Optional[torch.Tensor] = None):
        """Decode the given codes to a reconstructed representation, using the scale to perform
        audio denormalization if needed.

        Args:
            codes (torch.Tensor): Int tensor of shape [B, K, T]
            scale (torch.Tensor, optional): Float tensor containing the scale value.

        Returns:
            out (torch.Tensor): Float tensor of shape [B, C, T], the reconstructed audio.
        """
        emb = self.decode_latent(codes)
        #out = self.decoder(emb)
        # Below if block is added based on https://github.com/facebookresearch/audiocraft/issues/31
        if emb.device.type == 'mps':
            # XXX: Since mps-decoder does not work, cpu-decoder is used instead
            out = self.decoder.to('cpu')(emb.to('cpu')).to('mps')
        else:
            out = self.decoder(emb)

        out = self.postprocess(out, scale)
        # out contains extra padding added by the encoder and decoder
        return out

The code mentioned above was written by EbaraKoji (whose name suggests he might be Japanese?) from the following issue. I tried using his forked repository, but unfortunately, it didn’t work for me.

https://github.com/facebookresearch/audiocraft/issues/31#issuecomment-1705769295

Sample Code

This code below is slightly modified from something found elsewhere. Let’s put it in the demos directory along with other executable demo codes.

from audiocraft.models import AudioGen
from audiocraft.data.audio import audio_write
import argparse
import time

model = AudioGen.get_pretrained('facebook/audiogen-medium', device='mps')
model.set_generation_params(duration=5)  # generate [duration] seconds.

start = time.time()
def generate_audio(descriptions):
  wav = model.generate(descriptions)  # generates samples for all descriptions in array.
  
  for idx, one_wav in enumerate(wav):
      # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
      audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
      print(f'Generated {idx}.wav.')
      print(f'Elapsed time: {round(time.time()-start, 2)}')

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Generate audio based on descriptions.")
    parser.add_argument("descriptions", nargs='+', help="List of descriptions for audio generation")
    args = parser.parse_args()
    
    generate_audio(args.descriptions)

The key part is device='mps' on line 6. This instructs it to use the GPU for generation. Changing it to 'cpu' will make generation slower but won’t consume as much memory. Also, there is another pre-trained smaller audio model facebook/audiogen-small available, (I haven’t tested this one).

Usage

Note: The first time you run it, the pre-trained audio model will be downloaded, which may take some time.

You can provide the desired sound in English as arguments, and it will generate audio files named 0.wav, 1.wav,…. The generation speed doesn’t increase much whether you provide one or multiple arguments, so I recommend generating several at once.

python demos/audiogen_mps_app.py "text 1" "text 2"

Example:

python demos/audiogen_mps_app.py "heavy rain with a clap of thunder" "knocking on a wooden door" "people whispering in a cave" "racing cars passing by"

/Users/handsome/Documents/Python/AudioCraft_MPS/.venv/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Generated 0.wav.
Elapsed time: 53.02
Generated 1.wav.
Elapsed time: 53.08
Generated 2.wav.
Elapsed time: 53.13
Generated 3.wav.
Elapsed time: 53.2

On an M2 Max with 32GB RAM, starting with low memory pressure, a 5-second file takes around 60 seconds to generate, and a 10-second file takes around 100 seconds.

There’s a warning that appears right after running it, but since it works, I haven’t looked into it further. You can probably ignore it as long as you don’t
upgrade the PyTorch (torch) version.

MPS cannot be used with MusicGen or MAGNeT.

I tried to make MusicGen work with MPS using a similar approach, but it didn’t succeed. It does run on CPU, so you can try the GUI with python demos/musicgen_app.py.

MAGNeT seems to be a more advanced version, but I couldn’t get it running on CPU either. Looking at the following issue and the linked commit, it appears that it might work. However, I was unsuccessful in getting it to run myself.

https://github.com/facebookresearch/audiocraft/issues/396

So, that concludes our exploration for now.

Image by Stable Diffusion (Mochi Diffusion)
This part, which I’ve been writing at the end of each article, will now only be visible to those who open this specific title. It’s not very relevant to the main content.
This time, it generated many good images with a simple prompt. I chose the one that seemed least likely to trigger claustrophobia.

Date:
2024-7-22 1:52:43

Model:
realisticVision-v51VAE_original_768x512_cn

Size:
768 x 512

Include in Image:
future realistic image of audio generative AI

Exclude from Image:

Seed:
751124804

Steps:
20

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
All

2024-04-302024-05-27

How I developed “Speech + Subtitles Player” desktop app with Flet for Python.

Flet can let you develop cool desktop apps in Python. I previously released an app that could play audio and display subtitles (SRT) simultaneously, as well as edit subtitles. How did I make it? Here’s the background, steps, and code. The finished product is a standalone desktop app, and it’s not overly complicated. However, using Python + Flet to create a single application from start to finish is not something you see often, so I hope this blog helps some Flet app developers! It’s a long read, so I suggest searching word or using the table of contents rather than reading the whole post.

Speech + Subtitles (SRT) Player app made with Flet

Contents

1 Code and assets
2 Flet official documentations
3 Background and things not technical
- 3.1 Developer (myself) and background
- 3.2 My development environment
4 My development process
5 Overview
- 5.1 GUI layout
- 5.2 Code overview
  - 5.2.1 About SRT file format
6 Explanation of the code
7 Things I Thought About While Creating This App and Why I Wrote This Article
8 Image by Stable Diffusion

Code and assets

The code, along with the Python code, logos for execution, and images for building are all stored on GitHub.

Flet official documentations

If you’re using Flet for the first time, please read the official documentation first.

New releases are announced on the official blog and Discord, and other miscellaneous links can be found on the support page.

Background and things not technical

Developer (myself) and background

I am an IT manager at a Japanese office of a global company. Programming is my hobby, and I have been creating small, unfinished, and experimental programs for several decades (in 8bit old-school BASIC, HyperCard/HyperTalk, HTML/JavaScript, and Python). I have read several introductory books on Python, but I have only read about 60-80% of each one. I tend to get bored and stop reading before the end because I start thinking about creating something new instead. In the past, I have used Tkinter and PySimpleGUI to create desktop applications, but I have not been satisfied with the results. Recently, I discovered Flet, which has a beautiful design and allows me to create desktop, web, and mobile applications with relative ease. I was so impressed that I started experimenting with it immediately. One day, I was amazed by OpenAI’s Whisper, an excellent speech recognition tool, and on impulse, I started developing a subtitle editing app using Flet (I couldn’t find anything similar in the market). Before that, I had created a password generator app using Flet, which is available both as a desktop application and on the web.

My development environment

Mac (started with Mac mini M1 16GB RAM then migrated to a Mac Studio M2 Max 12-core CPU / 30-core GPU / 32GB RAM, refurbished)
Keyboard: HHKB Pro 2 Type-S (USB only model)
Mouse: Logi’s silent mouse
Monitors: Dell 4K 27-inch and QHD 24-inch
IDE: VisualStudio Code – Insiders (since the beginning I started using M1 mac mini)
Version control: GitHub and GitHub Desktop
Image generation: Mochi Diffusion (I used Keynote to design the app logo)
Speech recognition/text generation: MLX Whisper and a simple SRT generation code
Files for test: m4a audio file generated by yt_dlp and SRT text file as mentioned above.
Memo, task management: Smartsheet (free account) and Apple default Memo app
Hand writing/drawing: Whiteboard notebook nu board and PILOT Board Master S (links are amazon Japan pages)
Well visited websites: Flet official website, Discord server, and Copilot free version
Python: 3.11.7
Flet: 0.21.2 (pip install flet==0.21.2)
In my other blog posts you can find additional info around how to build Flet app

How to build Python GUI app on macOS with Flet

Successful build of Flet app with Audio control

My development process

It was something like the list below. I thought of a plan and started implementing it, but Flet’s implementation method wasn’t clear to me and I couldn’t make it work as expected. I spent several days going back and forth between reading the documentation and trying to write code, but I couldn’t get it right. However, my motivation didn’t drop even though I was stuck for a while. Instead, I took a break from Flet app development and worked on other things, like improving Whisper’s recognition accuracy by adjusting parameters or imagining the target users and use cases of the app. After a few days, I came back to Flet with refreshed energy and enthusiasm, and I was able to complete the app at last.

started my project by creating an audio control that looked like the official one.
added a local audio file to the code and played it.
drew an interface rough draft on a whiteboard (nu board) while imagining the app’s features.
implemented a slider that would move based on the audio playback status and displayed the elapsed time in the console. However, the slider did not move so I thought something was wrong with my code and spent some time to troubleshoot. It turned out the audio length was too long. Slider was already moving.
in oppose to the above, added a feature that allowed users to move the slider which would play the audio from the designated position. This feature took several days to implement, but it was worth it.
started by creating a FilePicker that would allow users to select audio files. This feature was quite straightforward and didn’t require much effort. macOS remembers folder previously accessed. Nice.
implemented the ability to automatically load any matching subtitles or transcripts when an audio file is selected. This feature was quite useful and made the app more convenient for users.
added a function that would automatically generate a button from loaded subtitles by referencing to the official To-Do app tutorial. It was great seeing my app generating buttons.
implemented millisecond and 00:00:00,000 format conversion logics. I used Copilot’s suggestions and started getting help from Copilot more frequently.
rewrote the main part to a class. After that, I gradually understood the importance and meaning of Python classes.
rewrote the entire code to use async functions. However, this did not improve the response time when there were many buttons on the screen. Later, Flet became async-first, and I had unknowingly taken the lead. Haha.
implemented class interactions (e.g., playing audio based on the current flow, scrolling through subtitles, jumping to timestamp, etc.). I also relearned classes in practice.
implemented file saving and loading functions. The operating system warns the user when there is a conflict with an existing file. It’s very convenient.
implemented a SnackBar to notify users if there was no file to load. This was easy to use and did not obstruct the interface.
a bug occurred where the write dialog would not open and the app would stop doing anything. Since the issue was not reproducible but occurs rarely, I changed the export-as dialog to individual buttons instead.
as an app, I prepared for release by finding free fonts for my logo and icon. I was simply too tired to investigate the cause and wanted to escape for a little while.
added copyright information, made overall design adjustments, and prepared the app for release.
found that built macOS app crashed due to NumPy, and I could not resolve. I logged an issue on GitHub.
attempted to make a web app instead, but I couldn’t open local files directly and gave up for now.
released the app on GitHub and blogged about it. At this moment the app could be run by python main.py.
Copilot suggested a NumPy-free implementation, which I used to build the macOS app successfully.
added the build process to the GitHub README and wrote a blog post about it.
Finally, I started writing this article.

Overview

GUI layout

Please excuse the handwritten text and drawing. The whiteboard itself is an app (=page) and you can see that there is a large column in the middle of it, which contains the main class definition. The other two sections, Audio and Dialogs, are usually not displayed and are added to the page from the main function. Everything else is wrapped in containers or rows and added to the page from top to bottom.

Code overview

Here’s a breakdown of the code by line number (xx-yy) and general content:

(1-4) I’m importing Flet and other modules – I use os for path operations and datetime only to add dates to file names, so almost all the necessary elements and features of my app are being created using Flet alone.
(6-79) Function block – conversion between milliseconds and digital format, and conversion of loaded text into a list for use within the app.
(81-183) SubButton class that generates subtitle buttons from the list – Initialization, build method to layout text and buttons, and methods to process that performs various processing when the button is clicked.
(185-791) The main AudioSubPlayer class of the app – first, in the initialization function (lines 187-374), all buttons, text fields, and other Flet controls used in the app’s layout are defined like self.foobar, and then in the next method block (lines 376-738), logics using async for events such as clicks are defined, and finally, in the build function (lines 740-791), the page layout is defined.
(793-812) The main function – defining the basic structure of the window using async, and adding audio and dialog instances as overlays to the page.
(815) Calls the main function

I think there are unnecessary long parts in my code, but it seems that the Flet code tends to become lengthy.

About SRT file format

The subtitle file format currently supported by this app is SRT. It’s a text file with the extension .srt. According to Wikipedia, it originated from a text subtitle file format generated by SubRip, a Windows freeware. It was adopted because it was used in Whisper for speech-to-text conversion. You can find my blog post here about how to use Whisper to convert audio files into SRT format on macOS (with some simple Python code).

The SRT file consists of 4 blocks for a subtitle text: index number, start time, “–>”, end time, text and an empty line. Here is a sample of what this looks like (the beginning of Steve Jobs’ famous speech):

1
00:00:00,000 --> 00:00:02,720
 Today, I want to tell you three stories from my life.

2
00:00:03,040 --> 00:00:04,620
 That's it. No big deal.

3
00:00:04,980 --> 00:00:06,160
 Just three stories.

The start and end times are in two digits for hours, minutes, and seconds, followed by the integer part of milliseconds after a comma. It should work fine if you use Whisper’s output, but this app does not support multiple lines for subtitles, so please combine them into one line if that’s the case. When using Whisper, blank lines with the same timestamp may be produced when speech recognition doesn’t work as intended; these are automatically removed when the file is read by this application.

Explanation of the code

From now on, I will explain the actual code and its explanation. I won’t go into much detail about Flet’s basic content, and I’ll proceed in an order that seems easier to understand. It would be helpful if you could open the code in an editor and/or run the app while reading this.

The Flet framework is imported at the beginning of the code as ft.

The last line ft.app(target=main, assets_dir=”assets”) creates the app

This last line is creating the Flet app. With target=main, I’m specifying the main function as the app itself. assets_dir="assets" sets the ‘assets’ folder in the same directory as the code body to be used for storing files such as images that the app will use. If you’re going to build your code as as an executable app, I suggest to name the Flet app’s main file as main.py, the function name inside the code as main, and the folder name as assets, so when building the app, you can simply run flet build macos (for macOS).

ft.app(target=main, assets_dir="assets")

“async def main” function to create a window and add overlay

This is a function that is called when the code is executed. It generates a Page instance, which serves as the foundation for the Flet app. After specifying the window title, initial size, and color theme, it adds an overlay with invisible audio files and dialogs to the page.

At line 806, an instance of AudioSubPlayer is created, and a function called load_audio is passed in which adds audio files to the overlay. The next line appends this function to the page. This allows audio files to be added to the page from within the class.

At lines 810-811, dialogs for opening and saving files are added as an overlay to the page using overlay.extend().

There might be alternative ways to manage overlays, but since adding overlays to a page couldn’t be achieved from within the UserControl class, I used this approach.

Using page.update(), you update (redraw) the page controls. In Flet, if you make any visual changes, updating the relevant control will apply the changes to the GUI. If it’s part of a larger process, you can update it at the end. So, for example, line 798 is unnecessary, my apologies (since I’ve already included line numbers in various places in this post, I won’t remove them prioritizing the text).

See the code:

# Main function that builds window and adds page. Also, adds audio file and dialogs that are invisible as overlay.
async def main(page: ft.Page):
    page.title = 'Speech + Subtitles Player'
    page.window_height = 800
    page.theme_mode=ft.ThemeMode.SYSTEM
    page.update()

    # Appends audio as an overlay to the page.
    async def load_audio():
        page.overlay.append(app.audio1)
        page.update()

    # Creates an instance of AudioSubPlayer class. Passes load_audio for the instance to append audio to the page. 
    app = AudioSubPlayer(load_audio)
    page.add(app)

    # Adds dialog instance methods to the page.
    page.overlay.extend([app.pick_speech_file_dialog, app.pick_text_file_dialog, 
                         app.export_as_srt_dialog, app.export_as_txt_dialog])
    page.update()

The main part of the app, “AudioSubPlayer” class

The main class is a custom control that inherits from UserControl and implements a user-defined control. The `build()` method, which is required by UserControl, is where UI is constructed. So, let’s take a look at its contents first (although this `UserControl` has been deprecated in Flet version 0.21.0, it still works in my local version 0.21.2, so I’ll continue with the explanation). However, please note that there will likely be significant changes before the official major release, and when using a new framework, it’s essential to check the release notes for any breaking changes.

UI layout by “def build(self)”

The block from the line 740 builds the user interface by constructing instances of self.view as an instance of Column. This is the largest hand-drawn diagram and its contents are within this column.

As I wrote the code, I noticed that the layout was becoming increasingly complex. To make it easier to maintain in the future, I focused solely on defining the layout here and writing controls separately. This way, the build() method will be simpler and easier to read.

In Flet, as you write the code for UI components, they will be stacked from top to bottom. Therefore, when you want to place multiple controls side by side, put them inside a Row and define their layout accordingly. For example, in the 748th line, there is a Row that contains a button to open an audio file and text displaying the file name, which will be displayed horizontally.

While coding, I think it's a good idea to try out various properties (such as alignment and color elements) on separate lines separately  like lines 773-778. This way, you can easily add or comment out multiple properties. Once finalized, you can then combine all the properties onto one line like 771.

See the code:

# === BUILD METHOD ===
def build(self):
    self.view = ft.Column(expand=True, controls=[
        ft.Container(content=
            ft.Column(controls=[
                ft.Row(controls=[
                    self.base_dir,
                ]),
                ft.Row(controls=[
                    self.speech_file_button,
                    self.speech_file_name,
                ]),
                ft.Row(controls=[
                    self.text_file_button,
                    self.text_file_name,
                    self.save_button,
                    #self.export_button,
                    self.export_as_srt_button,
                    self.export_as_txt_button,
                ]),
                self.audio_slider,
                ft.Row([
                    self.position_text,
                    self.duration_text,
                ], alignment=ft.MainAxisAlignment.SPACE_BETWEEN),
                ft.Row(controls=[
                    self.rewind_button,
                    self.play_button,
                    self.faster_sw,
                    self.sub_scroller_sw,
                ]),
            ]), expand=False, border_radius=10, border=ft.border.all(1), padding=10, 
        ),
        ft.Container(content=
            self.subs_view,
            border_radius=10,
            border=ft.border.all(1),
            padding=5,
        ),
        ft.Row(controls=[
            ft.Text(text_align=ft.CrossAxisAlignment.START,
                    spans=[ft.TextSpan('© 2024 Peddals.com', url="https://blog.peddals.com")],
                    ), 
            ft.Image(src='in_app_logo_small.png'),
        ],alignment=ft.MainAxisAlignment.SPACE_BETWEEN,
        ),
        ft.Container(content=
            self.notification_bar)
        ],
        )

    return self.view

In the method “def init(self, load_audio)” all controls are defined and initialized.

From line 187 onwards, the initialization part begins. First, class variables are initialized and a function for loading audio files is imported as mentioned earlier. The following lines up to 374 mainly consist of defining and initializing controls. While it would be quite extensive to explain each one individually, I can provide an overview: they define visual properties such as text or icons displayed on the control, along with methods that are called when specific events occur

Typical definition of a Button control

As a common usage example, I will explain the contents of a button control definition for loading a text file.

        # Open text file button
        self.text_file_button = ft.ElevatedButton(
            text='Open SRT/TXT File',
            icon=ft.icons.TEXT_SNIPPET_OUTLINED,
            on_click=self.pre_pick_text_file,
            disabled=True,
            width=210,
        )

First, please understand that these contents serve as initial values. The properties can be changed by other methods; thus, they define the application’s state at startup.

(Line 237) An instance of ft.ElevatedButton is created with the name self.text_file_button (it might be difficult to notice in a dark theme, but it looks like a slightly raised button). Properties and methods are defined within parentheses using commas for separation.

(Line 238) Define button display text using the text property.

(Line 239) Specify an icon to be included in the button using the icon property. The position of the icon is fixed at the left end and cannot be changed. For reference on finding and confirming the name of the icon, please see below column.

(Line 240) Define the method to be called when the on_click event occurs (i.e., when the button is clicked).

(Line 241) At app startup, the button is disabled by setting the disabled property to True. After the audio file has been loaded, set it to False to make the button clickable.

(Line 242) Fix the width of the button to 210 dots.

Although we’re not using for this button, setting the tooltip property allows you to display notification text when you hover your mouse cursor over it.

Note that I’ll come back to this later, but control’s properties can be set by methods or functions by doing like self.text_file_button.disabled = False along with update. For each control, refer to the official documentation for available properties, methods, and events.

Icons can be searched for on this icons browser page. Unfortunately, as of this article's publication, clicking on displayed icons in Safari does not copy their names. You will need to use Chrome or manually enter icon name that appears on hover (Visual Studio Code will autocomplete icon names as well). If you're doing import Flet as ft, use icon=ft.icons.THUMB_UP.

Hover on an icon to find its name if copy won’t work.

Most controls and their properties should be easy to understand, but I found the process of opening or saving file using the FilePicker control wasn’t easy to follow. Therefore, I will explain it separately.

See the code:

    def __init__(self, load_audio):
        super().__init__()
        self.position = 0
        self.duration = 0
        self.isPlaying = False
        self.load_audio = load_audio

        # == Controls ==
        
        # Audio control with default properties
        self.audio1 = ft.Audio(
            src='',
            volume=1,
            balance=0,
            playback_rate=1,
            on_loaded=self.loaded,
            on_position_changed = self.position_changed,
            on_state_changed = self.playback_completed,
        )

        # Path of the audio file
        self.base_dir = ft.Text(value=f"Base Directory: ")

        # Open speech file button
        self.speech_file_button = ft.ElevatedButton(
            text='Open Speech File', 
            icon=ft.icons.RECORD_VOICE_OVER_OUTLINED, 
            width=210,
            on_click=self.pre_pick_speech_file,
        )

        # Speech file picker control
        self.pick_speech_file_dialog = ft.FilePicker(on_result=self.pick_speech_file_result)

        # Speech file name
        self.speech_file_name = ft.Text(value='← Click to open a speech file.')

        # Alert dialog that opens if subtitle was edited but not saved when Open Speech File button is clicked.
        self.speech_save_or_cancel_dialog = ft.AlertDialog(
            modal=True,
            title=ft.Text('Change not saved.'),
            content=ft.Text('Do you want to discard the change?'),
            actions=[
                #ft.TextButton('Save', on_click=self.save_then_open, tooltip='Save then open another file.'),
                ft.TextButton('Open without save', on_click=self.open_speech_without_save, tooltip='Change will be lost.'),
                ft.TextButton('Cancel', on_click=self.close_speech_save_or_cancel_dialog),
            ]
        )

        # Open text file button
        self.text_file_button = ft.ElevatedButton(
            text='Open SRT/TXT File',
            icon=ft.icons.TEXT_SNIPPET_OUTLINED,
            on_click=self.pre_pick_text_file,
            disabled=True,
            width=210,
        )
        
        # Text file picker control
        self.pick_text_file_dialog = ft.FilePicker(on_result=self.pick_text_file_result)

        # Text file name
        self.text_file_name = ft.Text(value='No file selected.')

        # Save button to update edited subtitles. No dialog, it just overwrites current text file.
        self.save_button = ft.ElevatedButton(
            text='Save', 
            icon=ft.icons.SAVE_OUTLINED, 
            tooltip='Update current SRT/TXT file.',
            disabled=True,
            on_click=self.save_clicked
            )
        
        # Export as SRT button which opens a save dialog. Only available when SRT is open because SRT needs timestamp.
        self.export_as_srt_button = ft.ElevatedButton(
            text = 'SRT',
            icon=ft.icons.SAVE_ALT,
            on_click=self.export_as_srt,
            disabled=True,
            tooltip='Export as SRT file.'
        )

        # Export as SRT file picker
        self.export_as_srt_dialog = ft.FilePicker(on_result=self.export_as_srt_result)

        # Export as TXT button which opens a save dialog. TXT has not timestamp, subtitle text only.
        self.export_as_txt_button = ft.ElevatedButton(
            text = 'TXT',
            icon=ft.icons.SAVE_ALT,
            on_click=self.export_as_txt,
            disabled=True,
            tooltip='Export as TXT file.'
        )

        # Export as TXT file picker
        self.export_as_txt_dialog = ft.FilePicker(on_result=self.export_as_txt_result)

        # Export button to open a dialog (not in use)
        self.export_button = ft.ElevatedButton(
            text='Export as...', 
            icon=ft.icons.SAVE_ALT, 
            on_click=self.open_export_dialog,
            disabled=True,
            )
        
        # Export as dialog (not in use)
        self.export_dialog = ft.AlertDialog(
            modal = True,
            title = ft.Text('Export text as...'),
            content = ft.Text('Plesae select a file type.'),
            actions = [
                ft.TextButton('SRT', on_click=self.export_as_srt, tooltip='Subtitles with timestamps'),
                ft.TextButton('TXT', on_click=self.export_as_txt, tooltip='Subtitles only (no timestamps)'),
                #ft.TextButton('CSV', on_click=self.export_csv, tooltip='Comma separated value'),
                # I guess no one needs subtitles in CSV...
                ft.TextButton('Cancel', on_click=self.close_export_dialog),
            ],
            actions_alignment=ft.MainAxisAlignment.SPACE_BETWEEN,
        )
        
        # Alert dialog that opens if subtitle was edited but not saved when Open SRT/TXT File button is clicked.
        self.text_save_or_cancel_dialog = ft.AlertDialog(
            modal=True,
            title=ft.Text('Change not saved.'),
            content=ft.Text('Do you want to discard the change?'),
            actions=[
                #ft.TextButton('Save', on_click=self.save_then_open, tooltip='Save then open another file.'),
                ft.TextButton('Open without save', on_click=self.open_text_without_save, tooltip='Change will be lost.'),
                ft.TextButton('Cancel', on_click=self.close_text_save_or_cancel_dialog),
            ]
        )
        # Audio position slider
        self.audio_slider = ft.Slider(
            min = 0,
            value = int(self.position/10000),
            label = "{value}ms",
            on_change = self.slider_changed,
        )

        # Current playing position and duration of audio file
        self.position_text = ft.Text(value='Current position')
        self.duration_text = ft.Text(value='Duration (hh:mm:ss,nnn)')
        
        # Rewinds 5 seconds
        self.rewind_button = ft.ElevatedButton(
            icon=ft.icons.REPLAY_5,
            text="5 secs",
            tooltip='Rewind 5 secs',
            on_click=self.rewind_clicked,
            disabled=True,
        )

        # Play/Pause button. After loading audio file, this button will always be focused (space/enter to play/pause).
        self.play_button = ft.ElevatedButton(
            icon=ft.icons.PLAY_ARROW,
            text = "Play",
            on_click=self.play_button_clicked,
            disabled=True,
        )

        # 1.5x faster toggle switch
        self.faster_sw = ft.Switch(
            label='1.5x',
            value=False,
            on_change=self.playback_rate,
        )

        # Auto scroll toggle switch
        self.sub_scroller_sw = ft.Switch(
            label='Auto scroll',
            value=True,
        )
                
        # Area to add subtitles as buttons
        self.subs_view = ft.Column(
            spacing = 5,
            height= 400,
            width = float("inf"),
            scroll = ft.ScrollMode.ALWAYS,
            auto_scroll=False,
        )

        # Notification bar control at the bottom
        self.notification_bar=ft.SnackBar(
            content=ft.Text('Speech + Subtitle Player'),
            duration=2000,
            bgcolor=ft.colors.BLUE_GREY_700,
        )

Class Method (Logic) Part

From version 0.21.0, Flet has become an async-first framework, and it is recommended to create functions or methods in the form of async def unless synchronous processing is necessary. This can improve the responsiveness of your app, making it easier to manage without worrying about the details. Personally, I unfortunately started using async (await and control.update_async()) with an earlier version, then upgraded to a later async-first Flet version, and found I had to rewrite a lot of code… anyway, from line 378 to 738, most of the methods are defined as async def, and self.update() is used to update the view.

Below, I’ll describe some of the methods that I’d like to explain.

Process after completion of loading audio file, async def loaded(self, e)

The method from line 378 is called when an audio file has been loaded. It contains various changes to properties, as well as conversion processing for using subtitle files within the app. This is the longest single method in the entire code.

The first 30 lines or so are quite straightforward, setting values for properties of controls such as sliders, text, and buttons. The first three lines do something like this.

self.audio_slider.max = int(await self.audio1.get_duration_async())

We are using the get_duration_async() method of the Audio control to retrieve the duration (in milliseconds) of the audio, and assigning it to the max property of the slider control audio_slider. In Flet version 0.21.2, when using a method that returns a value like this, we
need to use the await ~ <method>_async() syntax, which is different from other parts of the code.

self.duration_text.value = f'{ms_to_hhmmssnnn(self.audio_slider.max)}'

We are taking the milliseconds obtained earlier and converting it to the format “00:00:00,000” for display as a text on the right side of the slider. The function ms_to_hhmmssnnn() (which I got from Copilot) is used for this conversion.

self.audio_slider.divisions = self.audio_slider.max//1000

We are dividing the slider into 1-second intervals (1000 milliseconds) using its divisions property. This is because the slider does not display numerical values unless it is divided. Furthermore, Flet only generates audio playback events at 1-second intervals, so we do this to match that timing. In reality, since the slider’s value cannot be changed from the millisecond display in this app, displaying the numerical value has little significance here.

The next if block (from line 383) handles processing when a subtitle file is found. The create_subtitles() function processes the subtitle file internally and stores it in a list format in self.subtitles. When a text file (.txt) is loaded, it does not contain timestamps, so all timestamps are stored as 55:55:55,555 (20135.55) seconds for simplicity’s sake. This value can be referenced throughout the code where necessary. There’s no specific reason to select fives, but reading a 56-hour audio file is unlikely.

The code from lines 397 to 406 mainly focuses on making buttons for audio playback clickable. In this app, audio playback and pause buttons are usually focused by default, allowing users to control playback with space or enter keys. Initially, I wanted to focus on the Open Speech File button at startup, then switch to the play/pause button once a file is loaded, but it didn’t work out that way. Some leftover code from this attempt remains in lines 398-403.

The code from lines 408 to 433 processes a list of subtitle files already generated, adjusting various settings for both TXT files without timestamps and SRT files with timestamps. For each subtitle line, a button is created. The actual content of the button is created in another
class SubButton(), but here, an instance named sub is assigned and appended to the controls list of the app’s bottom half screen area using self.subs_view.controls.append(sub).

The code from lines 436 to 443 displays a message at the bottom of the screen depending on whether a subtitle file exists. The self.open_notification_bar method is used, which takes only text and simply sends a notification when called. In cases where a subtitle file was not found, it is called with the type set to ‘error’ and displayed for a longer time in an error color.

See the code:

    # Called once audio file is loaded. Enable/disable buttons, create subtitles list, etc.
    async def loaded(self, e):
        self.audio_slider.max = int(await self.audio1.get_duration_async())
        self.duration_text.value = f'{ms_to_hhmmssnnn(self.audio_slider.max)}'
        self.audio_slider.divisions = self.audio_slider.max//60
        # Enables buttons if associated text file exists.
        if self.text_file != 'No Text File.':
            # Call function to create the list of subtitles, self.subtitles.
            self.subtitles = create_subtitles(self.text_file)
            self.save_button.text = 'Save'
            self.save_button.disabled=False
            self.export_button.disabled=False
            self.export_as_srt_button.disabled=False
            self.export_as_txt_button.disabled=False
        # Disable buttons if associated text file does not eixt.
        else:
            self.save_button.disabled=True
            self.export_button.disabled=True
            self.export_as_srt_button.disabled=True
            self.export_as_txt_button.disabled=True
            self.subtitles = []
        self.speech_file_button.autofocus=False
        self.speech_file_button.update()
        self.play_button.disabled=False
        self.play_button.focus()
        self.play_button.autofocus=True
        self.play_button.update()
        self.rewind_button.disabled=False
        self.text_file_button.disabled=False
        self.subs_view.controls.clear()
        
        # Create buttons of subtitles from the list self.subtitles.
        if self.subtitles != []:
            # .txt or .srt file
            for i in range(len(self.subtitles)):
                index = self.subtitles[i][0]
                start_time = self.subtitles[i][1]
                # .txt file (timestap is dummy, 55:55:55,555) disable buttons.
                if self.subtitles[0][1]== 201355555:
                    self.sub_scroller_sw.value=False
                    self.sub_scroller_sw.disabled=True
                    self.export_dialog.actions[0].disabled=True
                    self.export_as_srt_button.disabled=True
                # .srt file
                else:
                    self.sub_scroller_sw.value=True
                    self.sub_scroller_sw.disabled=False
                self.sub_scroller_sw.update()
                end_time = self.subtitles[i][2]
                text = self.subtitles[i][3]
                
                # Create button instance of each subtitle. Include methods and controls for the instance to call or update.
                sub = SubButton(index, start_time, end_time, text, self.sub_time_clicked, self.play_button, 
                                self.save_button, self.subtitles)

                # Add button to the subtitle button area, subs_view.
                self.subs_view.controls.append(sub)

            # Call snackbar to show a notification.
            notification = f'Subtitle file loaded: {os.path.basename(self.text_file)}'
            await self.open_notification_bar(notification)
        
        # No text file found. Call snackbar to show an alert.
        else:
            notification = f'Subtitle file (.srt or .txt) not found.'
            await self.open_notification_bar(notification, type='error')
            print('Subtitle file not found.')

        self.update()

Method when playback position changes, async def position_changed(self, e)

The methods from lines 447 to 454 are called when the playback position of an audio file changes, specifically when the on_position_changed event of self.audio1 occurs. In concrete cases, this will be triggered automatically every second during playback, and also when the user manually moves the slider or clicks on a timestamp in other situations. Let’s take a look at the code.

self.audio_slider.value = e.data
#print("Position:", self.audio_slider.value)
self.position_text.value = ms_to_hhmmssnnn(int(e.data))

The on_position_changed property receives an argument e within a method. The value of e.data is the playback position (elapsed time) in milliseconds, so this value is assigned to the value property of the audio_slider control to update its position. Additionally, the converted value is inserted into the value property of the position_text control, which will display a readable format and appear on the left side of the slider.

if (self.sub_scroller_sw.value == True) and (self.text_file_name.value != 'No Text File.'):
   self.scroll_to(self.audio_slider.value)
self.update()

The code above checks two conditions: the state of the auto-scroll switch for subtitles and whether a subtitle file exists. If no subtitle file is loaded, it displays “No Text File.” and uses this as a flag itself. When both conditions are true, it calls the scroll_to method to scroll the subtitles, passing self.audio_slider.value as an argument. Finally, self.update() updates the playback time of this method itself.

Process when the slider position changes, async def slider_changed(self, e)

The method from lines 457 to 460 is called when the slider position changes, specifically when the on_change method of self.audio_slider control is triggered.

self.audio1.seek(int(self.audio_slider.value))

The seek method of self.audio1 is called with the value of the slider (self.audio_slider.value) to change the playback position. After that, it’s just a matter of updating; changing the audio playback position is extremely simple.

Play button, async def play_button_clicked(self, e) and async def playback_completed(self, e)

The code from lines 463 to 488 handles processing related to the Play button. When an audio file is loaded, during playback, when paused, and after playback has ended, each state uses methods of self.audio1 to control playback or pause through button clicks. Additionally, icons and text are also updated accordingly.

I thought would be possible to get the playing status (e.g., “playing”) from e.data, but unfortunately, it didn’t work out. Instead, I created a class variable self.isPlaying to determine the state. Although the button could have displayed the same content consistently, such as “(Play / Pause)”, I wanted to display icons that change depending on the situation, which also came in handy during debugging when I wanted to see the status.

See the code:

    # Change Play/Pause status and icon when called.
    async def play_button_clicked(self, e):
        self.position = await self.audio1.get_current_position_async()
        if (self.isPlaying == False) and (self.position == 0):
            self.audio1.play()
            self.isPlaying = True
            self.play_button.icon=ft.icons.PAUSE
            self.play_button.text = "Playing"
        elif self.isPlaying == False:
            self.audio1.resume()
            self.isPlaying = True
            self.play_button.icon=ft.icons.PAUSE
            self.play_button.text = "Playing"
        else:
            self.audio1.pause()
            self.isPlaying = False
            self.play_button.icon=ft.icons.PLAY_ARROW
            self.play_button.text = "Paused"
        self.update()
    
    # When audio playback is complete, reset play button and status.
    async def playback_completed(self, e):
        if e.data == "completed":
            self.isPlaying = False 
            self.play_button.icon=ft.icons.PLAY_ARROW
            self.play_button.text = "Play"
        self.update()

Rewind and 1.5x speed, async def rewind_clicked(self, e) and async def playback_rate(self, e)

The code from lines 491 to 507 handles processing for the rewind button and the 1.5x playback speed switch. The rewind function is a simple one that ensures the value doesn’t become negative. The 1.5x playback speed is also straightforward, simply assigning 1.5 to the playback_rate method of the Audio control when the switch is on. Note that after changing the speed, it’s necessary to update the Audio control using await self.audio1.update_async().

As a fundamental principle of app design, I aimed to create a simple and intuitive design that is easy to operate. I also made sure to only add necessary features. The rewind button is one such feature. In my own experience, when editing subtitles, I often forget to pause playback temporarily. Moreover, subtitles typically appear at the top while playing, so it's convenient to have a button that allows me to go back a little bit during playback. If needed, I can click it
multiple times to rewind further. The reason why I didn't use 3 or 6 seconds is simply because there are no icons available for those numbers.
The 1.5x speed switch is designed with the trend of shortening time. I did try using 2x speed, but personally felt it was a bit too extreme, so I settled on 1.5x instead. For iOS and macOS, there's a limitation to playback rates within the range of 0.5 to 2, so you may want to experiment with changing self.audio1.playback_rate = 1.5 depending on your needs or target users.

See the code:

    # When 5 secs button is clicked, rewind 5 seconds.
    async def rewind_clicked(self, e):
        if self.audio_slider.value <= 5*1000:
            self.audio_slider.value = 0
        else:
            self.audio_slider.value -= 5*1000
        self.audio1.seek(int(self.audio_slider.value))
        #print(int(self.audio_slider.value))
        self.update()
    
    # Switch playback rate between normal and 1.5x faster.
    async def playback_rate(self, e):
        if self.faster_sw.value == True:
            self.audio1.playback_rate = 1.5
        else:
            self.audio1.playback_rate = 1
        #print(f'Playback rate: {self.audio1.playback_rate}')
        await self.audio1.update_async()

Timestamp buttons, async def sub_time_clicked(self, start_time)

The code from lines 510 to 514 handles processing when the timestamp button is clicked after loading an SRT file. When the button is clicked, it plays the part of the time corresponding to that timestamp. If playback has been paused, it will resume playback.

The timestamp buttons are generated by another class SubButton. When an instance of this class is created, it passes this method to be called when the button is clicked. The button then receives its own start_time from the jump_clicked() method of the SubButton class and uses the seek method of the Audio control to jump to that time.

Let’s go through the code and its explanation step by step.

# Create button instance of each subtitle. Include methods and controls for the instance to call or update.
sub = SubButton(index, start_time, end_time, text, self.sub_time_clicked, self.play_button, 
                self.save_button, self.subtitles)

The instance creation code section. This method self.sub_time_clicked is being passed as an argument.

# Create button of subtitle text.
class SubButton(ft.UserControl):
    def __init__(self, index, start_time, end_time, text, sub_time_clicked, play_button, save_button, subtitles):
        super().__init__()
        # Parameter of each subtitle.
        self.index = index
        self.start_time = start_time
        self.end_time = end_time
        self.text = text
        # Passed methods and controls to call and update.
        self.sub_time_clicked = sub_time_clicked

The initialization part of another class SubButton that creates a button (only first part). This class is storing objects passed from its parent class as its own object, which are highlighted in this section.

# When timestamp clicked calls AudioSubPlayer.sub_time_clicked to jump to button position.
async def jump_clicked(self, e):
    await self.sub_time_clicked(self.start_time)

This is the method that is called when the timestamp button is clicked as a result of an on_click event. This method uses self.start_time and self.sub_time_clicked to execute a method from its parent class.

And finally, this method plays the audio at the position of start_time.

# When the timestamp is clicked, jump to its position and play if not playing.
async def sub_time_clicked(self, start_time):
    self.audio1.seek(int(start_time))
    if self.isPlaying == False:
        await self.play_button_clicked(start_time)
    self.update()

Even if you understand Python and classes, it took me a long time to figure out how to execute a method from an instance of a class. It wasn't just a simple Google search away, as I struggled to translate my intentions into searchable keywords. Those who are stuck in their ways like GOTO/GOSUB, which are extinct species of knowledge, I highly recommend thoroughly studying Python classes.

Scroll of subtitle buttons, async def scroll_to(self, e)

The code from lines 517 to 525 is scrolling subtitles. This method is only called when a time-stamped SRT file is opened, specifically from the position_changed method. The argument e passed to this method contains the playback position (in milliseconds) of the audio. The class variable self.subtitles is a 2D list where each inner list contains consecutive index number, start times, end times, and text. This method references the index index and end time end_time.

What I wanted to do here was move the subtitles corresponding to the currently playing audio to the top. However, Flet can only retrieve the playback position of an audio file once per second, so it scrolls to the subtitle button with the end time closest to that value if it’s larger than the current playback position. This may not be perfectly synchronized in real-time, but it will ensure that the currently playing subtitles are displayed either at the top or the second position.

Let’s go through the code and explanation step by step.

end_time = [item[2] for item in self.subtitles]

The local variable end_time of type list is assigned with all the end times of the subtitles.

index = min(range(len(end_time)), key=lambda i: abs(end_time[i]-e))

The local variable index is assigned with the position of the subtitle that has the end time closest to the current playback position. The index is an integer starting from 0.

key=str(self.subtitles[index][0])

The local variable key is assigned with the index number from the SRT file converted to a string. The index numbers in the SRT file start from 1 and are not necessarily consecutive, so I’ve added an extra step to consider the possibility of missing index numbers (in reality, after writing this code, I generated the subtitles list inside the app using a code that ensures the indices become consecutive, making key = str(index+1) have the same reslut).

self.subs_view.scroll_to(key=key, duration =1000)

The scroll_to method of the subs_view instance, which is a Column object, is used to scroll to the button with the index number equal to the local variable key, with a smoothness of 1000 milliseconds (1 second). The left-hand side key refers to the property of the scroll_to method, while the right-hand side key is the local variable holding the index number as a string.

I'd like to add two points to this explanation. The first point is that if you have many buttons, the Flet app's performance can become sluggish. In particular, if you have over 300 buttons, the window movement can behave strangely. This is not a problem with CPU or memory usage, but rather a specification-related issue in Flet. If you're planning to create an app that uses many lists, I think it would be better to consider using a different control. When I researched this, I couldn't find any other controls that allow scrolling and on_click event handling, but there may be some workaround.

The second point is related to NumPy usage. In Flet version 0.21.2 on macOS, if you use NumPy in your code (as I did initially at line 520), the built app will crash when run. This is a problem that occurs only when building for macOS. I rewrote my code to avoid this issue, and I've written about it in a separate article. → NumPy issue is resolved by Flet. See this post.

Loading audio file, async def pre_pick_speech_file(self, e), and related processes

From here, I will explain the methods and controls related to loading an audio file. While utilizing OS features makes things easier, it seems that Flet or FilePicker control is not sufficient for implementing “Open File” and “Save File” capabilities. Many additional elements are necessary to achieve this. Specifically, when opening a file, you basically need to do the following:

Create an instance of the dialog control.
Add it to the page.
Create a button that triggers the “Open File” dialog event and place it on the page.
Create a method to receive the file selection event and process it.

In this application, I also created methods to handle the case where changes have been made to the subtitle text and prompt the user to either discard or keep those changes. This resulted in having two methods for handling each type of file (audio and text) separately, although they perform similar operations. In hindsight, it would be better to reuse code by making them more modular, but as it stands, there are separate codes for each. The process of reading and writing files is quite complex and requires a lot of attention to detail, so I found this part of the project to be the most challenging. From now on, I will explain the code in a step-by-step manner, following the actual workflow.

# Speech file picker control
self.pick_speech_file_dialog = ft.FilePicker(on_result=self.pick_speech_file_result)

This is an instance of the FilePicker control, which opens the OS’s “Open File” dialog. When a file is actually selected, the on_result event occurs and calls the self.pick_speech_file_result method.

# Adds dialog instance methods to the page.
page.overlay.extend([app.pick_speech_file_dialog, app.pick_text_file_dialog, 
                     app.export_as_srt_dialog, app.export_as_txt_dialog])

The dialog is added to the page using overlay.extend, which will be used for all file reading and writing operations. This is similar to adding an Audio control, and it’s being done outside of the class in the async def main() method.

# Open speech file button
self.speech_file_button = ft.ElevatedButton(
    text='Open Speech File', 
    icon=ft.icons.RECORD_VOICE_OVER_OUTLINED, 
    width=210,
    on_click=self.pre_pick_speech_file,
)

This is a button that calls self.pre_pick_speech_file when clicked.

# Called once Open Speech File button is clicked to pause playback and check if changes saved.
async def pre_pick_speech_file(self, e):
    if self.isPlaying == True:
        await self.play_button_clicked(e)
    if self.save_button.text == '*Save':
        #print('Save is not done.')
        await self.speech_save_or_cancel()
    else:
        await self.pick_speech_file()

This method is added to perform some processing before actually opening the “Open File” dialog. First, if playback is in progress, it stops. Then, if there are unsaved changes to the subtitles (indicated by an asterisk next to the “Save” button), a prompt dialogue is displayed to ask moving forward without save, and only after that, the method for opening the “Open File” dialog is called. In this case, all method calls require await. To temporarily stop playback, self.play_button_clicked(e) is called with the argument e since it’s required even if it’s not being used.

# Opens a dialog if change is not saved.
async def speech_save_or_cancel(self):
    self.page.dialog = self.speech_save_or_cancel_dialog
    self.speech_save_or_cancel_dialog.open = True
    self.page.update()

This method is called when there are unsaved changes. What’s being done are specifying an instance of AlertDialog (self.speech_save_or_cancel_dialog) as the dialog for the page, and setting its open property to enable displaying the dialog.

# Alert dialog that opens if subtitle was edited but not saved when Open Speech File button is clicked.
self.speech_save_or_cancel_dialog = ft.AlertDialog(
    modal=True,
    title=ft.Text('Change not saved.'),
    content=ft.Text('Do you want to discard the change?'),
    actions=[
         #ft.TextButton('Save', on_click=self.save_then_open, tooltip='Save then open another file.'),
         ft.TextButton('Open without save', on_click=self.open_speech_without_save, tooltip='Change will be lost.'),
         ft.TextButton('Cancel', on_click=self.close_speech_save_or_cancel_dialog),
    ]
)

This is a dialog that opens when there are unsaved changes. It has buttons for “Open without save” and “Cancel”, which allow you to open the file without saving or cancel the operation respectively. Although I wanted to add a button to save here as well, it didn’t work out and the “Save” button remains commented out.

# Closes the above dialog.
async def close_speech_save_or_cancel_dialog(self, e):
    self.speech_save_or_cancel_dialog.open = False
    self.page.update()

This is the cancel processing. It simply sets the open property of the dialog to False, closing it.

# Opens audio file pick.
async def open_speech_without_save(self, e):
    self.speech_save_or_cancel_dialog.open = False
    self.page.update()
    await self.pick_speech_file()

This is a method called when the user selects to open the file without saving in the dialog. It closes the dialog, updates the page, and then calls self.pick_speech_file().

# Opens audio file pick dialog. Only allow compatible extensions.
async def pick_speech_file(self):
    self.pick_speech_file_dialog.pick_files(
        dialog_title='Select a speech (audio) file',
        allow_multiple=False,
        allowed_extensions=['mp3', 'm4a', 'wav', 'mp4', 'aiff', 'aac'],
        file_type=ft.FilePickerFileType.CUSTOM,
    )

Finally, this is the method for opening the “Open File” dialog. This method is used to limit the file types that can be opened by setting two properties: allowed_extensions and file_type=ft.FilePickerFileType.CUSTOM. This method opens the “Open File” dialog using the pick_files() method of the self.pick_speech_file_dialog control, which was previously defined. When a file is selected, the on_result event occurs and calls the self.pick_speech_file_result method. Since this method uses OS functionality, it does not require keeping track of the previously opened folder within the Flet app; when opening a file again, the same folder will be opened.

# Called when audio file pick dialog is closed. If file is selected, call self.check_text_file to load text file.
async def pick_speech_file_result(self, e: ft.FilePickerResultEvent):
    if e.files:
        #print(f'e.files = {e.files}')
        self.speech_file_name.value = ''.join(map(lambda f: f.name, e.files))
        self.speech_file = ''.join(map(lambda f: f.path, e.files))
        #print(f'Full path= {self.speech_file}')
        self.audio1.src = self.speech_file
        self.base_dir.value=f"Directory: {os.path.dirname(self.speech_file)}"
        await self.check_text_file()
        self.update()
        await self.load_audio()

This method takes an argument e in the ft.FilePickerResultEvent, which contains information about the opened file. It extracts the file name f.name and absolute path f.path from e.files. It assigns the file name and path to self.speech_file_name.value for display purposes and self.audio1.src for loading the audio file into self.speech_file. It then calls the async self.check_text_file() method, which checks if file exists, updates the display, and finally loads the audio file using the
load_audio() function.

# Checks if audioFileName.srt or .txt exists to automatically load it.
async def check_text_file(self):
    #print(f'Speech file = {self.speech_file}')
    tmp_file = os.path.splitext(self.speech_file)[0]
    if os.path.exists(tmp_file+'.srt'):
        self.text_file = tmp_file+'.srt'
        self.text_file_name.value = os.path.basename(self.text_file)
    elif os.path.exists(tmp_file+'.txt'):
        self.text_file = tmp_file+'.txt'
        self.text_file_name.value = os.path.basename(self.text_file)
    else:
        self.text_file = self.text_file_name.value = 'No Text File.'
        self.save_button.disabled=True
        self.export_button.disabled=True
        self.sub_scroller_sw.disabled=True
    #print(f'Subtitle file = {self.text_file_name.value}')

This method prepares to read the selected audio file, checking if a file with the same name but with an extension of .srt or .txt exists. If neither exists, it disables buttons such as Save.

After this, the method self.load_audio() on line 801 is called, which adds the audio file to the page. When the loading of the audio file is complete, the event on_loaded is triggered for self.audio1, and then the method self.loaded explained at the beginning is called.

There is no code to evaluate whether the file contents are correct, but the combination of codes is this length. Although understanding the process can make it less complex, going through it in one's head can be quite challenging. When adding code for reading subtitle files, I created a checklist on Smartsheet free version (example capture below) and made progress by coding accordingly. This article will not cover the part related to reading text files because it's doing pretty much the same thing.

async def save_clicked(self, e), called to save and overwrite subtitle files

This method, located on lines 641-651, calls a method to overwrite the open subtitle file with the changed content. The call happens only when changes are made based on the open file type, .srt or .txt.

See the code:

    # Updates current open file.
    async def save_clicked(self, e):
        #print(f'File: {self.text_file}')
        extension = os.path.splitext(self.text_file)[1]
        #print(f'Extension: {extension}')
        if self.save_button.text==('*Save'):
            if extension == '.srt':
                await self.save_as_srt(self.text_file)
            elif extension == '.txt':
                await self.save_as_txt(self.text_file)
            self.save_button.text=('Save')
        self.update()

Overwriting SRT file, async def save_as_srt(self, save_file_name)

This method, located on lines 670-684, overwrites an SRT file. The save_file_name variable contains the absolute path of the file that was opened. The self.subtitles list is formatted for ease of use within the app, so it writes to the file in the format of index number, start time –> end time, and subtitle text with a following blank line (\n). After writing is complete, it sends a notification message to the bottom of the window and updates the display.

See the code:

    # Saves as .srt file.
    async def save_as_srt(self, save_file_name):
        with open(save_file_name, 'w') as srt:
            for i in self.subtitles:
                for j in range(len(i)):
                    if j % 4 == 0:
                        srt.write('%sn' % i[j])
                    elif j % 4 == 1:
                        start = ms_to_hhmmssnnn(int(i[j]))
                        end = ms_to_hhmmssnnn(i[j+1])
                        srt.write(f'{start} --> {end}n')
                    elif j % 4 == 3:
                        srt.write('%snn' % i[j]) 
        notification = f'Subtitle saved as an SRT file: {os.path.basename(save_file_name)}'
        await self.open_notification_bar(notification)
        self.update()

Overwriting TXT file, async def save_as_txt(self, save_file_name)

This method, located on lines 705-713, overwrites a TXT file. Unlike SRT files, which contain additional information such as timestamps and blank lines, the subtitles are simply represented as strings in this format. Therefore, it only writes the string parts of the self.subtitles list to the file. After writing is complete, it sends a message to the bottom of the window and updates the display.

See the code:

    # Saves as .txt file.
    async def save_as_txt(self, save_file_name):
        with open(save_file_name, 'w') as txt:
            for i in self.subtitles:
                for j in range(len(i)):
                    if j % 4 == 3:
                        txt.write('%sn' % i[j]) 
        notification = f'Subtitle saved as a TXT file: {os.path.basename(save_file_name)}'
        await self.open_notification_bar(notification)
        self.update()

Export as SRT/TXT, async def export_as_srt(self, e) and async def export_as_txt(self, e)

The lines 654-667 for exporting as SRT and 687-702 for exporting as TXT could have been merged and simplified, but I was too lazy to do it. When either button is clicked, if a file with the same name already exists, it will suggest a new name by adding the date and time to the filename and open a file save dialog. When a TXT file is open, only export as TXT is enabled since there is no timestamp information (it’s unable to generate a SRT file).

TXT cannot be exported as SRT.

Similarly, when opening a dialog to allow the user to specify the file name and save location, separate code is needed for adding controls, pages, and processing. The flow is almost identical to audio file loading, so I won’t go into details. However, depending on which button was clicked, the final result is writing to a file using either the save_as_srt or save_as_txt method introduced earlier.

See the code to export as SRT:

# Exports current open SRT file as another SRT file.
async def export_as_srt(self, e):
    if os.path.splitext(self.text_file)[1] == '.srt':
        suggested_file_name = os.path.basename(self.text_file).split('.', 1)[0]+'_'+datetime.now().strftime("%Y%m%d%H%M")+'.srt'
    self.export_as_srt_dialog.save_file(
        dialog_title='Export as an SRT file',
        allowed_extensions=['srt'],
        file_name = suggested_file_name,
        file_type=ft.FilePickerFileType.CUSTOM,
    )

# Checks result of Export as SRT File Picker and passes absolute path to self.save_as_srt if exists.
async def export_as_srt_result(self, e: ft.FilePicker.result):
    if e.path:
        await self.save_as_srt(e.path)

See the code to export as TXT:

# Exports current open text file as a TXT file.
async def export_as_txt(self, e):
    if os.path.exists(os.path.splitext(self.text_file)[0]+'.txt'):
        suggested_file_name = os.path.basename(self.text_file).split('.', 1)[0]+'_'+datetime.now().strftime("%Y%m%d%H%M")+'.txt'
    else:
        suggested_file_name = os.path.basename(self.text_file).split('.', 1)[0]+'.txt'
    self.export_as_txt_dialog.save_file(
        dialog_title='Export as a TXT file',
        allowed_extensions=['txt'],
        file_name = suggested_file_name,
        file_type=ft.FilePickerFileType.CUSTOM,
    )

# Checks result of Export as TXT File Picker and passes absolute path to self.save_as_txt if exists.
async def export_as_txt_result(self, e: ft.FilePicker.result):
    if e.path:
        await self.save_as_txt(e.path)

Notification at the bottom, async def open_notification_bar(self, notification, type=’normal’)

This method, located on lines 716-725, displays a notification at the bottom of the window. This feature utilizes the SnackBar control defined at line 370 in Flet, which is displayed only when needed and automatically disappears. Sample notification content:

This method displays a notification with the specified text and type (normal or error). If no type is specified, it defaults to a 2-second notification duration. For error notifications, it displays a red notification on yellow backend with a longer duration of 4 seconds (4000 ms). There are various ways to specify colors, but I used named colors from this page. The text color is specified as a property of the Text control within the SnackBar’s content property, while the notification area color bgcolor is specified as a property of the SnackBar itself – it is not intuitive. After configuring the content, open the notification with open=True, and it disappears after the specified time automatically.

Definition of the control is right below (line 369~), and the method starts from line 715.

    # Notification bar control at the bottom
    self.notification_bar=ft.SnackBar(
        content=ft.Text('Speech + Subtitle Player'),
        duration=2000,
        bgcolor=ft.colors.BLUE_GREY_700,
    )

Upon reviewing it again, I realize that the control definition above already specifies the notification area color, so there shouldn’t be no need to specify it again in the method.

    # Opens notification bar with given text. If type is 'error', shows message longer with caution color.
    async def open_notification_bar(self, notification, type='normal'):
        if type == 'normal':
            self.notification_bar.content=ft.Text(notification, color=ft.colors.LIGHT_BLUE_ACCENT_400)
            self.notification_bar.bgcolor=ft.colors.BLUE_GREY_700
        elif type == 'error':
            self.notification_bar.content=ft.Text(notification, color=ft.colors.RED)
            self.notification_bar.bgcolor=ft.colors.YELLOW
            self.notification_bar.duration=4000
        self.notification_bar.open=True 
        self.notification_bar.update()

Class to generate subtitle buttons, SubButton

The class that creates buttons from subtitles and timestamps is a custom control class that inherits from UserControl and implements user-defined controls. This class was created by modifying the official Flet Tutorial’s “To-Do app” example, so its internal structure is slightly different from the main class. It follows the order of initialization, layout building, button click method, etc.

Initialization method, def init(self, index, start_time, end_time, text, sub_time_clicked, play_button, save_button, subtitles)

When the parent class creates an instance of a button, it passes not only the index number, start time, and text of the subtitle related to display, but also methods of the parent class and the save button. The subtitles list, which holds the loaded subtitles as a 2D list, is passed so that it can be directly manipulated when editing the text. Passing the entire subtitles to each instance is not a good example to follow – it can be the reason of laggy window movement.

See the code:

def __init__(self, index, start_time, end_time, text, sub_time_clicked, play_button, save_button, subtitles):
    super().__init__()
    # Parameter of each subtitle.
    self.index = index
    self.start_time = start_time
    self.end_time = end_time
    self.text = text
    # Passed methods and controls to call and update.
    self.sub_time_clicked = sub_time_clicked
    self.play_button = play_button
    self.save_button = save_button
    self.subtitles = subtitles

Layout of time stamp and subtitle buttons, etc., def build(self)

In lines 97-150, the control instance is generated and initialized, and then returned to the parent class. The first half of the code up to line 123 creates buttons for timestamps, subtitles, and a placeholder for editing mode, and then wraps them together into the display control self.display_view.

The key of self.display_start_time is the index number, which serves as a target specification for scrolling when a timestamp button is clicked.

The if block starting from line 126 checks the type of subtitle file loaded and modifies the tooltip displayed when hovering over the timestamp button.

Lines 132-149 are setting up the editing mode for subtitle text. By default, it is set to visible=False, making it invisible.

The official To-Do app example has edit and delete buttons for each item, but in this application, the delete button is not necessary. Instead, the edit button is replaced with a subtitle itself that serves as a button, making it more intuitive to use. Additionally, by allowing the editing mode to be cancelled, I avoided implementing an undo feature.

Clicking on subtitle button enables editing. Enter key to settle.

See the code:

# === BUILD METHOD ===
def build(self):
    # Start time button
    self.display_start_time = ft.TextButton(text=f"{ms_to_hhmmssnnn(int(self.start_time))}",
                                        # Disable jump button if loaded text is TXT, no timestamp.
                                        disabled=(self.start_time==201355555),
                                        # When enabled, jump to the key when clicked.
                                        key=self.index,
                                        width=130,
                                        on_click=self.jump_clicked,)

    # Subtitle text button in display view. Click to edit.
    self.display_text= ft.TextButton(text=f"{self.text}", 
                                     on_click=self.edit_clicked, 
                                     tooltip='Click to edit')

    # Placeholder of subtitle text button in edit view.
    self.edit_text = ft.TextField(expand=1)

    # Put controls together. Left item is the key=index.
    self.display_view = ft.Row(
        alignment=ft.MainAxisAlignment.START,
        controls=[
            ft.Text(value=self.index, width=30),
            self.display_start_time,
            self.display_text,
        ]
    )

    # Change tool tip of start time button which is only clickable for SRT.
    if self.start_time==201355555:
        self.display_start_time.tooltip='Jump not available'
    else:
        self.display_start_time.tooltip='Click to jump here'

    # Subtitle edit view visible when clicked.
    self.edit_view = ft.Row(
        visible=False,
        #alignment=ft.MainAxisAlignment.SPACE_BETWEEN,
        #vertical_alignment=ft.CrossAxisAlignment.CENTER,
        controls=[
            self.edit_text,
            ft.IconButton(
                icon=ft.icons.DONE_OUTLINE_OUTLINED,
                tooltip='Update Text',
                on_click=self.save_clicked,
            ),
            ft.IconButton(
                icon=ft.icons.CANCEL_OUTLINED,
                tooltip='Close wihout change',
                on_click=self.cancel_clicked,
            )
        ]
    )
    return ft.Column(controls=[self.display_view, self.edit_view])

Subtitle editing mode, async def edit_clicked(self, e)

Lines 155-161 enable the editing mode when a subtitle button is clicked. By calling the focus() method, it immediately allows keyboard input to be made, and sets up the on_submit event to call the self.save_clicked method when the Enter key is pressed.

See the code:

# Opens editable text button with subtitle. Hit enter key or click checkmark to call save_clicked.
async def edit_clicked(self, e):
    self.edit_text.value = self.display_text.text
    self.edit_text.focus()
    self.display_view.visible = False
    self.edit_view.visible = True
    self.edit_text.on_submit = self.save_clicked
    self.update()

Update of subtitle, async def save_clicked(self, e)

Lines 164-172 handle the processing when a check (overwrite) button is clicked or when Enter is pressed, settling the edited subtitle. The save button will have an asterisk (*) indicating that it’s been edited, and the self.subtitles list, which holds the subtitles, is updated with the changed text. Focus is returned to the audio file playback button, allowing for playback and pause control via Space or Enter keys.

See the code:

# Updates edited subtitle, change save button, revert focus back to Play button.
async def save_clicked(self, e):
    self.display_text.text= self.edit_text.value
    self.display_view.visible = True
    self.edit_view.visible = False
    self.save_button.text = '*Save'
    self.subtitles[int(self.index)-1][3]=self.display_text.text
    self.play_button.focus()
    self.save_button.update()
    self.update()

Cancel editing, async def cancel_clicked(self, e)

Lines 175-179 handle the editing cancellation processing when the (×) button is clicked. Although it’s called “cancel”, it simply discards the changed content and ends the editing mode, returning focus to the playback button. Unfortunately, I had wanted to achieve this same behavior using the Esc key, but unfortunately, Flet didn’t have a simple way to do so, so I had to give up on that idea.

See the code:

# When timestamp clicked calls AudioSubPlayer.sub_time_clicked to jump to button position.
async def jump_clicked(self, e):
    await self.sub_time_clicked(self.start_time)

Jump by timestamp button, async def jump_clicked(self, e)

Lines 182-183 pass the start_time of the subtitle to the parent class’s sub_time_clicked method when a timestamp button is clicked, setting up playback from that point. This is necessary to use await because we need to use the seek method of the Audio control, which requires an asynchronous operation.

Functions not related to Flet GUI

Functions that are necessary for the app, but not related to Flet GUI, have been grouped together at the top of the code. Brief explanation of them are:

A function converts milliseconds to a time string in SRT format
A function does the reverse conversion
A function reads and processes subtitle files (TXT or SRT) into a list that can be used within the app

This is all of the explanation of the code.

Things I Thought About While Creating This App and Why I Wrote This Article

Flet is really easy to create a modern design app, and that’s its big charm. Of course, buttons and text that become GUI elements are built-in, as well as audio file processing, sliders, notifications, and dialogs, so you can just roughly arrange them without worrying about the details and create an app quickly. I feel it’s like having a high-quality 3D printer, where Python code becomes real, just like 3D model data turns into a physical object. That’s exciting.

The official documentation is comprehensive and can be well understood, allowing you to use it for your app. Since most of the features are designed to work in a web browser, there are many live examples available that you can try out. It’s also enjoyable to explore and find components to use in your own app by modifying them. By going to this gallery page, you can likely test most of the controls and features, and even check the actual code on GitHub, which is very helpful.

However, Flet is not a mouse-operated GUI creation tool like those that were popular in the distant past. Therefore, all aspects of your app will need to be implemented using Python code. Additionally, the completed app’s interface will be implemented as a web frontend (HTML, JavaScript, CSS). While knowledge of these is not essential, having some understanding of them can make creating a finished product easier. I imagine that someone who has no intention of learning about front-end development might still be able to use Flet effectively by reading 1-2 introductory web front-end books and then understanding the documentation with ease.

It seems that there is an increasing amount of Japanese information about Flet, but when I searched for articles of Flet, most of them were just “trying it out” type of articles. I couldn’t find many articles that actually explained how to create a practical app using Flet. While the official documentation covers everything, I thought it was still not simple enough when you actually use it.

So, at the time when my app “Speech plus Subtitles Player”, was almost complete, I decided to write an article about developing an Flet app. My intention was that there must be people who really need such an article, even if it’s imperfect and has some unnecessary parts. I’m glad finally both the app and this document are done.

For the Python logic parts, I relied heavily on Copilot (free version) to help me write them. It appears that Copilot doesn’t know much about Flet itself, but in terms of Python, I found that Copilot was quite reliable. The Large Language Model (LLM, AI) has made it possible to use code generation tools that can even do debugging together with them. This means that if you have an idea, the difficulty of creating an app has significantly decreased. (by the way, I got a huge help from LLMs for translation of this article from Japanese into English. Last half was almost done by Llama 3 8B Instruct with Ollama.)

It’s a great feeling to see your own idea come to life, even if it’s a simple app. Once built, you can share it with others and get feedback. I highly recommend trying out Flet for creating apps – it’s definitely worth a shot!

Image by Stable Diffusion

It seems to be caused by the model I used, and most of the generated images were of realistic white old men. I replaced the prompt “realistic, masterpiece, best quality” with “cartoon”. From among the drawings I received, I adopted the one that mostly ignored my instructions as the eye-catching image for this article. The model remains the same, but the difference in generated image is significant due to changes in prompts.

Date:
2024-4-8 19:16:06

Model:
realisticVision-v20_split-einsum

Size:
512 x 512

Include in Image:
cartoon, retro future, guy partially gray hair with glasses, white t-shirt, typing keyboard, happy coding!

Exclude from Image:
frame, old, fat, suit

Seed:
2776787021

Steps:
50

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
CPU & Neural Engine

2024-04-212024-04-21

flet build macos now supports NumPy (official bug-fix)

There was a bug in Flet (version 0.21.0 and 0.21.1) that caused built app with NumPy for macOS to crash immediately after launch. I reported this issue on GitHub some time ago, and finally Feodor Fitsner, the author, updated me with a solution. To fix the issue, you simply need to build your Flet app again.

Contents

1 Solution
2 Another way
3 Thanks to the solution
4 Image by Stable Diffusion

Solution

Template for Flet version 0.21.2 and the latest version 0.22.0 (as of April 19, 2024) has been modified, according to the author. Flet itself has not been changed. Therefore, if you are using NumPy with a Flet app on macOS, you should be good to go with these versions. Just rebuild now (flet build macos) and it should work.

If you’re running older version of Flet, upgrade to the latest with pip install --upgrade flet. You can specify a version as well if needed:

pip install flet==0.21.2

I upgraded to 0.22.0 and confirmed my SPS Player app built for macOS with NumPy code didn’t crash by building it again. If not sure your Flet version, try the below command:

pip list|grep flet

# result will be like the below:
flet                  0.21.2
flet-core             0.21.2
flet-runtime          0.21.2

Another way

Adding the below code to your Flet will also resolves the issue, according to the comment in the issue. I won’t bother as long as my codes work with the recent Flet versions though.

import os

os.environ["OPENBLAS_NUM_THREADS"] = "1"

In the related issue you can find more, detailed info.

Thanks to the solution

Flet is open-source software that is available for free, so I didn’t request for update or pressure the author to fix anything. However, I am grateful that they took the time to address this issue and make Flet even better. To contribute to Flet, I continue developing my app with it.

Image by Stable Diffusion

You can see what I wanted from the prompt. The quality of each blog post’s image is vastly different and it’s getting out of hand. Since I got the M2 Max, I’ve been asking for 6 images with 30 steps and then adjusting the step and prompt or model based on the outcome.

Date:
2024-4-20 0:14:30

Model:
fruity-mix_split-einsum_compiled

Size:
512 x 512

Include in Image:
fantazy, realistic painting, a wizard with a magic wand killed misterious creature by fireball

Exclude from Image:

Seed:
2031597071

Steps:
50

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
All

2024-04-032024-04-04

Successful build of Flet app with Audio control

I have successfully built my own app, Speech + Subtitles Player made with Flet for Python by eliminating a line of code that required NumPy. However, adding copyright by template caused building process to fail probably because of the audio control which seems to conflict with template file. Unfortunately, the official document does not mention challenges about NumPy or Audio + template on macOS. If you’re looking for the solution, you found the right place.

Contents

1 (My) Environment
2 Rewrite code to eliminate NumPy
3 Simple build works but it fails by specifying template
4 SOLUTION: Use command options instead of template
5 Some notes about the built app
6 Bonus: Get coding assistance from Copilot (Bing Chat)
7 Image by Stable Diffusion

(My) Environment

The issues and the solutions were found with the followings. Different versions of Flet and/or Flutter might result something different.

Hardware: Mac Studio M2 Max 12-core CPU / 30-core GPU / 32GB RAM
OS: macOS Sonoma 14.3.1
Xcode: 15.3
Python 3.11.7
Flet: 0.21.2
Flutter 3.19.3
Dart: 3.3.1
CocoaPods: 1.15.2

Rewrite code to eliminate NumPy

My previous code used NumPy only to find the closed value of a subtitle’s end time to scroll based on the current position of audio. It’s line #520. By changing it to the line #522, I was able to remove NumPy from my code (GitHub code is updated). Commenting # import numpy as np successfuly built the app by executing flet build macos --include-packages flet_audio now builds app with no issue. I do not notice any delay by making this change.

    # Called when slider position is changed and scroll to subtitle with the nearest end_time.
    async def scroll_to(self, e):
        end_time = [item[2] for item in self.subtitles]
        # Numpy is only used below:
        #index = np.argmin(np.abs(np.array(end_time) - e))
        # Below works without using Numpy:
        index = min(range(len(end_time)), key=lambda i: abs(end_time[i]-e))
        key=str(self.subtitles[index][0])
        self.subs_view.scroll_to(key=key, duration =1000)
        self.update()

Should you use NumPy a lot for a time sensitive app, you need to wait for Flet (flet build) to work with NumPy on Mac (I submitted an issue ). Alternatively, you can use flet pack main.py which is not suggested officially but works (see the official document).

Simple build works but it fails by specifying template

As a next step, I tried to add copyright by specifying the template directory and faced another issue below. It shows lots of complicated version dependencies.

% flet build macos --build-version "1.0.1" --template flet-build-template --include-packages flet_audio
Creating Flutter bootstrap project…OK
Customizing app icons and splash images…OK
Generating app icons…Because flet_audio <0.20.1 depends on flet ^0.20.0 and flet_audio >=0.20.1 <0.20.2 depends on flet ^0.20.1, flet_audio <0.20.2 requires flet ^0.20.0. 
And because flet_audio ^0.20.2 depends on flet ^0.20.2 and flet_audio >=0.21.0 <0.21.1 depends on flet ^0.21.0, flet_audio <0.21.1 requires flet ^0.20.0 or ^0.21.0. 
And because flet_audio >=0.21.1 <0.21.2 depends on flet ^0.21.1 and flet_audio >=0.21.2 depends on flet ^0.21.2, every version of flet_audio requires flet ^0.20.0 or >=0.21.0 <0.22.0.
So, because fletaudioplayback depends on both flet ^0.19.0 and flet_audio any, version solving failed.

You can try the following suggestion to make the pubspec resolve:
* Try upgrading your constraint on flet: dart pub add flet:^0.21.2

Error building Flet app - see the log of failed command above.

To avoid this error, you must forget about the template.

SOLUTION: Use command options instead of template

I know it’s not smart but that’s the only way I found. Trash the template folder and use a couple of options as blow. Hope audio + template will work in the near future as well.

flet build macos --build-version "1.0.1" \
--copyright "Copyright (c) 2024 Peddals.com" \
--product "Speech+SubtitlesPlayer" --include-packages flet_audio

Quick hints: --product can name your app. If you don’t use this option, parent folder name will be your app name. --include-packages flet_audio is a requirement for an app that uses the Audio control. See this page for the full list of options.

Additional tip regarding your folder structure — when you build an app, remove all unnecessary files and folders. When I first built this app with few large audio files which were in another folder than the assets folder, building process took a long time and built app was 1.7GB in size. After removing the audio files, the app size became 260MB.

Some notes about the built app

When I launched the app in build/macos first time, I saw a white blank window and it took several seconds to draw contents such as buttons, texts, and a logo image. The reason was not the app itself, but an anti-virus software on my Mac that was running so hard consuming 300% of CPU to scan the app. Once it’s done, the app started running quickly. While playing audio, CPU usage is around 30-40%. I’m not sure why but the splash logo does not appear when the app is launched. Since it does not harm functionalities, I decided not to bother. Everything else works as expected. Nice.

Bonus: Get coding assistance from Copilot (Bing Chat)

To rewrite the function with NumPy to native Python code, I got an assistance from Copilot, formerly Bing Chat free version. You better use paid version of generative AI services for serious production for sure, but the fee version works fine just as long as you understand that a few chat back and forth is needed to get an expected answer. Try giving enough but short requirements and complete in 5 round-trips. Below is an example, actual chat I had with Copilot.

You

can you help me rewrite an existing python code with numpy to python native without numpy? I’ll share a code with you

Copilot

Certainly! Please go ahead and share the existing Python code that uses NumPy, and I’ll help you rewrite it using native Python constructs. 😊

You

end_time is a list containing integers from the smallest to the largest. e is also an integer. import numpy as np. This is the actual code to be rewritten: index = np.argmin(np.abs(np.array(end_time) – e))

Copilot

Certainly! Let’s rewrite the given code snippet that uses NumPy to achieve the same functionality using native Python without relying on NumPy.

The original code snippet calculates the index of the element in the end_time list that is closest to the value of e. We’ll achieve the same result using native Python constructs.

Here’s the rewritten code:

def find_closest_index(end_time, e):
    # Initialize variables to keep track of the minimum difference and index
    min_difference = float('inf')  # Initialize with positive infinity
    closest_index = None

    # Iterate through the end_time list
    for i, time_value in enumerate(end_time):
        # Calculate the absolute difference between the current value and e
        difference = abs(time_value - e)

        # Update the closest index if the current difference is smaller
        if difference < min_difference:
            min_difference = difference
            closest_index = i

    return closest_index

# Example usage
end_time = [1, 3, 6, 9, 12]  # Replace with your actual end_time list
e = 7  # Replace with your actual value of e
result_index = find_closest_index(end_time, e)
print("Closest index:", result_index)

In this rewritten code:

We manually iterate through the end_time list.
We calculate the absolute difference between each value in the list and e.
We keep track of the index corresponding to the smallest difference encountered.

Feel free to adapt this approach to your specific use case, and let me know if you need further assistance! 😊

You

can you rewrite it to a one-liner with lambda?

Copilot

Certainly! Here’s the equivalent one-liner using a lambda function:

find_closest_index = lambda end_time, e: min(range(len(end_time)), key=lambda i: abs(end_time[i] - e))

This lambda function calculates the index of the element in the end_time list that is closest to the value of e. It uses the min() function with a custom key function to find the index with the smallest absolute difference.

Feel free to use this concise one-liner in your code! 😊

Image by Stable Diffusion

This time I needed to exclude few words as initially generated images were all fat old man with big stomach. Face in close-up is detailed enough, but when it’s far Stable Diffusion tends to lose focus. Thumbnail looks fine though.

Date:
2024-3-31 1:55:34

Model:
realisticVision-v20_split-einsum

Size:
512 x 512

Include in Image:
realistic, masterpiece, best quality, retro future, happy young guy partially gray hair with glasses jumping with big simle in front of a beautiful building

Exclude from Image:
frame, old, fat, suit

Seed:
1847693693

Steps:
50

Guidance Scale:
20.0

Scheduler:
DPM-Solver++

ML Compute Unit:
CPU & Neural Engine