1 min read

Link: Better Siri is coming: what Apple’s research says about its AI plans

In a paper called “LLM in a flash: Efficient Large Language Model Inference with Limited Memory” (all these papers have really boring titles but are really interesting, I promise!), researchers devised a system for storing a model’s data, which is usually stored on your device’s RAM, on the SSD instead. “We have demonstrated the ability to run LLMs up to twice the size of available DRAM [on the SSD],” the researchers wrote, “achieving an acceleration in inference speed by 4-5x compared to traditional loading methods in CPU, and 20-25x in GPU.” By taking advantage of the most inexpensive and available storage on your device, they found, the models can run faster and more efficiently. #

--

Yoooo, this is a quick note on a link that made me go, WTF? Find all past links here.