Quantization vs Pruning vs Knowledge Distillation

In the LLM world, when it comes to inference, there are mostly two major requirements from a well run and a well built LLM system:

High Accuracy vis-a-vis hence more usage
Low latency vis-a-vis hence low costs

And these two requirements are the part of a tradeoff that plagues almost all ML models/systems. Most accurate LLMs are highly complex multi-layer transformer models trained on trillions of tokens. Due to this, during inference time, billions of computations are required to generate a single token. Hence, it eats up the computational resources. In order to solve this problem, there are three common techniques:

Quantization:
Pruning:
Distillation: