Deploying Defog On-Prem for the Enterprise

Pre-Requisites

Deploying on GPU

Our text to SQL model requires a GPU with a minimum of 24GB VRAM to run. A single GPU is sufficient for most use cases, and any of the following GPUs are recommended if you do not have a latency sensitive use case:

NVIDIA RTX4090 (if deploying on your own physical hardware)
NVIDIA A10 (if you are deploying on a cloud provider)
- AWS: g5.2xlarge (opens in a new tab)
- Azure: Standard_NV36ads_A10_v5 (opens in a new tab)

These will provide the best performance for the cost. You can expect a median latency of 5-6 seconds for query generation with these GPUs.

If you have a latency sensitive use case, we recommend using an H100 GPU. With these, you can expect a median latency of 2-3 seconds for query generation.

Deploying on CPU

We do not recommend deploying on CPU, as the latency for query generation will be significantly higher. However, if you must deploy on CPU, we recommend using a machine with at least 32 cores and 64GB of RAM. You can expect a median latency of 30-40 seconds for query generation with these specs.

Overview Architecture