Tips to Improve Knowledge: MLFlow

📦 What Is MLflow? MLflow is an open-source platform for managing the complete machine learning lifecycle, including: • Experiment tracking • Model versioning & registry • Reproducible runs • Model deployment Originally developed by Databricks, MLflow works with any ML library (e.g., PyTorch, TensorFlow, Hugging Face), and integrates well with SageMaker, Kubernetes, AzureML, and other MLOps platforms. ________________________________________ 🔄 MLflow Components (4 Key Modules) Component Description ✅ Tracking Logs experiments: parameters, metrics, artifacts 📦 Projects Defines reusable and shareable ML code (optional) 🏷️ Model Registry Manages versions of ML models and stages like "Staging", "Production" 🚀 Model Serving Deploys models locally, on SageMaker, Azure, or Databricks ________________________________________ ✅ Why Use MLflow? (Advantages) 🔍 1. Track Every Experiment Run • Record hyperparameters, datasets used, metrics (BLEU, F1), models trained, and source code. • Know which model version worked best. 📊 2. Compare Performance Across Runs • UI or CLI lets you sort by test accuracy, BLEU, F1, etc. • Easily choose the top-performing run. 🔄 3. Ensure Reproducibility • MLflow logs Python environment, Git SHA, and dependencies. • Anyone can rerun the experiment later with the same results. 🧰 4. Register & Manage Models • Assign versions and lifecycle stages: “Staging”, “Production”, “Archived”. • Automate deployment with CI/CD pipelines. 🚀 5. Deploy Anywhere • You can deploy to: o Local Flask server (mlflow models serve) o AWS SageMaker o Azure ML o Kubernetes ________________________________________ 📘 What Are Experiments in MLflow? An Experiment in MLflow is a logical group of runs for a specific task or problem. ✅ Example: If you are working on a GenAI model for insurance claim triage, your experiment might be: mlflow.set_experiment("insurance-claims-genai") Then every mlflow.start_run() is logged under that experiment. ________________________________________ 💡 What’s Logged in an Experiment? Type Examples Parameters learning_rate, model_name, max_tokens Metrics train_loss, test_bleu, accuracy Artifacts model weights, tokenizer, confusion matrix PNGs Source Code Git SHA, environment, Python packages ________________________________________ 📈 Visualizing Experiments in MLflow UI You can: • See a list of runs under an experiment • Compare BLEU scores, loss curves • Click each run to view logs, artifacts, and code 📍 Useful when trying different: • Prompt engineering strategies • Model sizes (flan-t5, llama2, etc.) • Learning rates • Knowledge base grounding quality ________________________________________ 👨‍💻 Real Example (From Your Use Case) Imagine you're training multiple GenAI models to process insurance claim documents: with mlflow.start_run(): mlflow.log_param("model_name", "flan-t5-small") mlflow.log_param("epochs", 3) mlflow.log_metric("test_bleu", 0.89) mlflow.pytorch.log_model(model, "model") You run this 5 times with different models. Later, you: • Open MLflow UI • Sort runs by test_bleu • Pick the highest BLEU score • Register and deploy that model ________________________________________ 📌 Summary Feature What It Does Why It Matters 🧪 Experiment Tracking Logs params, metrics, artifacts Reproducibility, tuning 📋 Model Registry Versioned model management CI/CD, production rollout 📊 UI & Comparison Side-by-side metrics for multiple runs Better decision-making 🌍 Deployment One-click to SageMaker, local, etc. Fast, scalable inference 🧠 Use Case Fit Fine-tuning LLMs, GenAI, RAG, NLP, CV Especially helpful in complex workflows 🧪 What Is an MLflow Experiment? An MLflow experiment is a collection of runs (training executions) where you test different model configurations to find the best one. Each experiment: • Tracks hyperparameters, metrics, artifacts, and code version • Helps you compare performance across different runs • Can be visualized in the MLflow UI or queried via API ________________________________________ ✅ Insurance GenAI Use Case – MLflow Experiment Examples ✅ Experiment Name: "insurance-claims-genai" You're trying to generate claim triage summaries using different models, prompt styles, or training settings. ________________________________________ 🔁 Example 1: Compare Model Architectures 🔍 Goal: Evaluate which base model performs best for summarizing claims Run model_name test_bleu Status Run 1 flan-t5-small 0.82 OK Run 2 flan-t5-base 0.86 Better Run 3 llama2-7b 0.91 Best python CopyEdit mlflow.set_experiment("insurance-claims-genai") with mlflow.start_run(): mlflow.log_param("model_name", "flan-t5-base") mlflow.log_param("epochs", 3) mlflow.log_param("learning_rate", 5e-5) mlflow.log_metric("test_bleu", 0.86) mlflow.pytorch.log_model(model, "model") ________________________________________ 🔁 Example 2: Evaluate Prompt Engineering Variants 🔍 Goal: Test how different prompt templates affect model output Run prompt_template test_bleu A "question: ... context: ..." 0.78 B "You are a claim adjuster. Q: ..." 0.83 C "Claim details: ... What should I do?" 0.87 python CopyEdit mlflow.set_experiment("insurance-claims-genai") with mlflow.start_run(): mlflow.log_param("prompt_template", "Claim details: ... What should I do?") mlflow.log_param("context_window", 512) mlflow.log_metric("test_bleu", 0.87) mlflow.pytorch.log_model(model, "model") ________________________________________ 🔁 Example 3: Evaluate Retrieval Effectiveness (RAG) 🔍 Goal: Compare impact of Knowledge Base documents on RAG quality Run kb_documents top_k test_bleu Run A Auto policy only 5 0.75 Run B Auto + Fraud detection 5 0.81 Run C Full policy + SOPs 10 0.88 python CopyEdit mlflow.set_experiment("insurance-claims-genai") with mlflow.start_run(): mlflow.log_param("knowledge_base", "Auto + Fraud") mlflow.log_param("top_k_chunks", 5) mlflow.log_metric("test_bleu", 0.81) ________________________________________ 🔁 Example 4: Track Data Version and Token Limit 🔍 Goal: Check how model performs on different data snapshots and token limits Run data_version max_tokens test_bleu A v1-2024-12 256 0.79 B v2-2025-01 512 0.85 C v2-2025-01 1024 0.84 python CopyEdit mlflow.set_experiment("insurance-claims-genai") with mlflow.start_run(): mlflow.log_param("data_version", "v2-2025-01") mlflow.log_param("max_tokens", 512) mlflow.log_metric("test_bleu", 0.85) ________________________________________ 📈 How You Use These Experiments Once you've logged 10–20 of these runs: • Open the MLflow UI • Go to experiment "insurance-claims-genai" • Sort or filter by test_bleu or model_name • Click the best run → Register it → Deploy to SageMaker ________________________________________ 🧠 Pro Interview Tip If they ask "How do you know your GenAI model is improving?", answer: “We track multiple experiments in MLflow, comparing metrics like BLEU, latency, and token usage across different model variants and prompt styles. It gives us visibility into what improves performance and what doesn’t. Once we identify the best run, we register and promote the model into staging or production.” 🔹 1. We Track Multiple Experiments in MLflow You define a single experiment called: mlflow.set_experiment("insurance-claims-genai") You run training with different model architectures, prompts, or dataset versions. ________________________________________ 🧪 Example 1: Run with flan-t5-small and prompt style A with mlflow.start_run(): mlflow.log_param("model", "flan-t5-small") mlflow.log_param("prompt_style", "simple_question_context") mlflow.log_param("max_tokens", 512) # BLEU and latency results from test set mlflow.log_metric("test_bleu", 0.78) mlflow.log_metric("avg_latency_ms", 180) mlflow.log_metric("avg_token_usage", 140) mlflow.pytorch.log_model(model, "model") ________________________________________ 🧪 Example 2: Run with flan-t5-base and prompt style B with mlflow.start_run(): mlflow.log_param("model", "flan-t5-base") mlflow.log_param("prompt_style", "instruction_following") mlflow.log_param("max_tokens", 512) mlflow.log_metric("test_bleu", 0.85) mlflow.log_metric("avg_latency_ms", 210) mlflow.log_metric("avg_token_usage", 175) mlflow.pytorch.log_model(model, "model") ________________________________________ 📊 2. We Compare Metrics like BLEU, Latency, Token Usage After 5–10 runs like these, you open the MLflow Tracking UI and compare the results: Run Model Prompt Style BLEU Latency (ms) Token Usage 1 flan-t5-small simple_question 0.78 180 140 2 flan-t5-base instruction_following 0.85 210 175 3 llama2-7b context_augmented 0.91 250 210 You can sort by BLEU score, filter by model name, or plot BLEU vs. latency. This helps you make data-driven decisions: • Do bigger models improve accuracy? • Does a certain prompt style reduce latency? • Are higher token usage costs justified? ________________________________________ 🚀 3. Once We Identify the Best Run... Say Run #3 (llama2-7b) gives you the best BLEU (0.91) with acceptable latency. You then promote this run. 📌 Register It: from mlflow import register_model register_model( model_uri="runs:/abc1234567890/model", # Run ID from MLflow name="insurance-claims-triage-model" ) 📌 Set Model to "Production": from mlflow.tracking import MlflowClient client = MlflowClient() client.transition_model_version_stage( name="insurance-claims-triage-model", version="3", stage="Production" ) ________________________________________ 🌐 4. Deploy or Serve It Once promoted, you can: • Deploy to SageMaker using: mlflow.sagemaker.deploy( app_name="insurance-claims-genai", model_uri="models:/insurance-claims-triage-model/Production", region_name="us-east-1" ) • Or serve locally for testing: bash CopyEdit mlflow models serve -m models:/insurance-claims-triage-model/Production -p 5000 ________________________________________ 📈 Final Outcome: • You ran 5+ variants (different models/prompt/data) • Tracked everything in MLflow • Compared metrics like BLEU, latency, token cost • Promoted the best run to production • Exposed it via API Gateway + Lambda ________________________________________ ✅ Visual Flow: Train (Run 1-5) ──▶ Log to MLflow ──▶ Compare BLEU/latency ──▶ Register Best ──▶ Promote to Prod ──▶ Deploy to SageMaker ──▶ API Gateway ________________________________________

Tips to Improve Knowledge

Thursday, 26 June 2025

MLFlow

No comments:

Post a Comment