Tips to Improve Knowledge: BedRock -2

🔹 1. Data Preparation: Splitting into Train and Test Sets 🔧 Why? To evaluate how well your model generalizes to unseen examples. A 70/30 split is a common practice: • 70% for model training • 30% for testing/evaluation 🧪 Code: from sklearn.model_selection import train_test_split import # Load raw insurance claim dataset with open("claims_dataset.l", "r") as f: records = [.loads(line) for line in f] train_records, test_records = train_test_split(records, test_size=0.3, random_state=42) # Save split datasets with open("claims_train.l", "w") as f: for r in train_records: f.write(.dumps(r) + "\n") with open("claims_test.l", "w") as f: for r in test_records: f.write(.dumps(r) + "\n") ________________________________________ 🔹 2. Model Training & MLflow Integration 🔧 Why? To track experiments, compare performance across runs, and register the best model for deployment. ✅ MLflow Features Used: • mlflow.start_run(): Starts a new experiment run • mlflow.log_param(): Logs hyperparameters (e.g., model name, batch size) • mlflow.log_metric(): Logs evaluation metrics (e.g., BLEU score) • mlflow.pytorch.log_model(): Logs trained model • mlflow.register_model(): Pushes model to Model Registry • mlflow.sagemaker.deploy(): Deploys directly to SageMaker 🧪 Code Summary: with mlflow.start_run(): mlflow.log_param("model", "flan-t5-small") trainer.train() mlflow.log_metric("test_bleu", 0.88) mlflow.pytorch.log_model(model, "model") This makes your training and evaluation auditable, reproducible, and deployable. ________________________________________ 🔹 3. Model Evaluation (on 30% test set) 🔧 Why? To assess your model’s ability to generalize to unseen insurance claims. ✅ Metric Used: • BLEU Score (Bilingual Evaluation Understudy): Measures quality of generated text by comparing it to reference answers. 🧪 Code: from nltk.translate.bleu_score import sentence_bleu total_bleu = 0 for d in test_data: generated = bleu = sentence_bleu([d["answer"].split()], generated.split()) total_bleu += bleu avg_bleu = total_bleu / len(test_data) mlflow.log_metric("test_bleu", avg_bleu) ________________________________________ 🔹 4. MLflow Model Registry + SageMaker Deployment 🔧 Why? To promote models through lifecycle stages (Staging → Production) and simplify model deployment. ✅ Flow: • Register model: mlflow.register_model("runs://model", "insurance-claims-triage-model") • Deploy model: mlflow.sagemaker.deploy(app_name="insurance-claims-genai", ...) This step creates a SageMaker HTTPS endpoint where your model can be invoked with live input. ________________________________________ 🔹 5. AWS Bedrock Knowledge Base (RAG Support) 🔧 Why? To give the model grounded knowledge of policies, rules, and procedures from your S3 documents (PDFs, manuals, SOPs). ✅ How it Works: 1. Upload policy docs (e.g., auto_policy_2022.pdf) to S3 2. Create Bedrock Knowledge Base o Choose vector store (e.g., OpenSearch) o Choose model (Claude, Titan, etc.) 3. Query using invoke_model_with_rag() to retrieve relevant chunks from KB before generation 🧪 Sample Call: response = bedrock_runtime.invoke_model_with_rag( body={ "input": "Customer submitted claim with photo only", "knowledgeBaseId": "kb-id", ... }, modelId="anthropic.claude-v2" ) ________________________________________ 🔹 6. API Gateway + Lambda for Public Access 🔧 Why? To expose your SageMaker model as a public REST API for integration with apps, CRMs, customer portals, etc. ✅ Flow: 1. Create a Lambda function that: o Parses user input o Sends request to SageMaker model o Returns generated response 2. Attach API Gateway to the Lambda o Use Lambda Proxy integration o Enable CORS if needed 🧪 Lambda Code: def lambda_handler(event, context): body = .loads(event["body"]) question = body.get("question") response = runtime.invoke_endpoint( EndpointName=os.environ['ENDPOINT_NAME'], ContentType='application/', Body=.dumps({"inputs": question}) ) result = .loads(response['Body'].read().decode()) return { "statusCode": 200, "body": .dumps({"answer": result[0]['generated_text']}) } ________________________________________ 🔹 7. Final API Call (User View) 🧪 CURL Command: bash curl -X POST https://your-api-id.execute-api.us-east-1.amazonaws.com/prod/triage \ -H "Content-Type: application/" \ -d '{ "question": "Customer submitted claim with photo, no police report. Amount is $3000" }' ✅ Output: { "answer": "Proceed with review. Police report not required under $5000." } ________________________________________ 📊 Summary Table Stage Tool Purpose Data Split sklearn Train/Test separation Model Training HuggingFace Transformers Fine-tune LLM Experiment Tracking MLflow Param, metric, model tracking Model Registry MLflow Lifecycle management Model Deployment SageMaker via MLflow HTTPS inference endpoint Knowledge Base AWS Bedrock RAG grounding from documents API Exposure API Gateway + Lambda Public interface for GenAI

Tips to Improve Knowledge

Thursday, 26 June 2025

BedRock -2

No comments:

Post a Comment