Thursday, 26 June 2025

BedRock -2

๐Ÿ”น 1. Data Preparation: Splitting into Train and Test Sets ๐Ÿ”ง Why? To evaluate how well your model generalizes to unseen examples. A 70/30 split is a common practice: • 70% for model training • 30% for testing/evaluation ๐Ÿงช Code: from sklearn.model_selection import train_test_split import # Load raw insurance claim dataset with open("claims_dataset.l", "r") as f: records = [.loads(line) for line in f] train_records, test_records = train_test_split(records, test_size=0.3, random_state=42) # Save split datasets with open("claims_train.l", "w") as f: for r in train_records: f.write(.dumps(r) + "\n") with open("claims_test.l", "w") as f: for r in test_records: f.write(.dumps(r) + "\n") ________________________________________ ๐Ÿ”น 2. Model Training & MLflow Integration ๐Ÿ”ง Why? To track experiments, compare performance across runs, and register the best model for deployment. ✅ MLflow Features Used: • mlflow.start_run(): Starts a new experiment run • mlflow.log_param(): Logs hyperparameters (e.g., model name, batch size) • mlflow.log_metric(): Logs evaluation metrics (e.g., BLEU score) • mlflow.pytorch.log_model(): Logs trained model • mlflow.register_model(): Pushes model to Model Registry • mlflow.sagemaker.deploy(): Deploys directly to SageMaker ๐Ÿงช Code Summary: with mlflow.start_run(): mlflow.log_param("model", "flan-t5-small") trainer.train() mlflow.log_metric("test_bleu", 0.88) mlflow.pytorch.log_model(model, "model") This makes your training and evaluation auditable, reproducible, and deployable. ________________________________________ ๐Ÿ”น 3. Model Evaluation (on 30% test set) ๐Ÿ”ง Why? To assess your model’s ability to generalize to unseen insurance claims. ✅ Metric Used: • BLEU Score (Bilingual Evaluation Understudy): Measures quality of generated text by comparing it to reference answers. ๐Ÿงช Code: from nltk.translate.bleu_score import sentence_bleu total_bleu = 0 for d in test_data: generated = bleu = sentence_bleu([d["answer"].split()], generated.split()) total_bleu += bleu avg_bleu = total_bleu / len(test_data) mlflow.log_metric("test_bleu", avg_bleu) ________________________________________ ๐Ÿ”น 4. MLflow Model Registry + SageMaker Deployment ๐Ÿ”ง Why? To promote models through lifecycle stages (Staging → Production) and simplify model deployment. ✅ Flow: • Register model: mlflow.register_model("runs://model", "insurance-claims-triage-model") • Deploy model: mlflow.sagemaker.deploy(app_name="insurance-claims-genai", ...) This step creates a SageMaker HTTPS endpoint where your model can be invoked with live input. ________________________________________ ๐Ÿ”น 5. AWS Bedrock Knowledge Base (RAG Support) ๐Ÿ”ง Why? To give the model grounded knowledge of policies, rules, and procedures from your S3 documents (PDFs, manuals, SOPs). ✅ How it Works: 1. Upload policy docs (e.g., auto_policy_2022.pdf) to S3 2. Create Bedrock Knowledge Base o Choose vector store (e.g., OpenSearch) o Choose model (Claude, Titan, etc.) 3. Query using invoke_model_with_rag() to retrieve relevant chunks from KB before generation ๐Ÿงช Sample Call: response = bedrock_runtime.invoke_model_with_rag( body={ "input": "Customer submitted claim with photo only", "knowledgeBaseId": "kb-id", ... }, modelId="anthropic.claude-v2" ) ________________________________________ ๐Ÿ”น 6. API Gateway + Lambda for Public Access ๐Ÿ”ง Why? To expose your SageMaker model as a public REST API for integration with apps, CRMs, customer portals, etc. ✅ Flow: 1. Create a Lambda function that: o Parses user input o Sends request to SageMaker model o Returns generated response 2. Attach API Gateway to the Lambda o Use Lambda Proxy integration o Enable CORS if needed ๐Ÿงช Lambda Code: def lambda_handler(event, context): body = .loads(event["body"]) question = body.get("question") response = runtime.invoke_endpoint( EndpointName=os.environ['ENDPOINT_NAME'], ContentType='application/', Body=.dumps({"inputs": question}) ) result = .loads(response['Body'].read().decode()) return { "statusCode": 200, "body": .dumps({"answer": result[0]['generated_text']}) } ________________________________________ ๐Ÿ”น 7. Final API Call (User View) ๐Ÿงช CURL Command: bash curl -X POST https://your-api-id.execute-api.us-east-1.amazonaws.com/prod/triage \ -H "Content-Type: application/" \ -d '{ "question": "Customer submitted claim with photo, no police report. Amount is $3000" }' ✅ Output: { "answer": "Proceed with review. Police report not required under $5000." } ________________________________________ ๐Ÿ“Š Summary Table Stage Tool Purpose Data Split sklearn Train/Test separation Model Training HuggingFace Transformers Fine-tune LLM Experiment Tracking MLflow Param, metric, model tracking Model Registry MLflow Lifecycle management Model Deployment SageMaker via MLflow HTTPS inference endpoint Knowledge Base AWS Bedrock RAG grounding from documents API Exposure API Gateway + Lambda Public interface for GenAI

No comments:

Post a Comment