2024 DevOps Lifecycle: Share your expertise on CI/CD, deployment metrics, tech debt, and more for our Feb. Trend Report (+ enter a raffle!).
Kubernetes in the Enterprise: Join our Virtual Roundtable as we dive into Kubernetes over the past year, core usages, and emerging trends.
Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Enhancing Observability With AI/ML
Send Time Optimization
In today's fast-evolving technology landscape, the integration of Artificial Intelligence (AI) into Internet of Things (IoT) systems has become increasingly prevalent. AI-enhanced IoT systems have the potential to revolutionize industries such as healthcare, manufacturing, and smart cities. However, deploying and maintaining these systems can be challenging due to the complexity of the AI models and the need for seamless updates and deployments. This article is tailored for software engineers and explores best practices for implementing Continuous Integration and Continuous Deployment (CI/CD) pipelines for AI-enabled IoT systems, ensuring smooth and efficient operations. Introduction To CI/CD in IoT Systems CI/CD is a software development practice that emphasizes the automated building, testing, and deployment of code changes. While CI/CD has traditionally been associated with web and mobile applications, its principles can be effectively applied to AI-enabled IoT systems. These systems often consist of multiple components, including edge devices, cloud services, and AI models, making CI/CD essential for maintaining reliability and agility. Challenges in AI-Enabled IoT Deployments AI-enabled IoT systems face several unique challenges: Resource Constraints: IoT edge devices often have limited computational resources, making it challenging to deploy resource-intensive AI models. Data Management: IoT systems generate massive amounts of data, and managing this data efficiently is crucial for AI model training and deployment. Model Updates: AI models require periodic updates to improve accuracy or adapt to changing conditions. Deploying these updates seamlessly to edge devices is challenging. Latency Requirements: Some IoT applications demand low-latency processing, necessitating efficient model inference at the edge. Best Practices for CI/CD in AI-Enabled IoT Systems Version Control: Implement version control for all components of your IoT system, including AI models, firmware, and cloud services. Use tools like Git to track changes and collaborate effectively. Create separate repositories for each component, allowing for independent development and testing. Automated Testing: Implement a comprehensive automated testing strategy that covers all aspects of your IoT system. This includes unit tests for firmware, integration tests for AI models, and end-to-end tests for the entire system. Automation ensures that regressions are caught early in the development process. Containerization: Use containerization technologies like Docker to package AI models and application code. Containers provide a consistent environment for deployment across various edge devices and cloud services, simplifying the deployment process. Orchestration: Leverage container orchestration tools like Kubernetes to manage the deployment and scaling of containers across edge devices and cloud infrastructure. Kubernetes ensures high availability and efficient resource utilization. Continuous Integration for AI Models: Set up CI pipelines specifically for AI models. Automate model training, evaluation, and validation. This ensures that updated models are thoroughly tested before deployment, reducing the risk of model-related issues. Edge Device Simulation: Simulate edge devices in your CI/CD environment to validate deployments at scale. This allows you to identify potential issues related to device heterogeneity and resource constraints early in the development cycle. Edge Device Management: Implement device management solutions that facilitate over-the-air (OTA) updates. These solutions should enable remote deployment of firmware updates and AI model updates to edge devices securely and efficiently. Monitoring and Telemetry: Incorporate comprehensive monitoring and telemetry into your IoT system. Use tools like Prometheus and Grafana to collect and visualize performance metrics from edge devices, AI models, and cloud services. This helps detect issues and optimize system performance. Rollback Strategies: Prepare rollback strategies in case a deployment introduces critical issues. Automate the rollback process to quickly revert to a stable version in case of failures, minimizing downtime. Security: Security is paramount in IoT systems. Implement security best practices, including encryption, authentication, and access control, at both the device and cloud levels. Regularly update and patch security vulnerabilities. CI/CD Workflow for AI-Enabled IoT Systems Let's illustrate a CI/CD workflow for AI-enabled IoT systems: Version Control: Developers commit changes to their respective repositories for firmware, AI models, and cloud services. Automated Testing: Automated tests are triggered upon code commits. Unit tests, integration tests, and end-to-end tests are executed to ensure code quality. Containerization: AI models and firmware are containerized using Docker, ensuring consistency across edge devices. Continuous Integration for AI Models: AI models undergo automated training and evaluation. Models that pass predefined criteria are considered for deployment. Device Simulation: Simulated edge devices are used to validate the deployment of containerized applications and AI models. Orchestration: Kubernetes orchestrates the deployment of containers to edge devices and cloud infrastructure based on predefined scaling rules. Monitoring and Telemetry: Performance metrics, logs, and telemetry data are continuously collected and analyzed to identify issues and optimize system performance. Rollback: In case of deployment failures or issues, an automated rollback process is triggered to revert to the previous stable version. Security: Security measures, such as encryption, authentication, and access control, are enforced throughout the system. Case Study: Smart Surveillance System Consider a smart surveillance system that uses AI-enabled cameras for real-time object detection in a smart city. Here's how CI/CD principles can be applied: Version Control: Separate repositories for camera firmware, AI models, and cloud services enable independent development and versioning. Automated Testing: Automated tests ensure that camera firmware, AI models, and cloud services are thoroughly tested before deployment. Containerization: Docker containers package the camera firmware and AI models, allowing for consistent deployment across various camera models. Continuous Integration for AI Models: CI pipelines automate AI model training and evaluation. Models meeting accuracy thresholds are considered for deployment. Device Simulation: Simulated camera devices validate the deployment of containers and models at scale. Orchestration: Kubernetes manages container deployment on cameras and cloud servers, ensuring high availability and efficient resource utilization. Monitoring and Telemetry: Metrics on camera performance, model accuracy, and system health are continuously collected and analyzed. Rollback: Automated rollback mechanisms quickly revert to the previous firmware and model versions in case of deployment issues. Security: Strong encryption and authentication mechanisms protect camera data and communication with the cloud. Conclusion Implementing CI/CD pipelines for AI-enabled IoT systems is essential for ensuring the reliability, scalability, and agility of these complex systems. Software engineers must embrace version control, automated testing, containerization, and orchestration to streamline development and deployment processes. Continuous monitoring, rollback strategies, and robust security measures are critical for maintaining the integrity and security of AI-enabled IoT systems. By adopting these best practices, software engineers can confidently deliver AI-powered IoT solutions that drive innovation across various industries.
This article is intended for data scientists, AI researchers, machine learning engineers, and advanced practitioners in the field of artificial intelligence who have a solid grounding in machine learning concepts, natural language processing, and deep learning architectures. It assumes familiarity with neural network optimization, transformer models, and the challenges of integrating real-time data into generative AI systems. Introduction Retrieval-Augmented Generation (RAG) models have emerged as a compelling solution to augment the generative capabilities of AI with external knowledge sources. These models synergize neural retrieval methods with seq2seq generation models to introduce non-parametric data into the generative process, significantly expanding the potential of AI to handle information-rich tasks. In this article we'll look into a technical exposition of RAG architectures, delve into their operational intricacies, and provide a quick evaluation of their utility in professional settings and an overview of RAG models, highlighting their strengths, limitations, and the computational considerations intrinsic to their deployment. Generative AI has traditionally been constrained by the static knowledge encapsulated within its parameters at the time of training. Retrieval-Augmented Generation models revolutionize this paradigm by leveraging external knowledge sources, providing a conduit for AI models to access and utilize vast repositories of information in real-time. Technical Framework of RAG Models A RAG model functions through an orchestrated two-step process: a retrieval phase followed by a generation phase. The retrieval component, often instantiated by a Dense Passage Retriever (DPR), employs a BERT-like architecture for encoding queries and documents into a shared embedding space. The generation component is typically a Transformer-based seq2seq model that conditions its outputs on the combined embeddings of the input and retrieved documents. The Retriever: Dense Passage Retrieval The retrieval phase is crucial for the RAG architecture. It employs a dense retriever, which is fine-tuned on a dataset of (query, relevant document) pairs. The DPR encodes both queries and documents into vectors in a continuous space, using dual-encoder architecture. Python # Define tokenizers for the question and context encoders question_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base') context_tokenizer = DPRContextEncoderTokenizer.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base') # Encode and retrieve documents question_tokens = question_tokenizer(query, return_tensors='pt') context_tokens = context_tokenizer(list_of_documents, padding=True, truncation=True, return_tensors='pt') # Encode question and context into embeddings question_embeddings = question_encoder(**question_tokens)['pooler_output'] context_embeddings = context_encoder(**context_tokens)['pooler_output'] # Calculate similarities and retrieve top-k documents similarity_scores = torch.matmul(question_embeddings, context_embeddings.T) top_k_indices = similarity_scores.topk(k).indices retrieved_docs = [list_of_documents[index] for index in top_k_indices[0]] The Generator: Seq2Seq Model For the generation phase, RAG employs a seq2seq framework, often instantiated by a model like BART or T5, capable of generating text based on the enriched context provided by retrieved documents. The cross-attention layers are crucial for the model to interweave the input and retrieved content coherently. Python from transformers import BartForConditionalGeneration # Initialize seq2seq generation model seq2seq_model = BartForConditionalGeneration.from_pretrained('facebook/bart-large') # Generate response using the seq2seq model conditioned on the input and retrieved documents input_ids = tokenizer(query, return_tensors='pt').input_ids outputs = seq2seq_model.generate(input_ids, encoder_outputs=document_embeddings) response = tokenizer.decode(outputs[0], skip_special_tokens=True) Performance Optimization and Computational Considerations Training RAG models involves optimizing the dense retriever and the seq2seq generator in tandem. This necessitates backpropagating the loss from the output of the generator through to the retrieval component, a process that can introduce computational complexity and necessitate high-throughput hardware accelerators. Python from torch.nn.functional import cross_entropy # Compute generation loss prediction_scores = seq2seq_model(input_for_generation).logits generation_loss = cross_entropy(prediction_scores.view(-1, tokenizer.vocab_size), labels.view(-1)) # Compute contrastive loss for retrieval # Contrastive loss encourages the correct documents to have higher similarity scores retrieval_loss = contrastive_loss_function(similarity_scores, true_indices) # Combine losses and backpropagate total_loss = generation_loss + retrieval_loss total_loss.backward() optimizer.step() Applications and Implications RAG models have broad implications across a spectrum of applications, from enhancing conversational agents with real-time data fetching capabilities to improving the relevance of content recommendations. They also stand to make significant impacts on the efficiency and accuracy of information synthesis in research and academic settings. Limitations and Ethical Considerations Practically, RAG models contend with computational demands, latency in real-time applications, and the challenge of maintaining up-to-date external databases. Ethically, there are concerns regarding the propagation of biases present in the source databases and the veracity of information being retrieved. Conclusion RAG models represent a significant advancement in generative AI, introducing the capability to harness external knowledge in the generation process. This paper has provided a technical exploration of the RAG framework and has underscored the need for ongoing research into optimizing their performance and ensuring their ethical use. As the field evolves, RAG models stand to redefine the landscape of AI's generative potential, opening new avenues for knowledge-driven applications.
Welcome to the world of machine learning, where computers learn from data and make predictions without explicit programming. At the heart of this technology lies the concept of a “model.” What Is a Model? In traditional programming, we create functions/methods that receive inputs/parameters and return a result based on a formula. For example, imagine a Java method that applies the formula y = 3x + 1. Java public int formula(int x) { return 3 * x + 1; } The above code would return the following data for x and y: x -1 0 1 2 3 4 y -2 1 4 7 10 13 Now, imagine that rather than the formula, you have lots of x and y values. You can create a machine learning model to discover the formula and predict new values. As a real-life example, we can use the facial recognition that happens in the gallery of our phones. We have several inputs (photos) and outputs (people’s names), and the machine learning model is the formula that knows how to recognize people. As you give names to people in the photos, you’re feeding the model with data that is constantly retrained to better recognize those people. Python: The Language of Machine Learning Python has become the de facto language for machine learning. Its vast ecosystem of libraries, including TensorFlow and Keras, makes it a powerhouse for building and training models. If you’re curious about stepping into the world of machine learning, Python is your trusty companion on this journey. Our Model For simplicity, we’ll use the x and y data above to train a model that will know how to predict a y value based on x. Python import tensorflow as tf import numpy as np from tensorflow import keras import os def build_model(): # Create a model that receives 1 input value and returns 1 output value model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])]) # Define the algorithms for learning. You don't need to worry about this for now model.compile(optimizer='sgd', loss='mean_squared_error') return model def train_model(model, xs, ys, epochs=500): # Train the model. Here we're saying the algorithm to try 500 random formulas to find the one that best matches # the input and output data. model.fit(xs, ys, epochs=epochs) def predict_with_model(model, input_data): # Predict using the trained model return model.predict([input_data]) def save_model(model, export_path): # Save the model tf.keras.models.save_model( model, export_path, overwrite=True, include_optimizer=True, save_format=None, signatures=None, options=None ) def main(): # Input data xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float) ys = np.array([-2.0, 1.0, 4.0, 7.0, 10.0, 13.0], dtype=float) # Build the model model = build_model() # Train the model train_model(model, xs, ys) # Predict the value for x = 10. It will print a number very close to 31, like 30.9994 or something prediction = predict_with_model(model, 10.0) print(prediction) # Save the model model_dir = "./model" version = 1 export_path = os.path.join(model_dir, str(version)) print('export_path = {}\n'.format(export_path)) save_model(model, export_path) print('\nSaved model: ' + export_path) if __name__ == "__main__": main() Run the above Python code to create, train, and test the model. It will create the model under the ./model directory. Serving the Model Once you have created the model and have it under the ./model directory, you can serve it as a REST API. To do so, you can use the tensorflow/serving container image: Shell podman run -p 8501:8501 \ --name=tf_serving \ --mount type=bind,source=./model,target=/models/model -e MODEL_NAME=model \ -t tensorflow/serving Consuming the Model Once your container is up and running, you can send a request to make an inference. Run the following command to infer the y value for x = 10: Shell curl -d '{"instances": [[10.0]]}' \ -H "Content-Type: application/json" \ -X POST http://localhost:8501/v1/models/model:predict You should see a result similar to the following: Shell { "predictions": [[30.9971237] ] } That’s all, Folks! You’ve just created, trained, served, and consumed your first machine-learning model. You can find the source code used in this post on GitHub. Feel free to ask any questions in the comments, and stay tuned for more.
Whether it's crafting personalized content or tailoring images to user preferences, the ability to generate visual assets based on a description is quite powerful. But text-to-image conversion typically involves deploying an end-to-end machine learning solution, which is quite resource-intensive. What if this capability was an API call away, thereby making the process simpler and more accessible for developers? This tutorial will walk you through how to use AWS CDK to deploy a Serverless image generation application implemented using AWS Lambda and Amazon Bedrock, which is a fully managed service that makes base models from Amazon and third-party model providers (such as Anthropic, Cohere, and more) accessible through an API. Developers can leverage leading foundation models through a single API while maintaining the flexibility to adopt new models in the future. The solution is deployed as a static website hosted on Amazon S3 accessible via an Amazon CloudFront domain. Users can enter the image description which will be passed on to a Lambda function (via Amazon API Gateway) which in turn will invoke the Stable Diffusion model on Amazon Bedrock to generate the image. The entire solution is built using Go - this includes the Lambda function (using the aws-lambda-go library) as well as the complete solution deployment using AWS CDK. The code is available on GitHub. Prerequisites Before starting this tutorial, you will need the following: An AWS Account (if you don't yet have one, you can create one and set up your environment here) Go (v1.19 or higher) AWS CDK AWS CLI Git Docker Clone this GitHub repository and change it to the right directory: git clone https://github.com/build-on-aws/amazon-bedrock-lambda-image-generation-golang cd amazon-bedrock-lambda-image-generation-golang Deploy the Solution Using AWS CDK To start the deployment, simply invoke cdk deploy. cd cdk export DOCKER_DEFAULT_PLATFORM=linux/amd64 cdk deploy You will see a list of resources that will be created and will need to provide your confirmation to proceed (output shortened for brevity). Bundling asset BedrockLambdaImgeGenWebsiteStack/bedrock-imagegen-s3/Code/Stage... ✨ Synthesis time: 7.84s //.... omitted This deployment will make potentially sensitive changes according to your current security approval level (--require-approval broadening). Please confirm you intend to make the following modifications: //.... omitted Do you wish to deploy these changes (y/n)? y This will start creating the AWS resources required for the application. If you want to see the AWS CloudFormation template which will be used behind the scenes, run cdk synth and check the cdk.out folder. You can keep track of the progress in the terminal or navigate to the AWS console: CloudFormation > Stacks > BedrockLambdaImgeGenWebsiteStack. Once all the resources are created, you can try out the application. You should have: The image generation Lambda function and API Gateway An S3 bucket to host the website's HTML page CloudFront distribution And a few other components (like IAM roles, permissions, S3 Bucket policy, etc.) The deployment can take a bit of time since creating the CloudFront distribution is a time-consuming process. Once complete, you should get a confirmation along with the values for the S3 bucket name, API Gateway URL, and the CloudFront domain name. Update the HTML Page and Copy It to the S3 Bucket Open the index.html file in the GitHub repo, and locate the following text: ENTER_API_GATEWAY_URL. Replace this with the API Gateway URL that you received as the CDK deployment output above. To copy the file to S3, I used the AWS CLI: aws s3 cp index.html s3://<name of the S3 bucket from CDK output> Verify that the file was uploaded: aws s3 ls s3://<name of the S3 bucket from CDK output> Now you are ready to access the website! Verify the Solution Enter the CloudFront domain name in your web browser to navigate to the website. You should see the website with a pre-populated description that can be used as a prompt. Click Generate Image to start the process. After a few seconds, you should see the generated image. Modify the Model Parameters The Stability Diffusion model allows us to refine the generation parameters as per our requirements. The Stability.ai Diffusion models support the following controls: Prompt strength (cfg_scale) controls the image's fidelity to the prompt, with lower values increasing randomness. Generation step (steps) determines the accuracy of the result, with more steps producing more precise images. Seed (seed) sets the initial noise level, allowing for reproducible results when using the same seed and settings. Click Show Configuration to edit these. Max values for cfg_steps and steps are 30 and 150, respectively. Don’t Forget To Clean Up Once you're done, to delete all the services, simply use: cdk destroy #output prompt (choose 'y' to continue) Are you sure you want to delete: BedrockLambdaImgeGenWebsiteStack (y/n)? You were able to set up and try the complete solution. Before we wrap up, let's quickly walk through some of the important parts of the code to get a better understanding of what's going the behind the scenes. Code Walkthrough Since we will only focus on the important bits, a lot of the code (print statements, error handling, etc.) has been omitted for brevity. CDK You can refer to the CDK code here. We start by creating the API Gateway and the S3 bucket. apigw := awscdkapigatewayv2alpha.NewHttpApi(stack, jsii.String("image-gen-http-api"), nil) bucket := awss3.NewBucket(stack, jsii.String("website-s3-bucket"), &awss3.BucketProps{ BlockPublicAccess: awss3.BlockPublicAccess_BLOCK_ALL(), RemovalPolicy: awscdk.RemovalPolicy_DESTROY, AutoDeleteObjects: jsii.Bool(true), }) Then we create the CloudFront Origin Access Identity and grant S3 bucket read permissions to the CloudFront Origin Access Identity principal. Then we create the CloudFront Distribution: Specify the S3 bucket as the origin. Specify the Origin Access Identity that we created before. oai := awscloudfront.NewOriginAccessIdentity(stack, jsii.String("OAI"), nil) bucket.GrantRead(oai.GrantPrincipal(), "*") distribution := awscloudfront.NewDistribution(stack, jsii.String("MyDistribution"), &awscloudfront.DistributionProps{ DefaultBehavior: &awscloudfront.BehaviorOptions{ Origin: awscloudfrontorigins.NewS3Origin(bucket, &awscloudfrontorigins.S3OriginProps{ OriginAccessIdentity: oai, }), }, DefaultRootObject: jsii.String("index.html"), //name of the file in S3 }) Then, we create the image generation Lambda function along with IAM permissions (to the function execution IAM role) to allow it to invoke Bedrock operations. function := awscdklambdagoalpha.NewGoFunction(stack, jsii.String("bedrock-imagegen-s3"), &awscdklambdagoalpha.GoFunctionProps{ Runtime: awslambda.Runtime_GO_1_X(), Entry: jsii.String(functionDir), Timeout: awscdk.Duration_Seconds(jsii.Number(30)), }) function.AddToRolePolicy(awsiam.NewPolicyStatement(&awsiam.PolicyStatementProps{ Actions: jsii.Strings("bedrock:*"), Effect: awsiam.Effect_ALLOW, Resources: jsii.Strings("*"), })) Finally, we configure Lambda function integration with API Gateway, add the HTTP routes, and specify the API Gateway endpoint, S3 bucket name, and CloudFront domain name as CloudFormation outputs. functionIntg := awscdkapigatewayv2integrationsalpha.NewHttpLambdaIntegration(jsii.String("function-integration"), function, nil) apigw.AddRoutes(&awscdkapigatewayv2alpha.AddRoutesOptions{ Path: jsii.String("/"), Methods: &[]awscdkapigatewayv2alpha.HttpMethod{awscdkapigatewayv2alpha.HttpMethod_POST}, Integration: functionIntg}) awscdk.NewCfnOutput(stack, jsii.String("apigw URL"), &awscdk.CfnOutputProps{Value: apigw.Url(), Description: jsii.String("API Gateway endpoint")}) awscdk.NewCfnOutput(stack, jsii.String("cloud front domain name"), &awscdk.CfnOutputProps{Value: distribution.DomainName(), Description: jsii.String("cloud front domain name")}) awscdk.NewCfnOutput(stack, jsii.String("s3 bucket name"), &awscdk.CfnOutputProps{Value: bucket.BucketName(), Description: jsii.String("s3 bucket name")}) Lambda Function You can refer to the Lambda Function code here. In the function handler, we extract the prompt from the HTTP request body and the configuration from the query parameters. Then it's used to call the model using bedrockruntime.InvokeModel function. Note the JSON payload sent to Amazon Bedrock is represented by an instance of the Request struct. The output body returned from the Amazon Bedrock Stability Diffusion model is a JSON payload that is converted into a Response struct that contains the generated image as a base64 string. This is returned as an events.APIGatewayV2HTTPResponse object along with CORS headers. func handler(ctx context.Context, req events.APIGatewayV2HTTPRequest) (events.APIGatewayV2HTTPResponse, error) { prompt := req.Body cfgScaleF, _ := strconv.ParseFloat(req.QueryStringParameters["cfg_scale"], 64) seed, _ := strconv.Atoi(req.QueryStringParameters["seed"]) steps, _ := strconv.Atoi(req.QueryStringParameters["steps"]) payload := Request{ TextPrompts: []TextPrompt{{Text: prompt}, CfgScale: cfgScaleF, Steps: steps, } if seed > 0 { payload.Seed = seed } payloadBytes, err := json.Marshal(payload) output, err := brc.InvokeModel(context.Background(), &bedrockruntime.InvokeModelInput{ Body: payloadBytes, ModelId: aws.String(stableDiffusionXLModelID), ContentType: aws.String("application/json"), }) var resp Response err = json.Unmarshal(output.Body, &resp) image := resp.Artifacts[0].Base64 return events.APIGatewayV2HTTPResponse{ StatusCode: http.StatusOK, Body: image, IsBase64Encoded: false, Headers: map[string]string{ "Access-Control-Allow-Origin": "*", "Access-Control-Allow-Methods": "POST,OPTIONS", }, }, nil } //request/response model type Request struct { TextPrompts []TextPrompt `json:"text_prompts"` CfgScale float64 `json:"cfg_scale"` Steps int `json:"steps"` Seed int `json:"seed"` } type TextPrompt struct { Text string `json:"text"` } type Response struct { Result string `json:"result"` Artifacts []Artifact `json:"artifacts"` } type Artifact struct { Base64 string `json:"base64"` FinishReason string `json:"finishReason"` } Conclusion In this tutorial, you used AWS CDK to deploy a serverless image generation solution that was implemented using Amazon Bedrock and AWS Lambda and was accessed using a static website on S3 via a CloudFront domain. If you are interested in an introductory guide to using the AWS Go SDK and Amazon Bedrock Foundation Models (FMs), check out this blog post. Happy building!
In the rapidly evolving domain of machine learning (ML), the ability to seamlessly package and deploy models is as crucial as the development of the models themselves. Containerization has emerged as the game-changing solution to this, offering a streamlined path from the local development environment to production. Docker, a leading platform in containerization, provides the tools necessary to encapsulate ML applications into portable and scalable containers. This article delves into the step-by-step process of containerizing a simple ML application with Docker, making it accessible to ML practitioners and enthusiasts alike. Whether you're looking to share your ML models with the world or seeking a more efficient deployment strategy, this tutorial is designed to equip you with the fundamental skills to transform your ML workflows using Docker. Docker and Containerization Docker is a powerful platform that has revolutionized the development and distribution of applications by utilizing containerization, a lightweight alternative to full-machine virtualization. Containerization involves encapsulating an application and its environment — dependencies, libraries, and configuration files — into a container, which is a portable and consistent unit of software. This approach ensures that the application runs uniformly and consistently across any infrastructure, from a developer's laptop to a high-compute cloud-based server. Unlike traditional virtual machines that replicate an entire operating system, Docker containers share the host system's kernel, making them much more efficient, fast to start, and less resource-intensive. Docker's simple and straightforward syntax hides the complexity often involved in deployment processes, streamlining the workflow and enabling a DevOps approach to the lifecycle management of the software development process. Tutorial Below is a step-by-step tutorial that will guide you through the process of containerizing a simple ML application using Docker. Setting Up Your Development Environment Before you start, make sure you have Docker installed on your machine. If not, you can download it from the Docker website. Creating a Simple Machine Learning Application For this tutorial, let's create a simple Python application that uses the Scikit-learn library to train a model on the Iris dataset. Create a Project Directory Open your terminal or command prompt and run the following: Shell mkdir ml-docker-app cd ml-docker-app Set up a Python Virtual Environment (Optional, but Recommended) Shell python3 -m venv venv On Windows use venv\Scripts\activate Create a requirements.txt File List the Python packages that your application requires. For our simple ML application: Shell scikit-learn==1.0.2 pandas==1.3.5 Create the Machine Learning Application Script Save the following code into a file named app.py in the ml-docker-app directory: Python from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score import joblib # Load dataset iris = datasets.load_iris() X = iris.data y = iris.target # Split dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # Create a Gaussian Classifier clf = RandomForestClassifier() # Train the model using the training sets clf.fit(X_train, y_train) # Predict the response for test dataset y_pred = clf.predict(X_test) # Model Accuracy, how often is the classifier correct? print(f"Accuracy: {accuracy_score(y_test, y_pred)}") # Save the trained model joblib.dump(clf, 'iris_model.pkl') Install the Dependencies Run the following command to install the dependencies listed in requirements.txt: Shell pip install -r requirements.txt Run Your Application Run your application to make sure it works: Shell python3 app.py You should see the accuracy of the model printed to the console and a file named iris_model.pkl created, which contains the trained model. This script provides an end-to-end flow of a very basic machine learning task: loading data, preprocessing it, training a model, evaluating the model, and then saving the trained model for future use. Containerize the Application With Docker Create a ‘Dockerfile’ In the root of your ml-docker-app directory, create a file named Dockerfile with the following content: Python # Use an official Python runtime as a parent image FROM python:3.9-slim # Set the working directory in the container WORKDIR /usr/src/app # Copy the current directory contents into the container at /usr/src/app COPY . . # Install any needed packages specified in requirements.txt RUN pip install --no-cache-dir -r requirements.txt # Run app.py when the container launches Build the Docker Image Run the following command in your terminal to build the Docker image: Shell docker build -t ml-docker-app . Run the Docker Container Once the image is built, run your application in a Docker container: Shell docker run ml-docker-app If everything is set up correctly, Docker will run your Python script inside a container, and you should see the accuracy of the model outputted to your terminal, just like when you ran the script natively. Tag and Push the Container to DockerHub Log in to Docker Hub from the Command Line Once you have a Docker Hub account, you need to log in through the command line on your local machine. Open your terminal and run: Shell docker login You will be prompted to enter your Docker ID and password. Once logged in successfully, you can push images to your Docker Hub repository. Tag Your Docker Image Before you can push an image to Docker Hub, it must be tagged with your Docker Hub username. If you don’t tag it correctly, Docker will not know where to push the image. Assuming your Docker ID is a username, and you want to name your Docker image ml-docker-app, run: Shell docker tag ml-docker-app username/ml-docker-app This will tag the local ml-docker-app image as username/ml-docker-app, which prepares it to be pushed to your Docker Hub repository. Push the Image to Docker Hub To push the image to Docker Hub, use the docker push command followed by the name of the image you want to push: Shell docker push username/ml-docker-app Docker will upload the image to your Docker Hub repository. Check the Pushed Container Image on Docker Hub You can go to your Docker Hub repository and see the recently pushed image. That's it! You have successfully containerized a simple machine learning application, pushed it to Docker Hub, and made it available to be pulled and run from anywhere.
My most-used Gen AI trick is the summarization of web pages and documents. Combined with semantic search, summarization means I waste very little time searching for the words and ideas I need when I need them. Summarization has become so important that I now use it as I write to ensure that my key points show up in ML summaries. Unfortunately, it’s a double-edged sword: will reliance on deep learning lead to an embarrassing, expensive, or career-ending mistake because the summary missed something, or worse because the summary hallucinated? Fortunately, many years as a technology professional have taught me the value of risk management, and that is the topic of this article: identifying the risks of summarization and the (actually pretty easy) methods of mitigating the risks. Determining the Problem For all of the software development history, we had it pretty easy to verify that our code worked as required. Software and computers are deterministic, finite state automata, i.e., they do what we tell them to do (barring cosmic rays or other sources of Byzantine failure). This made testing for correct behavior simple. Every possible unit test case could be handled by assertEquals(actual, expected), assertTrue, assertSame, assertNotNull, assertTimeout, and assertThrows. Even the trickiest dynamic string methods could be handled by assertTrue(string.Contains(a), string.Contains(b), string.Contains(c) and string.Contains(d). But that was then. We now have large language models, which are fundamentally random systems. Not even the full alphabet of contains(a), contains(b), or contains(c) is up to the task of verifying the correct behavior of Gen AI when the response to an API call can vary by an unknowable degree. Neither JUnit nor Nunit nor PyUnit has assertMoreOrLessOK(actual, expected). And yet, we still have to test these Gen AI APIs and monitor them in production. Once your Gen AI feature is in production, traditional observability methods will not alert you to any potential failure modes described below. So, the problem is how to ensure that the content returned by Gen AI systems are consistent with expectations, and how can we monitor them in production? For that, we have to understand the many failure modes of LLMs. Not only do we have to understand them, we have to be able to explain them to our non-technical colleagues - before there’s a problem. LLM failure modes are unique and present some real challenges to observability. Let me illustrate with a recent example from OpenAI that wasn’t covered in the mainstream news but should have been. Three researchers from Stanford University UC Berkeley had been monitoring ChatGPT to see if it would change over time, and it did. Problem: Just Plain Wrong In one case, the investigators repeatedly asked ChatGPT a simple question: Is 17,077 a prime number? Think step by step and then answer yes or no. ChatGPT responded correctly 98% of the time in March of 2023. Three months later, they repeated the test, but ChatGPT answered incorrectly 87% of the time! It should be noted that OpenAI released a new version of the API on March 14, 2023. Two questions must be answered: Did OpenAI know the new release had problems, and why did they release it? If they didn’t know, then why not? This is just one example of your challenges in monitoring Generative AI. Even if you have full control of the releases, you have to be able to detect outright failures. The researchers have made their code and instructions available on GitHub, which is highly instructive. They have also added some additional materials and an update. This is a great starting point if your use case requires factual accuracy. Problem: General Harms In addition to accuracy, it’s very possible for Generative AI to produce responses with harmful qualities such as bias or toxicity. HELM, the Holistic Evaluation of Language Models, is a living and rapidly growing collection of benchmarks. It can evaluate more than 60 public or open-source LLMs across 42 scenarios, with 59 metrics. It is an excellent starting point for anyone seeking to better understand the risks of language models and the degree to which various vendors are transparent about the risks associated with their products. Both the original paper and code are freely available online. Model Collapse is another potential risk; if it happens, the results will be known far and wide. Mitigation is as simple as ensuring you can return to the previous model. Some researchers claim that ChatGPT and Bard are already heading in that direction. Problem: Model Drift Why should you be concerned about drift? Let me tell you a story. OpenAI is a startup; the one thing a startup needs more than anything else is rapid growth. The user count exploded when ChatGPT was first released in December of 2022. Starting in June of 2023, however, user count started dropping and continued to drop through the summer. Many pundits speculated that this had something to do with student users of ChatGPT taking the summer off, but commentators had no internal data from OpenAI, so speculation was all they could do. Understandably, OpenAI has not released any information on the cause of the drop. Now, imagine that this happens to you. One day, usage stats for your Gen AI feature start dropping. None of the other typical business data points to a potential cause. Only 4% of customers tend to complain, and your complaints haven’t increased. You have implemented excellent API and UX observability; neither response time nor availability shows any problems. What could be causing the drop? Do you have any gaps in your data? Model Drift is the gradual change in the LLM responses due to changes in the data, the language model, or the cultures that provide the training data. The changes in LLM behavior may be hard to detect when looking at individual responses. Data drift refers to changes in the input data model processes over time. Model driftrefers to changes in the model's performance over time after it has been deployed and can result in: Performance degradation: the model's accuracy decreases on the same test set due to data drift. Behavioral drift: the model makes different predictions than originally, even on the same data. However, drift can also refer to concept drift, which leads to models learning outdated or invalid conceptual assumptions, leading to incorrect modeling of the current language. It can cause failures on downstream tasks, like generating appropriate responses to customer messages. And the Risks? So far, the potential problems we have identified are failure and drift in the Generative AI system’s behavior, leading to unexpected outcomes. Unfortunately, It is not yet possible to categorically state what the risks to the business might be because nobody can determine beforehand what the possible range of responses might be with non-deterministic systems. You will have to anticipate the potential risks on a Gen AI use-case-by-use-case basis: is your implementation offering financial advice or responding to customer questions for factual information about your products? LLMs are not deterministic; a statement that, hopefully, means more to you now than it did three minutes ago. This is another challenge you may have when it comes time to help non-technical colleagues understand the potential for trouble. The best thing to say about risk is that all the usual suspects are in play (loss of business reputation, loss of revenue, regulatory violations, security). Fight Fire With Fire The good news is that mitigating the risks of implementing Generative AI can be done with some new observability methods. The bad news is that you have to use machine learning to do it. Fortunately, it’s pretty easy to implement. Unfortunately, you can’t detect drift using your customer prompts - you must use a benchmark dataset. What You’re Not Doing This article is not about detecting drift in a model’s dataset - that is the responsibility of the model's creators, and the work to detect drift is serious data science. If you have someone on staff with a degree in statistics or applied math, you might want to attempt to drift using the method (maximum mean discrepancy) described in this paper: Uncovering Drift In Textual Data: An Unsupervised Method For Detecting And Mitigating Drift In Machine Learning Models What Are You Doing? You are trying to detect drift in a model’s behavior using a relatively small dataset of carefully curated text samples representative of your use case. Like the method above, you will use discrepancy, but not for an entire set. Instead, you will create a baseline collection of prompts and responses, with each prompt-response pair sent to the API 100 times, and then calculate the mean and variance for each prompt. Then, every day or so, you’ll send the same prompts to the Gen AI API and look for excessive variance from the mean. Again, it’s pretty easy to do. Let’s Code! Choose a language model to use when creating embeddings. It should be as close as possible to the model being used by your Gen AI API. You must be able to have complete control over this model’s files, and all of its configurations, and all of the supporting libraries that are used when embeddings are created and when similarity is calculated. This model becomes your reference. The equivalent of the 1 kg sphere of pure Silicon that serves as a global standard of mass. Java Implementation The how-do-I-do-this-in-Java experience for me, a 20-year veteran of Java coding, was painful until I sorted out the examples from Deep Java Learning. Unfortunately, DJL has a very limited list of native language models available compared to Python. Though over-engineered, for example, the Java code is almost as pithy as Python: Setup of the LLM used to create sentence embedding vectors. Code to create the text embedding vectors and compare the semantic similarity between two texts: The function that calculates the semantic similarity. Put It All Together As mentioned earlier, the goal is to be able to detect drift in individual responses. Depending on your use case and the Gen AI API you’re going to use, the number of benchmark prompts, the number of responses that form the baseline, and the rate at which you sample the API will vary. The steps go like this: Create a baseline set of prompts and Gen AI API responses that are strongly representative of your use case: 10, 100, or 1,000. Save these in Table A. Create a baseline set of responses: for each of the prompts, send to the API 10, 50, or 100 times over a few days to a week, and save the text responses. Save these in Table B. Calculate the similarity between the baseline responses: for each baseline response, calculate the similarity between it and the response in Table A. Save these similarity values with each response in Table B. Calculate the mean, variance, and standard deviation of the similarity values in table B and store them in table A. Begin the drift detection runs: perform the same steps as in step 1 every day or so. Save the results in Table C. Calculate the similarity between the responses in Table A at the end of each detection run. When all the similarities have been calculated, look for any outside the original variance. For those responses with excessive variance, review the original prompt, the original response from Table A, and the latest response in Table C. Is there enough of a difference in the meaning of the latest response? If so, your Gen AI API model may be drifting away from what the product owner expects; chat with them about it. Result The data, when collected and charted, should look something like this: The chart shows the result of a benchmark set of 125 prompts sent to the API 100 times over one week - the Baseline samples. The mean similarity for each prompt was calculated and is represented by the points in the Baseline line and mean plot. The latest run of the same 125 benchmark samples was sent to the API yesterday. Their similarity was calculated vs the baseline mean values, the Latest samples. The responses of individual samples that seem to vary quite a bit from the mean are reviewed to see if there is any significant semantic discrepancy with the baseline response. If that happens, review your findings with the product owner. Conclusion Non-deterministic software will continue to be a challenge for engineers to develop, test, and monitor until the day that the big AI brain takes all of our jobs. Until then, I hope I have forewarned and forearmed you with clear explanations and easy methods to keep you smiling during your next Gen AI incident meeting. And, if nothing else, this article should help you to make the case for hiring your own data scientist. If that’s not in the cards, then… math?
Would you leave a Google Staff Research Engineer role just because you want your TV to automatically pause when you get up to get a cup of tea? Actually, how is that even relevant, you might ask. Let’s see what Pete Warden, former Google Staff Research Engineer and now CEO and Founder of Useful Sensors, has to say about that. From Jetpac To Google and TinyML, From Google To AI in a Box Pete Warden wrote the world’s only mustache detection image processing algorithm. He also was the founder and CTO of startup Jetpac. He raised a Series A from Khosla Ventures, built a technical team, and created a unique data product that analyzed the pixel data of over 140 million photos from Instagram and turned them into in-depth guides for more than 5,000 cities around the world. Jetpac was acquired by Google in 2014, and Warden has been a Google Staff Research Engineer from then till March 2022. That’s when he founded Useful Sensors, which he sees as the evolution of the work he’s been doing at Google. Warden was the Technical Lead of the TensorFlow Mobile team, responsible for deep learning on mobile and embedded devices. Warden is sometimes credited as having kickstarted the TinyML subdomain of machine learning. Naturally, much of what he did was based on things others were already working on: “A lot of my contribution has been helping publicize and document a bunch of these engineering practices that have emerged,” Warden said. Either way, TinyML is getting big, and Warden is a big part of it. Tiny machine learning (TinyML) is broadly defined as a fast-growing field of machine learning technologies and applications, including hardware, algorithms, and software capable of performing on-device sensor data analytics at extremely low power, typically in the mW range and below, and hence enabling a variety of always-on use-cases and targeting battery operated devices. Useful Sensors just launched a product called AI in a Box, which it dubs an “offline, private, open source LLM for conversations and more.” Even though it’s not the first product Useful Sensors has created, it’s the first one that’s officially launched. That was a good opportunity to catch up with Warden and talk about what Useful Sensors is working on. Simplicity and the Creepiness Factor While it is true that Warden cited wanting his TV to automatically pause when he gets up to get a cup of tea as part of the reason why he started Useful Sensors, some context is definitely needed. Part of Warden’s motivation for his work on Tensorflow for embedded devices was to see it used in everyday objects. As Warden related, when he went to talk to companies that made light switches or TVs to tell them “all about this wonderful open source code that they could get for free, and all the conferences and documentation and examples and books,” they would hear him out. But then, in the end, they’d usually say something like: “That’s great. But we barely have a software engineering team, let alone a machine learning team. So can you just give us something that gives us a voice interface or tells us when somebody sits down in front of the TV”? That’s quite telling, and producing self-contained AI-enhanced hardware is a valid reason to set out on a new venture. However, that’s not something Google itself could not achieve. Google Pixel, for example, already provides automatic captions running on-device for content that plays on the phone. But there’s something more: privacy and data sovereignty, aka “the creepiness factor.” In a video hosted on the Useful Sensors home page, Warden mentions how, during his tenure at Google, he would often get questions about whether Google was spying on people. Those questions are valid ones, triggered by a widely observed phenomenon: when mentioning the topic XYZ in the vicinity of your phone, you will often get bombarded with ads about XYZ for days on end. Warden, on his part, swears that the code he was working on does not do that. But, he goes on to add, he has no way of proving that because the code is proprietary. Plus, we may add, there’s nothing anyone can say about other parts of Google’s codebase, or other apps for that matter. It’s hard to light-heartedly dismiss such a widely shared experience. Useful Sensors That brings us to the core of it all — what Useful Sensors do and how it’s different. The vision, as Warden put it, is to be able to run machine learning locally and to do it in a private and checkable way. Everything should run locally with no internet connection, so conversations and data are completely secure. No account, setup, or subscription needed. Warden shared that Useful Sensors has already launched the person sensor, a small board that provides an indication of whether there’s a person nearby, as well as a tiny QR code. Both run entirely locally and retail at $10 and $7, respectively, Warden said. But these products have something else in common too: they are aimed at makers, i.e., hobbyists with enough motivation and technical skills to tinker with them, but also at electronics vendors. As Warden shared, Useful Sensors is currently in talks with a number of electronics vendors. Useful Sensors products are being evaluated, and Warden is hopeful that it won’t be too long before they end up being included in devices sold in the market. In fact, that is the audience that holds the greatest promise for Useful Sensors. Its backers also see the potential, apparently, as the company has received $5 million in seed funding already. Warden co-founded Useful Sensors with CTO Manjunath Kudlur, formerly of Cerebras. Kudlur was the compiler team lead at Cerebras as well as one of the founders of Google’s TensorFlow and Nvidia’s CUDA. Warden said Kudlur contributes greatly to things such as accelerating transformer models for Useful Sensors. The team lists a total of 8 people at this point, but if their plans come to fruition, another funding round and growth are well within sight, as per Warden. AI in a Box, the product that Useful Sensors just launched, seems like it was designed to do a number of things. First, it can grow awareness for Useful Sensors, as it’s aimed at makers. As Warden said, people can tinker with the code, but it already it does some useful things out of the box. It can provide live captions, as well as receive voice commands and translate between multiple major languages on the fly. AI in a Box can also help raise some cash for Useful Sensors. But perhaps more importantly, it positions Useful Sensors as an ecosystem provider. This seems like part of the vision for the company, and Warden shared that he’s hoping people will get creative with AI in the Box. In fact, he added, some examples of things people have built with Useful Sensors products already exist on Hackster. Under the Hood AI in a Box features a RockChip 3588S SoC with a NPU. The NPU is a unit specifically designed to accelerate neural networks, and the team was able to leverage it to enable a Large Language Model to run locally. AI in a Box is built on a foundation of open-source models like Whisper and Llama2. In the same vein, the company is releasing all the code to accelerate and control the system under an open-source license. Useful Sensors library for optimized transformer inference on the RockChip NPU is also available. The idea is that transparency should help with security and privacy auditing, with Warden noting he’d be happy to have regulators audit Useful Sensors products. Releasing open-source code will also enable developers to use the system as a base to build their own real-time voice input applications in Python. Warden said that once they were able to get the real-time speech-to-text working, there was lots of choices around which LLMs they could run locally. The team is also looking into doing some of its own fine-tuning, but they’ve been able to get a long way by just providing prompt contexts for interactions. As Warden noted, anybody who’s familiar with LLMs would probably easily recognize what Useful Sensors did. Real-time speech-to-text enables AI in a Box in a box to function as a keyboard too, among other things. By having an LLM in the mix, too, that opens up a range of possibilities. For example, LLMs are known to be able to interface with APIs. Warden mentioned Raspberry Pi as an example that could enable people to control a number of devices using voice commands. AI, Innovation, Empowerment An example of what people are already working on, Warden said, is an actor who is using the company’s person sensor to automate spotlight operation for solo performances. Rather than having to pay someone to operate a spotlight, the actor is hoping it should be possible to automate this. In a way, that’s a perfect metaphor for the double-edged sword that innovation and AI truly are. That may sound like a good idea for the actor, but what about the operator? “That’s that’s a really big question with anything around innovation. If we’re, quote-unquote, making things more efficient, what are the societal impacts of that? A big part of what I’m trying to do is get these technologies into people’s hands so that it’s not just a bunch of engineers who are making these decisions about what we should do. People can try these models for themselves and see, for example, how useful but also how flawed the current generation of LLMs are. I don’t want us technocrats to be the ones making these decisions. I want a well-informed public who are actually able to say — hey, this is what we want”, Warden said. That certainly sounds like a noble aspiration. How compatible it really is with VC backing, a cut-throat competitive landscape dominated by the Googles of the world, the public’s ability to elaborate its own take on things, and administrations’ willingness to take the public into account remains to be seen.
Alluxio, a leading data platform company, recently announced Alluxio Enterprise AI — a new solution purpose-built to accelerate enterprise artificial intelligence (AI) and machine learning (ML) workloads. The Growing Need for AI-Optimized Infrastructure Many organizations are investing in AI to drive digital transformation and gain a competitive advantage. However, legacy data infrastructure often hinders AI adoption due to challenges like: Slow data access and GPU underutilization Fragmented data across siloed on-premise and cloud environments Complex data pipelines that slow down model development Rising infrastructure costs to meet AI workload demands According to Alluxio's Director of Product Management, Adit Madan, "Challenges around low performance, data accessibility, GPU scarcity, complex data engineering, and underutilized resources frequently hinder enterprises' ability to extract value from their AI initiatives." Alluxio Enterprise AI provides an AI-optimized data platform to help overcome these challenges with innovations tailored specifically for machine learning workloads. Key Capabilities To Accelerate AI Workloads Alluxio Enterprise AI delivers key innovations to accelerate AI workloads: Intelligent caching: Alluxio's distributed memory caching feeds data to GPUs at high throughput and low latency to maximize utilization. This tiered storage between memory and disk provides optimized data access. Unified data access: Alluxio offers a single interface for managing workloads across on-premise, multi-cloud, and hybrid environments. This simplifies access by removing silos. Scalable architecture: A decentralized design provides unlimited scalability to manage over 100 billion objects on commodity cloud storage. This scales to meet growing demands. Accelerated pipelines: Alluxio accelerates training by reducing data loading times. It also speeds up deployment through extreme concurrency. Quantifiable Performance Improvements Organizations using Alluxio for AI workloads are seeing major improvements in speed and acceleration: Alibaba runs 80% of its deep learning training on Alluxio, at massive scale with billions of objects. This provides a high-performance foundation for their latest innovations. A generative AI customer improved their time-to-accuracy by 3x when scaling training jobs. Alluxio eliminated storage bottlenecks. Streamlines End-to-End ML Pipelines Alluxio integrates across the full machine-learning pipeline: Simplifies data engineering by enabling reliable, high-speed access to datasets anywhere Caches data in memory during training to maximize GPU utilization and throughput Accelerates deployment by serving models concurrently at scale with low latency This provides a unified data orchestration layer to connect disparate storage systems, AI frameworks, and on-prem/cloud environments. Empowering Data Teams With Optimized Infrastructure Alluxio Enterprise AI benefits various data team roles: Data engineers gain simplified pipelines and data accessibility ML researchers leverage maximum GPU performance to accelerate experiments IT architects have a scalable platform to support AI growth Expert Guidance on AI Infrastructure Trends According to Alluxio's Director of Product Management, Adit Madan, "On the infrastructure side, this crucial junction of leveraging innovation on the hardware and software side is where innovation will happen — making sure we are leveraging GPUs in accelerated compute most efficiently." Recommendations for Starting the AI Journey For enterprises starting with AI, Alluxio recommends beginning with a pilot project focusing on a single use case. Once a successful foundation is built, scale-out with governance, security, and management best practices in place. Prioritize usability and reliability early on. Seek out technologies like Alluxio that simplify infrastructure complexity for data teams. The Future of AI Infrastructure As AI workloads continue rapidly evolving, Alluxio aims to stay at the forefront with optimizations for emerging hardware, frameworks, and use cases. For example, Alluxio is powering innovation in areas like large language models at organizations like Alibaba. As AI models become larger and more complex, optimized data infrastructure becomes even more critical. Accelerating Enterprise AI Adoption In summary, Alluxio Enterprise AI overcomes the key challenges of AI infrastructure — enabling enterprises to simplify, accelerate, and scale AI workloads for faster time-to-value. With an AI-optimized data platform, organizations can modernize infrastructure to maximize the ROI of AI investments and stay competitive. Alluxio's innovations help power the next generation of data-driven applications and unlock the true strategic value of AI.
This is an article from DZone's 2023 Kubernetes in the Enterprise Trend Report.For more: Read the Report Kubernetes streamlines cloud operations by automating key tasks, specifically deploying, scaling, and managing containerized applications. With Kubernetes, you have the ability to group hosts running containers into clusters, simplifying cluster management across public, private, and hybrid cloud environments. AI/ML and Kubernetes work together seamlessly, simplifying the deployment and management of AI/ML applications. Kubernetes offers automatic scaling based on demand and efficient resource allocation, and it ensures high availability and reliability through replication and failover features. As a result, AI/ML workloads can share cluster resources efficiently with fine-grained control. Kubernetes' elasticity adapts to varying workloads and integrates well with CI/CD pipelines for automated deployments. Monitoring and logging tools provide insights into AI/ML performance, while cost-efficient resource management optimizes infrastructure expenses. This partnership streamlines the AI/ML development process, making it agile and cost-effective. Let's see how Kubernetes can join forces with AI/ML. The Intersection of AI/ML and Kubernetes The partnership between AI/ML and Kubernetes empowers organizations to deploy, manage, and scale AI/ML workloads effectively. However, running AI/ML workloads presents several challenges, and Kubernetes addresses those challenges effectively through: Resource management – This allocates and scales CPU and memory resources for AI/ML Pods, preventing contention and ensuring fair distribution. Scalability – Kubernetes adapts to changing AI/ML demands with auto-scaling, dynamically expanding or contracting clusters. Portability – AI/ML models deploy consistently across various environments using Kubernetes' containerization and orchestration. Isolation – Kubernetes isolates AI/ML workloads within namespaces and enforces resource quotas to avoid interference. Data management – Kubernetes simplifies data storage and sharing for AI/ML with persistent volumes. High availability – This guarantees continuous availability through replication, failover, and load balancing. Security – Kubernetes enhances security with features like RBAC and network policies. Monitoring and logging – Kubernetes integrates with monitoring tools like Prometheus and Grafana for real-time AI/ML performance insights. Deployment automation – AI/ML models often require frequent updates. Kubernetes integrates with CI/CD pipelines, automating deployment and ensuring that the latest models are pushed into production seamlessly. Let's look into the real-world use cases to better understand how companies and products can benefit from Kubernetes and AI/ML. REAL-WORLD USE CASES Use Case Examples Recommendation systems Personalized content recommendations in streaming services, e-commerce, social media, and news apps Image and video analysis Automated image and video tagging, object detection, facial recognition, content moderation, and video summarization Natural language processing (NLP) Sentiment analysis, chatbots, language translation, text generation, voice recognition, and content summarization Anomaly detection Identifying unusual patterns in network traffic for cybersecurity, fraud detection, and quality control in manufacturing Healthcare diagnostics Disease detection through medical image analysis, patient data analysis, drug discovery, and personalized treatment plans Autonomous vehicles Self-driving cars use AI/ML for perception, decision-making, route optimization, and collision avoidance Financial fraud detection Detecting fraudulent transactions in real-time to prevent financial losses and protect customer data Energy management Optimizing energy consumption in buildings and industrial facilities for cost savings and environmental sustainability Customer support AI-powered chatbots, virtual assistants, and sentiment analysis for automated customer support, inquiries, and feedback analysis Supply chain optimization Inventory management, demand forecasting, and route optimization for efficient logistics and supply chain operations Agriculture and farming Crop monitoring, precision agriculture, pest detection, and yield prediction for sustainable farming practices Language understanding Advanced language models for understanding and generating human-like text, enabling content generation and context-aware applications Medical research Drug discovery, genomics analysis, disease modeling, and clinical trial optimization to accelerate medical advancements Table 1 Example: Implementing Kubernetes and AI/ML As an example, let's introduce a real-world scenario: a medical research system. The main purpose is to investigate and find the cause of Parkinson's disease. The system analyzes graphics (tomography data and images) and personal patient data (which allows the use of the data). The following is a simplified, high-level example: Figure 1: Parkinson's disease medical research architecture The architecture contains the following steps and components: Data collection – gathering various data types, including structured, unstructured, and semi-structured data like logs, files, and media, in Azure Data Lake Storage Gen2 Data processing and analysis – utilizing Azure Synapse Analytics, powered by Apache Spark, to clean, transform, and analyze the collected datasets Machine learning model creation and training – employing Azure Machine Learning, integrated with Jupyter notebooks, for creating and training ML models Security and authentication – ensuring data and ML workload security and authentication through the Key Cloak framework and Azure Key Vault Container management – managing containers using Azure Container Registry Deployment and management – using Azure Kubernetes Services to handle ML model deployment, with management facilitated through Azure VNets and Azure Load Balancer Model performance evaluation – assessing model performance using log metrics and monitoring provided by Azure Monitor Model retraining – retraining models as required with Azure Machine Learning Now, let's examine security and how it lives in Kubernetes and AI/ML. Data Analysis and Security in Kubernetes In Kubernetes, data analysis involves processing and extracting insights from large datasets using containerized applications. Kubernetes simplifies data orchestration, ensuring data is available where and when needed. This is essential for machine learning, batch processing, and real-time analytics tasks. Kubernetes ML analyses require a strong security foundation, and robust security practices are essential to safeguard data in AI/ML and Kubernetes environments. This includes data encryption at rest and in transit, access control mechanisms, regular security audits, and monitoring for anomalies. Additionally, Kubernetes offers features like role-based access control (RBAC) and network policies to restrict unauthorized access. To summarize, here is an AL/ML for Kubernetes security checklist: Access control Set RBAC for user permissions Create dedicated service accounts for ML workloads Apply network policies to control communication Image security Only allow trusted container images Keep container images regularly updated and patched Secrets management Securely store and manage sensitive data (Secrets) Implement regular Secret rotation Network security Segment your network for isolation Enforce network policies for Ingress and egress traffic Vulnerability scanning Regularly scan container images for vulnerabilities Last but not least, let's look into distributed ML in Kubernetes. Distributed Machine Learning in Kubernetes Security is an important topic; however, selecting the proper distributed ML framework allows us to solve many problems. Distributed ML frameworks and Kubernetes provide scalability, security, resource management, and orchestration capabilities essential for efficiently handling the computational demands of training complex ML models on large datasets. Here are a few popular open-source distributed ML frameworks and libraries compatible with Kubernetes: TensorFlow – An open-source ML framework that provides tf.distribute.Strategy for distributed training. Kubernetes can manage TensorFlow tasks across a cluster of containers, enabling distributed training on extensive datasets. PyTorch – Another widely used ML framework that can be employed in a distributed manner within Kubernetes clusters. It facilitates distributed training through tools like PyTorch Lightning and Horovod. Horovod – A distributed training framework, compatible with TensorFlow, PyTorch, and MXNet, that seamlessly integrates with Kubernetes. It allows for the parallelization of training tasks across multiple containers. These are just a few of the many great platforms available. Finally, let's summarize how we can benefit from using AI and Kubernetes in the future. Conclusion In this article, we reviewed real-world use cases spanning various domains, including healthcare, recommendation systems, and medical research. We also went into a practical example that illustrates the application of AI/ML and Kubernetes in a medical research use case. Kubernetes and AI/ML are essential together because Kubernetes provides a robust and flexible platform for deploying, managing, and scaling AI/ML workloads. Kubernetes enables efficient resource utilization, automatic scaling, and fault tolerance, which are critical for handling the resource-intensive and dynamic nature of AI/ML applications. It also promotes containerization, simplifying the packaging and deployment of AI/ML models and ensuring consistent environments across all stages of the development pipeline. Overall, Kubernetes enhances the agility, scalability, and reliability of AI/ML deployments, making it a fundamental tool in modern software infrastructure. This is an article from DZone's 2023 Kubernetes in the Enterprise Trend Report.For more: Read the Report
Apache Kafka has emerged as a clear leader in corporate architecture for moving from data at rest (DB transactions) to event streaming. There are many presentations that explain how Kafka works and how to scale this technology stack (either on-premise or cloud). Building a microservice using ChatGPT to consume messages and enrich, transform, and persist is the next phase of this project. In this example, we will be consuming input from an IoT device (RaspberryPi) which sends a JSON temperature reading every few seconds. Consume a Message As each Kafka event message is produced (and logged), a Kafka microservice consumer is ready to handle each message. I asked ChatGPT to generate some Python code, and it gave me the basics to poll and read from the named "topic." What I got was a pretty good start to consume a topic, key, and JSON payload. The ChatGPT created code to persist this to a database using SQLAlchemy. I then wanted to transform the JSON payload and use API Logic Server (ALS - an open source project on GitHub) rules to unwarp the JSON, validate, calculate, and produce a new set of message payloads based on the source temperature outside a given range. Shell ChatGPT: “design a Python Event Streaming Kafka Consumer interface” Note: ChatGPT selected Confluent Kafka libraries (and using their Docker Kafka container)- you can modify your code to use other Python Kafka libraries. SQLAlchemy Model Using API Logic Server (ALS: a Python open-source platform), we connect to a MySQL database. ALS will read the tables and create an SQLAlchemy ORM model, a react-admin user interface, safrs-JSON Open API (Swagger), and a running REST web service for each ORM endpoint. The new Temperature table will hold the timestamp, the IoT device ID, and the temperature reading. Here we use the ALS command line utility to create the ORM model: Shell ApiLogicServer create --project_name=iot --db_url=mysql+pymysql://root:password@127.0.0.1:3308/iot The API Logic Server generated class used to hold our Temperature values. Python class Temperature(SAFRSBase, Base): __tablename__ = 'Temperature' _s_collection_name = 'Temperature' # type: ignore __bind_key__ = 'None' Id = Column(Integer, primary_key=True) DeviceId = Column(Integer, nullable=False) TempReading = Column(Integer, nullable=False) CreateDT = Column(TIMESTAMP, server_default=text("CURRENT_TIMESTAMP"), nullable=False) KafkaMessageSent = Column(Booelan, default=text("False")) Changes So instead of saving the Kafka JSON consumer message again in a SQL database (and firing rules to do the work), we unwrap the JSON payload (util.row_to_entity) and insert it into the Temperature table instead of saving the JSON payload. We let the declarative rules handle each temperature reading. Python entity = models.Temperature() util.row_to_entity(message_data, entity) session.add(entity) When the consumer receives the message, it will add it to the session which will trigger the commit_event rule (below). Declarative Logic: Produce a Message Using API Logic Server (an automation framework built using SQLAlchemy, Flask, and LogicBank spreadsheet-like rules engine: formula, sum, count, copy, constraint, event, etc), we add a declarative commit_event rule on the ORM entity Temperature. As each message is persisted to the Temperature table, the commit_event rule is called. If the temperature reading exceeds the MAX_TEMP or less than MIN_TEMP, we will send a Kafka message on the topic “TempRangeAlert”. We also add a constraint to make sure we receive data within a normal range (32 -132). We will let another event consumer handle the alert message. Python from confluent_kafka import Producer conf = {'bootstrap.servers': 'localhostd:9092'} producer = Producer(conf) MAX_TEMP = arg.MAX_TEMP or 102 MIN_TEMP = arg.MIN_TTEMP or 78 def produce_message( row: models.KafkaMessage, old_row: models.KafkaMessage, logic_row: LogicRow): if logic_row.isInserted() and row.TempReading > MAX_TEMP: produce(topic="TempRangeAlert", key=row.Id, value=f"The temperature {row.TempReading}F exceeds {MAX_TEMP}F on Device {row.DeviceId}") row.KafkaMessageSent = True if logic_row.isInserted() and row.TempReading < MIN_TEMP: produce(topic="TempRangeAlert", key=row.Id, value=f"The temperature {row.TempReading}F less than {MIN_TEMP}F on Device {row.DeviceId}") row.KafkaMessageSent = True Rules.constraint(models.Temperature, as_expression= lambda row: row.TempReading < 32 or row.TempReading > 132, error_message= "Temperature {row.TempReading} is out of range" Rules.commit_event(models.Temperature, calling=produce_message) Only produce an alert message if the temperature reading is greater than MAX_TEMP or less than MIN_TEMP. Constraint will check the temperature range before calling the commit event (note that rules are always unordered and can be introduced as specifications change). TDD Behave Testing Using TDD (Test Driven Development), we can write a Behave test to insert records directly into the Temperature table and then check the return value KafkaMessageSent. Behave begins with a Feature/Scenario (.feature file). For each scenario, we write a corresponding Python class using Behave decorators. Feature Definition Plain Text Feature: TDD Temperature Example Scenario: Temperature Processing Given A Kafka Message Normal (Temperature) When Transactions normal temperature is submitted Then Check KafkaMessageSent Flag is False Scenario: Temperature Processing Given A Kafka Message Abnormal (Temperature) When Transactions abnormal temperature is submitted Then Check KafkaMessageSent Flag is True TDD Python Class Python from behave import * import safrs db = safrs.DB session = db.session def insertTemperature(temp:int) -> bool: entity = model.Temperature() entity.TempReading = temp entity.DeviceId = 'local_behave_test' session.add(entity) return entity.KafkaMessageSent @given('A Kafka Message Normal (Temperature)') def step_impl(context): context.temp = 76 assert True @when('Transactions normal temperature is submitted') def step_impl(context): context.response_text = insertTemperature(context.temp) @then('Check KafkaMessageSent Flag is False') def step_impl(context): assert context.response_text == False Summary Using ChatGPT to generate the Kafka message code for both the Consumer and Producer seems like a good starting point. Install Confluent Docker for Kafka. Using API Logic Server for the declarative logic rules allows us to add formulas, constraints, and events to the normal flow of transactions into our SQL database and produce (and transform) new Kafka messages is a great combination. ChatGPT and declarative logic is the next level of "paired programming."
Tuhin Chattopadhyay
CEO at Tuhin AI Advisory and Professor of Practice at JAGSoM
Thomas Jardinet
IT Architect,
Rhapsodies Conseil
Sibanjan Das
Zone Leader,
DZone
Tim Spann
Principal Developer Advocate,
Cloudera