2024 DevOps Lifecycle: Share your expertise on CI/CD, deployment metrics, tech debt, and more for our Feb. Trend Report (+ enter a raffle!).
Kubernetes in the Enterprise: Join our Virtual Roundtable as we dive into Kubernetes over the past year, core usages, and emerging trends.
IoT, or the Internet of Things, is a technological field that makes it possible for users to connect devices and systems and exchange data over the internet. Through DZone's IoT resources, you'll learn about smart devices, sensors, networks, edge computing, and many other technologies — including those that are now part of the average person's daily life.
Continuous Integration and Continuous Deployment (CI/CD) for AI-Enabled IoT Systems
A Complete Guide To IoT Development Boards
The Akenza IoT platform, on its own, excels in collecting and managing data from a myriad of IoT devices. However, it is integrations with other systems, such as enterprise resource planning (ERP), customer relationship management (CRM) platforms, workflow management or environmental monitoring tools that enable a complete view of the entire organizational landscape. Complementing Akenza's capabilities, and enabling smooth integrations, is the versatility of Python programming. Given how flexible Python is, the language is a natural choice when looking for a bridge between Akenza and the unique requirements of an organization looking to connect its intelligent infrastructure. This article is about combining the two, Akenza and Python. At the end of it, you will have: A bi-directional connection to Akenza using Python and WebSockets. A Python service subscribed to and receiving events from IoT devices through Akenza. A Python service that will be sending data to IoT devices through Akenza. Since WebSocket connections are persistent, their usage enhances the responsiveness of IoT applications, which in turn helps to exchange occur in real-time, thus fostering a dynamic and agile integrated ecosystem. Python and Akenza WebSocket Connections First, let's have a look at the full Python code, which to be discussed later. Python # -*- coding: utf-8 -*- # Zato from zato.server.service import WSXAdapter # ############################################################################################### # ############################################################################################### if 0: from zato.server.generic.api.outconn.wsx.common import OnClosed, \ OnConnected, OnMessageReceived # ############################################################################################### # ############################################################################################### class DemoAkenza(WSXAdapter): # Our name name = 'demo.akenza' def on_connected(self, ctx:'OnConnected') -> 'None': self.logger.info('Akenza OnConnected -> %s', ctx) # ############################################################################################### def on_message_received(self, ctx:'OnMessageReceived') -> 'None': # Confirm what we received self.logger.info('Akenza OnMessageReceived -> %s', ctx.data) # This is an indication that we are connected .. if ctx.data['type'] == 'connected': # .. for testing purposes, use a fixed asset ID .. asset_id:'str' = 'abc123' # .. build our subscription message .. data = {'type': 'subscribe', 'subscriptions': [{'assetId': asset_id, 'topic': '*'}]} ctx.conn.send(data) else: # .. if we are here, it means that we received a message other than type "connected". self.logger.info('Akenza message (other than "connected") -> %s', ctx.data) # ############################################################################################## def on_closed(self, ctx:'OnClosed') -> 'None': self.logger.info('Akenza OnClosed -> %s', ctx) # ############################################################################################## # ############################################################################################## Now, deploy the code to Zato and create a new outgoing WebSocket connection. Replace the API key with your own and make sure to set the data format to JSON. Receiving Messages From WebSockets The WebSocket Python services that you author have three methods of interest, each reacting to specific events: on_connected: Invoked as soon as a WebSocket connection has been opened. Note that this is a low-level event and, in the case of Akenza, it does not mean yet that you are able to send or receive messages from it. on_message_received: The main method you work most of the time with. Invoked each time a remote WebSocket sends, or pushes an event to your service. With Akenza, this method will be invoked each time Akenza has something to inform you about, e.g., that you subscribed to messages, that. on_closed: Invoked when a WebSocket has been closed. It is no longer possible to use a WebSocket once it has been closed. Let's focus on on_message_received, which is where the majority of action takes place. It receives a single parameter of type OnMessageReceived, which describes the context of the received message. That is, it is in the "ctx" that you will both the current request as well as a handle to the WebSocket connection through which you can reply to the message. The two important attributes of the context object are: ctx.data: A dictionary of data that Akenza sent to you. ctx.conn: The underlying WebSocket connection through which the data was sent and through which you can send a response. Now, the logic from lines 30-40 is clear: First, we check if Akenza confirmed that we are connected (type=='connected'). You need to check the type of a message each time Akenza sends something to you and react to it accordingly. Next, because we know that we are already connected (e.g., our API key was valid), we can subscribe to events from a given IoT asset. For testing purposes, the asset ID is given directly in the source code, but, in practice, this information would be read from a configuration file or database. Finally, for messages of any other type, we simply log their details. Naturally, a full integration would handle them per what is required in given circumstances, e.g. by transforming and pushing them to other applications or management systems. A sample message from Akenza will look like this. INFO - WebSocketClient - Akenza message (other than "connected") -> {'type': 'subscribed', 'replyTo': None, 'timeStamp': '2023-11-20T13:32:50.028Z', 'subscriptions': [{'assetId': 'abc123', 'topic': '*', 'tagId': None, 'valid': True}], 'message': None} How To Send Messages to WebSockets An aspect not to be overlooked is communication in the other direction, that is, sending messages to WebSockets. For instance, you may have services invoked through REST APIs or perhaps from a scheduler, and their job will be to transform such calls into configuration commands for IoT devices. Here is the core part of such a service, reusing the same Akenza WebSocket connection: Python # -*- coding: utf-8 -*- # Zato from zato.server.service import Service # ############################################################################################## # ############################################################################################## class DemoAkenzaSend(Service): # Our name name = 'demo.akenza.send' def handle(self) -> 'None': # The connection to use conn_name = 'Akenza' # Get a connection .. with self.out.wsx[conn_name].conn.client() as client: # .. and send data through it. client.send('Hello') # ############################################################################################## # ############################################################################################## Note that responses to the messages sent to Akenza will be received using your first service's on_message_received method. WebSockets-based messaging is inherently asynchronous, and the channels are independent. Now, we have a complete picture of real-time IoT connectivity with Akenza and WebSockets. We are able to establish persistent, responsive connections to assets, and we can subscribe to and send messages to devices and that lets us build intelligent automation and integration architectures that make use of powerful, emerging technologies.
Open-source software has changed the face of the software industry for good. Now, open-source hardware promises to do the same for the electronics sector. These platforms have been popular among hobbyists for years. As the IoT grows and production costs rise, the trend is creeping into commercial integrated circuit design, too. Where it proceeds from here will reshape electronics design. Here’s what that could look like. Faster IoT Growth As more manufacturers capitalize on open-source hardware, the IoT will skyrocket. There could be more than 29 billion active IoT connections by 2027, but that growth isn’t possible if all other conditions remain unchanged from today. Materials are too expensive and production is too centralized to foster enough competition. The open-source movement changes things. Open-source designs dramatically lower the barrier to entry for electronics design, of which the IoT is undoubtedly the most promising category. Businesses can customize and produce devices without lengthy, expensive R&D. In some cases, they won’t even have to manufacture critical components themselves. This democratization of device development will foster significant growth. Smaller companies will be able to compete in the crowded but ever-growing IoT market, leading to an influx of new gadgets. Rapid Innovation Relatedly, the open-source movement will pave the way for more innovative devices. Access to ready-made, proven designs shortens production timelines enough to allow smaller, more risk-taking players to enter the field. If these end devices remain open-source, they’ll spur a wave of collaboration to drive device optimization further. Open-source designs let anyone with access to electronic circuit design software see and refine others’ work. Thousands — even millions — of businesses and hobbyists can collaborate to make high-functioning devices, just as open-source software has enabled innovation. One designer may have a good foundational idea but fail to identify all EMI issues. Another user could see their design and then develop and test an alteration that’s more resistant to interference. By speeding up the development process, a business is able to complete comprehensive EMC testing sooner to improve performance and ensure compliance. As a result, new, feature-rich devices can emerge that never would’ve made it to market otherwise. Consistent Security Standards This collaboration and innovation will prove particularly useful in cybersecurity. Like in open-source software, open-source hardware opens designs to input from multiple parties to find and patch vulnerabilities. As that happens, the IoT can overcome its prominent security shortcomings. Collaborative integrated circuit design will also make it easier to adhere to regulatory guidelines. Programs like the U.S. Cyber Trust Mark incentivize higher security standards, but achieving this recognition can be difficult for smaller, less experienced developers. Improving each other’s open-source designs makes it easier for more devices to meet these requirements. Businesses could distribute Cyber Trust Mark-compliant designs for others to build on, leading to a proliferation of high-security IoT endpoints. IoT security would improve across the board without jeopardizing interoperability. A Rise in Custom Products Because open-source designs streamline development, they leave more room for further tweaks. That could lead to custom electronics becoming more common across both consumer and commercial segments. Open-source components tend to be versatile by design. Devs can easily make small adjustments in electronic circuit design software to create one-off, purpose-built devices based on these modular platforms. Singular, one-size-fits-all electronics could give way to highly personalized options in some markets. Personalization is important, with 62% of consumers today saying a brand would lose their loyalty if it didn’t offer personalized experiences. Typically, this tailoring applies to marketing recommendations or UI preferences, but hardware can cash in on it through open-source designs. New Revenue Streams Similarly, open-source hardware paves the way for new monetization strategies. Democratized design and production make the electronics market fairer but could also make it crowded. However, businesses can counteract that competition by profiting from open-source designs and devices. Electronics companies could build and sell development kits containing basic microcontrollers and tools to program them. Instead of relying on proprietary devices outperforming others, they’d profit from making it easier to get into device design. Current open-source giants have found considerable success through this model. Alternatively, some enterprises could offer to manufacture smaller business’s open-source designs. Providing low-cost, small-scale manufacturing for devices using the same templates would be relatively straightforward but increasingly profitable as more parties try to capitalize on the open-source movement. Obstacles in the Road Ahead The extent of open-source hardware’s impact on electronics design is still uncertain. While it could likely lead to all these benefits, it also faces several challenges to mainstream adoption. The most significant of these is the volatility and high costs of the necessary raw materials. Roughly 70% of all silicon materials come from China. This centralization makes prices prone to fluctuations from local disruptions in China or throughout the supply chain. Similarly, long shipping distances raise related prices for U.S. developers. Even if integrated circuit design becomes more accessible, these costs keep production inaccessible, slowing open-source devices’ growth. Similarly, industry giants may be unwilling to accept the open-source movement. While open-source designs open new revenue streams, these market leaders profit greatly from their proprietary resources. The semiconductor fabs supporting these large companies are even more centralized. It may be difficult for open-source hardware to compete if these organizations don’t embrace the movement. These obstacles don’t mean open-source alternatives can’t be successful, but they cast a shadow over their large-scale industry impact. This movement has experienced significant growth lately, so it will likely change the electronics market in some capacity. How far that impact goes depends on how the industry responds to these challenges. Open-Source Hardware Will Shape the Future Open-source hardware may seem like a distant dream, but consider its software counterpart. It’s changed the industry despite the popularity of proprietary alternatives and development complications. When dev tools and methods become more accessible, the same thing could happen in hardware. As open-source integrated circuit design grows, it could make electronics a more diverse, resilient, and democratized industry. Achieving that will be difficult, but the results will benefit virtually all parties involved.
Building a battery-powered IOT device is a very interesting challenge. Advancements in technology have enabled IOT module chips to perform more functions than ever before. There are chipsets available today that are smaller than a penny in size, yet they are equipped with GPS, Wi-Fi, cellular connectivity, and application processing capabilities. With these developments, it is an opportune time for the creation of small form factor IoT devices that can connect to the cloud and solve interesting problems in all domains. There are plenty of applications where the IOT device needs to be accessed externally by an application to request a specific task to be executed. Think of a smart home lock device that needs to open and close the door lock from a mobile application. Or an asset tracking device that must start logging its location upon request. The basic idea is to be able to issue a command to these devices and make them perform an action. Problem Statement At first glance, it may seem pretty straightforward to achieve this. Connect the IoT device to the network, pair it with a server, and have it wait for commands. What is the big deal, one may ask? There are a few hurdles to making it work within a low-power battery-driven setup. Cellular radio is expensive on battery. Keeping the device constantly connected to the network will drain the battery really fast. Maintaining a constant network connection can consume a significant amount of power and reduce the battery life span to an unacceptable level for most battery-powered applications where the expected battery life span is in the months to years range. Such a device must be designed to conserve power as much as possible to extend its operational life. Applying first principles thinking to the problem of high power consumption caused by cellular connections, we can ask a critical question: Is it necessary for the device to remain connected to the network at all times? In most cases, the device may not need to transfer or receive any data, and it may not make sense to keep it connected to the network. What if we could make it stay in 'sleep' mode and periodically wake it up to check for any incoming commands? This would mean the device only connects to the network when necessary, hence saving power. This is the fundamental idea behind the LTE-M power-saving modes. There are two modes that LTE-M offers: PSM and eDRX. Let's look at them in detail. PSM Mode When in PSM mode, an IoT device can request to enter a dormant state (RRC Idle) for a duration of up to 413 days. This mode is particularly useful for devices that need to periodically transmit data over the network and then go back to sleep. It is important to note that the device is entirely inaccessible while it is in the dormant state. This is the power profile of a device in PSM mode. [Fig 1: PSM Power profile] The device remains in a dormant state for an extended period until it wakes up to transfer data. Additionally, there is a brief paging window during which the device is reachable, after which it returns to sleep mode. Requesting longer sleep times can result in greater power savings, but it may also require sacrificing data transfer frequency. Ultimately, it is up to the developer to determine the ideal tradeoff between sleep time and required data resolution, depending on the application requirements. A smart energy meter might be perfectly fine sending updates once per day, but a GPS tracker that needs to send more frequent updates would need shorter sleep times. eDRX Mode The term eDRX stands for "extended Discontinuous Reception". The concept is similar to PSM mode, where the device goes into RRC Idle, but unlike PSM mode, in the eDRX, the device wakes up periodically to "check-in" with the network every eDRX cycle period. The maximum sleep time for eDRX devices ranges from up to 43 minutes for devices using LTE-M. [Fig 2: eDRX power profile] Compared to eDRX, a PSM device requires significantly more time to wake up from sleep mode and remain active, as it must establish a connection with the network before it can receive application data. When using eDRX, the device only needs to wake up and listen for 1 ms, whereas with PSM, the device must wake up, receive, and transmit control messages for approximately 100-200 ms before it can receive a message from the cloud application, resulting in a 100-fold difference. [1] While these power modes seem promising for a low-power network IOT device, building a reachable device, i.e., a device that can send a message packet externally over the network, is still challenging. We will go into the details of the challenges that are encountered with TCP, SMS, and UDP-based communication protocols. Communication Protocol Considerations There are a variety of IOT communication protocols available for different kinds of IOT applications, but for the purpose of this article, we will categorize them into two kinds. Connection-based: TCP-based MQTT, Websockets, HTTP Connection-less: UDP-based MQTT-SN, CoAP, LWM2M and SMS. Connection Based Protocols These communication protocols require an active connection over the network for the data to transfer. The connection is established over a three-way handshake, and it looks something like this: [Fig 3: Three-way handshake illustration] Notice that before any actual payload data can be transferred (shown in green), there are three network payloads that need to be sent back and forth. This overhead may seem little for a system where data consumption or battery life isn’t an issue, but in our case, this would cost a lot of precious battery resources, sometimes consuming more data than the actual payload just to initiate a connection. Secondly, notice in the figure that the client is the one initiating the connection; this is intentional and a big limitation of connection-based (TCP, MQTT, etc.) protocols in the application of a network reachable device, which requires that the device responds to the server’s request. One could have the client initiate a first connection and keep the connection open, but to keep the connection alive, keep-alive packets need to be sent every pre-defined interval, which again costs a tremendous amount of battery, making these kinds of protocols unusable and impractical for a low power network reachable device. [Fig 4: TCP Based communication protocol power profile] Connection-Less Protocols Connection-less protocols require no active connection with the server for the data transfer to occur. These offer a much more promising outlook for the network-reachable IOT device application. Some examples of such protocols are MQTT-SN, CoAP, and LWM2M, which are implemented on top of UDP. These protocols are lightweight and optimized for power and resource consumption, working in favor of the resource-restricted platform they’re intended to be used on. There is, however, a logistical challenge with these networks with regard to the device being network reachable. Despite the protocols themselves being connection-less, the underlying network still requires occasional data transfer to maintain active NAT translations. NAT (Network Address Translation) devices, commonly used in home and enterprise networks, have timeouts for translation entries in their tables. If a device using connection-less protocols remains dormant for an extended period, exceeding the NAT timeout, the corresponding translation entry is purged, hence making the original client unreachable by the network. To address this, periodic keep-alive messages or other mechanisms have to be implemented to maintain NAT translations and ensure continuous network reachability for IoT devices using connection-less protocols but that is very expensive on the battery and not ideal for an IOT device where battery is a scarce resource. Conclusion As we have explored in this article, building a network-reachable device is a challenge on all layers, from device constraints to infrastructure limitations. UDP-based communication protocols offer a promising future for low-power network reachable devices because, fundamentally, the logistical challenge of the NAT timeouts, as mentioned above, can be eliminated using a private network and bypassing the NAT altogether. As enough interest grows in the need for network-reachable devices, we should see specific commercial network infrastructures that offer out-of-the-box solutions for these challenges.
The rapid growth of the Internet of Things (IoT) has revolutionized the way we connect and interact with devices and systems. However, this surge in connectivity has also introduced new security challenges and vulnerabilities. IoT environments are increasingly becoming targets for cyber threats, making robust security measures essential. Security Information and Event Management (SIEM) systems, such as Splunk and IBM QRadar, have emerged as critical tools in bolstering IoT security. In this article, we delve into the pivotal role that SIEM systems play in monitoring and analyzing security events in IoT ecosystems, ultimately enhancing threat detection and response. The IoT Security Landscape Challenges and Complexities IoT environments are diverse, encompassing a wide array of devices, sensors, and platforms, each with its own set of vulnerabilities. The challenges of securing IoT include: Device diversity: IoT ecosystems comprise devices with varying capabilities and communication protocols, making them difficult to monitor comprehensively. Data volume: The sheer volume of data generated by IoT devices can overwhelm traditional security measures, leading to delays in threat detection. Real-time threats: Many IoT applications require real-time responses to security incidents. Delayed detection can result in significant consequences. Heterogeneous networks: IoT devices often connect to a variety of networks, including local, cloud, and edge networks, increasing the attack surface. The Role of SIEM Systems in IoT Security SIEM systems are designed to aggregate, analyze, and correlate security-related data from various sources across an organization's IT infrastructure. When applied to IoT environments, SIEM systems offer several key benefits: Real-time monitoring: SIEM systems provide continuous monitoring of IoT networks, enabling organizations to detect security incidents as they happen. This real-time visibility is crucial for rapid response. Threat detection: By analyzing security events and logs, SIEM systems can identify suspicious activities and potential threats in IoT ecosystems. This proactive approach helps organizations stay ahead of cyber adversaries. Incident response: SIEM systems facilitate swift incident response by alerting security teams to anomalies and security breaches. They provide valuable context to aid in mitigation efforts. Log management: SIEM systems collect and store logs from IoT devices, allowing organizations to maintain a comprehensive record of security events for auditing and compliance purposes. Splunk: A Leading SIEM Solution for IoT Security Splunk is a renowned SIEM solution known for its powerful capabilities in monitoring and analyzing security events, making it well-suited for IoT security. Key features of Splunk include: Data aggregation: Splunk can collect and aggregate data from various IoT devices and systems, offering a centralized view of the IoT security landscape. Advanced analytics: Splunk's machine-learning capabilities enable it to detect abnormal patterns and potential threats within IoT data streams. Real-time alerts: Splunk can issue real-time alerts when security events or anomalies are detected, allowing for immediate action. Custom dashboards: Splunk allows organizations to create custom dashboards to visualize IoT security data, making it easier for security teams to interpret and respond to events. IBM QRadar: Strengthening IoT Security Posture IBM QRadar is another SIEM solution recognized for its effectiveness in IoT security. It provides the following advantages: Threat intelligence: QRadar integrates threat intelligence feeds to enhance its ability to detect and respond to IoT-specific threats. Behavioral analytics: QRadar leverages behavioral analytics to identify abnormal activities and potential security risks within IoT networks. Incident forensics: QRadar's incident forensics capabilities assist organizations in investigating security incidents within their IoT environments. Compliance management: QRadar offers features for monitoring and ensuring compliance with industry regulations and standards, a critical aspect of IoT security. In the realm of IoT security, SIEM systems such as Splunk and IBM QRadar serve as indispensable guardians. Offering real-time monitoring, advanced threat detection, and swift incident response, they empower organizations to fortify their IoT ecosystems. As the IoT landscape evolves, the integration of these SIEM solutions becomes paramount in ensuring the integrity and security of connected devices and systems. Embracing these tools is a proactive stride toward a safer and more resilient IoT future.
In the vanguard of the ever-evolving digital world, the rapid increase of wearable contrivances and the expanding realm of the Internet of Things (IoT) have firmly gripped the world's technological spotlight. The amalgamation of IoT with wearable devices has arisen as a pivotal focus within the tech sphere, bestowing upon us an abundance of potential that ensures shifting progress across sectors as diverse as healthcare, fitness, transportation, and beyond. In the subsequent discourse, we shall embark on an in-depth exploration of the intricate domain that is the integration of IoT and wearable technology. Our journey will encompass an exploration of how significant it has become, the formidable challenges that lie in its path, and the driving force behind its software development. Surveying the Landscape of Wearable Devices From its humble beginnings as a simple wristwatch, wearable technology has evolved significantly. Nowadays, it encompasses a diverse range of devices, including fitness trackers, smart glasses, smart clothing, and even implantable medical tools. These cutting-edge technologies diligently gather data from both the human body and the surrounding environment. The information collected is then transmitted to other devices, networks, or cloud platforms for thorough examination and analysis. Some main distinguishing factors that make wearable devices unique include; Data Sensing: Wearable devices gather a wide range of information, encompassing measurements like heart rate, step count, body temperature, and various other metrics. Data Processing: Some wearables intricately handle data directly within them, while others send it to smartphones or the cloud for thorough analysis. Data Transmission: The seamless transmission of data often ensues through an array of communication protocols, including Bluetooth, Wi-Fi, or cellular networks. User Interface: Wearable devices should offer a multifaceted user experience filled with displays, auditory interfaces, and haptic feedback mechanisms. The Synergy of IoT Integration IoT represents a vast interconnected realm where devices harmoniously converse with one another and interface seamlessly with external systems. This interconnectedness contributes to the creation of a more astute and automated environment. When wearable devices seamlessly integrate into this IoT ecosystem, they not only coexist but also share and receive data from other connected devices, thus amplifying their functionality. This union ushers forth a multitude of advantages, including: Seamless Data Exchange: Wearable devices have increased value when they can readily exchange data with an array of IoT pairs such as smartphones, laptops, or home automation systems. Enhanced Monitoring: Practitioners in the healthcare industry can exercise real-time remote patient monitoring, simplifying the provision of timely and responsive care. Improved User Experience: By connecting wearables with smart home systems, users can create personalized experiences, such as setting the thermostat to their preferred temperature when they arrive home. Data Analytics: The amalgamated data streams from wearable devices and diverse IoT sensors transcend the mere collection of information; they metamorphose into valuable insights, offering glimpses into user behavior, health patterns, and beyond. Efficient Resource Management: Within sectors like manufacturing and logistics, IoT integration serves as the linchpin, granting businesses the power to optimize resource allocation, thus curbing operational expenses. Software Development for Wearable IoT Software development plays a pivotal role in ensuring the success of the rise of wearable technology integration. Developers need to consider various aspects when creating software for this complex environment: Device Compatibility Ensuring compatibility of software is the focal point when it comes to a multitude of wearable devices, each featuring its own distinct operating system, sensors, and capabilities. Developers are confronted with the formidable challenge of creating applications that go beyond device limitations, either by adopting designs that are compatible with other devices or by customizing software complexities for various platforms. Data Security Given the sensitive nature of the data collected by wearables, the utmost importance lies in data security. Encryption during transmission and robust security measures to thwart unauthorized access must be implemented. Data protection and secure authentication mechanisms must be prioritized by software developers. Low-Power Consumption Wearable devices often run on limited battery power. Therefore, the software must be optimized for energy efficiency. Developers need to write code that minimizes resource consumption and includes features like background processing and power-saving modes. Real-Time Data Processing The heartbeat of numerous wearable IoT applications echoes with the rhythm of real-time data processing. Here, software must perform a ballet of swift data analysis, offering timely insights or triggering actions in the blink of an eye. Pioneering the pursuit of minimal latency, developers sculpt codes that have a pattern of rapid data comprehension, thereby bestowing users with instantaneous responsiveness. Cloud Integration For many wearable devices, cloud integration is essential. Cloud platforms enable data storage, analysis, and accessibility from multiple devices. Software developers must create applications that can interact with cloud services with ease. Challenges and Considerations While IoT-wearable integration presents significant opportunities, it also comes with its share of challenges and considerations: Privacy Concerns Privacy concerns are aroused by the convergence of wearable technology and the gathering of personal data. Users may find the collection of sensitive health and location information unsettling. As a result, meticulous management by developers and organizations becomes essential, thus necessitating complete transparency and reliable opt-in/opt-out mechanisms. Regulatory Compliance Especially in the health-centric sphere, adherence to stringent regulations like the US's HIPAA is obligatory. Software developers bear the weighty responsibility of ensuring their applications align with these regulations, steering clear of legal entanglements. Data Quality and Accuracy In order for wearable gadgets to provide significant insights, it is imperative for the data accuracy to be non-negotiable. The responsibility of executing meticulous data validation and error-checking procedures falls on the shoulders of software developers. Accuracy holds utmost importance, as flawed data has the potential to compromise the reliability of conclusions and recommendations derived from analyses. Interoperability Ensuring that wearable devices can communicate with other IoT devices can be a complex task. Developers need to consider interoperability standards like Bluetooth, Zigbee, or Thread to achieve seamless integration. Conclusion The fusion of IoT with wearable technology stands as a catalytic force reshaping the very fabric of numerous sectors. From healthcare and fitness to agriculture and retail, this convergence propels industries into innovative realms. Navigating this landscape demands adept software development, an endeavor riddled with challenges encompassing device compatibility, impregnable data security, frugal power consumption, and real-time data processing. In this expedition of innovation, software developers emerge as visionary architects. Armed with skill and creativity, they transform these foundational elements into tangible reality, ushering in a future where technology augments every facet of our existence.
In this tutorial, you'll learn how to publish images from an ESP32-CAM board to multiple browser clients using MQTT (Message Queuing Telemetry Transport). This setup will enable you to create a platform that functions similarly to a live video stream, viewable by an unlimited number of users. Prerequisites Before diving in, make sure you have completed the following prerequisite tutorials: Your First Xedge32 Project: This tutorial covers essential setup and configuration instructions for Xedge32 running on an ESP32. Your First MQTT Lua Program: This tutorial introduces the basics of MQTT and how to write a simple Lua program to interact with MQTT. By building on the knowledge gained from these foundational tutorials, you'll be better equipped to follow along with this tutorial. Publishing ESP32-CAM Images via MQTT In the MQTT CAM code, our primary focus is publishing images without subscribing to other events. This publishing operation is managed by a timer event, which publishes images based on the intervals specified. Setting Up the Timer First, let's create a timer object. This timer will trigger the publishImage function at specific intervals. timer = ba.timer(publishImage) To interact with the ESP32 camera, initialize a camera object like so: cam = esp32.cam(cfg) The cfg parameter represents a configuration table. Important: make sure it matches the settings for your particular ESP32-CAM module. See the Lua CAM API for details. Handling MQTT Connection Status For monitoring MQTT connections, use the following callback function: Lua local function onstatus(type, code, status) if "mqtt" == type and "connect" == code and 0 == status.reasoncode then timer:set(300, false, true) -- Activate timer every 300 milliseconds return true -- Accept connection end timer:cancel() return true -- Keep trying end The above function starts the timer when a successful MQTT connection is made. If the connection drops, it cancels the timer but will keep attempting to reconnect. Image Publishing via Timer Callback The core of the image publishing mechanism is the timer callback function, publishImage. This function captures an image using the camera object and publishes it via MQTT. The timer logic supports various timer types. Notably, this version operates as a Lua coroutine (akin to a thread). Within this coroutine, it continually loops and hibernates for the duration defined by the timer through coroutine.yield(true). Lua function publishImage() local busy = false while true do if mqtt:status() < 2 and not busy then busy = true -- thread busy ba.thread.run(function() local image = cam:read() mqtt:publish(topic, image) busy = false -- no longer running end) end coroutine.yield(true) -- sleep end end The above function maintains flow control by not publishing an image if two images already populate the MQTT client's send queue. The cam:read function can be time-consuming -- not in human time, but in terms of microcontroller operations. As such, we offload the task of reading from the CAM object onto a separate thread. While this step isn't strictly necessary, it enhances the performance of applications juggling multiple operations alongside reading from the CAM. For a deeper dive into the threading intricacies, you are encouraged to refer to the Barracuda App Server’s documentation on threading. The following shows the complete MQTT CAM code: Lua local topic = "/xedge32/espcam/USA/92629" local broker = "broker.hivemq.com" -- Settings for 'FREENOVE ESP32-S3 WROOM' CAM board local cfg={ d0=11, d1=9, d2=8, d3=10, d4=12, d5=18, d6=17, d7=16, xclk=15, pclk=13, vsync=6, href=7, sda=4, scl=5, pwdn=-1, reset=-1, freq="20000000", frame="HD" } -- Open the cam local cam,err=esp32.cam(cfg) assert(cam, err) -- Throws error if 'cfg' incorrect local timer -- Timer object; set below. -- MQTT connect/disconnect callback local function onstatus(type,code,status) -- If connecting to broker succeeded if "mqtt" == type and "connect" == code and 0 == status.reasoncode then timer:set(300,false,true) -- Activate timer every 300 milliseconds trace"Connected" return true -- Accept connection end timer:cancel() trace("Disconnect or connect failed",type,code) return true -- Keep trying end -- Create MQTT client local mqtt=require("mqttc").create(broker,onstatus) -- Timer coroutine function activated every 300 millisecond function publishImage() local busy=false while true do --trace(mqtt:status(), busy) -- Flow control: If less than 2 queued MQTT messages if mqtt:status() < 2 and not busy then busy=true ba.thread.run(function() local image,err=cam:read() if image then mqtt:publish(topic,image) else trace("cam:read()",err) end busy=false end) end coroutine.yield(true) -- sleep end end timer = ba.timer(publishImage) While we have already covered the majority of the program's functionality, there are a few aspects we haven't touched upon yet: Topic and Broker Configuration: local topic = "/xedge32/espcam/USA/92629": Sets the MQTT topic where the images will be published. Change this topic to your address. local broker = "broker.hivemq.com": Specifies the MQTT broker's address. The public HiveMQ broker is used in this example. ESP32 Camera Configuration (cfg): This block sets up the specific pin configurations and settings for your ESP32 CAM board. Replace these settings with those appropriate for your hardware. Creating the MQTT Client: The MQTT client is created with the require("mqttc").create(broker, onstatus) function, passing in the broker address and the onstatus callback. Creating the Timer Object for publishImage: The timer is created by calling ba.timer and passing in the publishImage callback, which will be activated at regular intervals. This is the mechanism that continually captures and publishes images. Subscribing to CAM Images With a JavaScript-Powered HTML Client To visualize the images published by the ESP32 camera, you can use an HTML client. The following client will subscribe to the same MQTT topic to which the camera is publishing images. The client runs purely in your web browser and does not require any server setup. The entire code for the HTML client is shown below: HTML <!DOCTYPE html> <html lang="en"> <head> <title>Cam Images Over MQTT</title> <script data-fr-src="https://cdnjs.cloudflare.com/ajax/libs/mqtt/5.0.0-beta.3/mqtt.min.js"></script> <script> const topic="/xedge32/espcam/USA/92629"; const broker="broker.hivemq.com"; window.addEventListener("load", (event) => { let img = document.getElementById("image"); let msg = document.getElementById("msg"); let frameCounter=0; const options = { clean: true, connectTimeout: 4000, port: 8884 // Secure websocket port }; const client = mqtt.connect("mqtts://"+broker+"/mqtt",options); client.on('connect', function () { msg.textContent="Connected; Waiting for images..."; client.subscribe(topic); }); client.on("message", (topic, message) => { const blob = new Blob([message], { type: 'image/jpeg' }); img.src = URL.createObjectURL(blob); frameCounter++; msg.textContent = `Frames: ${frameCounter}`; }); }); </script> </head> <body> <h2>Cam Images Over MQTT</h2> <div id="image-container"> <img id="image"/> </div> <p id="msg">Connecting...</p> </body> </html> MQTT JavaScript Client At the top of the HTML file, the MQTT JavaScript library is imported to enable MQTT functionalities. This is found within the <script data-fr-src=".......mqtt.min.js"></script> line. Body Layout The HTML body contains a <div> element with an id of "image-container" that will house the incoming images, and a <p> element with an id of "msg" that serves as a placeholder for status messages. MQTT Configuration In the JavaScript section, two constants topic and broker are defined. These must correspond to the topic and broker configurations in your mqttcam.xlua file. Connecting to MQTT Broker The client initiates an MQTT connection to the specified broker using the mqtt.connect() method. It uses a secure websocket port 8884 for this connection. Handling Incoming Messages Upon a successful connection, the client subscribes to the topic. Any incoming message on this topic is expected to be a binary JPEG image. The message is converted into a Blob and displayed as the source for the image element. Frame Counter A frameCounter variable keeps count of the incoming frames (or images) and displays this count as a text message below the displayed image. By having this HTML file open in a web browser, you'll be able to visualize in real-time the images that are being published to the specified MQTT topic. Preparing the Code Step 1: Prepare the Lua Script as Follows As explained in the tutorial Your First Xedge32 Project, when the Xedge32 powered ESP32 is running, use a browser and navigate to the Xedge IDE. Create a new Xedge app called "cam" and LSP enable the app. Expand the cam app now visible in the left pane tree view. Right-click the cam app and click New File in the context menu. Type camtest.lsp and click Enter. Open the camtest.lsp file at GitHub and click the copy raw file button. Go to the Xedge IDE browser window and paste the content into the camtest.lsp file. Important: Adjust the cfg settings in camtest.lsp to match your specific ESP32 CAM board. See the Lua CAM API for details. Click Save and then Click Open to test your cam settings. Make sure you see the image generated by the LSP script before proceeding. Right-click the cam app and click New File in the context menu. Type mqttcam.xlua and click Enter. Open the mqttcam.xlua file at GitHub and click the copy raw file button. Go to the Xedge IDE browser window and paste the content into the mqttcam.xlua file. Using the Xedge editor, update the topic variable /xedge32/espcam/USA/92629 in the Lua script to your desired MQTT topic. Important: Copy the cfg settings from camtest.lsp and replace the cfg settings in mqttcam.xlua with the settings you tested in step 9. Click the Save & Run button to save and start the example. Step 2: Prepare the HTML/JS File as follows Download mqttcam.html, open the file in any editor, and ensure the topic in the HTML file matches the topic you set in the Lua script. Save the mqttcam.html file. Open mqttcam.html: Double-click the mqttcam.html file or drag and drop it into your browser. Note: this file is designed to be opened directly from the file system. You do not need a web server to host this file. Observe the Output: The webpage will display the images being published by the ESP32 CAM. The number of frames received will be displayed below the image. Potential Issues with ESP32 CAM Boards and Solutions ESP32 CAM boards are widely recognized for their versatility and affordability. However, they're not without their challenges. One of the significant issues users might face with the ESP32 CAM boards is interference between the camera read operation and the built-in WiFi module. Let's delve into the specifics: Problem: Interference and WiFi Degradation When the ESP32 CAM board is in operation, especially during the camera's read operation, it can generate noise. This noise interferes with the built-in WiFi, which results in: Reduced Range: The distance over which the WiFi can effectively transmit and receive data can be notably decreased. Decreased Throughput: The speed and efficiency at which data is transmitted over the WiFi network can be considerably hampered. Solutions To combat these issues, consider the following solutions: Use a CAM Board with an External Antenna: Several ESP32 CAM boards come equipped with or support the use of an external antenna. By using such a board and connecting an external antenna, you can boost the WiFi signal strength and range, mitigating some of the interference caused by the camera operations. Integrate the W5500 Ethernet Chip: If your application demands consistent and robust data transmission, consider incorporating the W5500 Ethernet chip. By utilizing Ethernet over WiFi, you are effectively sidestepping the interference issues associated with WiFi on the ESP32 CAM board. Xedge32 is equipped with integrated Ethernet drivers. When paired with hardware that supports it, like the W5500 chip, it can facilitate smooth and interference-free data transfer, ensuring that your application remains stable and efficient. In conclusion, while the ESP32 CAM board is an excellent tool for a myriad of applications, it's crucial to be aware of its limitations and know how to circumvent them to ensure optimal performance. References Lua MQTT API Lua timer API Lua CAM API
This blog post explores the state of data streaming for the energy and utilities industry in 2023. The evolution of utility infrastructure, energy distribution, customer services, and new business models requires real-time end-to-end visibility, reliable and intuitive B2B and B2C communication, and integration with pioneering technologies like 5G for low latency or augmented reality for innovation. Data streaming allows integrating and correlating data in real-time at any scale to improve most workloads in the energy sector. I look at trends in the utilities sector to explore how data streaming helps as a business enabler, including customer stories from SunPower, 50hertz, Powerledger, and more. A complete slide deck and on-demand video recording are included. General Trends in the Energy and Utilities Industry The energy and utilities industry is fundamental for a sustainable future. Garter explores the Top 10 Trends Shaping the Utility Sector in 2023: "In 2023, power and water utilities will continue to face a variety of forces that will challenge their business and operating models and shape their technology investments. Utility technology leaders must confidently compose the future for their organizations in the midst of uncertainty during this energy transition volatile period — the future that requires your organizations to be both agile and resilient." From System-Centric and Large to Smaller-Scale and Distributed The increased use of digital tools makes the expected structural changes in the energy system possible: Energy AI Use Cases Artificial Intelligence (AI) with technologies like Machine Learning (ML) and Generative AI (GenAI) is a hot topic across all industries. Innovation around AI disrupts many business models, tasks, business processes, and labor. NVIDIA created an excellent diagram showing the various opportunities for AI in the energy and utilities sector. It separates the scenarios by segment: upstream, midstream, downstream, power generation, and power distribution. Cybersecurity: The Threat Is Real! McKinsey & Company explains that "the cyber threats facing electric-power and gas companies include the typical threats that plague other industries: data theft, billing fraud, and ransomware. However, several characteristics of the energy sector heighten the risk and impact of cyber threats against utilities:" Data Streaming in the Energy and Utilities Industry Adopting trends like predictive maintenance, track and trace, proactive sales and marketing, or threat intelligence is only possible if enterprises in the energy sector can provide and correlate information at the right time in the proper context. Real-time, which means using the information in milliseconds, seconds, or minutes, is almost always better than processing data later (whatever later means): Data streaming combines the power of real-time messaging at any scale with storage for true decoupling, data integration, and data correlation capabilities. 5 Ways Utilities Accomplish More With Real-Time Data “After creating a collaborative team that merged customer experience and digital capabilities, one North American utility went after a 30 percent reduction in its cost-to-serve customers in some of its core journeys.” As the Utilities Analytics Institute explains: "Utilities need to ensure that the data they are collecting is high quality, specific to their needs, preemptive in nature, and, most importantly, real-time." The following five characteristics are crucial to add value to real-time data: High-Quality Data Data Specific to Your Needs Make Your Data Proactive Data Redundancy Data is Constantly Changing Real-Time Data for Smart Meters and Common Praxis Smart meters are a perfect example of increasing business value with real-time data streaming. As Clou Global confirms: "The use of real-time data in smart grids and smart meters is a key enabler of the smart grid." Possible use cases include: Load Forecasting Fault Detection Demand Response Distribution Automation Smart Pricing Processing and correlating events from smart meters with stream processing is just one IoT use case. Cloud Adoption in Utilities and Energy Sector Accenture points out that 84% use Cloud SaaS solutions and 79% use Cloud PaaS Solutions in the energy and utilities market for various reasons: New approach to IT Incremental adoption Improved scalability, efficiency, agility, and security Unlock most business value. This is a general statistic, but this applies to all components in the data-driven enterprise, including data streaming. A company does not just move a specific application to the cloud; this would be counter-intuitive from a cost and security perspective. Hence, most companies start with a hybrid architecture and bring more and more workloads to the public cloud. Architecture Trends for Data Streaming The energy and utilities industry applies various trends for enterprise architectures for cost, flexibility, security, and latency reasons. The three major topics I see these days at customers are: Global data streaming Edge computing and hybrid cloud integration OT/IT modernization Let's look deeper into some enterprise architectures that leverage data streaming for energy and utilities use cases. Global Data Streaming Across Data Centers, Clouds, and the Edge Energy and utilities require data infrastructure everywhere. While most organizations have a cloud-first strategy, there is no way around running some workloads at the edge outside a data center for cost, security, or latency reasons. Data streaming is available everywhere: Data synchronization across environments, regions, and clouds is possible with open-source Kafka tools like MirrorMaker. However, this requires additional infrastructure and development/operations efforts. Innovative solutions like Confluent's Cluster Linking leverage the Kafka protocol for real-time replication. This enables much easier deployments and significantly reduced network traffic. Edge Computing and Hybrid Cloud Integration Kafka deployments look different depending on where it needs to be deployed. Fully managed serverless offerings like Confluent Cloud are highly recommended in the public cloud to focus on business logic with reduced time-to-market and TCO. In a private cloud, data center, or edge environment, most companies deploy on Kubernetes today to provide a similar cloud-native experience. Kafka can also be deployed on industrial PCs (IPC) and other industrial hardware. Many use cases exist for data streaming at the edge. Sometimes, a single broker (without high availability) is good enough. No matter how you deploy data streaming workloads, a key value is the unidirectional or bidirectional synchronization between clusters. Often, only curated and relevant data is sent to the cloud for cost reasons. Also, command and control patterns can start a business process in the cloud and send events to the edge. OT/IT Modernization With Data Streaming The energy sector operates many monoliths, inflexible and closed software and hardware products. This is changing in this decade. OT/IT modernization and digital transformation require open APIs, flexible scale, and decoupled applications (from different vendors). Many companies leverage Apache Kafka to build a postmodern data historian to complement or replace existing expensive OT middleware: Just to be clear: Kafka and any other IT software like Spark, Flink, Amazon Kinesis, and so on are NOT hard real-time. It cannot be used for safety-critical use cases with deterministic systems like autonomous driving or robotics. That is C, Rust, or other embedded software. However, data streaming connects the OT and IT worlds. As part of that, connectivity with robotic systems, intelligent vehicles, and other IoT devices is the norm for improving logistics, integration with ERP and MES, aftersales, etc. New Customer Stories for Data Streaming in the Energy and Utilities Sector So much innovation is happening in the energy and utilities sector. Automation and digitalization change how utilities monitor infrastructure, build customer relationships, and create completely new business models. Most energy service providers use a cloud-first approach to improve time-to-market, increase flexibility, and focus on business logic instead of operating IT infrastructure. Elastic scalability gets even more critical with all the growing networks, 5G workloads, autonomous vehicles, drones, and other innovations. Here are a few customer stories from worldwide energy and utilities organizations: 50hertz: A grid operator modernization of the legacy, monolithic, and proprietary SCADA infrastructure to cloud-native microservices and a real-time data fabric powered by data streaming. SunPower: Solar solutions across the globe where 6+ million devices in the field send data to the streaming platform. However, sensor data alone is not valuable! Fundamentals for delivering customer value include measurement ingestion, metadata association, storage, and analytics. aedifion: Efficient management of the real estate to operate buildings better and meet environmental, social, and corporate governance (ESG) goals. Secure connectivity and reliable data collection are implemented with Confluent Cloud (and deprecated the existing MQTT-based pipeline). Ampeers Energy: Decarbonization for the real estate. The service provides district management with IoT-based forecasts and optimization and local energy usage accounting. The real-time analytics of time-series data is implemented with OPC-UA, Confluent Cloud, and TimescaleDB. Powerledger: Green energy trading with blockchain-based tracking, tracing, and trading of renewable energy from rooftop solar power installations and virtual power plants. Non-fungible tokens (NFTs) representing renewable energy certificates (RECs) in. A decentralized rather than the conventional unidirectional market. Confluent Cloud ingests data from smart electricity meters. Resources To Learn More This blog post is just the starting point. Learn more about data streaming in the energy and utilities industry in the following on-demand webinar recording, the related slide deck, and further resources, including pretty cool lightboard videos about use cases. On-Demand Video Recording The video recording explores the telecom industry's trends and architectures for data streaming. The primary focus is the data streaming case studies. Slides If you prefer learning from slides, check out the deck used for the above recording: Slide Deck: The State of Data Streaming for Energy & Utilities in 2023 Case Studies and Lightboard Videos for Data Streaming in the Energy and Utilities Industry The state of data streaming for energy and utilities in 2023 is fascinating. New use cases and case studies come up every month. This includes better data governance across the entire organization, real-time data collection and processing data across hybrid edge and cloud infrastructures, data sharing and B2B partnerships for new business models, and many more scenarios. We recorded lightboard videos showing the value of data streaming simply and effectively. These five-minute videos explore the business value of data streaming, related architectures, and customer stories. Stay tuned; I will update the links in the next few weeks and publish a separate blog post for each story and lightboard video. And this is just the beginning. Every month, we will talk about the status of data streaming in a different industry. Manufacturing was the first. Financial services second, then retail, telcos, gaming, and so on... Check out my other blog posts. Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.
This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report Data-driven design is a game changer. It uses real data to shape designs, ensuring products match user needs and deliver user-friendly experiences. This approach fosters constant improvement through data feedback and informed decision-making for better results. In this article, we will explore the importance of data-driven design patterns and principles, and we will look at an example of how the data-driven approach works with artificial intelligence (AI) and machine learning (ML) model development. Importance of the Data-Driven Design Data-driven design is crucial as it uses real data to inform design decisions. This approach ensures that designs are tailored to user needs, resulting in more effective and user-friendly products. It also enables continuous improvement through data feedback and supports informed decision-making for better outcomes. Data-driven design includes the following: Data visualization – Aids designers in comprehending trends, patterns, and issues, thus leading to effective design solutions. User-centricity – Data-driven design begins with understanding users deeply. Gathering data about user behavior, preferences, and challenges enables designers to create solutions that precisely meet user needs. Iterative process – Design choices are continuously improved through data feedback. This iterative method ensures designs adapt and align with user expectations as time goes on. Measurable outcomes – Data-driven design targets measurable achievements, like enhanced user engagement, conversion rates, and satisfaction. This is a theory, but let's reinforce it with good examples of products based on data-driven design: Netflix uses data-driven design to predict what content their customers will enjoy. They analyze daily plays, subscriber ratings, and searches, ensuring their offerings match user preferences and trends. Uber uses data-driven design by collecting and analyzing vast amounts of data from rides, locations, and user behavior. This helps them optimize routes, estimate fares, and enhance user experiences. Uber continually improves its services by leveraging data insights based on real-world usage patterns. Waze uses data-driven design by analyzing real-time GPS data from drivers to provide accurate traffic updates and optimal route recommendations. This data-driven approach ensures users have the most up-to-date and efficient navigation experience based on the current road conditions and user behavior. Common Data-Driven Architectural Principles and Patterns Before we jump into data-driven architectural patterns, let's reveal what data-driven architecture and its fundamental principles are. Data-Driven Architectural Principles Data-driven architecture involves designing and organizing systems, applications, and infrastructure with a central focus on data as a core element. Within this architectural framework, decisions concerning system design, scalability, processes, and interactions are guided by insights and requirements derived from data. Fundamental principles of data-driven architecture include: Data-centric design – Data is at the core of design decisions, influencing how components interact, how data is processed, and how insights are extracted. Real-time processing – Data-driven architectures often involve real-time or near real-time data processing to enable quick insights and actions. Integration of AI and ML – The architecture may incorporate AI and ML components to extract deeper insights from data. Event-driven approach – Event-driven architecture, where components communicate through events, is often used to manage data flows and interactions. Data-Driven Architectural Patterns Now that we know the key principles, let's look into data-driven architecture patterns. Distributed data architecture patterns include the data lakehouse, data mesh, data fabric, and data cloud. Data Lakehouse Data lakehouse allows organizations to store, manage, and analyze large volumes of structured and unstructured data in one unified platform. Data lakehouse architecture provides the scalability and flexibility of data lakes, the data processing capabilities, and the query performance of data warehouses. This concept is perfectly implemented in Delta Lake. Delta Lake is an extension of Apache Spark that adds reliability and performance optimizations to data lakes. Data Mesh The data mesh pattern treats data like a product and sets up a system where different teams can easily manage their data areas. The data mesh concept is similar to how microservices work in development. Each part operates on its own, but they all collaborate to make the whole product or service of the organization. Companies usually use conceptual data modeling to define their domains while working toward this goal. Data Fabric Data fabric is an approach that creates a unified, interconnected system for managing and sharing data across an organization. It integrates data from various sources, making it easily accessible and usable while ensuring consistency and security. A good example of a solution that implements data fabric is Apache NiFi. It is an easy-to-use data integration and data flow tool that enables the automation of data movement between different systems. Data Cloud Data cloud provides a single and adaptable way to access and use data from different sources, boosting teamwork and informed choices. These solutions offer tools for combining, processing, and analyzing data, empowering businesses to leverage their data's potential, no matter where it's stored. Presto exemplifies an open-source solution for building a data cloud ecosystem. Serving as a distributed SQL query engine, it empowers users to retrieve information from diverse data sources such as cloud storage systems, relational databases, and beyond. Now we know what data-driven design is, including its concepts and patterns. Let's have a look at the pros and cons of this approach. Pros and Cons of Data-Driven Design It's important to know the strong and weak areas of the particular approach, as it allows us to choose the most appropriate approach for our architecture and product. Here, I gathered some pros and cons of data-driven architecture: PROS AND CONS OF DATA-DRIVEN DESIGN Pros Cons Personalized experiences: Data-driven architecture supports personalized user experiences by tailoring services and content based on individual preferences. Privacy concerns: Handling large amounts of data raises privacy and security concerns, requiring robust measures to protect sensitive information. Better customer understanding: Data-driven architecture provides deeper insights into customer needs and behaviors, allowing businesses to enhance customer engagement. Complex implementation: Implementing data-driven architecture can be complex and resource-intensive, demanding specialized skills and technologies. Informed decision-making: Data-driven architecture enables informed and data-backed decision-making, leading to more accurate and effective choices. Dependency on data availability: The effectiveness of data-driven decisions relies on the availability and accuracy of data, leading to potential challenges during data downtimes. Table 1 Data-Driven Approach in ML Model Development and AI A data-driven approach in ML model development involves placing a strong emphasis on the quality, quantity, and diversity of the data used to train, validate, and fine-tune ML models. A data-driven approach involves understanding the problem domain, identifying potential data sources, and gathering sufficient data to cover different scenarios. Data-driven decisions help determine the optimal hyperparameters for a model, leading to improved performance and generalization. Let's look at the example of the data-driven architecture based on AI/ML model development. The architecture represents the factory alerting system. The factory has cameras that shoot short video clips and photos and send them for analysis to our system. Our system has to react quickly if there is an incident. Below, we share an example of data-driven architecture using Azure Machine Learning, Data Lake, and Data Factory. This is only an example, and there are a multitude of tools out there that can leverage data-driven design patterns. The IoT Edge custom module captures real-time video streams, divides them into frames, and forwards results and metadata to Azure IoT Hub. The Azure Logic App watches IoT Hub for incident messages, sending SMS and email alerts, relaying video fragments, and inferencing results to Azure Data Factory. It orchestrates the process by fetching raw video files from Azure Logic App, splitting them into frames, converting inferencing results to labels, and uploading data to Azure Blob Storage (the ML data repository). Azure Machine Learning begins model training, validating data from the ML data store, and copying required datasets to premium blob storage. Using the dataset cached in premium storage, Azure Machine Learning trains, validates model performance, scores against the new model, and registers it in the Azure Machine Learning registry. Once the new ML inferencing module is ready, Azure Pipelines deploys the module container from Container Registry to the IoT Edge module within IoT Hub, updating the IoT Edge device with the updated ML inferencing module. Figure 1: Smart alerting system with data-driven architecture Conclusion In this article, we dove into data-driven design concepts and explored how they merge with AI and ML model development. Data-driven design uses insights to shape designs for better user experiences, employing iterative processes, data visualization, and measurable outcomes. We've seen real-world examples like Netflix using data to predict content preferences and Uber optimizing routes via user data. Data-driven architecture, encompassing patterns like data lakehouse and data mesh, orchestrates data-driven solutions. Lastly, our factory alerting system example showcases how AI, ML, and data orchestrate an efficient incident response. A data-driven approach empowers innovation, intelligent decisions, and seamless user experiences in the tech landscape. This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report
This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report Originally, the term "data pipeline" was focused primarily on the movement of data from one point to another, like a technical mechanism to ensure data flows from transactional databases to destinations such as data warehouses or to aggregate this data for analysis. Fast forward to the present day, data pipelines are no longer seen as IT operations but as a core component of a business's transformation model. New cloud-based data orchestrators are an example of evolution that allows integrating data pipelines seamlessly with business processes, and they have made it easier for businesses to set up, monitor, and scale their data operations. At the same time, data repositories are evolving to support both operational and analytical workloads on the same engine. Figure 1: Retail business replenishment process Consider a replenishment process for a retail company, a mission-critical business process, in Figure 1. The figure is a clear example of where the new data pipeline approach is having a transformational impact on the business. Companies are evolving from corporate management applications to new data pipelines that include artificial intelligence (AI) capabilities to create greater business impact. Figure 2 demonstrates an example of this. Figure 2: Retail replenishment data pipeline We no longer see a data process based on data movement, but rather we see a business process that includes machine learning (ML) models or integration with distribution systems. It will be exciting to see how data pipelines will evolve with the emergence of the new generative AI. Data Pipeline Patterns All data pipeline patterns are composed of the following stages, although each one of them has a workflow and use cases that make them different: Extract – To retrieve data from the source system without modifying it. Data can be extracted from several sources such as databases, files, APIs, streams, or more. Transform – To convert the extracted data into final structures that are designed for analysis or reporting. The data transformed is stored in an intermediate staging area. Load – To load the transformed data into the final target database. Currently and after the evolution of data pipelines, these activities are known as data ingestion and determine their pattern as we will see below. Here are some additional activities and components that are now part and parcel of modern data pipelines: Data cleaning is a crucial step in the data pipeline process that involves identifying and correcting inconsistencies and inaccuracies in datasets, such as removing duplicate records or handling missing values. Data validation ensures the data being collected or processed is accurate, reliable, and meets the specified criteria or business rules. This includes whether the data is of the correct type, falls within a specified range, or that all required data is present and not missing Data enrichment improves the quality, depth, and value of the dataset by adding relevant supplementary information that was not originally present with additional information from external sources. Machine learning can help enhance various stages of the pipeline from data collection to data cleaning, transformation, and analysis, thus making it more efficient and effective. Extract, Transform, Load Extract, transform, load (ETL) is a fundamental process pattern in data warehousing that involves moving data from the source systems to a centralized repository, usually a data warehouse. In ETL, all the load related to the transformation and storage of the raw data is executed in a layer previous to the target system. Figure 3: ETL data pipeline The workflow is as follows: Data is extracted from the source system. Data is transformed into the desired format in an intermediate staging area. Transformed data is loaded into the data warehouse. When to use this pattern: Target system performance – If the target database, usually a data warehouse, has limited resources and poor scalability, we want to minimize the impact on performance. Target system capacity – When the target systems have limited storage capacity or the GB price is very high, we are interested in transforming and storing the raw data in a cheaper layer. Pre-defined structure – When the structure of the target system is already defined. ETL Use Case This example is the classic ETL for an on-premise system where, because of the data warehouse computational and storage capacity, we are neither interested in storing the raw data nor in executing the transformation in the data warehouse itself. It is more economical and efficient when it involves an on-premises solution that is not highly scalable or at a very high cost. Figure 4: ETL sales insights data pipeline Extract, Load, Transform Modern cloud-based data warehouses and data lakes, such as Snowflake or BigQuery, are highly scalable and are optimized for in-house processing that allows for handling large-scale transformations more efficiently in terms of performance and cheaper in terms of cost. In extract, load, transform (ELT), the data retrieved in the source systems is loaded directly without transformation into a raw layer of the target system. Finally, the following transformations are performed. This pattern is probably the most widely used in modern data stack architectures. Figure 5: ELT data pipeline The workflow is as follows: Data is extracted from the source system. Data is loaded directly into the data warehouse. Transformation occurs within the data warehouse itself. When to use this pattern: Cloud-based modern warehouse – Modern data warehouses are optimized for in-house processing and can handle large-scale transformations efficiently. Data volume and velocity – When handling large amounts of data or near real-time processing. ELT Use Case New cloud-based data warehouse and data lake solutions are high-performant and highly scalable. In these cases, data repositories are better suited for work than external processes. The transformation process can take advantage of new features and run data transformation queries inside the data warehouse faster and at a lower cost. Figure 6: ELT sales insights data pipeline Reverse ETL Reverse ETL is a new data pattern that has grown significantly in recent years and has become fundamental for businesses. It is composed of the same stages as a traditional ETL, but functionally, it does just the opposite. It takes data from the data warehouse or data lake and loads it into the operational system. Nowadays, analytical solutions are generating information with a differential value for businesses and also in a very agile manner. Bringing this information back into operational systems allows it to be actionable across other parts of the business in a more efficient and probably higher impact way. Figure 7: Reverse ETL data pipeline The workflow is as follows: Data is extracted from the data warehouse. Data is transformed into the desired format in an intermediate staging area. Data is loaded directly into the operational system. When to use this pattern: Operational use of analytical data – to send back insights to operational systems to drive business processes Near real-time business decisions – to send back insights to systems that can trigger near real-time decisions Reverse ETL Use Case One of the most important things in e-commerce is to be able to predict what items your customers are interested in; this type of analysis requires different sources of information, both historical and real-time. The data warehouse contains historical and real-time data on customer behavior, transactions, website interactions, marketing campaigns, and customer support interactions. The reverse ETL process enables e-commerce to operationalize the insights gained from its data analysis and take targeted actions to enhance the shopping experience and increase sales. Figure 8: Reverse ETL customer insights data pipeline The Rise of Real-Time Data Processing As businesses become more data-driven, there's an increasing need to have actionable information as quickly as possible. This evolution has driven the transition from batch processing to real-time processing that allows processing the data immediately as it arrives. The advent of new technological tools and platforms that are capable of handling real-time data processing such as Apache Kafka, Apache Pulsar, or Apache Flink have made it possible to build real-time data pipelines. Real-time analytics became crucial for scenarios like fraud detection, IoT, edge computing, recommendation engines, and monitoring systems. Combined with AI, it allows businesses to make automatic, on-the-fly decisions. The Integration of AI and Advanced Analytics Advancements in AI, particularly in areas like generative AI and large language models (LLMs), will transform data pipeline landscape capabilities such as enrichment, data quality, data cleansing, anomaly detection, and transformation automation. Data pipelines will evolve exponentially in the coming years, becoming a fundamental part of the digital transformation, and companies that know how to take advantage of these capabilities will undoubtedly be in a much better position. Some of the activities where generative AI will be fundamental and will change the value and way of working with data pipelines include: Data cleaning and transformation Anomaly detection Enhanced data privacy Real-time processing AI and Advanced Analytics Use Case E-commerce platforms receive many support requests from customers every day, including a wide variety of questions and responses written in different conversational styles. Increasing the efficiency of the chatbot is so important to improve the customer experience, and many companies decide to implement a chatbot using a GPT model. In this case, we need to provide all information from questions, answers, and technical product documentation that is available in different formats. The new generative AI and LLM models not only allow us to provide innovative solutions for interacting with humans, but also to increase the capabilities of our data pipelines, such as data cleaning or transcriptions. Training the GPT model requires clean and preprocessed text data. This involves removing any personally identifiable information, correcting spelling, correcting grammar mistakes, or removing any irrelevant information. A GPT model can be trained to perform these tasks automatically. Figure 9: ETL data pipeline with AI for chatbot content ingestion Conclusion Data pipelines have evolved a lot in the last few years, initially with the advent of streaming platforms and later with the explosion of the cloud and new data solutions. This evolution means that every day they have a greater impact on business value, moving from a data movement solution to a key element in the business transformation. The explosive growth of generative AI solutions in the last year has opened up an exciting path, as they have a significant impact on all stages of the data pipelines; therefore, the near future is undoubtedly linked to AI. Such a disruptive evolution requires the adaptation of organizations, teams, and engineers to enable them to use the full potential of technology. The data engineer role must evolve to acquire more business and machine learning skills. This is a new digital transformation, and perhaps the most exciting and complex movement in recent decades. This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report
Apache Kafka has emerged as a clear leader in corporate architecture for moving from data at rest (DB transactions) to event streaming. There are many presentations that explain how Kafka works and how to scale this technology stack (either on-premise or cloud). Building a microservice using ChatGPT to consume messages and enrich, transform, and persist is the next phase of this project. In this example, we will be consuming input from an IoT device (RaspberryPi) which sends a JSON temperature reading every few seconds. Consume a Message As each Kafka event message is produced (and logged), a Kafka microservice consumer is ready to handle each message. I asked ChatGPT to generate some Python code, and it gave me the basics to poll and read from the named "topic." What I got was a pretty good start to consume a topic, key, and JSON payload. The ChatGPT created code to persist this to a database using SQLAlchemy. I then wanted to transform the JSON payload and use API Logic Server (ALS - an open source project on GitHub) rules to unwarp the JSON, validate, calculate, and produce a new set of message payloads based on the source temperature outside a given range. Shell ChatGPT: “design a Python Event Streaming Kafka Consumer interface” Note: ChatGPT selected Confluent Kafka libraries (and using their Docker Kafka container)- you can modify your code to use other Python Kafka libraries. SQLAlchemy Model Using API Logic Server (ALS: a Python open-source platform), we connect to a MySQL database. ALS will read the tables and create an SQLAlchemy ORM model, a react-admin user interface, safrs-JSON Open API (Swagger), and a running REST web service for each ORM endpoint. The new Temperature table will hold the timestamp, the IoT device ID, and the temperature reading. Here we use the ALS command line utility to create the ORM model: Shell ApiLogicServer create --project_name=iot --db_url=mysql+pymysql://root:password@127.0.0.1:3308/iot The API Logic Server generated class used to hold our Temperature values. Python class Temperature(SAFRSBase, Base): __tablename__ = 'Temperature' _s_collection_name = 'Temperature' # type: ignore __bind_key__ = 'None' Id = Column(Integer, primary_key=True) DeviceId = Column(Integer, nullable=False) TempReading = Column(Integer, nullable=False) CreateDT = Column(TIMESTAMP, server_default=text("CURRENT_TIMESTAMP"), nullable=False) KafkaMessageSent = Column(Booelan, default=text("False")) Changes So instead of saving the Kafka JSON consumer message again in a SQL database (and firing rules to do the work), we unwrap the JSON payload (util.row_to_entity) and insert it into the Temperature table instead of saving the JSON payload. We let the declarative rules handle each temperature reading. Python entity = models.Temperature() util.row_to_entity(message_data, entity) session.add(entity) When the consumer receives the message, it will add it to the session which will trigger the commit_event rule (below). Declarative Logic: Produce a Message Using API Logic Server (an automation framework built using SQLAlchemy, Flask, and LogicBank spreadsheet-like rules engine: formula, sum, count, copy, constraint, event, etc), we add a declarative commit_event rule on the ORM entity Temperature. As each message is persisted to the Temperature table, the commit_event rule is called. If the temperature reading exceeds the MAX_TEMP or less than MIN_TEMP, we will send a Kafka message on the topic “TempRangeAlert”. We also add a constraint to make sure we receive data within a normal range (32 -132). We will let another event consumer handle the alert message. Python from confluent_kafka import Producer conf = {'bootstrap.servers': 'localhostd:9092'} producer = Producer(conf) MAX_TEMP = arg.MAX_TEMP or 102 MIN_TEMP = arg.MIN_TTEMP or 78 def produce_message( row: models.KafkaMessage, old_row: models.KafkaMessage, logic_row: LogicRow): if logic_row.isInserted() and row.TempReading > MAX_TEMP: produce(topic="TempRangeAlert", key=row.Id, value=f"The temperature {row.TempReading}F exceeds {MAX_TEMP}F on Device {row.DeviceId}") row.KafkaMessageSent = True if logic_row.isInserted() and row.TempReading < MIN_TEMP: produce(topic="TempRangeAlert", key=row.Id, value=f"The temperature {row.TempReading}F less than {MIN_TEMP}F on Device {row.DeviceId}") row.KafkaMessageSent = True Rules.constraint(models.Temperature, as_expression= lambda row: row.TempReading < 32 or row.TempReading > 132, error_message= "Temperature {row.TempReading} is out of range" Rules.commit_event(models.Temperature, calling=produce_message) Only produce an alert message if the temperature reading is greater than MAX_TEMP or less than MIN_TEMP. Constraint will check the temperature range before calling the commit event (note that rules are always unordered and can be introduced as specifications change). TDD Behave Testing Using TDD (Test Driven Development), we can write a Behave test to insert records directly into the Temperature table and then check the return value KafkaMessageSent. Behave begins with a Feature/Scenario (.feature file). For each scenario, we write a corresponding Python class using Behave decorators. Feature Definition Plain Text Feature: TDD Temperature Example Scenario: Temperature Processing Given A Kafka Message Normal (Temperature) When Transactions normal temperature is submitted Then Check KafkaMessageSent Flag is False Scenario: Temperature Processing Given A Kafka Message Abnormal (Temperature) When Transactions abnormal temperature is submitted Then Check KafkaMessageSent Flag is True TDD Python Class Python from behave import * import safrs db = safrs.DB session = db.session def insertTemperature(temp:int) -> bool: entity = model.Temperature() entity.TempReading = temp entity.DeviceId = 'local_behave_test' session.add(entity) return entity.KafkaMessageSent @given('A Kafka Message Normal (Temperature)') def step_impl(context): context.temp = 76 assert True @when('Transactions normal temperature is submitted') def step_impl(context): context.response_text = insertTemperature(context.temp) @then('Check KafkaMessageSent Flag is False') def step_impl(context): assert context.response_text == False Summary Using ChatGPT to generate the Kafka message code for both the Consumer and Producer seems like a good starting point. Install Confluent Docker for Kafka. Using API Logic Server for the declarative logic rules allows us to add formulas, constraints, and events to the normal flow of transactions into our SQL database and produce (and transform) new Kafka messages is a great combination. ChatGPT and declarative logic is the next level of "paired programming."
Tim Spann
Principal Developer Advocate,
Cloudera