2024 DevOps Lifecycle: Share your expertise on CI/CD, deployment metrics, tech debt, and more for our Feb. Trend Report (+ enter a raffle!).
Kubernetes in the Enterprise: Join our Virtual Roundtable as we dive into Kubernetes over the past year, core usages, and emerging trends.
Data is at the core of software development. Think of it as information stored in anything from text documents and images to entire software programs, and these bits of information need to be processed, read, analyzed, stored, and transported throughout systems. In this Zone, you'll find resources covering the tools and strategies you need to handle data properly.
AWS Partition Projections: Enhancing Athena Query Performance
Send Time Optimization
Have you ever encountered a use case where you need to connect with multiple sources and aggregate their responses into a single response? If yes, then you already have some understanding of scatter-gather. Scatter-gather is an enterprise integration pattern, a design pattern to scale distributed systems or services running on the cloud, or a way to get a solution to the problem — “How to write code to fetch data from N sources?”. The scatter-gather pattern provides guidelines to spread the load in distributed systems or achieve parallelism and reduce the response time of applications handling requests from multiple users. In this article, we’ll investigate this pattern in-depth. I’ll solve one simple problem by applying scatter-gather and later discuss the applicability of this pattern for complex use cases. What Are the Features of the Scatter Gather Pattern? The scatter-gather pattern is applicable when we can divide a big task into smaller, independent sub-tasks that can be executed simultaneously by multiple cores or distributed computing nodes. Independent sub-tasks should be executed asynchronously in a predefined time window. The scatter-gather pattern is hierarchical in nature, and it is useful while designing systems for high scalability deployed in the cloud. Scatter-gather could be enhanced further with messaging patterns using Kafka, RabbitMQ, NATS, or other famous public cloud providers’ messaging services. We’ll understand each feature with a simple example or use case. We'll see how this pattern is useful to solve How To Fetch Data From N Sources Let us take an example to understand the problem where the pattern can be efficient. Our use case is to build an application to list hotel prices. People travel worldwide and want a way to compare hotel prices at a certain location. Here is a high-level design for the requirement. Figure 1: High-level design of hotel price comparison application So, our main requirement is to “Write code to retrieve hotel prices from 3 hotel price listing websites?” We’ll solve this problem for a case where a single service instance is running and serving our clients logged in from mobile Apps or laptops. I am using Java as a programming language to solve this problem. You can use your preferred programming language to solve the problem. Approach 1 Make three serial HTTP calls to Booking, Expedia, and Hotels websites and return prices to the user. These are the steps involved: getHotelPrice() method running on the main thread of our hotels' price comparison service. Whenever a user is calling getHotelPrice() method with the hotel name or ID by an API, our service makes HTTP call to Booking.com and then waits for a response, then 2nd call to Expedia.com and again waiting for a response, and so forth. At last, the service will return aggregated hotel prices from all websites. This is a simple but inefficient approach to solving the problem, as we are wasting CPU resources while waiting for responses from hotel listing websites. Figure 2: Sequential calls to hotel price listing websites Approach 2 Our service makes 3 HTTP calls to Booking.com, Expedia.com, and Hotels.com at the same time and then waits for responses. This is a better solution than approach one as we are utilizing CPU resources effectively, but still, there is an issue. Figure 3: Parallel calls to hotel price listing websites HTTP call is an I/O bound operation, and it is possible that one of the hotel listing websites is not available (due to official downtime or some other error) when our hotels' price comparison service is requesting for hotel price. It’s undesirable to ask the client to wait forever to get hotel prices. Let’s tweak our problem statement to “Write code to retrieve hotel prices from 3 hotel price listing websites, waiting for a maximum of 2 seconds?” By introducing a 2-second wait timeout, our service won’t wait forever for the response from unavailable websites. Our service may return a partial response (ignoring delayed or unavailable responses) after waiting 2 seconds. Please show me some code. There are different ways to solve this problem in Java, e.g., futures/callable, synchronization barrier phaser, and concurrent locks and conditions. I am writing code using CompletableFuture. CompletableFuture represents a future result of an asynchronous operation executed in a different thread. Let's break down our solution: Write a service to list the price of a given hotel from three hotel prices lister platforms: Booking, Expedia, and Hotels. Service should be able to return a partial response when one of the hotel prices lister platforms is unavailable or slow. Step 1: Create an enumeration of HotelsListerPlatform Java public enum HotelsListerPlatform { BOOKING, EXPEDIA, HOTELS } Step 2: Create an interface with the name PriceFetcher. We’ll create three separate classes for Booking, Expedia, and Hotels to implement this interface. Java public interface PriceFetcher { CompletableFuture<Double> fetchHotelPrice(String hotelId); } Step 3: Create a factory to resolve the hotel lister platform name in our HotelsPriceComparisonService class Java public class HotelsListerFactory { private static final Map<HotelsListerPlatform, PriceFetcher> priceFetcherFactory = Map.of( HotelsListerPlatform.BOOKING, new BookingPriceFetcher(), HotelsListerPlatform.EXPEDIA, new ExpediaPriceFetcher(), HotelsListerPlatform.HOTELS, new HotelsPriceFetcher() ); public static PriceFetcher getPriceFetcher(HotelsListerPlatform hotelsListerPlatform) { return priceFetcherFactory.get(hotelsListerPlatform); } } Step 4: Classes implementing the interface (We are introducing an artificial delay in HotelsPriceFetcher class to simulate the slowness of hotels com API) Java public class BookingPriceFetcher implements PriceFetcher { @Override public CompletableFuture<Double> fetchHotelPrice(String hotelId) { return CompletableFuture.supplyAsync(() -> 100.15, CompletableFuture.delayedExecutor(100, TimeUnit.MILLISECONDS)); } } public class ExpediaPriceFetcher implements PriceFetcher { @Override public CompletableFuture<Double> fetchHotelPrice(String hotelId) { return CompletableFuture.supplyAsync(() -> 110.15, CompletableFuture.delayedExecutor(100, TimeUnit.MILLISECONDS)); } } public class HotelsPriceFetcher implements PriceFetcher { @Override public CompletableFuture<Double> fetchHotelPrice(String hotelId) { return CompletableFuture.supplyAsync(() -> { try { // Artificial delay to simulate slow response Thread.sleep(3000); } catch (InterruptedException ex) { ex.printStackTrace(); } return 105.25; }, CompletableFuture.delayedExecutor(100, TimeUnit.MILLISECONDS)); } } Step 5: Create a class to write the business logic of scatter-gather using CompletableFuture and timeout Java public class HotelsPriceComparisonService { public List<Double> fetchHotelPrices(String hotelId, long timeoutInSeconds) { // List of CompletableFuture ready to be called when client need the data List<CompletableFuture<Double>> hotelPricesFutures = Stream.of(HotelsListerPlatform.values()) .map(hotelsListerPlatform -> { PriceFetcher hotelsPriceFetcher = HotelsListerFactory.getPriceFetcher(hotelsListerPlatform); return hotelsPriceFetcher.fetchHotelPrice(hotelId); }) .toList(); try { // Returns a new CompletableFuture that is completed when given list of CompletableFutures complete CompletableFuture.allOf(hotelPricesFutures.toArray(new CompletableFuture[hotelPricesFutures.size()])) .get(timeoutInSeconds, TimeUnit.SECONDS); } catch (Exception ex) { // You may log exception here } // Returns complete or partial (when one of the hotels listing websites is not available) list of prices return hotelPricesFutures .stream() // Get the value from successfully completed future .filter(future -> future.isDone() && !future.isCompletedExceptionally()) .map(CompletableFuture::join) .toList(); } } Step 6: Write some unit tests to verify our business logic. Java @Test public void shouldReturnResponsesFromAllHotelsListerPlatformWhenTimeoutIsMoreThanSlowestPlatform() { HotelsPriceComparisonService hotelsPriceComparisonService = new HotelsPriceComparisonService(); List<Double> hotelPrices = hotelsPriceComparisonService.fetchHotelPrices("Hyatt", 4); Assertions.assertEquals(hotelPrices, List.of(100.15, 110.15, 105.25)); } @Test public void shouldReturnPartialResponseWhenTimeoutIsMoreThanSlowestPlatform() { HotelsPriceComparisonService hotelsPriceComparisonService = new HotelsPriceComparisonService(); List<Double> hotelPrices = hotelsPriceComparisonService.fetchHotelPrices("Hyatt", 2); Assertions.assertEquals(hotelPrices, List.of(100.15, 110.15)); } Congratulations, you have successfully implemented a simpler version of Scatter-Gather. In real scenarios, problems are not as simple as calling three websites to get hotel prices. Real-world applications are a mix of CPU-bound and I/O-bound tasks where a single process or machine could be a bottleneck due to limited memory, network, or disk bandwidth. Instead of parallelizing an application across multiple cores on a single machine, we can use the scatter-gather pattern to parallelize requests across multiple processes on many different machines. This will ensure that the bottleneck in our process continues to be the CPU since the memory, network, and disk bandwidth are all spread across different machines. Hierarchical Scatter-Gather In the last section, we have implemented the simplest scatter-gather. Let’s assume that our hotel price comparison website is getting popular, and it is successfully handling loads from a small number of users in certain locations. Now, we are planning to expand our business to many users, and with a lot many hotels price lister platforms. We want to expand our business to the USA to help tourists plan their hotel stays while they are on vacations around different states in the USA. The simplest version of the scatter-gather won’t be able to sustain this load, but the hierarchical nature of the scatter-gather will help us to solve this problem. Before delving into this complex scalability problem of our hotel price comparison website, first understand hierarchical scatter-gather with a simple example. Imagine an organization running a set of surveys for all employees. These surveys will provide opportunities for all employees to share 360-degree feedback about the organization's culture and practices. This is a perfect example to demonstrate. The main challenge here is to distribute surveys to all employees and gather back results efficiently. The binary tree data structure can be used to solve this problem. A binary tree node represents an employee. Survey tasks are divided among employees in a full binary tree. Each employee processes their assigned survey tasks. Results are gathered from leaf nodes up to the root of the tree. Java // A recursive method to create binary tree private static WorkerTreeNode createBinaryTree(int depth) { if (depth <= 0) return null; var node = new WorkerTreeNode(new Worker(UUID.randomUUID().toString())); node.setLeft(createBinaryTree(depth - 1)); node.setRight(createBinaryTree(depth - 1)); return node; } // Scatter method to assign list of tasks to employee node in binary tree private static void scatter(List<SurveyTask> tasks, WorkerTreeNode node) { if (node == null) return; tasks.stream().forEach(task -> node.worker.tasks.add(task)); scatter(tasks, node.left); scatter(tasks, node.right); } // Gather method to aggregate results from employee node in binary tree private static List<CompletableFuture<String>> gather(WorkerTreeNode node) { if (node == null) return new ArrayList<>(); var leftResultsTask = gather(node.left); var rightResultsTask = gather(node.right); var results = new ArrayList<CompletableFuture<String>>(); node.worker.tasks.forEach(task -> results.add(node.worker.executeTask(task))); results.addAll(leftResultsTask); results.addAll(rightResultsTask); return results; } public static void main(String[] args) throws ExecutionException, InterruptedException { var surveyTasks = IntStream.rangeClosed(1, 3).mapToObj(i -> new SurveyTask(i, String.format("Data for survey task %d", i))).toList(); //Create a binary tree var root = createBinaryTree(2); //Scatter surveyTasks to workers in the tree scatter(surveyTasks, root); //Gather results from all workers in the tree var gatheredResults = gather(root); var results = CompletableFuture.allOf(gatheredResults.toArray(new CompletableFuture<?>[0])) .thenApply(v -> gatheredResults.stream() .map(CompletableFuture::join) .collect(Collectors.toList()) ).get(); System.out.println(String.join("\n", results)); } The screenshot below shows the output after running scatter and gather phases. As you can see, there are three employees, and each employee is doing three survey tasks, so there are a total of 9 entries in the output. Hierarchical Scatter-Gather To Scale Workloads in Cloud Infrastructure The distribution of tasks using a hierarchical or tree structure is an application of a scatter-gather pattern to scale workloads in a distributed system environment or cloud. In this tree structure, when the root node receives an incoming request from any client, it divides it into multiple independent tasks and assigns them to available leaf nodes, the “scatter” phase. Leaf nodes are multiple machines or containers available on a distributed network. Each leaf node works individually and in parallel on its computation, generating a fraction of the response. Once they have completed their task, each leaf sends its partial response back to the root, which collects them all as part of the “gather” phase. The root node then combines the partial responses and returns the result to the client. The diagram below illustrates it. Figure 4: Hierarchical scatter-gather using Kubernetes pods The strategy enables us to exploit cloud infrastructure to allocate as many virtual machines, compute nodes, or Kubernetes pods as needed and scale horizontally. If a particular leaf node is slow, the root node can also reshuffle the load for a desired response time. Now, we are in a position to solve the scalability problem of our hotel price comparison website by applying hierarchical scatter-gather. Our goal is to launch our website in the USA. To improve the performance of our application, we can divide the USA into New England, Great Lakes, Midwest, Northwest, Southwest, South, and Southeast. When a request arrives, the root node divides it and assigns it to the seven leaf nodes to process in parallel. Each leaf node is responsible for a specific region in the country and returns a list of all hotel prices to that region. Finally, the root node merges all the results to create a final list to show to the customer. There are two major problems with this approach: Slow or non-responsive leaf nodes: Even though our application is using a cloud infrastructure, there is always a likelihood that machines may become unavailable due to network or infrastructure issues (see fallacies of distributed systems). Moreover, our leaf nodes are communicating to external hotel price lister websites that may not respond swiftly. But there is nothing to worry about, as we have solved this problem while implementing the simple version of our problem by comparing prices from three hotels' lister websites. You guessed it right: timeouts, the root node must set an upper limit on the desired response time. If a leaf node does not respond within that time, the root node can ignore it and return a partially collected response from available leaf nodes to the user. Tight coupling between root node and leaf nodes: Our implementation is suffering from runtime tight coupling between root node and leaf nodes where the availability of our hotel price comparison website is impacted by the availability of leaf nodes and third-party hotel price lister websites. A splendid way to eliminate tight runtime coupling is to use message brokers. The message broker acts as an intermediary, holding the message until dependent services are ready to process it. This asynchronous approach allows services to operate independently, breaking the tight coupling that exists in the initial solution, i.e., synchronous https-based communication. Loosely Coupled Hierarchical Scatter-Gather There are two types of messaging concepts that can be implemented between root nodes and leaf nodes to remove runtime-tight coupling. Topic: The topic works on the publish-subscribe model. A single message can be received by many subscribers, i.e., the root node is asking for hotel prices from all leaf nodes distributed across seven regions in the USA. The topic may also store the message so that it can be read later again. Figure 5: Publish Subscribe communication using Topic Queue: Queue works on a point-to-point communication model. Queue sends the message to one consumer only, where messages will be queued until the next available node consumes a message from the queue one at a time. Message in queue is deleted as soon as it is consumed by one consumer node. The queue can work with acknowledgment or without acknowledgment (fire and forget) about the successful consumption of the message. Figure 6: Point to Point communication using Queue Now, we can use these messaging concepts to scale our hotel price comparison application. A combination of topic and queue could help to achieve runtime loose coupling between root and leaf nodes. Whenever there is an incoming request for hotel prices at a certain location, the root node broadcasts messages to the leaf nodes. All working leaf nodes can then subscribe to the incoming message, process it, call external hotel price lister platforms, and publish their hotel price results to a response queue. The root node can consume the results from that queue, aggregate them, and respond to the user. The figure below shows the implementation. Figure 7: Loosely coupled scatter-gather with topic and queue What Is the Right Number for Leaf Nodes? We have come up with a possible solution to scale hotel price comparison application, but there is no silver bullet to solve all problems while designing software systems. Scatter-gather provides a way to parallelize load in several leaf nodes, but increased parallelization comes with a cost, and thus, choosing the right number of leaf nodes is crucial while implementing distributed systems. Nonlinear performance gain: There is an overhead involved while serving HTTP requests from clients, i.e., parsing HTTP requests, processing, and routing messages to topics and queues. This overhead cost is negligible with a small number of leaf nodes, but as parallelization continues and you add more leaf nodes, this overhead eventually dominates the compute cost. So, adding more leaf nodes is not proportional to gaining more performance. Sluggish leaf nodes: the root node waits for the results from all leaf nodes, which means that our application cannot be faster than the slowest leaf node. It is generally known as the “straggler node” problem, and it can substantially delay the overall response time due to a bunch of slow nodes or even a single node. But still, our question is unanswered: what is the right number of leaf nodes? Well, it depends on the commitment towards the end user, i.e., Service Level Agreement (SLA). When our hotel price comparison application makes a promise to be available 96 % (57m 36s allowed downtime per day), then it needs a precise numerical target of availability from dependent systems, and that numerical target is Service Level Objective (SLO). There is a great post from Google about explaining these terms. Example: How many nodes are needed to achieve 96% availability SLA? Let’s assume that we are running our application on a Kubernetes cluster where the cluster’s nodes are hosted in a public cloud provider machine, and 99% availability SLA is guaranteed for these machines. Composite SLA calculation is all about simple Mathematics, e.g., addition, subtraction, multiplication, and probability. In simple terms, to calculate composite availability SLA: - Multiply availability for serial dependencies. Multiply unavailability for parallel dependencies. Composite SLA of our hotel price comparison application = 0.99 * 0.99 * 0.99 * 0.99 * 0.99 = ~0.96, i.e., five leaf nodes are enough to get the desired SLA. Let’s see what happens when we add another leaf node = 0.99 * 0.99 * 0.99 * 0.99 * 0.99 * 0.99 = ~0.95, i.e., adding an extra leaf node is breaching our SLA, and it gets worse when we add more leaf nodes. This calculation proves that adding more machines doesn’t help in improving our promised SLA. Conclusion In this article, we learned about the basics of scatter-gather patterns with a focus on solving one simple use case for a single machine case and later scaled it for a distributed environment. The pattern has several advantages, such as improved performance with parallel processing of tasks, increased fault tolerance by reallocating failed tasks to available nodes, better horizontal scalability, and good resource utilization. There are some disadvantages as well, e.g., complex topology between processing nodes may lead to low latency, incorrect number of leaf nodes may result in sub-optimal performance. Overall, the scatter-gather pattern is a powerful technique for developing high-performing and scalable distributed applications. References Enterprise Integration Patterns SRE Fundamentals Recommendations for defining reliability targets Designing Distributed Systems
My favorite technical experience from grad school was all the cool ways we were trying to squeeze every last bit of performance out of the IBM JVM (now called Eclipse OMR). The majority of such optimizations required an intricate understanding of how CPUs and memories look under the hood. But why is there such an impressive performance gain in padding objects with blank space to the closest multiple of 64 bytes and ensuring they always start at addresses that are exactly divisible with 64? We’ll need a bit of background before being able to answer this. As anyone with a passion for computing and software engineering knows, the quest for heightened performance is never-ending. Up until the dawn of the new millennium, we were enjoying rapid CPU clock-rate increases. The expectation was that applications would continue to double in speed with each generation. Moore’s law, proposed in 1965, predicted that transistor counts would continue to double every two years. However, the quirks of quantum mechanics derailed our straightforward approach of merely miniaturizing transistors. Once a diode narrows down to a mere few nanometers (with a handful of atoms acting as the barrier), electrons’ quantum leaps start to significantly bypass voltage barriers. This quantum behavior sets definite limits on our ability to extract performance from CPUs by simply miniaturizing or overclocking them. So CPU architects started looking for alternative ways to sustain the momentum promised by Moore’s Law. Instead of shrinking the atomic gaps between input-output pathways, they introduced (massive) parallelism. Not only could multiple threads execute simultaneously on a CPU, but with hyperthreading, even a single core could manage multiple threads concurrently. A simplified CPU architecture There are always tradeoffs though, and this created a new bottleneck: the memory wall. While CPUs evolved at breakneck speeds, RAM lagged behind. Don’t quote me on this, but from the CPU’s perspective, RAM is roughly as fast as a dot-matrix printer. This causes CPUs to idle while waiting for new data to consume. To address this, we adopted a smart workaround: minimize reliance on the snail-paced RAM by keeping essential data within arm’s reach, courtesy of small, nimble memory units called CPU caches. Paired with cache-aware software engineering design patterns, contemporary cutting-edge CPU cache algorithms can minimize RAM fetches while also enabling multiple CPUs to seamlessly share and edit data. This sidesteps any synchronization hurdles. In this article, we’ll delve into the nuances of cache locality and the art of crafting software to harness the power of CPU caches, emphasizing their crucial role in application performance. We will uncover the brilliance behind contemporary CPU cache algorithms, prefetching mechanisms, and other strategies to sidestep the imposing memory wall before wrapping up everything up with an actual cache-aware experiment we conducted on two CPUs. And by the end, we will be able to answer why 64-bit object alignment makes good sense! The Nerdy Stuff My old students (and maybe myself too at some point) used to confuse Computer Organization and Computer Architecture, so let’s start by seeing how these two Computer Engineering concepts are different: Computer Organization Computer organization delves into how the CPU’s hardware components are interconnected and how they function. This deals with the operational units, their interconnections, and their functionality and includes features that remain transparent to the programmer—things like control signals, interfaces between the computer and peripherals, and memory technology used. It is exactly because these features are transparent (i.e., not directly controlled by the engineer) that we should pay extra attention to them to maximize performance! Computer Architecture In contrast, computer architecture focuses on the design and specification of the computing system from the perspective of a programmer. It looks at concepts like instruction set architectures (ISAs), machine code, and fundamental logical structures. Architecture defines what the system can do, while organization details how it achieves things. In Object-Oriented Programming terms, Computer Architecture is the Interface, while Computer Organization is the Concrete Class. And because of this abstraction into machine code, computer architecture is normally the realm of compilers—unless a developer writes in x86, amd64, or arm32 directly, of course! Zooming Into CPU Caches The analogy we often draw is that the CPU cache is like your fridge at home, while the RAM is the distant supermarket. Think of the cache as a handy spot where the CPU stores a limited quantity of ‘food’ (or data) that it frequently needs—though this ‘food’ might sometimes go stale or expire. While the CPU can always make a trip all the way to the ‘supermarket’ (RAM) for a more extensive selection, this journey consumes time and energy—like the hassle of driving, fuel, and the dreaded traffic (or databus) jams. IBM Power 9 Diagram The following CPU cache concepts could probably fill an entire undergrad course, so feel free to read up on these for further detail: Cache Line At the heart of a cache lies the cache line (or cache block). When data is fetched, it isn’t fetched in single bytes or words but in blocks called cache lines–pay attention here; this is crucial! So you can think of a CPU cache as a collection of cache blocks, each able to store one cache line’s worth of data. Let’s work out an example. Many contemporary CPU organizations define the cache line size at 64 bytes. When you execute a command that makes a cache reference, say load 6432, which means read the word located at address 6,432 in the process’ virtual address space, the address will be first divided by the cache line. So: 6432 div 64 = 100 and 6432 mod 64 = 32 Which means that the word we want to read starts in the middle of cache line number 100. Assuming we are working on a 64-bit system, the word will be 8 bytes long. That means (and again, pay attention here) the word we are reading is fully contained in the 100th cache line, and we need only that one cache line fetched (or already in the cache) to perform that load operation. Hierarchies Modern systems employ a hierarchical approach to caching: L1, L2, L3, etc., with L1 being the smallest and fastest and L3 (or beyond) being larger and slower. This hierarchy trades off access speed against size to ensure that frequently accessed data stays close to the CPU. The CPU first looks up in the smallest and fastest L1 (which is actually two, one for data and one for instructions). If the data is not found, it moves on to L2, and so on, until it’s forced to fetch data from the RAM. The Last-Level Cache (LLC) highlights the boundaries of the CPU cache before using the CPU databus to go to the RAM. Note that in CPU Organization lingo, “upwards” typically refers to moving closer to the CPU (i.e., from the LLC to L1) and “downwards” further away, from L1 to LLC. So a memory request starts from the CPU and goes downwards to the first cache that matches, while data travels upwards from the cache level it was found in (or the main memory) through all cache hierarchies until it reaches the CPU. Associativity A cache operates similarly to a hash map, with buckets being similar to cache blocks. Associativity determines how many locations in the cache (i.e., cache blocks) a cache line can end up. A cache can be direct-mapped (each memory location maps to exactly one cache spot), fully associative (data can be anywhere in the cache), or set-associative (a middle-ground approach). The tradeoff is that the direct mapping needs very little hardware design and is super fast, but conflicting cache lines can immediately evict each other. On the other end, full associativity maximizes cache utilization but requires more logic to be printed on the chip. Prefetching The CPU cache, apart from fetching a new cache line from the memory, can go a step further and try to predict the lines your application will soon need. Prefetching, predicting future data needs, and preemptively loading this data even before it’s explicitly requested helps in reducing cache misses. One common pattern is when a specific cache line is brought into the cache, its neighbors (spatially close in main memory) also come along. In our previous example, fetching cache line 100 might also prefetch lines 99 and 101, even if they are not needed (yet). Eviction Policies When a cache (or all associative cache block options) is full and new data needs to be loaded, we need to determine which existing cache line to eject. Popular policies include Least Recently Used (LRU), First-In-First-Out (FIFO), and Random. The tradeoffs of cache utilization vs. circuit complexity still stand, so we avoid extremely complicated eviction policies. Contention for cache blocks doesn’t only come from references of your application, though; your application (OS process in particular) is not the only one competing for CPU cache resources. Since all OS processes run concurrently, there’s heavy potential for reduced performance due to CPU cache interference. Cache Hits, Misses, and References We move now to events that are direct and measurable via hardware performance counters. When data is found in the cache, it’s a hit. If not, it’s a miss, leading to fetching from the main memory, which incurs a performance penalty. On the other hand, a cache reference is when a request is made to the cache. Even though every memory request will always result in an L1 cache reference if a higher hierarchy cache (like L1) is unable to serve the CPU, the cache reference will propagate further downward. However, if any of these caches can satisfy the request, it won’t need to continue further. Hence, remember that the number of LLC references is generally far fewer than the number of CPU references. Going back to hardware counters, CPUs might count/expose references, misses, hits, and other cache metrics per different cache level hierarchies, so we might be able to retrieve, say, L1-instruction misses and LLC references. Cold Cache vs. Warm Cache A cold cache is one that hasn’t been populated with data relevant to the current operations, leading to high miss rates initially. This can also happen when the relevant data gets evicted–either because the same process needed new, different data or another application running concurrently on the CPU interfered. As operations proceed and the cache gets populated, it “warms up,” increasing the hit rate. Keeping the cache warm is a core strategy to improve performance. Coherence Multicore systems add further complexity, leading to data races. Different cores could be making the same reference, which is okay if they just want to read it. But what happens if they also want to write it? In that case, local caches need to be recursively synchronized, potentially all the way up the main memory. Coherence ensures that all processors in a multi-processor system have a consistent view of memory. Looking at a common cache and core organization, we notice that a core might have its own L1 cache, but a nearby group of cores might share L2, and an even broader group of cores might share L3, and so on. This approach can both help with increased cache utilization but also minimize the up-down cache traversal needed to maintain coherence, as long as execution threads that operate on the same data are also placed on nearby cores that use the same cache hierarchy. True and False Sharing What do we mean by core sharing the same data? Remember that from a cache perspective, a reference is not a CPU word (8 bytes in a 64-bit system) but is equal to a cache line size (64 bytes commonly). So, coherence needs to be maintained across the cache line, not the actual reference the programs read and write. This results in cache line synchronization being triggered for two reasons when two cores reference the same cache line: Either both cores look at the same word in the cache line (True Sharing) or at a different word in the same cache line (False Sharing). Both types exert equal pressure on the cache, but it does feel a bit painful to incur a slowdown for data you don’t even share! To avoid false sharing, you need to design your applications to consider thread locality. NUMA and Cache Design Extra complexity arises when, in the hierarchy, we add multiple main memories or RAMs, a feature used by datacenter servers. Non-Uniform Memory Access (NUMA) architectures do this by using multiple sockets for massively multicore chips, and each of these sockets is also connected to its own RAM. Now, let’s say a thread operates on Socket 1, but it needs a memory reference from Socket 2. The second socket would need to interrupt its operations to fetch data for the thread in Socket 1. Imagine having to pay frequent cross-socket synchronization penalties because of False Sharing events because of a lack of NUMA-awareness in your application. Ouch. Cache and Memory Locality Assuming that software engineers develop their applications using well-structured design patterns, programs tend to reuse data and instructions they’ve recently accessed (temporal locality) or access data elements that are close together in memory (spatial locality). Let’s see how effective caching strategies exploit these tendencies. Your program is likely to maintain a contiguous memory space for its code. A CPU goes through a compulsive fetch-decode-execute cycle, where for each clock cycle, it fetches the next instruction from the main memory, decides what to do with it, and then runs it. This is where the instruction L1 cache comes into play. The CPU is guaranteed to always need the next instruction from the main memory, so let’s add a specialized cache just for memory that handles instructions. Instructions are normally fetched sequentially–assuming your application doesn’t perform frequent Goto statements to distant addresses–so a simple prefetch can minimize instruction misses. Next, your program maintains a call stack that stores information about the current function executing and those that called it, as well as local variables and parameters. This info is also normally stored together and in a sequential order. Once again, a simple prefetching approach will do great! Things become a bit more complicated, though, when you start using the heap. This happens when you allocate an object. The OS and even the language runtimes above it (if you’re in a high-level language, like Java, C#, or Node.js) will be doing their best to allocate your objects in contiguous areas. Language runtimes, in particular, will do their best to keep the cache warm when they run background tasks (e.g., garbage collection) to minimize impact on your application. When multithreading comes into play, an engineer is likely to maintain high levels of isolation among the operational specifics of a thread and then periodically synchronize with other threads on common data. The hierarchy of caches combined with a smart thread allocation (usually by the OS) on the appropriate cores can maximize using L1 caches for each thread while minimizing roundtrips to lower-level caches when synchronization is needed. Plus, when we take into account multiprocessing, we now have multiple OS processes (applications) to manage. Some of those might be interrelated and part of a broader system, while others might be completely unrelated and even belong to different users or tenants. Smart thread allocation mechanisms and even thread pinning (restricting a process’ threads to only run on a predefined subset of available cores) can maximize cache utilization while minimizing sharing contention. For example, a tenant on a data center server could be assigned solely within a NUMA node. Finally, let’s tackle the question posed in the introduction: “Why does padding objects with blank space to the closest multiple of 64 bytes and ensuring they always start at addresses that are exactly divisible by 64 improve performance”? Now we know that this optimization method can improve locality for small space tradeoffs. The contents of objects are likely to be accessed at the same time. By fitting them all together in a single-cache line, if possible, we are achieving good temporal locality. For longer objects, we achieve good spatial locality by storing the object in multiple sequential cache lines. And when multi-threading brings in the risk of cache invalidation due to maintaining coherence–potentially across NUMA nodes–by padding objects in separate cache lines, we avoid the worst-case scenario of false sharing. Experiment (Or Let’s Thrash the Cache) Let’s conduct an experiment to check some of these ideas. Being too lazy to do it myself anymore, I got ChatGPT (with a few back-and-forths and me doing some final editing) to write a Java application that creates and repeatedly accesses a buffer array. For each iteration, it increments the buffer’s contents with a random number. Crucially, we scan the contents with a variable stride length that loops back to the start of the buffer if it overflows. That way, we ensure exactly the same number of operations, regardless of the stride length. import java.util.concurrent.ThreadLocalRandom; public class Main { /** * Usage: bufferSizeKB strideLength repetitions * Recommended examples: * 1) Uses cache locality and prefetching to be fast: 512 1 16 * 2) Breaks cache locality and prefetching to be slow: 512 129 16 * * No2 is 2 x LLC-cache-line-size + 1 to also break prefetching, assuming line size = 64 bytes * * @param args */ public static void main(String[] args) { try { final int bufferSizeKB = Integer.parseInt(args[0]); final int strideLength = Integer.parseInt(args[1]); final int repetitions = Integer.parseInt(args[2]); final byte[] buffer = new byte[bufferSizeKB * (1<<20)]; performBufferScans(buffer, strideLength, repetitions); } catch (Throwable thr) { System.err.println("Usage: bufferSizeKB strideLength repetitions"); } } private static void performBufferScans(byte[] buffer, int strideLength, int repetitions) { final long t0 = System.currentTimeMillis(); for (int i = 0; i < repetitions; i++) { System.out.println("Running " + ( i + 1 ) + " out of " + repetitions + " repetitions"); performBufferScan(buffer, strideLength); } final long t1 = System.currentTimeMillis(); System.out.println("Execution time = " + (t1 - t0) + "ms"); } private static void performBufferScan(byte[] buffer, int strideLength) { int position = 0; for (int i = 0; i < buffer.length; i++) { buffer[position] += ThreadLocalRandom.current().nextInt(); position = (position + strideLength) % buffer.length; } } } At a glance, we wouldn’t expect any performance difference. However, due to cache locality effects, we actually do get significant slowdowns! To validate these hypotheses, we can use the Linux perf tool to collect CPU hardware metrics: perf stat -e LLC-loads,LLC-load-misses,l2_lines_out.useless_hwpf,instructions,cycles java Main <buffer-size-kb> <stride-length> <repetitions> LLC-loads: The number of memory references made at the Last Level Cache. LLC-load-misses: The number of LLC-loads not found in the cache and had to be fetched by the RAM. L2_lines_out.useless_hwpf: The number of prefetched cache lines that were not used. instructions: The number of CPU instructions executed. cycles: The number of CPU cycles the application took. Results All stride lengths execute the same number of CPU instructions: However, as the stride length increases, so does the application’s execution time: Looking at the effectiveness of the CPU, we can see that as the stride length increases, more and more cycles are required on average to execute an instruction. This is caused by decreased utilization of the CPU cache. We can see that the proportion of cache misses increases to virtually 100% by stride length of about 100 and remains more or less stable from that point on (even though performance still deteriorates). This is happening because the cache line size on the tested hardware is 64 bytes; hence, the moment we cross that length, further nearby accesses will not benefit from being already loaded in the cache. The biggest telltale of the cache line size is the effect of prefetching utilization. We can see a sharp increase in the number of prefetches that are never used (almost at 100%) the moment we cross to about twice the length of the cache line (which is 64 bytes long for this system). At that stage, bringing in the next cache line is not useful because the application just jumps over it! Finally, we see the full picture of why there’s a further delay in higher strides by studying the raw numbers of LLC loads and misses. We can see that for low stride numbers, the number of LLC loads (and also misses) is very low. This is an indication that the inner cache layers (L1 and L2) are not getting overwhelmed and can contain the relevant references without bubbling them up to the LLC. However, as the stride length increases, the smaller L1 and L2 are quickly thrashed, and they start relying on LLC, which, for the most part, doesn’t contain the references requested (it is possible, due to the 16 repetitions, that a reference might be present from the previous iteration and not evicted yet) and has to fetch them from the RAM, resulting in significant slowdowns. A final observation concerns the dip in cache metrics for the 4097 case, even though the execution time is still higher. It’s likely that because the stride length is so large, the scanning of the buffer loops back quickly, so more items are likely to be found in the last level cache–in other words, the cache stays warm. However, the inner caches (L1 and L2) are completely circumvented, which results in a bigger slowdown. Remember when we discussed how CPU references are not equal to cache references? Well, as LLC cache references start to approach the actual number of CPU references, the cache gets more and more practically removed from the system. Eventually, it’s almost as if it doesn’t exist, and the CPU fetches data directly from the RAM. Validation Mini Experiment To further test our observations, we can do a simple comparison between two CPUs, a newer and an older one. You might be thinking, what is there to test? A new CPU will just do better, right? Well, yes—but remember earlier on when we said newer CPUs are no longer clocked faster? In order for the new CPU to actually be faster, it uses other tricks, like smarter cache designs/algorithms and denser parallelism. In this test, we executed the Java buffer application from above on an Intel iCore i5 and an Intel Core i7. I used the same stride lengths as before. Comparing performance between the older and newer CPUs, slowdowns are way more dramatic for the newer CPU. So, our cache-thrashing attack has somehow confounded the Intel Core i7 more than its predecessor. Let’s look at the actual execution times now. Core i7 starts very strong with low stride lengths, which represent expected good software development patterns. However, Core i7’s advantage disappears as the cache is thrashed, and they even reach rough equality at 1025 stride length! So, because this is a single-threaded application (and doesn’t even allocate objects that would trigger garbage collection), much of the performance difference must be due to advanced cache designs! From our simple experiment, then, we can infer that Intel achieved a speedup of roughly 2x between Core i5 and Core i7 just by developing more advanced cache algorithms—as long as you write your application in a cache-aware way, of course. Final Thoughts I hope you enjoyed this window into computer and software performance engineering! The goal is that a customer should measure the performance of their software and not the overhead of their monitoring tools.
In the ever-evolving landscape of database technology, staying ahead of the curve is not just an option, it’s a necessity. As modern applications continue to grow in complexity and global reach, the role of the underlying database becomes increasingly critical. It’s the backbone that supports the seamless functioning of applications and the storage and retrieval of vast amounts of data. In this era of global-scale applications, having a high-performance, flexible, and efficient database is paramount. As the demands of modern applications surge, the need for a database that can keep pace has never been greater. The “ultra-database” has become a key player in ensuring that applications run seamlessly and efficiently globally. These databases need to offer a unique combination of speed, versatility, and adaptability to meet the diverse requirements of various applications, from e-commerce platforms to IoT systems. They need to be more than just data repositories. They must serve as intelligent hubs that can quickly process, store, and serve data while facilitating real-time analytics, security, and scalability. The ideal ultra-database is not just a storage facility; it’s an engine that drives the dynamic, data-driven applications that define the modern digital landscape. The latest release of HarperDB 4.2 introduces a unified development architecture for enterprise applications, providing an approach to building global-scale applications. HarperDB 4.2 HarperDB 4.2 is a comprehensive solution that seamlessly combines an ultra-fast database, user-programmable applications, and data streaming into a cohesive technology. The result is a development environment that simplifies the complex, accelerates the slow, and reduces costs. HarperDB 4.2 offers a unified platform that empowers developers to create applications that can span the globe, handling data easily and quickly. In this tutorial, we will explore the features of HarperDB 4.2 and show you how to harness its power in conjunction with Java Quarkus. We will take you through the steps to leverage HarperDB’s new capabilities to build robust and high-performance applications with Quarkus, demonstrating the impressive potential of this unified development architecture. So, join us on this enlightening journey and revolutionize your application development process. Creating a Quarkus Microservice API With HarperDB, Part 1: Setting up the Environment This section will guide you through configuring your development environment and creating the necessary project setup to get started. Step 1: Configuring the Environment Before diving into the development, you need to set up your environment. We’ll start by running HarperDB in a Docker container. To do this, open your terminal and run the following command: Shell docker run -d -e HDB_ADMIN_USERNAME=root -e HDB_ADMIN_PASSWORD=password -e HTTP_THREADS=4 -p 9925:9925 -p 9926:9926 harperdb/harperdb This command downloads and runs the HarperDB Docker container with the specified configuration. It exposes the necessary ports for communication. Step 2: Creating a Schema and Table With HarperDB up and running, the next step is to create a schema and define a table to store animal data. We will use the “curl” commands to interact with HarperDB’s RESTful API. Create a schema named “dev” by executing the following command: Shell curl --location --request POST 'http://localhost:9925/' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data-raw '{ "operation": "create_schema", "schema": "dev" }' This command sends a POST request to create the “dev” schema. Next, create a table named “animal” with “scientificName” as the hash attribute using the following command: Shell curl --location 'http://localhost:9925' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_table", "schema": "dev", "table": "animal", "hash_attribute": "scientificName" }' This command establishes the “animal” table within the “dev” schema. 3. Now, add the required attributes for the “animal” table by creating “name,” “genus,” and “species” attributes: Shell curl --location 'http://localhost:9925' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_attribute", "schema": "dev", "table": "animal", "attribute": "name" }' curl --location 'http://localhost:9925' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_attribute", "schema": "dev", "table": "animal", "attribute": "genus" }' curl --location 'http://localhost:9925' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_attribute", "schema": "dev", "table": "animal", "attribute": "species" }' These commands add the “name”, “genus”, and “species” attributes to the “animal” table within the “dev” schema. With HarperDB configured and the schema and table set up, you can start building your Quarkus-based microservice API to manage animal data. Stay tuned for the next part of the tutorial, where we’ll dive into the development process. Building Quarkus Application We configured HarperDB and prepared the environment. Now, we’ll start building our Quarkus application to manage animal data. Quarkus makes it easy with a handy project generator, so let’s begin. Quarkus offers an intuitive web-based project generator that simplifies the initial setup. Visit Quarkus Project Generator, and follow these steps: Select the extensions you need for your project. Add “JAX-RS” and “JSON” for this tutorial to handle REST endpoints and JSON serialization. Click the “Generate your application” button. Download the generated ZIP file and extract it to your desired project directory. With your Quarkus project generated, you’re ready to move on. Our project will use the DataFaker library and the HarperDB Java driver to generate animal data to interact with the HarperDB database. To include the HarperDB Java Driver, please read the previous article. In your Quarkus project, create a Java record to represent the Animal entity. This record will have fields for the scientific name, name, genus, and species, allowing you to work with animal data efficiently. Java public record Animal(String scientificName, String name, String genus, String species) { public static Animal of(Faker faker) { var animal = faker.animal(); return new Animal( animal.scientificName(), animal.name(), animal.genus(), animal.species() ); } } This record includes a factory method, of, that generates an Animal instance with random data using the DataFaker library. We’ll use this method to populate our database with animal records. In your Quarkus project, we’ll set up CDI (Contexts and Dependency Injection) to handle database connections and data access. Here’s an example of how to create a ConnectionSupplier class that manages database connections: Java @ApplicationScoped public class ConnectionSupplier { private static final Logger LOGGER = Logger.getLogger(ConnectionSupplier.class.getName()); @Produces @RequestScoped public Connection get() throws SQLException { LOGGER.info("Creating connection"); // Create and return the database connection, e.g., using DriverManager.getConnection } public void dispose(@Disposes Connection connection) throws SQLException { LOGGER.info("Closing connection"); connection.close(); } } The ConnectionSupplier class uses CDI annotations to produce and dispose of database connections. This allows Quarkus to manage the database connection lifecycle for you. Let’s create the AnimalDAO class to interact with the database using JDBC. This class will have methods for inserting and querying animal data. Java @ApplicationScoped public class AnimalDAO { private final Connection connection; public AnimalDAO(Connection connection) { this.connection = connection; } public void insert(Animal animal) { try { // Prepare and execute the SQL INSERT statement to insert the animal data } catch (SQLException exception) { throw new RuntimeException(exception); } } public Optional<Animal> findById(String id) { try { // Prepare and execute the SQL SELECT statement to find an animal by ID } catch (SQLException exception) { throw new RuntimeException(exception); } } // Other methods for data retrieval and manipulation } In the AnimalDAO class, you’ll use JDBC to perform database operations. You can add more methods to handle various database tasks, such as updating and deleting animal records. The AnimalService class will generate animal data and utilize the AnimalDAO for database interaction. Java @ApplicationScoped public class AnimalService { private final Faker faker; private final AnimalDAO dao; @Inject public AnimalService(Faker faker, AnimalDAO dao) { this.faker = faker; this.dao = dao; } // Implement methods for generating and managing animal data } In the AnimalService, you’ll use the DataFaker library to generate random animal data and the AnimalDAO for database operations. With these components in place, you’ve set up the foundation for your Quarkus-based Microservice API with HarperDB. In the next part of the tutorial, we’ll dive into developing RESTful endpoints and data management. Create AnimalResource Class In this final part of the tutorial, we will create an AnimalResource class to expose our animal service through HTTP endpoints. Additionally, we will provide sample curl commands to demonstrate how to consume these endpoints locally. Create an AnimalResource class with RESTful endpoints for managing animal data. This class will interact with the AnimalService to handle HTTP requests and responses. Java @Path("/animals") @Produces(MediaType.APPLICATION_JSON) @Consumes(MediaType.APPLICATION_JSON) public class AnimalResource { private final AnimalService service; public AnimalResource(AnimalService service) { this.service = service; } @GET public List<Animal> findAll() { return this.service.findAll(); } @POST public Animal insert(Animal animal) { this.service.insert(animal); return animal; } @DELETE @Path("{id}") public void delete(@PathParam("id") String id) { this.service.delete(id); } @POST @Path("/generate") public void generateRandom() { this.service.generateRandom(); } } In this class, we’ve defined several RESTful endpoints, including: GET /animals: Returns a list of all animals. POST /animals: Inserts a new animal. DELETE /animals/{id}: Deletes an animal by its ID. POST /animals/generate: Generates random animal data. Here are curl commands to test the HTTP endpoints locally using http://localhost:8080/animals/ as the base URL: Retrieve All Animals (GET) Shell curl -X GET http://localhost:8080/animals/ Insert a New Animal (POST) Shell curl -X POST -H "Content-Type: application/json" -d '{ "scientificName": "Panthera leo", "name": "Lion", "genus": "Panthera", "species": "Leo" }' http://localhost:8080/animals/ Delete an Animal by ID (DELETE) Replace {id} with the ID of the animal you want to delete: Shell curl -X DELETE http://localhost:8080/animals/{id} Generate Random Animal Data (POST) This endpoint doesn’t require any request data: Shell curl -X POST http://localhost:8080/animals/generate These curl commands allow you to interact with the Quarkus-based microservice API, performing actions such as retrieving, inserting, and deleting animal data. The generated random data endpoint is valuable for populating your database with test data. With these RESTful endpoints, you have a fully functional Quarkus application integrated with HarperDB to manage animal data over HTTP. You can extend and enhance this application further to meet your specific requirements. Congratulations on completing this tutorial! Conclusion In this tutorial, we embarked on a journey to build a Quarkus-based Microservice API integrated with HarperDB, a robust, high-performance database. We started by setting up our environment and creating a Quarkus project with the necessary extensions. Leveraging the DataFaker library, we generated random animal data to populate our HarperDB database. The core of our application was the seamless integration with HarperDB, showcasing the capabilities of the HarperDB Java driver. We used CDI to manage database connections efficiently and created a structured data access layer with the AnimalDAO class. Through this, we performed database operations, such as inserting and querying animal data. With the implementation of the AnimalService class, we combined the generated data with database operations, bringing our animal data management to life. Finally, we exposed our animal service through RESTful endpoints in the AnimalResource class, allowing us to interact with the service through HTTP requests. You can explore the complete source code of this project on GitHub. Feel free to fork, modify, and extend it to suit your needs. As you continue your journey into the world of HarperDB and Quarkus, remember to consult the comprehensive HarperDB documentation available at HarperDB Documentation to dive deeper into the capabilities and features of HarperDB. Stay informed about the latest updates, release notes, and news on HarperDB’s official website to ensure you’re always working with the most up-to-date information. Check out the latest release notes to discover what’s new and improved in HarperDB. By combining Quarkus and HarperDB, you’re well-equipped to build efficient and scalable applications that meet the demands of the modern digital landscape. Happy coding!
DevOps has shown to be successful in streamlining the product delivery process over time. A well-structured framework for maximizing the value of corporate data had to be established when firms all over the world adopted a data-driven strategy. These data-driven insights enabled consumers to make wise decisions based on verifiable evidence rather than relying on incorrect assumptions and forecasts. To better understand the distinction between DataOps and DevOps, it is meaningful to first define a clear definition. DevOps represents a paradigm shift in the capacity of development and software teams to deliver output effectively. At the same time, DataOps primarily centers around optimizing and refining intelligent systems and analytical models through the expertise of data analysts and data engineers. What Is DataOps? Your data analytics and decision-making processes can only reach their full potential with the help of DataOps. Reducing the cost of data management is one of the main goals of data operations. You may minimize the need for labor-intensive operations and free up precious resources by automating manual data-gathering and processing processes. It not only saves money but also frees up your team to concentrate on more strategic projects. Additionally, enhancing data quality is the core of data operations. You can spot and fix any problems or irregularities in your data pipeline in real time by using continuous monitoring. Making educated judgments is made possible by ensuring the reliability and accuracy of the insights and information you rely on. What is DevOps? DevOps is an approach to software development that focuses on making things run smoothly and continuously improving. It's like the agile development method, but it takes a step further by involving the IT operations and quality assurance teams. So now, the development team is focused on creating the product and how it performs after being deployed. The focus of DevOps is to make collaboration better and reduce any obstacles in the development process. It's all about being efficient. The great thing is that it comes with benefits like better communication between the product teams, saving money, getting better, and quickly responding to customer feedback. DataOps and DevOps: Similarities DataOps and DevOps share a common foundation in agile project management. The subsequent sections will delve into the specific aspects that highlight this shared background. Agile Methodology The Agile methodology serves as the foundation for the extensions known as DevOps and DataOps, which are specifically tailored to the domains of software development and data analysis, respectively. Agile methodology places a strong emphasis on adaptable thinking and swift adjustments to effectively address evolving business requirements and capitalize on emerging technologies and opportunities. DevOps and DataOps adhere to this philosophy to optimize their respective pipelines. Iterative Cycles Both methodologies employ brief iterative cycles to efficiently generate outcomes and gather input from stakeholders to guide their subsequent actions. Incremental development enables users to promptly benefit from the deliverable and assess its alignment with fundamental requirements. Subsequently, DevOps or DataOps teams can commence constructing subsequent layers of the product or alter their trajectory as necessary. Collaboration DataOps and DevOps are all about teamwork and collaboration! In DataOps, our awesome data engineers and scientists team up with business users and analysts to uncover valuable insights that align with our business goals. Meanwhile, in DevOps, our development, operations, and quality assurance teams join forces to create top-notch software that our customers will love. The best part? Both models put a huge emphasis on gathering feedback from our end users because we believe that their satisfaction is the ultimate measure of success. DataOps and DevOps: Differences Outcomes When it comes to achieving results, DataOps is all about creating a seamless flow of data and ensuring that valuable information reaches the hands of end users. To maximize efficiency, this includes developing cutting-edge data transformation apps and optimizing infrastructure. DevOps, on the other hand, adopts a somewhat different strategy by emphasizing the rapid delivery of excellent software to clients. DevOps seeks to deliver a minimal viable product (MVP) as rapidly as possible by distributing updates and making incremental adjustments based on insightful consumer input. The best thing, though? In following development cycles, its functionality can be increased and improved to give clients the greatest experience possible. Testing In DataOps, it is important to verify test results because the true value or statistic is unknown. This may lead to questions about the relevance of the data and the use of the most recent information, which requires validation to ensure confidence in the analysis. In DevOps, the outcomes are clearly defined and expected, making the testing phase simpler. The main focus is on whether the application achieves the desired result. If it is successful, the process continues; if not, debugging and retesting are done. Workflow Real-time data processing for decision-making and ensuring that high-quality data is consistently delivered via the data pipeline are the main goals of the field of data operations. Due to the always-evolving and expanding nature of data sets, building pipelines for new use cases is only one important aspect of maintaining and improving the underlying infrastructure. In contrast, DevOps, while also prioritizing efficiency, follows a structured sequence of stages in its pipeline. Some organizations employ DevOps and continuous integration/continuous deployment (CI/CD) to frequently introduce new features. However, the velocity of a DataOps pipeline surpasses that of DevOps, as it promptly processes and transforms newly collected data, potentially resulting in multiple deliveries per second based on the volume of data. Feedback DataOps places a high emphasis on soliciting feedback from business users and analysts to ensure that the final deliverable is in line with their specific requirements. These stakeholders possess valuable contextual knowledge regarding the data-generating business processes and the findings they make based on the information provided. In contrast, DevOps does not always need customer feedback unless a particular aspect of the application fails to meet their needs. If the end users are content, their feedback becomes optional. Nevertheless, teams should actively monitor the usage of the application and DevOps metrics to evaluate overall satisfaction, identify areas for enhancement, and guarantee that the product fulfills all intended use cases. Which One Is Better for You? Although they may sound like fancy buzzwords, DevOps and DataOps are revolutionizing the fields of software development and data engineering. These two techniques may have some similar concepts in common, such as efficiency, automation, and cooperation, but they also have a different focus. Let's start with DevOps. This approach is all about optimizing software delivery and IT operations. The smooth development, testing, and deployment of your software is like having a well-oiled machine. You can put an end to the annoying delays and bottlenecks that used to bog down your development process with DevOps. It's all about simplifying processes and making everything operate seamlessly. On the other hand, we have DataOps. This methodology takes data management to a whole new level. It's not just about storing and organizing data anymore. The goal of DataOps is to improve your analytics and decision-making processes. It's analogous to having a crystal ball that provides insights and forecasts based on your data. You may get a competitive advantage in the market by making smarter, data-driven decisions using DataOps.
There is no doubt that Jira is one of the most popular project management and issue-tracking tools for organizations. It provides a great number of benefits to teams, including improved collaboration between technical and non-technical teams, increased visibility, enhanced productivity, better project planning, flexible customization, scalability, comprehensive reposting, agile methodology support, and, of course, easy compatibility with other Atlassian cloud products — Bitbucket and Confluence. However, what would your team do if something went wrong with your Jira data? In this artice, we will have a deep dive into Jira security best practices. Yet, first, let’s have a quick tour of what security risks and threats your Jira data can face, and what security approaches Atlassian uses to protect your data against those threats. Jira Security Risks Protecting data should become one of the most important tasks for security leaders. If you don’t believe it, let me remind you about the Atlassian outage in April 2022 that affected 775 customers who couldn’t access their Jira data for almost a week, or numerous vulnerabilities and security flaws tracked in Jira which cloud lead to lost credentials and data breaches. However, it's not only vulnerabilities, outages, ransomware, and malware activity that can threaten your Jira account data. To this "why-protect-Jira-data" list, you can easily add human mistakes, hardware and software errors, compliance with strict security audits, and the Atlassian Cloud Shared Responsibility Model. Atlassian Security Approaches and Shared Responsibility Model Every service provider follows the Shared Responsibility Model; Atlassian is no exception. According to the Atlassian Cloud Shared Responsibility Model: The provider is concentrated on its infrastructure integrity and is responsible for the hosting, system, and application The customer is in charge of protecting their Jira account data, as account-level protection isn’t included in Atlassian’s responsibilities and competence To ensure that Atlassian products are secure, the cloud service provider has built its security philosophy, which states that Atlassian is aimed at leading its users in cloud and product security exceeding all industry security standards and certification criteria as well as client requirements for cloud security and is honest about its programs, processes, and metrics. Focusing on the fundamentals of security, the provider has a number of programs to certify that its approach to security is proactive and wide-reaching. These security programs include the Security Champions Program, Security Detections Program, and Bug Bounty Program. Security in Jira: Atlassian’s New DevSecOps Feature Recently Atlassian introduced its new feature “Security in Jira,” which permits Jira users to integrate popular security tools in the app’s Security tab. Suzie Prince, a head of product for DevOps at Atlassian, states: “Our goal with Security in Jira is to make security a native part of the agile planning rituals central to excellent software teams. With the Security tab, we’re shifting security left while increasing transparency across tools and teams so Jira Software’s more than 100,000 customers will now be able to more easily and effectively address vulnerabilities.” Thus, the software provider is sure that the integration of security tools and Jira Software’s Security tab will better equip DevOps teams to simplify their processes and respond to vulnerabilities fast. Jira Security Best Practices By following these security best practices, you can build a solid foundation for safeguarding your company’s vital Jira data. 1. Set Up User Access Controls Imagine a situation when a new user creates a password without paying a lot of attention to it and thinks that his birth date is suitable for that. Will it be an easy task for a threat actor to break into his Jira account? Probably, yes. That’s why it’s important that all your employees use strong user passwords. Atlassian has its password policies that define the requirements your password should have and set password strength requirements and expiry dates for reducing the risk of password-related compromises. Here are the tips Atlassian provides in its official documents: Atlassian Password Policy: Tips for setting strong passwords. 2. Use Organizations for Central Visibility and Management Atlassian has an outstanding feature for managing and controlling all the users who have access to not only Jira-related products, Jira Software, Jira Work Management, and Jira Service Management, but also other Atlassian products — Bitbucket and Confluence. This feature is “organizations,” an administration layer that provides admins with a mechanism to enforce the appropriate security policies across the Atlassian accounts in their company. With it, admins have complete visibility and control over the company’s employees who use Jira regularly and can enforce security controls like SSO and automated user provisioning across the company by validating your corporate domain and managing all Atlassian accounts within the organization. If you don’t intend to establish an organization and enforce security policies on your organization, then you will need to consider configuring your Atlassian infrastructure in a way that only some specific cloud sites, products, or repositories contain sensitive information within them. Moreover, you may need to limit the number of people who have access to those specified sites, products, or repositories. 3. Update Jira Regularly Atlassian constantly monitors its security and detects vulnerabilities that can potentially threaten its environment. To deal with these threats effectively, the service provider regularly releases patches to its products. So, it’s critical to update your Jira to enhance security and withstand ever-appearing bugs. 4. Implement Network Security Measures To protect your Jira environment and its data from unauthorized access, malware, and other cyber threats, you can implement network security measures like: Firewalls, which will block unauthorized access to the network and Jira system and will filter traffic to and from Jira preventing from malicious traffic entering the network. Intrusion detection or prevention systems (IDS/IPS), which can monitor network traffic for any sign of malicious activity. Virtual private networks (VPNs), which you can use to create a secure, encrypted connection between remote users and your company’s Jira system. Network segmentation, which will help your company separate Jira from other parts of the network, and restrict access to Jira only to authorized users and systems. Two-step verification, which will serve as an extra layer to protect secure access to the Jira account by providing not only a password but also verification by phone or email (by the way, Atlassian permits enforcing individual two-step verification as well as two-step authentication across your entire organization). 5. Configure SSO With Your Identity Provider If you want to control your team’s account access, you can enable single sign-on (SSO) across all the SaaS applications your company uses. It will help to mitigate the security risks brought on by the company’s increasing usage of cloud applications and logins. What benefits does the integration between your SSO and the Atlassian provider bring? It enables: Centralized management of your authentication policies Just-in-time provisioning An automatic lockout when you deactivate a user from your SSO Atlassian provides a few options for SSO. The first one goes with the subscription to Atlassian Access, which enables the connection of your cloud products and the identity provider of your choice — SAML SSO. And the other one is the SSO with G Suite, which is useful if you rely a lot on Google Workspace. 6. Install Automated Provisioning and De-Provisioning How much time do you need to spend on creating new accounts or updating the accounts of other employees who, for example, transfer to another department? Yep; it takes time. However, if you have automated user provisioning, you can save time on this operation, as it allows for a direct synchronization between your identity provider and Jira. So, there is no need to manually create user accounts when a new employee joins your team. Automated de-provisioning, on the other hand, removes access for those who leave your company without your interference. It can reduce the risk of information breaches due to forgetting to restrict access to ex-employees. With Atlassian, you can use provisioning with G Suite (but here is one note: no group categorization is reflected in your organization), or provisioning with SCIM, which permits you to sync your identity provider with Jira. 7. Conduct Regular Security Assessments It’s important to test and monitor your Jira security on a regular basis to identify and address any vulnerabilities or security weaknesses. For that reason, you can: Review Jira audit logs, which will give you a wider view of which events and actions are logged in your Jira and your configuration options. Regularly audit your Jira accounts even if you have SSO or two-step verification enabled. Perform penetration testing, which will simulate a cyberattack on your Jira system and identify any weaknesses that could be exploited. Conduct vulnerability assessments using some vulnerability scanning tools to identify potential vulnerabilities in your Jira environment. 8. Educate Your Team on Security Procedures It’s important to educate your team about security procedures, as sometimes even knowing something for sure, you can forget about it. So, as a security leader, you should remind your employees to create strong security passwords, enable two-step verification, enforce SSO, remind them to use API tokens for Jira REST API basic authentication, and restrict access to pages or tickets that include sensitive information. 9. Back Up Your Jira Environment Having regular backups of your Jira environment can give your product team assurance that nothing will interrupt their working process. They will be sure that even during an Atlassian outage or a ransomware attack, they will be able to recover their Jira backup for the company’s business continuity. Here are the Jira backup best practices you should follow to have your Jira data available and recoverable anytime: Your backup solution should cover all your Jira data, including projects, workflows, users, attachments, etc. Your backup provider should be a multi-storage system that permits you to back up your Jira data to a few destinations. You should have a possibility to follow the 3-2-1 backup rule, having at least 3 copies in no less than 2 storage instances, one of which is offsite. Your backup should be ransomware-proof and have encryption in-flight and at rest with your own encryption key. Your backup should have a Jira restore and disaster recovery technology permitting you to restore your data fast in any event of failure, including such features as restore to the same or new account, or to a free Jira account with no-user recovery option, point-in-time instant restore, Cloud2Cloud or Cloud2Local restore. Your backup should permit you to schedule your Jira backups automatically. Central Monitoring Management, which will help you keep track of your Jira backup performance. Of course, as a security leader, you can appoint certain members of your team to build scripts and carry out backups regularly as part of their everyday duties, but this might take time and disturb them from their core tasks, or you can utilize a third-party backup software to protect your Jira data. 10. Set Up a Mobile Policy for Your Cloud Applications You can turn on the function of preventing copy-paste or screen recording for cloud mobile apps as an additional layer of protection. Using Atlassian Access, it’s possible to configure a mobile (MAM) policy or configure your existing Mobile Device Management (MDM) solution using the AppConfig standard (supported by most MDM solutions, such as Microsoft Intune, VMware, JAMF, etc.). Final Thought: Has Everything Been Mentioned? Atlassian puts a lot of effort into securing its products, though you shouldn’t forget that the security of your Jira data is your responsibility. It’s important to adhere to the security best practices to be sure that your sensitive data is protected. There is one more tip. Only install apps and plugins from trusted sources — for example, Atlassian Marketplace — and review and remove any unused or unnecessary ones on a regular basis.
The Initial Need Leading to CQRS The traditional CRUD (Create, Read, Update, Delete) pattern has been a mainstay in system architectures for many years. In CRUD, reading and writing operations are usually handled by the same data model and often by the same database schema. While this approach is straightforward and intuitive, it becomes less effective as systems scale and as requirements become more complex. For instance, consider a large-scale e-commerce application with millions of users. This system may face conflicting demands: it needs to quickly read product details, reviews, and user profiles, but it also has to handle thousands of transactions, inventory updates, and order placements efficiently. As both reading and writing operations grow, using a single model for both can lead to bottlenecks, impacting performance and user experience. Basics of the CQRS Pattern CQRS was introduced to address these scaling challenges. The essence of the pattern lies in its name — Command Query Responsibility Segregation. Here, commands are responsible for any change in the system’s state (like placing an order or updating a user profile), while queries handle data retrieval without any side effects. In a CQRS system, these two operations are treated as entirely distinct responsibilities, often with separate data models, databases, or even separate servers or services. This allows each to be tuned, scaled, and maintained independently of the other, aligning with the specific demands of each operation. CQRS Components Commands Commands are the directive components that perform actions or changes within the system. They should be named to reflect the intent and context, such as PlaceOrder or UpdateUserProfile. Importantly, commands should be responsible for changes, and therefore should not return data. Further exploration into how commands handle validation, authorization, and business logic would illuminate their role within the CQRS pattern. Queries Queries, on the other hand, handle all request-for-data operations. The focus could be on how these are constructed to provide optimized, denormalized views of the data tailored for specific use cases. You may delve into different strategies for structuring and optimizing query services to deal with potentially complex read models. Command and Query Handlers Handlers serve as the brokers that facilitate the execution of commands and queries. Command handlers are responsible for executing the logic tied to data mutations while ensuring validations and business rules are adhered to. Query handlers manage the retrieval of data, potentially involving complex aggregations or joins to form the requested read model. Single Database vs Dual Database Approach Single Database In this model, both command and query operations are performed on a single database, but with distinct models or schemas. Even though both operations share the same physical storage, they might utilize different tables, views, or indexes optimized for their specific requirements. It could be represented as follows: Single database representation Benefits Simplified infrastructure and reduced overhead Immediate consistency, as there’s no lag between write and read operations Trade-Offs The shared resource can still become a bottleneck during heavy concurrent operations. Less flexibility in tuning and scaling operations independently Dual Database Approach Here, command and query operations are entirely separated, using two distinct databases or storage systems. The write database is dedicated to handling commands, while the read database serves query operations. A handle synchronization system must be added. It could be represented as follows: Multi database representation Benefits Individual tuning, scaling, and optimization for each operation Potential for each database to be hosted on different servers or services, distributing load; database solutions could also be different to allow more modularity for specific needs. Trade-Offs Introduces complexity in ensuring data consistency between the two databases Requires synchronization mechanisms to bridge the potential misalignment or latency between the write and read databases: For instance, a write operation may update the command database, but the read database might not immediately reflect these changes. To address this, synchronization techniques can range from simple polling to more intricate methods like Event Sourcing, where modifications are captured as a series of events and replayed to update the read database. The single database approach offers simplicity; the dual database configuration provides more flexibility and scalability at the cost of added complexity. The choice between these approaches depends on the specific requirements and challenges of the system in question, particularly around performance needs and consistency requirements. Benefits and Trade-Offs of CQRS Pattern Benefits Performance optimization: By separating read and write logic, you can independently scale and optimize each aspect. For instance, if a system is read-heavy, you can allocate more resources to handling queries without being bogged down by the demands of write operations. Flexibility and scalability: With separate models for reading and writing, it’s easier to introduce changes in one without affecting the other. This segregation not only allows for more agile development and easier scalability but also offers protection against common issues associated with eager entity loading. By distinctly handling reads and writes, systems can be optimized to avoid unnecessary data loading, thereby improving performance and reducing resource consumption. Simplified codebase: Separating command and query logic can lead to a more maintainable and less error-prone codebase. Each model has a clear responsibility, reducing the likelihood of bugs introduced due to intertwined logic. Trade-Offs Data consistency: As there might be different models for read and write, achieving data consistency can be challenging. CQRS often goes hand-in-hand with the “eventual consistency” model, which might not be suitable for all applications. Complexity overhead: Introducing CQRS can add complexity, especially if it’s paired with other patterns like Event Sourcing. It’s crucial to assess whether the benefits gained outweigh the added intricacy. Increased development effort: Having separate models for reading and writing means, in essence, maintaining two distinct parts of the system. This can increase the initial development effort and also the ongoing maintenance overhead. Conclusion CQRS offers a transformative approach to data management, bringing forth significant advantages in terms of performance, scalability, and code clarity. However, it’s not a silver bullet. Like all architectural decisions, adopting CQRS should be a measured choice, considering both its benefits and the challenges it introduces. For systems where scalability and performance are paramount, and where the trade-offs are acceptable, CQRS can be a game-changer, elevating the system’s robustness and responsiveness to new heights.
It's right in the middle of the busy conference season, and I was prepping for an upcoming conference talk. As I often do, I went to Neo4j Aura to spin up a free database and use Cypher with APOC to import data from an API, but this API requires a header, and the APOC procedure that adds headers to a request is blocked by security in Aura. Hmm, I needed a new route. I decided to try JBang, which is a tool for scripting with Java. I had heard about it but hadn't tried it yet. It's pretty cool, so I wanted to share my onboarding. What Is JBang? Java developers have lamented the lack of a scripting language for Java for years. JBang solves this problem. I found an excellent overview of JBang from a post on InfoQ (Scripting Java with a jBang). JBang provides a way of running Java code as a script...[It] is a launcher script, written in bash and powershell, that can discover or download a JVM, and then (down)load the Java script given in an argument. The implementation of JBang is a Java JAR archive, which it then launches to execute further commands. JBang can run jsh or java files; the latter is a standard Java class with a main() method. However, unlike JShell, comments at the top of JBang allow dependencies to be automatically downloaded and set up on the classpath. JShell allows adding JARs to the classpath at launch, but any (recursive) dependencies have to be added manually. JBang seems like a nicer alternative to either using a full-fledged Java project or a Linux script. Let's get a bit more detail about the data API we will pull from before we dive into writing the script! Setup: Install/Download First, we need to install JBang from the download page. I had to find the download for my operating system and then choose an install type. Since I use SDKMan to manage my Java versions, I installed JBang with SDKMan, too. Shell sdk install jbang Several IDEs have plugins for JBang, as well, including IntelliJ. The IntelliJ plugin seems to have several nice features, including import suggestions. However, I had trouble utilizing it from an existing project or randomly created script, but I had to create a separate project initialized with JBang. I probably need to play with this a bit more since it would simplify the import problem (discussed in a bit). Anyway, I decided to mess with the plugin later and just use the command line for now. API Details I wanted to import data for traveling with pets, and the Yelp Fusion API was one that I knew I wanted to use. This was also the one that required a header on the request, which led me down the path toward JBang in the first place. The Yelp API has a really useful playground where I could test a few requests before I started writing the script. I also used the playground to verify syntax and get sample code for an API call in Java. Write the Script In the playground, you can choose the endpoint you want to hit, any parameters, as well as the language you want to use to make the request. I chose Java and the parameters I knew I needed, and it gave me the following code: Java OkHttpClient client = new OkHttpClient(); Request request = new Request.Builder() .url("https://api.yelp.com/v3/businesses/search?location=" + location + "&categories=" + category + "&attributes=dogs_allowed&limit=50&sort_by=distance") .get() .addHeader("accept", "application/json") .addHeader("Authorization", "Bearer " + yelpApiKey) .build(); Response response = client.newCall(request).execute(); Now, I tweaked the code a bit above to use placeholder variables for location, category, and yelpApiKey so that I could pass in arbitrary values later. The code sample from the playground auto-includes your API token, so I copied/pasted the block above into my JBang script, and then I needed to go back and add dependencies. This was where JBang was a little less convenient and where an IDE plugin might come in handy. I had to go to Maven Central and search for the dependencies I needed. There isn't an auto-import, which makes sense since we don't have a dependency manager like Maven or Spring that could potentially search dependencies for useful import suggestions. I also wanted to pull pet travel data from several (of the many) categories Yelp offers. Since there is a high request limit but a smaller result limit, I decided to hit the endpoint for each category independently to retrieve the maximum results for each category. I also wanted a parameter for the location so that I could pull data for different cities. Finally, I needed a file to output the results so that I wouldn't have to hit the API each time I might want to load the data. I added the following variables to the script: Java filename = "yelpApi.json"; String[] yelpCategories = {"active","arts","food","hotelstravel","nightlife","pets","restaurants","shopping"}; String location = "New%20York%20City"; Last but not least, I needed to create the JSON object to format and hold the results and then write that to the JSON file. Java try { JSONObject json = new JSONObject(); JSONArray jsonArray = new JSONArray(); String jsonData = ""; OkHttpClient client = new OkHttpClient().newBuilder().connectTimeout(20, TimeUnit.SECONDS).build(); for (String category : yelpCategories) { <API call> jsonData = response.body().string(); JSONObject obj = new JSONObject(jsonData); JSONArray array = obj.getJSONArray("businesses"); JSONObject place = new JSONObject(); int n = array.length(); for (int i = 0; i < n; ++i) { place = array.getJSONObject(i); if (!place.isEmpty()) { json.append(category, place); } } } FileWriter myWriter = new FileWriter(filename); myWriter.write(json.toString(4)); myWriter.close(); System.out.println("Successfully wrote to Yelp file."); } catch (IOException e) { e.printStackTrace(); } Following this, I needed a few more import statements. You might notice that I added a connect timeout to the request. This is because the servers for one of the APIs were sometimes a bit sluggish, and I decided to wrap the other API calls with the same timeout protection to prevent the script from hanging or erroring out. The full version of the code is available on GitHub. Running the Script To run, we can use the command `jbang` plus the name of the script file. So our command would look like the following: Shell jbang travelPetDataImport.java This will run the script and output the results to the file we specified. We can check the file to make sure the data was written as we expected. Wrap Up! I was really impressed and happy with the capabilities and simplicity of JBang! It provided a straightforward way to write a script using the same Java syntax I'm comfortable with, and it was easy to get started. Next time, I'd like to figure out the IDE plugin so that I can hopefully take advantage of import suggestions and other efficiencies available. I'm looking forward to using JBang more in the future! Resources GitHub repository: Accompanying code for this blog post Website: JBang Documentation: JBang Data: Yelp Fusion API
As Facebook's user base and social graph complexity have expanded exponentially, the need for a highly scalable and efficient data storage solution has become increasingly critical. Enter TAO (The Associations and Objects), Facebook's custom-built distributed data store, designed to manage the social graph and provide low-latency access to user data. In this article, we will take an in-depth look at TAO, exploring its technical features, architecture, and the role it plays in optimizing Facebook's performance. TAO: A Graph-Based Data Model At its essence, TAO is an elegant and efficient graph-based data model that comprises two primary entities: objects and associations. Objects are nodes within the social graph, representing users, pages, posts, or comments. Associations, on the other hand, symbolize relationships between these objects, such as friendships, likes, or shares. Objects are identified by a 64-bit Object Identifier (OID), while associations are represented by an Association Identifier (AID). Both objects and associations possess a type that delineates their schema and behavior within the system. Here's a simplified example: Python # Create a user object user = Object("user", 123456789) user.set("name", "John Doe") user.set("birthdate", "1990-01-01") # Create a page object page = Object("page", 987654321) page.set("title", "AI Chatbots") page.set("category", "technology") # Create an association between the user and the page like = Association("like", user, page) like.set("timestamp", "2021-01-01T12:34:56") Dissecting TAO's Architecture The TAO system is composed of three primary components: TAO clients, TAO servers, and caching layers. Let's delve into each component individually. TAO Clients: Embedded within Facebook's web servers, TAO clients manage incoming read and write requests from users. They communicate with TAO servers to retrieve or update data, while also maintaining a local cache for rapid access to frequently requested data. TAO Servers: Tasked with storing and managing TAO's actual data, these servers are organized into multiple clusters. Each cluster contains a partition of the overall data, which is replicated across several TAO servers to ensure fault tolerance and load balancing. Caching Layers: TAO employs a two-tiered caching mechanism to optimize data access. The first level cache is maintained by TAO clients, while the second level cache is a separate distributed system called Memcache. By caching data at both levels, the need for resource-intensive database operations is minimized, thus enhancing overall performance. Balancing Consistency and Performance TAO is engineered to deliver eventual consistency, meaning that updates may not be immediately visible to all clients. However, this trade-off enables improved performance and scalability. To achieve this balance, TAO employs a combination of techniques, including: Write-Through Caching: Upon receiving a write request, a client first updates its local cache before forwarding the request to the TAO server. This ensures that subsequent reads from the same client remain consistent with the latest updates. Cache Invalidation: When a TAO server processes a write request, it broadcasts an invalidation message to all clients and Memcache servers. This mechanism guarantees that outdated data is eventually purged from caches and replaced with the most recent version. Read Repair: If a client detects inconsistency between its local cache and the TAO server, it can issue a read repair request to synchronize the local cache with the correct data. Conclusion TAO serves as a vital component of Facebook's infrastructure, facilitating the efficient storage and retrieval of billions of objects and associations within the social graph. Its distributed architecture, caching mechanisms, and consistency model have been meticulously designed to ensure high performance, scalability, and fault tolerance. By understanding the technical nuances of TAO, we can appreciate the challenges inherent in large-scale distributed systems and glean valuable insights for constructing our own scalable applications.
Asynchronous programming is a cornerstone of modern game development, enabling developers to execute multiple tasks simultaneously without blocking the main thread. This is crucial for maintaining smooth gameplay and enhancing the overall user experience. One of the most powerful yet often misunderstood features for achieving this in Unity is Coroutines. In this article, we will demystify Unity's Coroutines, starting with a foundational understanding of what coroutines are in the broader context of programming. We'll then narrow our focus to Unity's specific implementation of this concept, which allows for simplified yet robust asynchronous programming. By the end of this article, you'll have a solid grasp of how to start, stop, and effectively utilize coroutines in your Unity projects. Whether you're new to Unity or an experienced developer looking to deepen your understanding of this feature, this article aims to provide you with the knowledge you need to make your game development journey smoother and more efficient. Stay tuned as we delve into the ABCs of Unity's Coroutines: from basics to implementation. Understanding Coroutines Coroutines are a fascinating and powerful construct in the world of programming. They allow for a more flexible and cooperative form of multitasking compared to the traditional methods. Before diving into Unity's specific implementation, let's first understand what coroutines are in a general programming context. Coroutines are a type of control structure where the flow control is cooperatively passed between two different routines without returning. In simpler terms, imagine you have two tasks, A and B. Normally, you would complete task A before moving on to task B. However, with coroutines, you can start task A, pause it, switch to task B, and then resume task A from where you left off. This is particularly useful for tasks that don't need to be completed in one go and can yield control to other tasks for better efficiency. Traditional functions and methods have a single entry point and a single exit point. When you call a function, it runs to completion before returning control back to the caller. Coroutines, on the other hand, can have multiple entry and exit points. They can pause their execution at specific points, yield control back to the calling function, and then resume from where they left off. In Unity, this ability to pause and resume makes coroutines extremely useful for a variety of tasks such as animations, timing, and handling asynchronous operations without the complexity of multi-threading. Unity's Take on Coroutines Unity's implementation of coroutines is a bit unique but very much in line with the engine's overall design philosophy of simplicity and ease of use. Unity uses C#'s IEnumerator interface for its coroutine functionality. This allows you to use the `yield` keyword to pause and resume the coroutine's execution. Here's a simple Unity C# example to demonstrate a coroutine: C# using System.Collections; using UnityEngine; public class CoroutineExample : MonoBehaviour { // Start is called before the first frame update void Start() { StartCoroutine(SimpleCoroutine()); } IEnumerator SimpleCoroutine() { Debug.Log("Coroutine started"); yield return new WaitForSeconds(1); Debug.Log("Coroutine resumed"); yield return new WaitForSeconds(1); Debug.Log("Coroutine ended"); } } In this example, the SimpleCoroutine method is defined as an IEnumerator. Inside the coroutine, we use Debug.Log to print messages to the Unity console. The yield return new WaitForSeconds(1); line pauses the coroutine for one second. After the pause, the coroutine resumes and continues its execution. This is a very basic example, but it demonstrates the core concept of how Unity's coroutines work. They allow you to write asynchronous code in a more straightforward and readable manner, without getting into the complexities of multi-threading or callback hell. In summary, Unity's coroutines offer a powerful yet simple way to handle asynchronous programming within the Unity engine. They leverage the IEnumerator interface and the yield keyword to provide a flexible mechanism for pausing and resuming tasks, making it easier to create smooth and responsive games. Unity's Implementation of Coroutines Unity's approach to coroutines is both elegant and practical, fitting well within the engine's broader design philosophy. The implementation is rooted in C#'s IEnumerator interface, which provides the necessary methods for iteration. This allows Unity to use the yield keyword to pause and resume the execution of a coroutine, making it a powerful tool for asynchronous programming within the Unity environment. Explanation of How Unity Has Implemented Coroutines In Unity, coroutines are essentially methods that return an IEnumerator interface. The IEnumerator interface is part of the System.Collections namespace and provides the basic methods for iterating over a collection. However, Unity cleverly repurposes this interface to control the execution flow of coroutines. Here's a simple example to illustrate how Unity's coroutines work: C# using System.Collections; using UnityEngine; public class CoroutineDemo : MonoBehaviour { void Start() { StartCoroutine(MyCoroutine()); } IEnumerator MyCoroutine() { Debug.Log("Coroutine started at time: " + Time.time); yield return new WaitForSeconds(2); Debug.Log("Coroutine resumed at time: " + Time.time); } } In this example, the MyCoroutine method returns an IEnumerator. Inside the coroutine, we log the current time, then use yield return new WaitForSeconds(2); to pause the coroutine for 2 seconds. After the pause, the coroutine resumes, and we log the time again. The IEnumerator interface is the cornerstone of Unity's coroutine system. It provides the methods MoveNext(), Reset(), and the property Current, which are used internally by Unity to control the coroutine's execution. When you use the yield keyword in a coroutine, you're essentially providing a point where the MoveNext() method will pause and later resume the execution. The yield return statement can take various types of arguments to control the coroutine's behavior. For example: yield return null: Waits until the next frame. yield return new WaitForSeconds(float): Waits for a specified time in seconds. yield return new WaitForEndOfFrame(): Waits until the frame's rendering is done. yield return new WWW(string): Waits for a web request to complete. Here's an example that combines multiple yield statements: C# using System.Collections; using UnityEngine; public class MultipleYields : MonoBehaviour { void Start() { StartCoroutine(ComplexCoroutine()); } IEnumerator ComplexCoroutine() { Debug.Log("Started at frame: " + Time.frameCount); yield return null; Debug.Log("One frame later: " + Time.frameCount); Debug.Log("Waiting for 2 seconds..."); yield return new WaitForSeconds(2); Debug.Log("Resumed after 2 seconds."); yield return new WaitForEndOfFrame(); Debug.Log("Waited for end of frame."); } } In this example, the ComplexCoroutine method uses different types of yield statements to control its execution flow. This showcases the flexibility and power of using the IEnumerator interface in Unity's coroutines. In summary, Unity's implementation of coroutines via the IEnumerator interface provides a robust and flexible way to handle asynchronous tasks within your game. Whether you're animating characters, loading assets, or making network calls, coroutines offer a straightforward way to perform these operations without blocking the main thread, thereby keeping your game smooth and responsive. Starting and Stopping Coroutines Understanding how to start and stop coroutines is crucial for effectively managing asynchronous tasks in Unity. The engine provides simple yet powerful methods to control the lifecycle of a coroutine, allowing you to initiate, pause, and terminate them as needed. In this section, we'll delve into these methods and their usage. Starting a coroutine in Unity is straightforward. You use the StartCoroutine() method, which is a member of the MonoBehaviour class. This method takes an IEnumerator as an argument, which is the coroutine you want to start. Here's a basic example: C# using System.Collections; using UnityEngine; public class StartCoroutineExample : MonoBehaviour { void Start() { StartCoroutine(MyCoroutine()); } IEnumerator MyCoroutine() { Debug.Log("Coroutine has started."); yield return new WaitForSeconds(2); Debug.Log("Two seconds have passed."); } } In this example, the Start() method calls StartCoroutine(MyCoroutine()), initiating the coroutine. The MyCoroutine method is defined as an IEnumerator, fulfilling the requirement for the StartCoroutine() method. You can also start a coroutine by passing a string name of the method: C# StartCoroutine("MyCoroutine"); However, this approach is generally less efficient and more error-prone, as it relies on reflection and won't be checked at compile-time. How to Stop a Coroutine Using StopCoroutine() and StopAllCoroutines() Methods Stopping a coroutine is just as important as starting one, especially when you need to manage resources or change the flow of your game dynamically. Unity provides two methods for this: StopCoroutine() and StopAllCoroutines(). StopCoroutine(): This method stops a specific coroutine. You can pass the coroutine's IEnumerator or the string name of the coroutine method to stop it. C# using System.Collections; using UnityEngine; public class StopCoroutineExample : MonoBehaviour { IEnumerator MyCoroutine() { while (true) { Debug.Log("Coroutine is running."); yield return new WaitForSeconds(1); } } void Start() { StartCoroutine(MyCoroutine()); } void Update() { if (Input.GetKeyDown(KeyCode.Space)) { StopCoroutine(MyCoroutine()); Debug.Log("Coroutine has been stopped."); } } } In this example, pressing the spacebar will stop the `MyCoroutine` coroutine, which is running in an infinite loop. StopAllCoroutines(): This method stops all coroutines running on the current MonoBehaviour script. C# using System.Collections; using UnityEngine; public class StopAllCoroutinesExample : MonoBehaviour { IEnumerator Coroutine1() { while (true) { Debug.Log("Coroutine1 is running."); yield return new WaitForSeconds(1); } } IEnumerator Coroutine2() { while (true) { Debug.Log("Coroutine2 is running."); yield return new WaitForSeconds(1); } } void Start() { StartCoroutine(Coroutine1()); StartCoroutine(Coroutine2()); } void Update() { if (Input.GetKeyDown(KeyCode.Space)) { StopAllCoroutines(); Debug.Log("All coroutines have been stopped."); } } } In this example, pressing the spacebar will stop all running coroutines (Coroutine1 and Coroutine2) on the current MonoBehaviour. In summary, Unity provides a straightforward yet powerful set of tools for managing the lifecycle of coroutines. The StartCoroutine() method allows you to initiate them, while StopCoroutine() and StopAllCoroutines() give you control over their termination. These methods are essential for writing efficient and manageable asynchronous code in Unity. Conclusion Coroutines in Unity serve as a powerful tool for handling a variety of tasks in a more manageable and efficient manner. They offer a simplified approach to asynchronous programming, allowing developers to write cleaner, more readable code. By understanding the basics, you've laid the groundwork for diving into more complex and nuanced aspects of using coroutines in Unity. In this article, we've covered the foundational elements: Understanding Coroutines: We started by defining what coroutines are in a general programming context, emphasizing their ability to pause and resume execution, which sets them apart from conventional functions and methods. Unity's Implementation of Coroutines: We delved into how Unity has uniquely implemented coroutines using the `IEnumerator` interface. This implementation allows for the use of the yield keyword, which is central to pausing and resuming coroutine execution. Starting and Stopping Coroutines: Finally, we explored the methods Unity provides for controlling the lifecycle of a coroutine. We discussed how to initiate a coroutine using StartCoroutine() and how to halt its execution using StopCoroutine() and StopAllCoroutines(). As we move forward, there are several advanced topics to explore: Concept of Yielding: Understanding the different types of yield instructions can help you control your coroutines more effectively. This includes waiting for seconds, waiting for the end of the frame, or even waiting for asynchronous operations to complete. Execution Flow: A deeper look into how coroutines affect the overall execution flow of your Unity project can provide insights into optimizing performance and resource management. Practical Use-Cases: Coroutines are versatile and can be used in a myriad of scenarios like animations, AI behavior, procedural generation, and network calls, among others. In the next article, we will delve into these advanced topics, providing you with the knowledge to leverage the full power of Unity's coroutines in your projects. Whether you're a beginner just getting your feet wet or a seasoned developer looking to optimize your code, understanding coroutines is a valuable skill that can elevate your Unity development experience.
This is an article from DZone's 2023 Kubernetes in the Enterprise Trend Report.For more: Read the Report Kubernetes streamlines cloud operations by automating key tasks, specifically deploying, scaling, and managing containerized applications. With Kubernetes, you have the ability to group hosts running containers into clusters, simplifying cluster management across public, private, and hybrid cloud environments. AI/ML and Kubernetes work together seamlessly, simplifying the deployment and management of AI/ML applications. Kubernetes offers automatic scaling based on demand and efficient resource allocation, and it ensures high availability and reliability through replication and failover features. As a result, AI/ML workloads can share cluster resources efficiently with fine-grained control. Kubernetes' elasticity adapts to varying workloads and integrates well with CI/CD pipelines for automated deployments. Monitoring and logging tools provide insights into AI/ML performance, while cost-efficient resource management optimizes infrastructure expenses. This partnership streamlines the AI/ML development process, making it agile and cost-effective. Let's see how Kubernetes can join forces with AI/ML. The Intersection of AI/ML and Kubernetes The partnership between AI/ML and Kubernetes empowers organizations to deploy, manage, and scale AI/ML workloads effectively. However, running AI/ML workloads presents several challenges, and Kubernetes addresses those challenges effectively through: Resource management – This allocates and scales CPU and memory resources for AI/ML Pods, preventing contention and ensuring fair distribution. Scalability – Kubernetes adapts to changing AI/ML demands with auto-scaling, dynamically expanding or contracting clusters. Portability – AI/ML models deploy consistently across various environments using Kubernetes' containerization and orchestration. Isolation – Kubernetes isolates AI/ML workloads within namespaces and enforces resource quotas to avoid interference. Data management – Kubernetes simplifies data storage and sharing for AI/ML with persistent volumes. High availability – This guarantees continuous availability through replication, failover, and load balancing. Security – Kubernetes enhances security with features like RBAC and network policies. Monitoring and logging – Kubernetes integrates with monitoring tools like Prometheus and Grafana for real-time AI/ML performance insights. Deployment automation – AI/ML models often require frequent updates. Kubernetes integrates with CI/CD pipelines, automating deployment and ensuring that the latest models are pushed into production seamlessly. Let's look into the real-world use cases to better understand how companies and products can benefit from Kubernetes and AI/ML. REAL-WORLD USE CASES Use Case Examples Recommendation systems Personalized content recommendations in streaming services, e-commerce, social media, and news apps Image and video analysis Automated image and video tagging, object detection, facial recognition, content moderation, and video summarization Natural language processing (NLP) Sentiment analysis, chatbots, language translation, text generation, voice recognition, and content summarization Anomaly detection Identifying unusual patterns in network traffic for cybersecurity, fraud detection, and quality control in manufacturing Healthcare diagnostics Disease detection through medical image analysis, patient data analysis, drug discovery, and personalized treatment plans Autonomous vehicles Self-driving cars use AI/ML for perception, decision-making, route optimization, and collision avoidance Financial fraud detection Detecting fraudulent transactions in real-time to prevent financial losses and protect customer data Energy management Optimizing energy consumption in buildings and industrial facilities for cost savings and environmental sustainability Customer support AI-powered chatbots, virtual assistants, and sentiment analysis for automated customer support, inquiries, and feedback analysis Supply chain optimization Inventory management, demand forecasting, and route optimization for efficient logistics and supply chain operations Agriculture and farming Crop monitoring, precision agriculture, pest detection, and yield prediction for sustainable farming practices Language understanding Advanced language models for understanding and generating human-like text, enabling content generation and context-aware applications Medical research Drug discovery, genomics analysis, disease modeling, and clinical trial optimization to accelerate medical advancements Table 1 Example: Implementing Kubernetes and AI/ML As an example, let's introduce a real-world scenario: a medical research system. The main purpose is to investigate and find the cause of Parkinson's disease. The system analyzes graphics (tomography data and images) and personal patient data (which allows the use of the data). The following is a simplified, high-level example: Figure 1: Parkinson's disease medical research architecture The architecture contains the following steps and components: Data collection – gathering various data types, including structured, unstructured, and semi-structured data like logs, files, and media, in Azure Data Lake Storage Gen2 Data processing and analysis – utilizing Azure Synapse Analytics, powered by Apache Spark, to clean, transform, and analyze the collected datasets Machine learning model creation and training – employing Azure Machine Learning, integrated with Jupyter notebooks, for creating and training ML models Security and authentication – ensuring data and ML workload security and authentication through the Key Cloak framework and Azure Key Vault Container management – managing containers using Azure Container Registry Deployment and management – using Azure Kubernetes Services to handle ML model deployment, with management facilitated through Azure VNets and Azure Load Balancer Model performance evaluation – assessing model performance using log metrics and monitoring provided by Azure Monitor Model retraining – retraining models as required with Azure Machine Learning Now, let's examine security and how it lives in Kubernetes and AI/ML. Data Analysis and Security in Kubernetes In Kubernetes, data analysis involves processing and extracting insights from large datasets using containerized applications. Kubernetes simplifies data orchestration, ensuring data is available where and when needed. This is essential for machine learning, batch processing, and real-time analytics tasks. Kubernetes ML analyses require a strong security foundation, and robust security practices are essential to safeguard data in AI/ML and Kubernetes environments. This includes data encryption at rest and in transit, access control mechanisms, regular security audits, and monitoring for anomalies. Additionally, Kubernetes offers features like role-based access control (RBAC) and network policies to restrict unauthorized access. To summarize, here is an AL/ML for Kubernetes security checklist: Access control Set RBAC for user permissions Create dedicated service accounts for ML workloads Apply network policies to control communication Image security Only allow trusted container images Keep container images regularly updated and patched Secrets management Securely store and manage sensitive data (Secrets) Implement regular Secret rotation Network security Segment your network for isolation Enforce network policies for Ingress and egress traffic Vulnerability scanning Regularly scan container images for vulnerabilities Last but not least, let's look into distributed ML in Kubernetes. Distributed Machine Learning in Kubernetes Security is an important topic; however, selecting the proper distributed ML framework allows us to solve many problems. Distributed ML frameworks and Kubernetes provide scalability, security, resource management, and orchestration capabilities essential for efficiently handling the computational demands of training complex ML models on large datasets. Here are a few popular open-source distributed ML frameworks and libraries compatible with Kubernetes: TensorFlow – An open-source ML framework that provides tf.distribute.Strategy for distributed training. Kubernetes can manage TensorFlow tasks across a cluster of containers, enabling distributed training on extensive datasets. PyTorch – Another widely used ML framework that can be employed in a distributed manner within Kubernetes clusters. It facilitates distributed training through tools like PyTorch Lightning and Horovod. Horovod – A distributed training framework, compatible with TensorFlow, PyTorch, and MXNet, that seamlessly integrates with Kubernetes. It allows for the parallelization of training tasks across multiple containers. These are just a few of the many great platforms available. Finally, let's summarize how we can benefit from using AI and Kubernetes in the future. Conclusion In this article, we reviewed real-world use cases spanning various domains, including healthcare, recommendation systems, and medical research. We also went into a practical example that illustrates the application of AI/ML and Kubernetes in a medical research use case. Kubernetes and AI/ML are essential together because Kubernetes provides a robust and flexible platform for deploying, managing, and scaling AI/ML workloads. Kubernetes enables efficient resource utilization, automatic scaling, and fault tolerance, which are critical for handling the resource-intensive and dynamic nature of AI/ML applications. It also promotes containerization, simplifying the packaging and deployment of AI/ML models and ensuring consistent environments across all stages of the development pipeline. Overall, Kubernetes enhances the agility, scalability, and reliability of AI/ML deployments, making it a fundamental tool in modern software infrastructure. This is an article from DZone's 2023 Kubernetes in the Enterprise Trend Report.For more: Read the Report
Kai Wähner
Technology Evangelist,
Confluent
Gilad David Maayan
CEO,
Agile SEO