2024 DevOps Lifecycle: Share your expertise on CI/CD, deployment metrics, tech debt, and more for our Feb. Trend Report (+ enter a raffle!).
Kubernetes in the Enterprise: Join our Virtual Roundtable as we dive into Kubernetes over the past year, core usages, and emerging trends.
Integration refers to the process of combining software parts (or subsystems) into one system. An integration framework is a lightweight utility that provides libraries and standardized methods to coordinate messaging among different technologies. As software connects the world in increasingly more complex ways, integration makes it all possible facilitating app-to-app communication. Learn more about this necessity for modern software development by keeping a pulse on the industry topics such as integrated development environments, API best practices, service-oriented architecture, enterprise service buses, communication architectures, integration testing, and more.
REST API Microservice AI Design and Spreadsheet Rules
Building LangChain Applications With Amazon Bedrock and Go: An Introduction
API development is a big part of what I love about software. Whether it’s building integrations or crafting APIs for decoupled web applications, it’s usually just me and the code. Most of the time, I work as a solo API developer. Going solo has its perks: fast decisions and full control. But it's a double-edged sword since keeping everything in my head makes handoffs and delegation tricky. And going solo limits the size and complexity of projects I can work on. After all, I am just one person. Postman is my primary tool for API work — sending requests, managing environments, and running tests. I’m familiar with my solo workflow. But I’ve started to wonder: in a team environment, what more can Postman offer? How might it enhance collaboration and streamline the dev process? To explore these questions, I started working on an example API, which I call “X Nihilo.” Example API: X Nihilo X Nihilo helps you generate 280-character tweets based on parameters you store or send. You provide a topic, a goal, a description of the tone to take, and a description of the audience. Behind the scenes, the API will send a request to OpenAI’s API for text completion, which will assist in generating the tweet. In addition, you can save the strings you use for tone and audience then reuse them in subsequent tweet requests. Let’s walk through my basic API dev workflow. Going At It Alone: My Solo Workflow The first step in my workflow is to design the API and write up an OpenAPI spec. In Postman, I created a new API, and then I started a new API definition. After some thinking (and working directly with ChatGPT, which was great for generating an initial OpenAPI spec based on my descriptions), I had my spec written: With my OpenAPI spec in place, I came to a fork in the road. Should I set up a mock server and some example requests and responses to show what it would look like to interact with this API? Or should I start writing implementation code? As a solo developer, I can only be an API producer or an API consumer at any given time. So I decided: no need to build mocks — the consumer in me would have to wait. Let’s write some code! A Few Moments Later… Using Node.js with Express, and talking to a PostgreSQL database, I had my basic API implemented. Here’s a rundown of everything that I needed to build: POST /signin takes a username and password, authenticates against records in the database, and then returns a signed JWT which can be used in all subsequent requests. POST /generateTweet generates a 280-character (max) tweet. It takes a topic (string) and a goal (string). It also takes either a tone (string) or a toneId (integer ID of a stored tone), along with either an audience (string) or an audienceId (integer ID of a stored audience). Whenever tone and/or audience strings are provided, the API will save these to the database. GET /tones returns a list of tone IDs and corresponding strings. GET /audiences does the same for reusable audience strings. DELETE /tones takes a tone ID and deletes that tone record. DELETE /audiences does the same for audience records. After the initial implementation was done, it was time to get back to Postman to start running some requests. Create an Environment With Variables First, I created a new environment called “Test.” I added variables to store my root_url along with a valid username and password. Create a Collection and a Request Then, I created a new collection and added my first request: a POST request to /signin, to try out authentication. With my API server running in a terminal window, I sent my first request. Success! I got my token, which I would need in any future requests. I created a new request, this time to generate a tweet. I made sure to set the Authorization to use “Bearer Token,” and I provided the token that I just received from the previous request. Here’s the response: It works! Summing up the Solo Approach That’s a basic sneak peek into my solo workflow. Of course, I would do a few other things along the way: Write a pre-request script to perform a /signin request and then set an environment variable based on the token in the response. Create requests for all other endpoints in the OpenAPI spec. Write tests for each endpoint, making sure to cover my edge cases. If I’m working solo, this basic workflow gets me pretty close to the finish line. While that’s fine, I know that I’m probably only scratching the surface of available features in Postman. And I know that I would need a lot more from Postman if I was working on a team. The Aha! Moment: Why Consider Postman for Teams? What if I could no longer be a solo API developer for X Nihilo? This could happen due to several reasons: X Nihilo grows in size and complexity, and a single API developer is no longer enough to support it. X Nihilo is only a small part of a larger API project involving multiple API developers or maybe even multiple API teams. Not all of my API projects for clients will be small ones that I can build on my own. At times, I’ll need to be part of a team that builds an API. I might even need to lead the API team. When that happens, I would need to leave my solo mindset behind and leave my solo way of doing things in Postman. That motivated me to look into Postman’s team-related features. Exploring Postman's Team (Enterprise) Features Postman has a free tier, and it offers some limited collaboration features, which might be sufficient if you’re a small team (meaning not more than three developers). Postman has additional features as part of its Enterprise Essentials tier. These are great for API teams in larger organizations that use Postman across the board. Workspace Sharing A workspace lets your teams collaborate on multiple API projects. This is great if you have different teams working on different APIs, but those APIs interact with one another (which is typically the case in larger software organizations). Workspaces are excellent for enabling real-time collaboration. Team members can edit API documentation, work together on crafting requests or writing tests, and build out a mock server that the entire team can use. As edits are made, they’re synced with the entire team in real-time. Version Control While I cared about version control for my code, as a solo API developer, I didn’t care much about version control of my Postman collections or environments. If I change something (modify a request, update a test, remove an environment variable), I’m the only one affected. No big deal. When you work on a team, you definitely want to know when things change. And sometimes, you need to roll back changes. Fortunately, for teams using Postman Enterprise, Postman gives you access to a changelog for collections, workspaces, and APIs. You can roll back collections to earlier points in time. As an API developer on a team, you’ll need this. Most of the time, it’s because Bob screwed up the collection. Sometimes, though, you’re Bob. Role-Based Access and User Organization Not everybody in a Postman workspace should have the same level of permissions. Some team members are testers, and they just need the ability to send requests and run tests — but not modify them. Others might be designers who are only allowed to modify API definitions. In Postman, you can assign roles to team members. This affects the kind of access and level of permission that each team member has within a team or workspace. Teams can also have private workspaces, restricting access from others outside the team. I also noticed that Postman Enterprise supports “domain capture.” This basically means you can set up all the users in your organization by giving access to everyone from the (for example) mycompany.biz domain. Inline Comments and Discussions Postman does have one collaboration feature, which is available on all its plans, not just its Enterprise Essentials. This is the ability for team members to add comments to collections (on folders, requests, examples, or pull requests). Being able to comment and discuss directly in Postman is huge for team API development. Since most of your team’s API dev work will happen in Postman, it makes sense to have your discussion there (instead of in GitHub or some other external service). Team API Development: Postman for the Win Whenever I join a team, I bring the tools that I love using, but I don’t really push them on my teammates. I figure they’ve got their own tools. So, to each their own. But I have a feeling that most API developers have Postman in their toolbelt. It would just be a shame if several API developers on a team all used Postman individually, with a solo mindset, and without taking advantage of some of these team features. If they took advantage of Postman Enterprise features, they would get… Increased Efficiency In a shared workspace, you get to start with the shared API definition. Whenever a team member edits the definition (or documentation or tests or environments), the edits sync up, and everybody has the latest and greatest. Everybody works off the same set of requests in a collection, and everybody has access to all the tests that will be used. All of these conveniences will improve a team’s efficiency, which in turn will skyrocket their development velocity. Fewer Misunderstandings When everybody is working from the same source of truth (your workspace), and they can make comments and converse within the tool they’re using for development, this will lead to fewer misunderstandings. Your team won’t lose time to confusion over whether that endpoint was supposed to take query params or path params. Everything will be clear. Key Takeaway I didn’t know that Postman was for teams and enterprises. Now that I know, here’s my big takeaway: Team players use tools that are good for the team. Postman has been great for me when I’m a solo API developer. Fortunately, it also has features that would make it great for me when I’m on an API dev team. For me, that’s real-time sync of edits within collections, changelogs, and version control, and granular permissions for access. Now, I’m excited to take on larger API projects with larger teams. Happy coding!
The topic of Serverless testing is a hot one at the moment. There are many different approaches and opinions on how best to do it. In this post, I'm going to share some advice on how we tackled this problem, what the benefits are of our approach, and how things could be improved. The project in question is Stroll Insurance, a fully Serverless application running on AWS. In previous posts, we have covered some of the general lessons learned from this project, but in this post, we are going to focus on testing. For context, the web application is built with React and TypeScript, which makes calls to an AppSync API that makes use of the Lambda and DynamoDB datasources. We use Step Functions to orchestrate the flow of events for complex processing like purchasing and renewing policies, and we use S3 and SQS to process document workloads. The Testing 'Triangle' When the project started, it relied heavily on unit testing. This isn't necessarily a bad thing, but we needed a better balance between getting features delivered and maintaining quality. Our testing pyramid triangle looked like this: Essentially, we had an abundance of unit tests and very few E2E tests. This worked really well for the initial stages of the project, but as the product and AWS footprint grew in complexity, we could see that a number of critical parts of the application had no test coverage. Specifically, we had no tests for: Direct service integrations used by AppSync resolvers + StepFunctions Event-driven flows like document processing These underpin critical parts of the application. If they stop working, people will be unable to purchase insurance! One problem that we kept experiencing was unit tests would continue to pass after a change to a lambda function, but subsequent deployments would fail. Typically, this was because the developer had forgotten to update the permissions in CDK. As a result, we created a rule that everyone had to deploy and test their changes locally first, in their own AWS sandbox, before merging. This worked, but it was an additional step that could easily be forgotten, especially when under pressure or time constraints. Balancing the Triangle So, we agreed that it was time to address the elephant in the room. Where are the integration tests? Our motivation was simple: There is a lack of confidence when we deploy to production, meaning we perform a lot of manual checks before we deploy and sometimes these don't catch everything. This increases our lead time and reduces our deployment frequency. We would like to invert this. The benefits for our client were clear: Features go live quicker, reducing time to market while still maintaining quality Gives them a competitive edge Reduces feedback loop, enabling iteration on ideas over a shorter period of time Critical aspects of the application are tested Issues can be diagnosed quicker Complex bugs can be reproduced Ensures / Reduces no loss of business Integration testing can mean a lot of different things to different teams, so our definition was this: An integration test in this project is defined as one that validates integrations with AWS services (e.g. DynamoDB, S3, SQS, etc) but not third parties. These should be mocked out instead. Breaking Down the Problem We decided to start small by first figuring out how to test a few critical paths that had caused us issues in the past. We made a list of how we “trigger” a workload: S3 → SQS → Lambda DynamoDB Stream → SNS → Lambda SQS → Lambda StepFunction → Lambda The pattern that emerged was that we have an event that flows through a messaging queue, primarily SQS and SNS. There are a number of comments we can make about this: There's no real business logic to test until a Lambda function or a State Machine is executed, but we still want to test that everything is hooked up correctly. We have the most control over the Lambda functions, and it will be easier to control the test setup in there. We want to be able to put a function or a State Machine into “test mode” so that it will know when to make mocked calls to third parties. We want to keep track of test data that is created so we can clean it up afterward. Setting the Test Context One of the most critical parts of the application is how we process insurance policy documents. This has enough complexity to be able to develop a good pattern for writing our tests so that other engineers could build upon it in the future. This was the first integration test we were going to write. The flow is like this: The file is uploaded to the S3 bucket This event is placed onto an SQS queue with a Lambda trigger The Lambda function reads the PDF metadata and determines who the document belongs to. It fetches some data from a third-party API relating to the policy and updates a DynamoDB table. The file is moved to another bucket for further processing. We wanted to assert that: The file no longer exists in the source bucket The DynamoDB table was updated with the correct data The file exists in the destination bucket This would be an incredibly valuable test. Not only does it verify that the workload is behaving correctly, but it also verifies that the deployed infrastructure is working properly and that it has the correct permissions. For this to work, we needed to make the Lambda Function aware that it was running as part of a test so that it would use a mocked response instead. The solution that we came up with was to attach some additional metadata to the object when it was uploaded at the start of the test case — an is-test flag: If the S3 object is moved to another bucket as part of its processing, then we also copy its metadata. The metadata is never lost, even in more complex or much larger end-to-end workflows. The Middy Touch Adding the is-test flag to our object metadata gave us our way of passing some kind of test context into our workload. The next step was to make the Lambda Function capable of discovering the context and then using that to control how it behaves under test. For this, we used Middy. If you're not familiar, Middy is a middleware framework specifically designed for Lambda functions. Essentially, it allows you to wrap your handler code up so that you can do some before and after processing. I'm not going to do a Middy deep dive here, but the documentation is great if you haven't used it before. We were already using Middy for various different things, so it was a great place to do some checks before we executed our handler. The logic is simple: In the before phase of the middleware, check for the is-test flag in the object's metadata, and if true, set a global test context so that the handler is aware it's running as part of a test. In the after phase (which is triggered after the handler is finished), clear the context to avoid any issues for subsequent invocations of the warmed-up function: TypeScript export const S3SqsEventIntegrationTestHandler = (logger: Logger): middy.MiddlewareObj => { // this happens before our handler is invoked. const before: middy.MiddlewareFn = async (request: middy.Request<SQSEvent>): Promise<void> => { const objectMetadata = await getObjectMetadata(request.event); const isIntegrationTest = objectMetadata.some(metadata => metadata["is-test"] === "true"); setTestContext({isIntegrationTest}); }; // this happens after the handler is invoked. const after: middy.MiddlewareFn = (): void => { setTestContext({isIntegrationTest: false}); }; return { before, after, onError }; }; Here's the test context code. It follows a simple TypeScript pattern to make the context read-only: TypeScript export interface TestContext { isIntegrationTest: boolean } const _testContext: TestContext = { isIntegrationTest: false }; export const testContext: Readonly<TestContext> = _testContext; export const setTestContext = (updatedContext: TestContext): void => { _testContext.isIntegrationTest = updatedContext.isIntegrationTest; }; I think this is the hardest part about solving the Serverless testing “problem.” I believe the correct way to do this is in a real AWS environment, not a local simulator, and just making that deployed code aware that it is running as part of a test is the trickiest part. Once you have some kind of pattern for that, the rest is straightforward enough. We then built upon this pattern for each of our various triggers, building up a set of middleware handlers for each trigger type. For our S3 middleware, we pass the is-test flag in an object's metadata, but for SQS and SNS, we pass the flag using message attributes. A Note on Step Functions By far the most annoying trigger to deal with was Lambda Functions invoked by a State Machine task. There is no easy way of passing metadata around each of the states in a State Machine - a global state would be really helpful (but would probably be overused and abused by people). The only thing that is globally accessible by each state is the Context Object. Our workaround was to use a specific naming convention when the State Machine is executed, with the execution name included in the Context Object and therefore available to every state in the State Machine. For State Machines that are executed by a Lambda Function, we can use our testContext to prefix all State Machine executions with "IntegrationTest-". This is obviously a bit of a hack, but it does make it easy to spot integration test runs from the execution history of the State Machine. We then make sure that the execution name is passed into each Lambda Task and that our middleware is able to read the execution name from the event. (Note that $$ provides access to the Context Object). Another difficult thing to test with Step Functions is error scenarios. These will often be configured with retry and backoff functionality, which can make tests too slow to execute. Thankfully, there is a way around this, which my colleague, Tom Bailey, has covered in a great post. I would recommend giving that a read. Mocking Third-Party APIs We're now at a point where a Lambda Function is being invoked as part of our workload under test. That function is also aware that it's running as part of a test. The next thing we want to do is determine how we can mock the calls to our third-party APIs. There are a few options here: Wiremock: You could host something like wiremock in the AWS account and call the mocked API rather than the real one. I've used Wiremock quite a bit, and it works really well, but can be difficult to maintain as your application grows. Plus, it's another thing that you have to deploy and maintain. API Gateway: Either spin up your own custom API for this or use the built-in mock integrations. DynamoDB: This is our current choice. We have a mocked HTTP client that, instead of making an HTTP call, queries a DynamoDB table for a mocked response, which has been seeded before the test has run. Using DynamoDB gave us the flexibility we needed to control what happens for a given API call without having to deploy a bunch of additional infrastructure. Asserting That Something Has Happened Now, it's time to determine if our test has actually passed or failed. A typical test would be structured like this: TypeScript it("should successfully move documents to the correct place", async () => { const seededPolicyData = await seedPolicyData(); await whenDocumentIsUploadedToBucket(); await thenDocumentWasDeletedFromBucket(); await thenDocumentWasMovedToTheCorrectLocation(); }); With our assertions making use of the aws-testing-library: TypeScript async function thenDocumentWasMovedToTheCorrectLocation(): Promise<void> { await expect({ region, bucket: bucketName, }).toHaveObject(expectedKey); } The aws-testing-library gives you a set of really useful assertions with built-in delays and retries. For example: Checking an item exists in DynamoDB: TypeScript await expect({ region: 'us-east-1', table: 'dynamo-db-table', }).toHaveItem({ partitionKey: 'itemId', }); Checking an object exists in an S3 bucket: TypeScript await expect({ region: 'us-east-1', bucket: 's3-bucket', }).toHaveObject( 'object-key' ); Checking if a State Machine is in a given state: TypeScript await expect({ region: 'us-east-1', stateMachineArn: 'stateMachineArn', }).toBeAtState('ExpectedState'); It's important to note that because you're testing in a live, distributed system, you will have to allow for cold starts and other non-deterministic delays when running your tests. It certainly took us a while to get the right balance between retries and timeouts. While at times it has been flakey, the benefits of having these tests far outweigh the occasional test failure. Running the Tests There are two places where these tests get executed: developer machines and CI. Each developer on our team has their own AWS account. They are regularly deploying a full version of the application and run these integration tests against it. What I really like to do is get into a test-driven development flow where I will write the integration test first and make my code changes, which will be hot-swapped using CDK, and then run my integration test until it turns green. This would be pretty painful if I was waiting on a full stack to deploy each time, but Hot Swap works well at reducing the deployment time. On CI, we run these tests against a development environment after a deployment has finished. It Could Be Better There are a number of things that we would like to improve upon in this approach. Temporary environments: We would love to run these tests against temporary environments when a Pull Request is opened. Test data cleanup: Sometimes, tests are flaky and don't clean up after themselves properly. We have toyed with the idea of setting a TTL on DynamoDB records when data is created as part of a test. Run against production: We don't run these against production yet, but that is the goal. Open source the middleware: I think more people could make use of the middleware than just us, but we haven't got around to open-sourcing it yet. AWS is trying to make it better: Serverless testing is a hot topic at the moment. AWS has responded with some great resources. Summary While there are still some rough edges to our approach, the integration tests really helped with the issues we have already outlined and can be nicely summarised with three of the four key DORA metrics: Deployment frequency: The team's confidence increased when performing deployments, which increased their frequency. Lead time for changes: Less need for manual testing reduced the time it takes for a commit to make it to production. Change failure rate: Permissions errors no longer happen in production, and bugs are caught sooner in the process. The percentage of deployments causing a failure in production was reduced. Resources
In the dynamic landscape of web development, the choice of an API technology plays a pivotal role in determining the success and efficiency of a project. In this article, we embark on a comprehensive exploration of three prominent contenders: REST, gRPC, and GraphQL. Each of these technologies brings its own set of strengths and capabilities to the table, catering to different use cases and development scenarios. What Is REST? REST API, or Representational State Transfer Application Programming Interface, is a set of architectural principles and conventions for building web services. It provides a standardized way for different software applications to communicate with each other over the Internet. REST is often used in the context of web development to create scalable and maintainable APIs that can be easily consumed by a variety of clients, such as web browsers or mobile applications. Key characteristics of a REST API include: Statelessness: Each request from a client to a server contains all the information needed to understand and process the request. The server does not store any information about the client's state between requests. This enhances scalability and simplifies the implementation on both the client and server sides. Resource-based: REST APIs are centered around resources, which are identified by URLs (Uniform Resource Locators). These resources can represent entities like objects, data, or services. CRUD (Create, Read, Update, Delete) operations are performed on these resources using standard HTTP methods like GET, POST, PUT, and DELETE. Representation: Resources are represented in a format such as JSON (JavaScript Object Notation) or XML (eXtensible Markup Language). Clients can request different representations of a resource, and the server will respond with the data in the requested format. Uniform interface: REST APIs maintain a uniform interface, making it easy for developers to understand and work with different APIs. This uniformity is achieved through a set of constraints, including statelessness, resource-based representation, and standard HTTP methods. Stateless communication: Communication between the client and server is stateless, meaning that each request from the client contains all the information necessary for the server to fulfill that request. The server does not store any information about the client's state between requests. Client-server architecture: REST APIs follow a client-server architecture, where the client and server are independent entities that communicate over a network. This separation allows for flexibility and scalability, as changes to one component do not necessarily affect the other. Cacheability: Responses from the server can be explicitly marked as cacheable or non-cacheable, allowing clients to optimize performance by caching responses when appropriate. REST APIs are widely used in web development due to their simplicity, scalability, and compatibility with the HTTP protocol. They are commonly employed to enable communication between different components of a web application, including front-end clients and back-end servers, or to facilitate integration between different software systems. Pros and Cons of REST REST has several advantages that contribute to its widespread adoption in web development. One key advantage is its simplicity, as RESTful APIs are easy to understand and implement. This simplicity accelerates the development process and facilitates integration between different components of a system. The statelessness of RESTful communication allows for easy scalability, as each request from the client contains all the necessary information, and servers don't need to maintain client state between requests. REST's flexibility, compatibility with various data formats (commonly JSON), and support for caching enhance its overall performance. Its well-established nature and support from numerous tools and frameworks make REST a popular and accessible choice for building APIs. However, REST does come with certain disadvantages. One notable challenge is the potential for over-fetching or under-fetching of data, where clients may receive more information than needed or insufficient data, leading to additional requests. The lack of flexibility in data retrieval, especially in scenarios where clients require specific data combinations, can result in inefficiencies. Additionally, while REST is excellent for stateless communication, it lacks built-in support for real-time features, requiring developers to implement additional technologies or workarounds for immediate data updates. Despite these limitations, the advantages of simplicity, scalability, and widespread support make REST a robust choice for many web development projects. What Is gPRC? gRPC, which stands for "gRPC Remote Procedure Calls," is an open-source RPC (Remote Procedure Call) framework developed by Google. It uses HTTP/2 as its transport protocol and Protocol Buffers (protobuf) as the interface description language. gRPC facilitates communication between client and server applications, allowing them to invoke methods on each other as if they were local procedures, making it a powerful tool for building efficient and scalable distributed systems. Key features of gRPC include: Performance: gRPC is designed to be highly efficient, leveraging the capabilities of HTTP/2 for multiplexing multiple requests over a single connection. It also uses Protocol Buffers, a binary serialization format, which results in faster and more compact data transmission compared to traditional text-based formats like JSON. Language agnostic: gRPC supports multiple programming languages, enabling developers to build applications in languages such as Java, C++, Python, Go, Ruby, and more. This language-agnostic nature promotes interoperability between different components of a system. IDL (Interface Definition Language): gRPC uses Protocol Buffers as its IDL for defining the service methods and message types exchanged between the client and server. This provides a clear and structured way to define APIs, allowing for automatic code generation in various programming languages. Bidirectional streaming: One of gRPC's notable features is its support for bidirectional streaming. This means that both the client and server can send a stream of messages to each other over a single connection, providing flexibility in communication patterns. Code generation: gRPC generates client and server code based on the service definition written in Protocol Buffers. This automatic code generation simplifies the development process and ensures that the client and server interfaces are in sync. Strong typing: gRPC uses strongly typed messages and service definitions, reducing the chances of runtime errors, and making the communication between services more robust. Support for authentication and authorization: gRPC supports various authentication mechanisms, including SSL/TLS for secure communication. It also allows for the implementation of custom authentication and authorization mechanisms. gRPC is particularly well-suited for scenarios where high performance, scalability, and efficient communication between distributed systems are critical, such as in microservices architectures. Its use of modern protocols and technologies makes it a compelling choice for building complex and scalable applications. Pros and Cons of gPRC gRPC presents several advantages that contribute to its popularity in modern distributed systems. One key strength is its efficiency, as it utilizes the HTTP/2 protocol, enabling multiplexing of multiple requests over a single connection and reducing latency. This efficiency, combined with the use of Protocol Buffers for serialization, results in faster and more compact data transmission compared to traditional REST APIs, making gRPC well-suited for high-performance applications. The language-agnostic nature of gRPC allows developers to work with their preferred programming languages, promoting interoperability in heterogeneous environments. The inclusion of bidirectional streaming and strong typing through Protocol Buffers further enhances its capabilities, offering flexibility and reliability in communication between client and server components. While gRPC offers substantial advantages, it comes with certain challenges. One notable drawback is the learning curve associated with adopting gRPC, particularly for teams unfamiliar with Protocol Buffers and the concept of remote procedure calls. Debugging gRPC services can be more challenging due to the binary nature of Protocol Buffers, requiring specialized tools and knowledge for effective troubleshooting. Additionally, the maturity of the gRPC ecosystem may vary across different languages and platforms, potentially impacting the availability of third-party libraries and community support. Integrating gRPC into existing systems or environments that do not fully support HTTP/2 may pose compatibility challenges, requiring careful consideration before migration. Despite these challenges, the efficiency, flexibility, and performance benefits make gRPC a compelling choice for certain types of distributed systems. What Is GraphQL? GraphQL is a query language for APIs (Application Programming Interfaces) and a runtime for executing those queries with existing data. It was developed by Facebook in 2012 and later open-sourced in 2015. GraphQL provides a more efficient, powerful, and flexible alternative to traditional REST APIs by allowing clients to request only the specific data they need. Key features of GraphQL include: Declarative data fetching: Clients can specify the structure of the response they need, including nested data and relationships, in a single query. This eliminates over-fetching and under-fetching of data, ensuring that clients precisely receive the information they request. Single endpoint: GraphQL APIs typically expose a single endpoint, consolidating multiple RESTful endpoints into one. This simplifies the API surface and allows clients to request all the required data in a single query. Strong typing and schema: GraphQL APIs are defined by a schema that specifies the types of data that can be queried and the relationships between them. This schema provides a clear contract between clients and servers, enabling strong typing and automatic validation of queries. Real-time updates (subscriptions): GraphQL supports real-time data updates through a feature called subscriptions. Clients can subscribe to specific events, and the server will push updates to the client when relevant data changes. Introspection: GraphQL APIs are self-documenting. Clients can query the schema itself to discover the types, fields, and relationships available in the API, making it easier to explore and understand the data model. Batched queries: Clients can send multiple queries in a single request, reducing the number of network requests and improving efficiency. Backend aggregation: GraphQL allows the backend to aggregate data from multiple sources, such as databases, microservices, or third-party APIs, and present it to the client in a unified way. GraphQL is often used in modern web development, particularly in single-page applications (SPAs) and mobile apps, where optimizing data transfer and minimizing over-fetching are crucial. It has gained widespread adoption and is supported by various programming languages and frameworks, both on the client and server sides. Deciding the Right API Technology Choosing between REST, gRPC, and GraphQL depends on the specific requirements and characteristics of your project. Each technology has its strengths and weaknesses, making them more suitable for certain use cases. Here are some considerations for when to choose REST, gRPC, or GraphQL: Choose REST when: Simplicity is key: REST is straightforward and easy to understand. If your project requires a simple and intuitive API, REST might be the better choice. Statelessness is sufficient: If statelessness aligns well with your application's requirements and you don't need advanced features like bidirectional streaming, REST is a good fit. Widespread adoption and compatibility: If you need broad compatibility with various clients, platforms, and tooling, REST is well-established and widely supported. Choose gRPC when: High performance is critical: gRPC is designed for high-performance communication, making it suitable for scenarios where low latency and efficient data transfer are crucial, such as microservices architectures. Strong typing is important: If you value strong typing and automatic code generation for multiple programming languages, gRPC's use of Protocol Buffers can be a significant advantage. Bidirectional streaming is needed: For applications that require bidirectional streaming, real-time updates, and efficient communication between clients and servers, gRPC provides a robust solution. Choose GraphQL when: Flexible data retrieval is required: If your application demands flexibility in data retrieval and allows clients to specify the exact data they need, GraphQL's query language provides a powerful and efficient solution. Reducing over-fetching and under-fetching is a priority: GraphQL helps eliminate over-fetching and under-fetching of data by allowing clients to request only the specific data they need. This is beneficial in scenarios where optimizing data transfer is crucial. Real-time updates are essential: If real-time features and the ability to subscribe to data updates are critical for your application (e.g., chat applications, live notifications), GraphQL's support for subscriptions makes it a strong contender. Ultimately, the choice between REST, gRPC, and GraphQL should be based on a careful evaluation of your project's requirements, existing infrastructure, and the specific features offered by each technology. Additionally, consider factors such as developer familiarity, community support, and ecosystem maturity when making your decision. It's also worth noting that hybrid approaches, where different technologies are used for different parts of an application, can be viable in certain scenarios. Conclusion The choice between REST, gRPC, and GraphQL is a nuanced decision that hinges on the specific requirements and objectives of a given project. REST, with its simplicity and widespread adoption, remains a solid choice for scenarios where ease of understanding and compatibility are paramount. Its statelessness and broad support make it an excellent fit for many web development projects. On the other hand, gRPC emerges as a powerful contender when high performance and efficiency are critical, particularly in microservices architectures. Its strong typing, bidirectional streaming, and automatic code generation make it well-suited for applications demanding low-latency communication and real-time updates. Meanwhile, GraphQL addresses the need for flexible data retrieval and the elimination of over-fetching and under-fetching, making it an optimal choice for scenarios where customization and optimization of data transfer are essential, especially in applications requiring real-time features. Ultimately, the decision should be guided by a careful assessment of project requirements, developer expertise, and the specific features offered by each technology, recognizing that a hybrid approach may offer a pragmatic solution in certain contexts.
What Is an API Gateway? An API Gateway is a tool that acts as an intermediary for requests from clients seeking resources from servers or microservices. It manages, routes, aggregates, and secures the API requests. Like previous patterns we have explored, this is often described as a “microservices context” pattern, but this is not necessarily the case. It could be worth using in many “not microservices” cases and sometimes shouldn’t be used in microservices. Let’s go deeper into the details. Request Routing This involves taking a client’s request and determining which service/services should handle it. It could have different aspects: Dynamic routing: API Gateways can dynamically route requests based on URL paths, HTTP methods, HTTP headers, etc. NB: This could be useful in the case of a multi-tenancy context. (See Architecture Patterns: Multi-tenancy with Keycloak, Angular, and Spring Boot) Service versioning: Allows multiple versions of a service to coexist; Clients can specify which version they want to interact with. NB: This is very useful in microservices, but also in SOA or any kind of exposed API that needs to allow different versions. Load distribution: Some gateways can distribute load to multiple instances of a service, often in conjunction with a load balancer. API Composition Combining multiple service requests into a single response to streamline client communication. This could be done with: Aggregation: For instance, a client might want details about a user and their orders. Instead of making separate calls, a single call is made, and the gateway fetches data from the user and order services, aggregating the results. Transformation: Transforming data from multiple services into a format expected by the client NB: This part could also be achieved with another pattern called backend for the frontend (BFF). It could also be combined with the gateway depending on your needs. (See Backend For Frontend (BFF) Pattern) Rate Limiting Restricting the number of requests a user or service can make within a given time frame. This is very useful for protecting your API and can have several uses: Client-specific limits: Different clients can have different rate limits based on their roles, subscription levels, etc. Burst vs. sustained limits: Allow short bursts of traffic or limit requests over a more extended period Preventing system overload: Ensures that services aren’t overwhelmed with too many requests, leading to degradation or failures Security Ensuring that only authorized requests reach the services could also provide client-specific authentication. Authentication: Verifying the identity of clients using methods like JWT, OAuth tokens, API keys, etc. Authorization: Determining what an authenticated client is allowed to do Threat detection: Some gateways can identify and block potential security threats like DDoS attacks, SQL injections, etc. (Related to the previous point) Caching Caching is temporarily storing frequently used data to speed up subsequent requests. This depends on the caching strategy you are trying to achieve. It could also be done in BFF, maybe not at all. Response caching: Store service responses for common requests to avoid redundant processing. TTL (Time-To-Live): Ensuring cached data isn’t too old, defining how long it should be stored Cache invalidation: Mechanisms to remove outdated or incorrect data Service Discovery This refers to finding the network locations of service instances automatically. Dynamic location: In dynamic environments like Kubernetes, services might move around. The gateway keeps track of where they are, so the consumer doesn’t have to worry about it; it’s a way of decoupling the scalability effects from the client side. Health checks: If a service instance fails a health check, the gateway won’t route requests to it. It prevents consumers from requesting to reach a down service, which improves the quality of failure management and may also prevent some types of exploits. Integration with Service Discovery tools: Often integrated with tools like Consul, Eureka, or Kubernetes service discovery Analytics and Monitoring Analytics and monitoring relate to gathering data on API usage and system health. DevOps would be very interested in this feature, especially because it allows them to have a good view of the activity on a whole system, regardless of its complexity. Logging: Capturing data about every request and response Metrics: Tracking key metrics like request rate, response times, error rates (classified by HTTP code), etc. Visualization: Integration with tools like Grafana or Kibana to visualize the data Alerting: Notifying system operators if something goes wrong or metrics breach a threshold This could be a representation: Benefits and Trade-Offs Benefits Simplified Client By having a unified access point, clients can communicate without knowing the intricacies of the backend services. This simplifies client development and maintains a consistent experience, as they don’t need to handle the varied endpoints and protocols directly. Centralized Management A major advantage of the API Gateway is that common functionalities such as rate limiting or security checks are handled in a single place. This reduces redundant code and ensures a consistent application of rules and policies. Cross-Cutting Concerns Concerns that apply to multiple services, like logging or monitoring, can be handled at the gateway level. This ensures uniformity and reduces the overhead of implementing these features in every single service. Optimized Requests and Responses Based on the client’s needs (e.g., mobile vs web vs desktop), the API Gateway can modify requests and responses. This ensures clients receive data in the most optimal format, reducing unnecessary payload and enhancing speed. Increased Security By centralizing authentication and authorization mechanisms, the gateway provides a consistent and robust security barrier. It also can encrypt traffic, providing an added layer of data protection. Stability Features like circuit breaking prevent overloading a service, ensuring smooth system operation. The gateway can quickly reroute or pause requests if a service becomes unresponsive, maintaining overall system health. Trade-Offs Single Point of Failure (SPOF) Without appropriate high availability and failover strategies, the API Gateway can become a system’s Achilles heel. If it goes down, all access to the backend services may be cut off. This then became a very strategic and sensitive point for the production environment, but not only that, when working on the gateway in a given environment, all the services that depend on that access are affected, so it has to be considered in both the production and development aspects. Complexity Introducing an API Gateway adds another component to manage and operate. This can increase deployment complexity and necessitate additional configuration and maintenance. I would agree with Elon Musk that “the best part is no part" - this could be true for aerospace development as well as software engineering. Latency As all requests and responses pass through the gateway, there’s potential for added latency, especially if extensive processing or transformation is involved. The same as the previous point, usually more components on the transaction path imply more time. Scaling Issues High traffic can stress the API Gateway. Proper scaling strategies, both vertical and horizontal, are essential to ensure the gateway can handle peak loads without degrading performance. (Linked to SPOF point) Potential Inefficiencies Without careful design, the gateway can introduce inefficiencies, such as redundant API calls or unnecessary data transformations. Proper optimization and continuous monitoring are crucial. Conclusion The API Gateway, like many architectural patterns, offers a robust suite of functionalities that cater to an array of needs in modern software systems. By providing a unified entry point for client requests, it streamlines, secures, and optimizes interactions between clients and services, especially in microservices-based systems. However, as with any tool or pattern, it comes with its unique set of challenges. The potential for increased latency, the need for special scaling considerations, and the risk of introducing a single point of failure underscores the importance of careful planning, design, and continuous monitoring. When employed judiciously, and with a thorough understanding of both its benefits and trade-offs, an API Gateway can prove invaluable in achieving scalable, secure, and efficient system interactions. As always, architects and developers must weigh the pros and cons to determine the fit of the API Gateway within their specific context, ensuring they harness its power while mitigating potential pitfalls.
In an era where data reigns supreme, integrating data management with business intelligence (BI) is no longer an option — it's a strategic imperative. But this imperative is also fraught with challenges and complexities, given the unique attributes of each field. Data management serves as the foundational bedrock, focusing on the secure and organized handling of data across its lifecycle. On the other hand, business intelligence (BI) is the analytical engine that transforms this raw data into meaningful insights that drive business decisions. Together, they form a formidable duo capable of delivering enhanced insights, agile responses, and a more robust understanding of market dynamics. The symbiosis between data management and BI is like a well-oiled machine, where each cog plays an indispensable role. Yet, achieving a seamless integration between these two vital components is easier said than done. From technological incongruities to organizational hurdles, there's a labyrinth of challenges that organizations must navigate to create an integrated environment. However, the promise of more actionable insights, heightened efficiency, and a stronger competitive edge makes this integration an unmissable opportunity. This guide aims to unravel the complexities of integrating data management with business intelligence. We'll delve into the pillars that constitute each domain, explore the challenges that may arise during the integration process, and provide best practices to achieve a successful integration. Let's embark on this journey through the landscape of data management and BI integration — a venture that promises to redefine how organizations perceive, manage, and benefit from their data assets. The Pillars of Data Management Data management is not a monolithic structure; it is a compilation of several essential components that contribute to the overall wellness of an organization’s data environment. At its core are elements like data integration, data quality, data governance, and data security. Data integration refers to the complex orchestration of various data sources into a cohesive data reservoir, be it a data lake or a traditional data warehouse. The element of data quality, on the other hand, emphasizes ensuring the accuracy, consistency, and reliability of this pooled data. Likewise, data governance provides an organizational framework, outlining who owns the data and how it should be used, thereby enhancing compliance and accountability. Data security focuses on the protection measures and protocols that safeguard critical data from unauthorized access and breaches. Each of these components acts like a cog in the machine, essential for ensuring that the data at hand is not only vast but also accurate, compliant, and secure. The Essentials of Business Intelligence (BI) While data management focuses on the "raw materials," business intelligence (BI) is all about turning those materials into something valuable — actionable insights. Data Visualization Data visualization is the initial stage where raw data starts to assume a shape that is understandable and interpretable. Tools like Tableau or Power BI offer advanced visualization capabilities that go beyond traditional charts and graphs, enabling businesses to view their data in a more interactive and insightful manner. Data Analysis This is where the heavy lifting occurs. Data analysis can range from simple descriptive analytics that tell you what happened, to more complex machine learning models that can predict what could happen in the future. Gartner’s Carlie Idoine succinctly puts it, “Data and analytics leaders must evolve their organizational models and metrics to focus not just on data and platform control, but also on business outcomes.” Reporting Reporting synthesizes all of the analysis and visualization into digestible pieces of information that can guide business decisions. This goes beyond simply presenting data; it involves explaining what the data means and how it can affect various aspects of the business. Modern BI tools are increasingly incorporating features like natural language generation to make reports more accessible and easier to understand, even for those without a technical background. The Symbiotic Relationship Between Data Management and BI When data management and BI are discussed separately, each seems like a full-bodied discipline in itself. However, when these two are integrated, the real magic happens — a synergetic relationship forms, adding layers of robustness and dependability to business insights. Data management ensures the integrity of the raw data, which then feeds into BI tools for further analysis and visualization. In essence, effective data management acts like a gatekeeper, ensuring that the data entering the BI process is of high quality, is well-governed, and is secure. On the flip side, BI tools can provide feedback into the data management processes, identifying gaps in data quality or suggesting new integration points. This closed-loop system ensures that the two disciplines augment each other, leading to a significantly more powerful data strategy. Challenges in Integration Integrating data management and business intelligence is akin to merging two complex ecosystems, each with its own unique attributes and requirements. As one navigates through the labyrinth of integration, multiple challenges often surface. Technological Challenges The first hurdle many organizations face is technological. The software landscape in both data management and BI is vast, often leading to tool incompatibility. Mismatched data formats, for example, can cause disruptions in the data pipeline. Traditional BI tools may not be fully equipped to handle real-time streaming data or may not be compatible with more modern data storage solutions like data lakes. These technological incongruities can obstruct seamless integration and often require additional layers of transformation or mapping, adding to the complexity and operational overhead. Cultural Challenges Organizational culture, surprisingly, plays a significant role in the integration process. A legacy mindset that views data management and BI as distinct, separate entities can hamper integration efforts. Resistance to change can manifest in various ways — from the reluctance to adopt new technologies to internal politics surrounding data ownership and access. "Culture eats strategy for breakfast," opined management guru Peter Drucker. The implication is clear: an uncooperative culture can unravel even the most well-thought-out integration plans. Process-Related Challenges Process barriers often emerge when organizations lack a centralized vision for data management and BI. Departments might adopt ad hoc practices, creating disjointed data silos that are difficult to integrate later. The absence of a unified data governance strategy can also lead to issues such as data duplication, inconsistency, and even breaches of compliance. These process-related challenges make it critical to have an organization-wide strategy for the integration to be successful. Best Practices for Integration Navigating the labyrinth of challenges requires a systematic approach, grounded in best practices that have been tried and tested. Here are some effective strategies to ensure a smooth integration process. Adopt a Unified Approach The first step toward effective integration is to establish a unified approach that brings together stakeholders from both data management and BI under a common framework. This involves creating an enterprise-wide data strategy that serves as a blueprint for integration. Organizations can use architectural approaches such as DataOps or MLOps to automate and streamline the integration process. Prioritize Data Governance Data governance is often the unsung hero in any successful integration initiative. Robust governance practices ensure that data quality is maintained, roles and responsibilities are clearly defined, and compliance is upheld. "In the world of big data, good data governance is the key to any company's success," remarks data management thought leader David Marco. A strong governance policy can act as a lighthouse, steering the organization clear of the potential pitfalls like data inconsistency and non-compliance. Modernize Data Integration Techniques Traditional ETL (Extract, Transform, Load) processes may not suffice in today's fast-paced, data-intensive environments. Instead, organizations are increasingly adopting ELT (Extract, Load, Transform) techniques, which are more flexible and compatible with cloud-based data storage solutions. In similar vein, employing real-time data integration strategies can also augment BI processes, enabling more timely and accurate decision-making. Leverage Emerging Technologies The advancements in AI and machine learning offer new avenues for automating many of the cumbersome, manual tasks involved in data management. Natural Language Processing (NLP) algorithms can automate data tagging and categorization. Machine learning models can predict data quality issues before they become critical, allowing for proactive measures. By leveraging these emerging technologies, organizations can make the integration process more efficient and future-ready. Real-World Case Studies Several companies have successfully navigated the complexities of integrating data management and BI. For instance, a leading healthcare provider managed to unify its disparate data sources into a single data lake, applying robust data governance policies. This integration enabled the organization to not only comply with healthcare regulations but also to generate more nuanced patient care insights via their BI tools. Another example comes from the retail sector, where a multinational company integrated its data management and BI capabilities to create a real-time inventory tracking system. The end result was a highly responsive supply chain that could adapt to market demands in near real-time, driving significant cost savings and operational efficiencies. Future Trends Looking forward, the intersection between data management and BI is set to become even more dynamic. Technologies like data mesh are revolutionizing how we think about data architecture, making it more decentralized yet integrated. Real-time analytics are becoming the norm rather than the exception, powered by advancements in stream processing and event-based architectures. Even AI and machine learning are no longer just buzzwords; they're becoming integral components that can automate many data management tasks and add a predictive layer to BI tools. This confluence of emerging trends suggests that the integration of data management and BI will continue to evolve, offering new avenues for driving business value and innovation. Bridging Data Management and Business Intelligence for a Cohesive, Data-Driven Future In summary, the integration of data management with business intelligence is not just a technical requirement but a business imperative. It adds a layer of integrity, reliability, and depth to the data that feeds into BI systems, enriching the actionable insights that these systems produce. It's not just about having data or insights; it's about having data that you can trust and insights that you can act upon. By embracing an integrated approach, organizations stand to gain far more than the sum of the individual benefits offered by each discipline. They set the stage for a data-rich, insight-rich future, where data doesn’t just sit in silos but flows seamlessly through pipelines, contributing to a 360-degree view of the business landscape. With the ongoing advancements in both fields, there has never been a better time to merge these two worlds for a holistic, data-driven strategy.
The world of application integration is witnessing a transformative shift, one that is redefining the landscape of software development and deployment. This transformation is underpinned by the rise of containerization technologies, which encapsulate applications and their dependencies within isolated, consistent environments. Historically, application integration has faced myriad challenges, from compatibility issues between different systems to the complexities of scaling applications in response to fluctuating demands. The introduction of containers has emerged as a solution to these challenges, offering a paradigm that enhances agility, scalability, and efficiency. This comprehensive exploration delves into the evolution of application integration, the revolutionary impact of containerization technologies such as Docker and Kubernetes, its applications across various sectors, specific use cases, and the challenges that must be navigated. As we examine this fascinating topic, we uncover not just a technological innovation but a shift in thinking that is reshaping the very fabric of the software industry. Evolution of Application Integration The Early Days Application integration has its roots in early enterprise systems, where mainframes and bespoke applications were the norm. The integration was mainly manual and lacked standardization. It was primarily concerned with connecting different in-house systems to ensure that data and processes flowed uniformly. Transition to SOA The introduction of Service-Oriented Architecture (SOA) marked a turning point in application integration. By defining interfaces in terms of services it allowed different applications to communicate without needing to know the underlying details. SOA became a key factor in easing the integration process, but was not without its challenges. It often led to complicated configurations and difficulties in managing services across different systems. Containerization as a Response The limitations of traditional methods led to the emergence of containerization as a novel approach to integration. By encapsulating applications and dependencies within isolated environments called containers it allowed for more scalable, agile, and consistent deployment across various platforms. The Rise of Containerization Technologies Containers represent a groundbreaking form of virtualization. Unlike traditional virtual machines that include a full operating system, containers encapsulate an application and its dependencies within a consistent environment. This allows them to be lightweight, efficient, and highly portable. What Are Containers? At their core, containers are isolated environments that run a single application along with its dependencies, libraries, and binaries. By sharing the host system's kernel, they avoid the overhead of running multiple operating systems, offering a more streamlined and responsive experience. Containers vs. Virtual Machines While virtual machines virtualize the hardware, containers virtualize the operating system. This fundamental difference leads to containers being more efficient, as they eliminate the need for a separate OS for each application. This efficiency translates to faster startup times, lower resource consumption, and increased scalability. Key Technologies: Docker and Kubernetes Docker: Revolutionizing Containerization Docker has become a cornerstone of containerization. It provides a platform where developers can create, package, and deploy applications within containers effortlessly. Docker’s real power comes from its simplicity and accessibility, making it an essential tool for modern development. Docker Architecture Docker utilizes a client-server architecture. The Docker client communicates with the Docker daemon, responsible for building, running, and managing containers. This architecture simplifies both development and deployment, ensuring consistency across various environments. Docker Images and Containers A Docker image is a snapshot of an application and its dependencies. Docker containers are the runtime instances of these images, encompassing everything needed to run the application. This distinction between images and containers ensures repeatability and consistency, eliminating the "it works on my machine" problem. Kubernetes: Orchestrating Containers While Docker simplifies creating and running containers, Kubernetes focuses on managing them at scale. It's an orchestration platform that handles deployment, scaling, and management of containerized applications. Kubernetes Architecture Kubernetes operates based on a cluster architecture. It consists of a master node responsible for the overall management of the cluster and worker nodes that run the containers. This structure facilitates high availability, load balancing, and resilience. Kubernetes in Action Kubernetes automates many of the manual processes involved in managing containers. It can automatically deploy or kill containers based on defined rules, distribute loads, and heal failed containers, making it essential for large-scale applications. Impact on Development and Deployment Containerization technologies have profoundly impacted both development and deployment, introducing new paradigms and methodologies. Streamlined Development Process Containerization simplifies the development process by standardizing the environment across different stages. This ensures that the application behaves consistently from development through to production. Deployment and Scaling With container orchestration through Kubernetes, deployment and scaling become automated and highly responsive. Organizations can swiftly adapt to changing demands, scaling up or down as needed without human intervention. Collaboration and Innovation Containerization fosters collaboration across development, testing, and operations teams. By ensuring environment consistency, it encourages more iterative and innovative approaches, allowing teams to experiment without risking the broader system. In the words of Solomon Hykes, founder of Docker, "Containers are changing the way people think about developing, deploying, and maintaining software." Containerization in Application Integration Unifying Disparate Systems Containerization facilitates the integration of disparate systems by encapsulating them within uniform environments. This unification simplifies the complexities of connecting different technologies and platforms, fostering a more collaborative and efficient workflow. Microservices and Scalability The adoption of containerization in microservices architecture provides a pathway to create more modular, resilient, and scalable applications. Containers enable the individual services to be developed and deployed independently while still maintaining seamless integration. Facilitating Digital Transformation Containerization is playing a significant role in driving digital transformation initiatives within organizations. It supports rapid innovation and agility, enabling businesses to adapt and respond to the ever-changing market landscape. Challenges and Considerations Security Concerns Security remains a significant challenge when implementing containerization. Containers can present vulnerabilities if not configured and managed correctly. This requires constant vigilance and adherence to best practices to maintain the integrity of the containerized environment. Performance Considerations While containerization offers many efficiencies, it also brings some performance considerations. Understanding the resources utilized by containers and tuning them appropriately is vital to ensure that the system performs optimally. Compliance and Governance Integrating containerization into existing enterprise systems must also consider compliance with various regulations and governance policies. This requires thorough planning and alignment with organizational standards and legal requirements. As a software architect, Adrian Cockcroft who was the VP of Cloud Architecture Strategy at Amazon Web Services (AWS). insightfully noted, "Containerization's impact reaches far beyond just technological considerations. It's reshaping how we think about applications, from development to deployment, integration, and management." Real-World Applications of Containerization in Application Integration The real-world applications of containerization in application integration are a testament to the transformative power of this technology. Organizations across different industries have realized significant benefits through its adoption. Financial Industry A global financial institution grappling with a multitude of applications and complex legacy systems turned to containerization as a solution. Implementing Docker and Kubernetes, they were able to orchestrate a unified platform that enhanced communication across various business functions. The success of this project led to reduced operational costs, improved efficiency, and fostered a culture of innovation. Healthcare Sector In the healthcare sector, a leading hospital network leveraged containerization to integrate various patient care systems. This ensured that patient records, treatment plans, and medical histories were accessible across different departments and locations. By providing a consistent and secure environment, containerization enabled better collaboration between healthcare professionals, leading to improved patient outcomes. E-Commerce An e-commerce giant harnessed the power of containerization to integrate its supply chain management, inventory tracking, and customer relationship systems. The containerized environment allowed for real-time updates and synchronization, enabling them to respond rapidly to market trends and customer demands. The enhanced agility and responsiveness proved crucial in maintaining a competitive edge in the fast-paced online marketplace. Use Cases in Containerized Application Integration The use cases of containerization in application integration are vast and diverse, reflecting the adaptability and potential of this technology. Microservices Architecture Microservices architecture, a key trend in software development, has found a strong ally in containerization. By allowing individual services to run in separate containers, developers can create more modular and scalable applications. This approach not only makes deployment and maintenance more straightforward but also facilitates more flexible development cycles, catering to the unique demands of each service. Cross-Platform Integration The days of struggling with cross-platform compatibility issues are alleviated with containerization. Whether integrating applications running on Linux with those on Windows or bridging the gap between on-premises and cloud systems, containerization ensures that the application environment remains consistent. This consistency accelerates development, simplifies testing, and ensures that applications run smoothly across different platforms. Enhancing Scalability For businesses operating in a dynamic market, scalability is often a pressing concern. Containerization's inherent ability to scale quickly enables organizations to adapt to changing business conditions without overhauling their existing infrastructure. It allows them to deploy or modify services efficiently, whether scaling up to meet peak demands or scaling down during quieter periods. As Mark Russinovich, CTO of Microsoft Azure, states, "Containerization is not a mere technology trend; it's a strategic enabler that's shaping the future of application integration, offering unprecedented agility, scalability, and efficiency." The Horizon: Containerization's Revolutionary Impact Containerization in application integration is more than a mere technological advancement; it represents a philosophical shift in how we approach software development and integration. From its roots in resolving the complexities of integrating disparate systems to its current role in facilitating microservices architecture, cross-platform integration, and scalability, containerization stands as a testament to innovation and adaptability. The real-world applications across industries like finance, healthcare, and e-commerce, coupled with specific use cases such as enhancing scalability and ensuring consistency across platforms, paint a vivid picture of containerization's expansive influence. While challenges related to security, performance, and compliance cannot be overlooked, the trajectory of containerization's growth, underscored by technologies like Docker and Kubernetes, demonstrates a forward-thinking approach that is here to stay. In reflecting on the journey of containerization, one can't help but agree with Martin Fowler, a prominent figure in software development, who remarked, "Containerization has not only solved technical problems but has started a conversation about collaboration, consistency, and experimentation that transcends traditional boundaries."
In the ever-evolving landscape of database technology, staying ahead of the curve is not just an option, it’s a necessity. As modern applications continue to grow in complexity and global reach, the role of the underlying database becomes increasingly critical. It’s the backbone that supports the seamless functioning of applications and the storage and retrieval of vast amounts of data. In this era of global-scale applications, having a high-performance, flexible, and efficient database is paramount. As the demands of modern applications surge, the need for a database that can keep pace has never been greater. The “ultra-database” has become a key player in ensuring that applications run seamlessly and efficiently globally. These databases need to offer a unique combination of speed, versatility, and adaptability to meet the diverse requirements of various applications, from e-commerce platforms to IoT systems. They need to be more than just data repositories. They must serve as intelligent hubs that can quickly process, store, and serve data while facilitating real-time analytics, security, and scalability. The ideal ultra-database is not just a storage facility; it’s an engine that drives the dynamic, data-driven applications that define the modern digital landscape. The latest release of HarperDB 4.2 introduces a unified development architecture for enterprise applications, providing an approach to building global-scale applications. HarperDB 4.2 HarperDB 4.2 is a comprehensive solution that seamlessly combines an ultra-fast database, user-programmable applications, and data streaming into a cohesive technology. The result is a development environment that simplifies the complex, accelerates the slow, and reduces costs. HarperDB 4.2 offers a unified platform that empowers developers to create applications that can span the globe, handling data easily and quickly. In this tutorial, we will explore the features of HarperDB 4.2 and show you how to harness its power in conjunction with Java Quarkus. We will take you through the steps to leverage HarperDB’s new capabilities to build robust and high-performance applications with Quarkus, demonstrating the impressive potential of this unified development architecture. So, join us on this enlightening journey and revolutionize your application development process. Creating a Quarkus Microservice API With HarperDB, Part 1: Setting up the Environment This section will guide you through configuring your development environment and creating the necessary project setup to get started. Step 1: Configuring the Environment Before diving into the development, you need to set up your environment. We’ll start by running HarperDB in a Docker container. To do this, open your terminal and run the following command: Shell docker run -d -e HDB_ADMIN_USERNAME=root -e HDB_ADMIN_PASSWORD=password -e HTTP_THREADS=4 -p 9925:9925 -p 9926:9926 harperdb/harperdb This command downloads and runs the HarperDB Docker container with the specified configuration. It exposes the necessary ports for communication. Step 2: Creating a Schema and Table With HarperDB up and running, the next step is to create a schema and define a table to store animal data. We will use the “curl” commands to interact with HarperDB’s RESTful API. Create a schema named “dev” by executing the following command: Shell curl --location --request POST 'http://localhost:9925/' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data-raw '{ "operation": "create_schema", "schema": "dev" }' This command sends a POST request to create the “dev” schema. Next, create a table named “animal” with “scientificName” as the hash attribute using the following command: Shell curl --location 'http://localhost:9925' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_table", "schema": "dev", "table": "animal", "hash_attribute": "scientificName" }' This command establishes the “animal” table within the “dev” schema. 3. Now, add the required attributes for the “animal” table by creating “name,” “genus,” and “species” attributes: Shell curl --location 'http://localhost:9925' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_attribute", "schema": "dev", "table": "animal", "attribute": "name" }' curl --location 'http://localhost:9925' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_attribute", "schema": "dev", "table": "animal", "attribute": "genus" }' curl --location 'http://localhost:9925' \ --header 'Authorization: Basic cm9vdDpwYXNzd29yZA==' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_attribute", "schema": "dev", "table": "animal", "attribute": "species" }' These commands add the “name”, “genus”, and “species” attributes to the “animal” table within the “dev” schema. With HarperDB configured and the schema and table set up, you can start building your Quarkus-based microservice API to manage animal data. Stay tuned for the next part of the tutorial, where we’ll dive into the development process. Building Quarkus Application We configured HarperDB and prepared the environment. Now, we’ll start building our Quarkus application to manage animal data. Quarkus makes it easy with a handy project generator, so let’s begin. Quarkus offers an intuitive web-based project generator that simplifies the initial setup. Visit Quarkus Project Generator, and follow these steps: Select the extensions you need for your project. Add “JAX-RS” and “JSON” for this tutorial to handle REST endpoints and JSON serialization. Click the “Generate your application” button. Download the generated ZIP file and extract it to your desired project directory. With your Quarkus project generated, you’re ready to move on. Our project will use the DataFaker library and the HarperDB Java driver to generate animal data to interact with the HarperDB database. To include the HarperDB Java Driver, please read the previous article. In your Quarkus project, create a Java record to represent the Animal entity. This record will have fields for the scientific name, name, genus, and species, allowing you to work with animal data efficiently. Java public record Animal(String scientificName, String name, String genus, String species) { public static Animal of(Faker faker) { var animal = faker.animal(); return new Animal( animal.scientificName(), animal.name(), animal.genus(), animal.species() ); } } This record includes a factory method, of, that generates an Animal instance with random data using the DataFaker library. We’ll use this method to populate our database with animal records. In your Quarkus project, we’ll set up CDI (Contexts and Dependency Injection) to handle database connections and data access. Here’s an example of how to create a ConnectionSupplier class that manages database connections: Java @ApplicationScoped public class ConnectionSupplier { private static final Logger LOGGER = Logger.getLogger(ConnectionSupplier.class.getName()); @Produces @RequestScoped public Connection get() throws SQLException { LOGGER.info("Creating connection"); // Create and return the database connection, e.g., using DriverManager.getConnection } public void dispose(@Disposes Connection connection) throws SQLException { LOGGER.info("Closing connection"); connection.close(); } } The ConnectionSupplier class uses CDI annotations to produce and dispose of database connections. This allows Quarkus to manage the database connection lifecycle for you. Let’s create the AnimalDAO class to interact with the database using JDBC. This class will have methods for inserting and querying animal data. Java @ApplicationScoped public class AnimalDAO { private final Connection connection; public AnimalDAO(Connection connection) { this.connection = connection; } public void insert(Animal animal) { try { // Prepare and execute the SQL INSERT statement to insert the animal data } catch (SQLException exception) { throw new RuntimeException(exception); } } public Optional<Animal> findById(String id) { try { // Prepare and execute the SQL SELECT statement to find an animal by ID } catch (SQLException exception) { throw new RuntimeException(exception); } } // Other methods for data retrieval and manipulation } In the AnimalDAO class, you’ll use JDBC to perform database operations. You can add more methods to handle various database tasks, such as updating and deleting animal records. The AnimalService class will generate animal data and utilize the AnimalDAO for database interaction. Java @ApplicationScoped public class AnimalService { private final Faker faker; private final AnimalDAO dao; @Inject public AnimalService(Faker faker, AnimalDAO dao) { this.faker = faker; this.dao = dao; } // Implement methods for generating and managing animal data } In the AnimalService, you’ll use the DataFaker library to generate random animal data and the AnimalDAO for database operations. With these components in place, you’ve set up the foundation for your Quarkus-based Microservice API with HarperDB. In the next part of the tutorial, we’ll dive into developing RESTful endpoints and data management. Create AnimalResource Class In this final part of the tutorial, we will create an AnimalResource class to expose our animal service through HTTP endpoints. Additionally, we will provide sample curl commands to demonstrate how to consume these endpoints locally. Create an AnimalResource class with RESTful endpoints for managing animal data. This class will interact with the AnimalService to handle HTTP requests and responses. Java @Path("/animals") @Produces(MediaType.APPLICATION_JSON) @Consumes(MediaType.APPLICATION_JSON) public class AnimalResource { private final AnimalService service; public AnimalResource(AnimalService service) { this.service = service; } @GET public List<Animal> findAll() { return this.service.findAll(); } @POST public Animal insert(Animal animal) { this.service.insert(animal); return animal; } @DELETE @Path("{id}") public void delete(@PathParam("id") String id) { this.service.delete(id); } @POST @Path("/generate") public void generateRandom() { this.service.generateRandom(); } } In this class, we’ve defined several RESTful endpoints, including: GET /animals: Returns a list of all animals. POST /animals: Inserts a new animal. DELETE /animals/{id}: Deletes an animal by its ID. POST /animals/generate: Generates random animal data. Here are curl commands to test the HTTP endpoints locally using http://localhost:8080/animals/ as the base URL: Retrieve All Animals (GET) Shell curl -X GET http://localhost:8080/animals/ Insert a New Animal (POST) Shell curl -X POST -H "Content-Type: application/json" -d '{ "scientificName": "Panthera leo", "name": "Lion", "genus": "Panthera", "species": "Leo" }' http://localhost:8080/animals/ Delete an Animal by ID (DELETE) Replace {id} with the ID of the animal you want to delete: Shell curl -X DELETE http://localhost:8080/animals/{id} Generate Random Animal Data (POST) This endpoint doesn’t require any request data: Shell curl -X POST http://localhost:8080/animals/generate These curl commands allow you to interact with the Quarkus-based microservice API, performing actions such as retrieving, inserting, and deleting animal data. The generated random data endpoint is valuable for populating your database with test data. With these RESTful endpoints, you have a fully functional Quarkus application integrated with HarperDB to manage animal data over HTTP. You can extend and enhance this application further to meet your specific requirements. Congratulations on completing this tutorial! Conclusion In this tutorial, we embarked on a journey to build a Quarkus-based Microservice API integrated with HarperDB, a robust, high-performance database. We started by setting up our environment and creating a Quarkus project with the necessary extensions. Leveraging the DataFaker library, we generated random animal data to populate our HarperDB database. The core of our application was the seamless integration with HarperDB, showcasing the capabilities of the HarperDB Java driver. We used CDI to manage database connections efficiently and created a structured data access layer with the AnimalDAO class. Through this, we performed database operations, such as inserting and querying animal data. With the implementation of the AnimalService class, we combined the generated data with database operations, bringing our animal data management to life. Finally, we exposed our animal service through RESTful endpoints in the AnimalResource class, allowing us to interact with the service through HTTP requests. You can explore the complete source code of this project on GitHub. Feel free to fork, modify, and extend it to suit your needs. As you continue your journey into the world of HarperDB and Quarkus, remember to consult the comprehensive HarperDB documentation available at HarperDB Documentation to dive deeper into the capabilities and features of HarperDB. Stay informed about the latest updates, release notes, and news on HarperDB’s official website to ensure you’re always working with the most up-to-date information. Check out the latest release notes to discover what’s new and improved in HarperDB. By combining Quarkus and HarperDB, you’re well-equipped to build efficient and scalable applications that meet the demands of the modern digital landscape. Happy coding!
In my previous articles, we've discussed in detail how to architect global API layers and multi-region service meshes using Kong and YugabyteDB. However, the solutions presented still harbored a bottleneck and a single point of failure: the database Kong uses internally to store its metadata and application-specific configurations. This guide demonstrates how to eliminate this final bottleneck by running Kong on YugabyteDB, a distributed SQL database built on PostgreSQL. Kong's Default Database Kong uses PostgreSQL as a database for its own needs. Taking a look at the database schema created by Kong during the bootstrap process, you'll find dozens of tables and other database objects that store metadata and application-specific configurations: SQL kong=# \d List of relations Schema | Name | Type | Owner --------+-------------------------------+-------+---------- public | acls | table | postgres public | acme_storage | table | postgres public | basicauth_credentials | table | postgres public | ca_certificates | table | postgres public | certificates | table | postgres public | cluster_events | table | postgres public | clustering_data_planes | table | postgres public | consumers | table | postgres public | filter_chains | table | postgres public | hmacauth_credentials | table | postgres public | jwt_secrets | table | postgres public | key_sets | table | postgres public | keyauth_credentials | table | postgres public | keys | table | postgres .. the list goes on PostgreSQL serves perfectly well those Kong deployments that don't need to scale across multiple availability zones, regions, or data centers. However, when an application needs to deploy Kong Gateway or Kong Mesh across various locations, a standalone PostgreSQL server can become a bottleneck or single point of failure. Initially, Kong offered Apache Cassandra as an alternative to PostgreSQL for those wishing to architect distributed APIs and service meshes. But later, Cassandra support was officially deprecated. Kong team stated that PostgreSQL would remain the only officially supported database. Why Distributed PostgreSQL? Even though Cassandra was deprecated, the demand for a distributed version of Postgres by Kong users didn't wane, driven by several reasons: High availability: API layers and service meshes must be resilient against all kinds of potential outages, including zone and region-level incidents. Scalability: From global load balancers to the API and database layers, the entire solution needs to handle both read and write workloads at low latency. Data regulations: When an API or mesh spans multiple jurisdictions, certain API endpoints may be required to store specific settings and configurations within data centers located in a particular geography. As a result, members from both the Kong and YugabyteDB communities began to work on adapting YugabyteDB for distributed Kong deployments. Why YugabyteDB? YugabyteDB is a distributed SQL database that is built on PostgreSQL. The upper half of YugabyteDB, the query layer, is PostgreSQL, with modifications needed for the YugabyteDB's distributed storage layer. Essentially, you can think of YugabyteDB as a distributed Postgres. Provided that YugabyteDB maintains feature and runtime compatibility with Postgres, the majority of applications, libraries, drivers, and frameworks designed for Postgres should operate seamlessly with YugabyteDB, requiring no code changes. For instance, one of the earlier articles shows how to deploy Kubernetes on YugabyteDB, using the integration initially created for Postgres. Back in 2022, following the Cassandra deprecation, Kong was not compatible with YugabyteDB due to the absence of certain Postgres features in the distributed database engine. However, this changed with the release of YugabyteDB version 2.19.2, which included support for all the features necessary for Kong. Next, we'll explore how to get Kong Gateway up and running on a multi-node YugabyteDB cluster. Starting a Multi-Node YugabyteDB Cluster There are many ways to start Kong Gateway and YugabyteDB. One of the options is to run everything inside Docker containers. So, let's use this approach today: First off, create a custom docker network for YugabyteDB and Kong containers: Shell docker network create custom-network Next up, get a three-node YugabyteDB cluster running: Shell mkdir $HOME/yb_docker_data docker run -d --name yugabytedb_node1 --net custom-network \ -p 15433:15433 -p 7001:7000 -p 9001:9000 -p 5433:5433 \ -v $HOME/yb_docker_data/node1:/home/yugabyte/yb_data --restart unless-stopped \ yugabytedb/yugabyte:latest \ bin/yugabyted start --base_dir=/home/yugabyte/yb_data --daemon=false docker run -d --name yugabytedb_node2 --net custom-network \ -p 15434:15433 -p 7002:7000 -p 9002:9000 -p 5434:5433 \ -v $HOME/yb_docker_data/node2:/home/yugabyte/yb_data --restart unless-stopped \ yugabytedb/yugabyte:latest \ bin/yugabyted start --join=yugabytedb_node1 --base_dir=/home/yugabyte/yb_data --daemon=false docker run -d --name yugabytedb_node3 --net custom-network \ -p 15435:15433 -p 7003:7000 -p 9003:9000 -p 5435:5433 \ -v $HOME/yb_docker_data/node3:/home/yugabyte/yb_data --restart unless-stopped \ yugabytedb/yugabyte:latest \ bin/yugabyted start --join=yugabytedb_node1 --base_dir=/home/yugabyte/yb_data --daemon=false And finally, verify the cluster's status by going to the YugabyteDB UI here: Starting Kong Gateway To deploy Kong Gateway on YugabyteDB using Docker, follow the steps below. First, connect to YugabyteDB and create the kong database using psql or your preferred SQL tool: Shell psql -h 127.0.0.1 -p 5433 -U yugabyte create database kong; \q Next, start the Kong bootstrapping and migration process: Shell docker run --rm --net custom-network \ -e "KONG_DATABASE=postgres" \ -e "KONG_PG_HOST=yugabytedb_node1" \ -e "KONG_PG_PORT=5433" \ -e "KONG_PG_USER=yugabyte" \ -e "KONG_PG_PASSWORD=yugabyte" \ kong:latest kong migrations bootstrap KONG_DATABASE: is set to postgres, which directs Kong to continue using the PostgreSQL implementation for its metadata storage. KONG_PG_HOST: Kong can interface with any node within the YugabyteDB cluster. The chosen node will route Kong's requests and manage their execution across the cluster. The bootstrapping process can take up to 5 minutes, during which there may be no log output. Once completed, the following log messages will indicate the happy end: Shell .... migrating response-ratelimiting on database 'kong'... response-ratelimiting migrated up to: 000_base_response_rate_limiting (executed) migrating session on database 'kong'... session migrated up to: 000_base_session (executed) session migrated up to: 001_add_ttl_index (executed) session migrated up to: 002_320_to_330 (executed) 58 migrations processed 58 executed Database is up-to-date Finally, launch the Kong Gateway container, configured to utilize YugabyteDB as the database backend: Shell docker run -d --name kong-gateway \ --net custom-network \ -e "KONG_DATABASE=postgres" \ -e "KONG_PG_HOST=yugabytedb_node1" \ -e "KONG_PG_PORT=5433" \ -e "KONG_PG_USER=yugabyte" \ -e "KONG_PG_PASSWORD=yugabyte" \ -e "KONG_PROXY_ACCESS_LOG=/dev/stdout" \ -e "KONG_ADMIN_ACCESS_LOG=/dev/stdout" \ -e "KONG_PROXY_ERROR_LOG=/dev/stderr" \ -e "KONG_ADMIN_ERROR_LOG=/dev/stderr" \ -e "KONG_ADMIN_LISTEN=0.0.0.0:8001, 0.0.0.0:8444 ssl" \ -p 8000:8000 \ -p 8443:8443 \ -p 127.0.0.1:8001:8001 \ -p 127.0.0.1:8444:8444 \ kong:latest Test Kong’s operation by sending a request to the Gateway: Shell curl -i -X GET --url http://localhost:8001/services Then, return to the YugabyteDB UI, selecting 'kong' from the 'Databases' menu to view the dozens of tables and indexes Kong uses internally. Job done! In Summary Even though the Kong team stopped supporting Cassandra for distributed deployments, their initial bet on PostgreSQL paid off over time. As one of the fastest-growing databases, Postgres has a rich ecosystem of extensions and other products that extend its use cases. Kong users required a distributed version of Postgres for APIs and service meshes spanning various locations, and that use case was eventually addressed by YugabyteDB, a distributed database built on PostgreSQL.
Mocking, in a broader software development and testing context, is a technique used to simulate the behavior of certain components or objects in a controlled manner. It involves creating fake or mock objects that imitate the behavior of real objects or components within a software system. Mocking is often used in various stages of software development, including testing, to isolate and focus on specific parts of a system while ignoring the complexities of its dependencies. Mocking allows developers and testers to isolate specific parts of a system for testing without relying on the actual implementation of external components, services, or modules. Benefits of Mocking A few of the benefits of Mocking are: Simulation of Dependencies: Mocking involves creating mock objects or functions that imitate the behavior of real components or services that a software system relies on. These mock objects or functions provide predefined responses to interactions, such as method calls or API requests. Isolation: Mocking helps in isolating the unit or component being tested from the rest of the system. This isolation ensures that any issues or failures detected during testing are specific to the unit under examination and not caused by external factors. Speed and Efficiency: Using mock objects or functions can expedite the testing process because they provide immediate and predictable results. There is no need to set up and configure external services or wait for actual responses. Reduced Risk: Since mocking avoids the use of real external dependencies, there is a reduced risk of unintended side effects or data corruption during testing. The test environment remains controlled and predictable. Speed: Real dependencies might involve time-consuming operations or external services like databases, APIs, or network calls. Mocking these dependencies can significantly speed up the testing process because you eliminate the need to perform these real operations. Reduced Resource Usage: Since mocks don’t use real resources like databases or external services, they reduce the load on these resources during testing, which can be especially important in shared development and testing environments. Mocking API In Cypress? Mocking APIs in Cypress is a powerful technique for simulating APIs and external services in your tests. This allows you to create controlled environments for testing different scenarios without relying on the actual services. To mock an API in Cypress, you can use the cy.intercept() command. This command intercepts HTTP requests and returns a predefined response. Cypress’s cy.intercept() method empowers developers and testers to intercept and manipulate network requests, allowing them to simulate various scenarios and responses, thus making it an indispensable tool for testing in dynamic and unpredictable environments. By crafting custom responses or simulating error conditions, you can comprehensively assess your application’s behavior under diverse conditions. To use cy.intercept(), you must first specify the method and URL of the request to intercept. You can also specify other parameters, such as the request body and headers. The second argument to cy.intercept() is the response that you want to return. For example, the following code mocks the response to a GET request to the /api/users endpoint: cy.intercept('GET', '/api/users', { statusCode: 200, body: [ { id: 1, name: 'John Doe', email: 'john.doe@example.com', }, { id: 2, name: 'Jane Doe', email: 'jane.doe@example.com', }, ], }); Example Mocking API Data in Cypress Step 1 Create a Scenario To Automate To gain deeper insights, we’ll automate a specific scenario at this link. Below are the steps we are going to automate for Mocking the data : Visit the website at this link. Upon opening the page, two requests are triggered in the Network call — one for Tags and the other for Articles. Intercept the Tags request, and instead of the original list of Tags, insert two new tags: “Playwright” and “QAAutomationLabs”. Make sure to verify that these tags are displayed correctly in the user interface. Intercept the Article request, and instead of the original list of articles, provide just one article with modified details. You should change the username, description, and the number of likes. Afterward, confirm that these modifications are accurately reflected in the user interface. Before Automating the above steps, let's create the data that we want to Mock. Step 2 Create Data To Mock the API Create two files with name mockTags.json,mockArticles.json 1. mockTags.json { "tags":[ "Cypress", "Playwright", "SLASSCOM" ] } 2. mockArticles.json { "articles":[ { "title":"Hi qaautomationlabs.com", "slug":"Hi - qaautomationlabs.com", "body":"qaautomationlabs", "createdAt":"2020-09-26T03:18:26.635Z", "updatedAt":"2020-09-26T03:18:26.635Z", "tagList":[ ], "description":"SLASSCOM QUALITY SUMMIT 2023", "author":{ "username":"Kailash Pathak", "bio":null, "image":"https://static.productionready.io/images/smiley-cyrus.jpg", "following":false }, "favorited":false, "favoritesCount":1000 } ], "articlesCount":500 } Step 3 Create Script Let’s create Script for mocking the API data. describe("API Mocking in Cypress using cy.intercept Method ", () => { beforeEach(() => { cy.visit("https://angular.realworld.io/"); cy.intercept("GET", "https://api.realworld.io/api/tags", { fixture: "mockTags.json", }); cy.intercept( "GET", "https://api.realworld.io/api/articles?limit=10&offset=0", { fixture: "mockArticles.json" } ); }); it("Mock API Tags, and then validate on UI", () => { cy.get(".tag-list", { timeout: 1000 }) .should("contain", "Cypress") .and("contain", "Playwright"); }); it("Mock the Article feed, and then validate on UI", () => { cy.get("app-favorite-button.pull-xs-right").contains("10"); cy.get(".author").contains("Kailash Pathak"); cy.get(".preview-link > p").contains("SLASSCOM QUALITY SUMMIT 2023"); }); }); Code Walkthrough Let me break down the code step by step: describe("API Mocking in Cypress using cy.intercept Method", () => { ... }): This is a test suite description. It defines a test suite titled "API Mocking in Cypress using cy.intercept Method." beforeEach(() => { ... }): This is a hook that runs before each test case in the suite. It sets up the initial conditions for the tests. cy.visit("https://angular.realworld.io/");: It opens a web page at the URL "angular.realworld" using Cypress. cy.intercept("GET", "https://api.realworld.io/api/tags", { fixture: "mockTags.json" });: This line intercepts a GET request to "tags" and responds with data from the fixture file "mockTags.json." It mocks the API call to retrieve tags. cy.intercept("GET", "https://api.realworld.io/api/articles?limit=10&offset=0", { fixture: "mockArticles.json" });: Similar to the previous line, this intercepts a GET request to "article" and responds with data from the fixture file "mockArticles.json." It mocks the API call to retrieve articles. it("Mock API Tags, and then validate on UI", () => { ... }): This is the first test case. It verifies that the mocked tags are displayed on the UI. cy.get(".tag-list", { timeout: 1000 })...: It selects an element with the class "tag-list" and waits for it to appear for up to 1000 milliseconds. Then, it checks if it contains the tags "Cypress" and "Playwright." it("Mock the Article feed, and then validate on UI", () => { ... }): This is the second test case. It verifies that the mocked articles are displayed correctly on the UI. cy.get("app-favorite-button.pull-xs-right").contains("10");: It selects an element with the class "app-favorite-button.pull-xs-right" and checks if it contains the text "10." cy.get(".author").contains("Kailash Pathak");: It selects an element with the class "author" and checks if it contains the text "Kailash Pathak." cy.get(".preview-link > p").contains("SLASSCOM QUALITY SUMMIT 2023");: It selects an element with the class "preview-link" and checks if it contains the text "SLASSCOM QUALITY SUMMIT 2023." Step 4 Execute the Script Run the command yarn Cypress Open. Default Data displaying in the site: The below tags are displayed by default in the site for Tags. Below Feed are displayed by default in the site for Tags. Data After Mocking the Data In the screenshot below, you can see the tags that we have provided in mockTags.json replaced with the default tags. In the screenshot below, you can see the Feed that we have provided in mockArticles.json replaced with the default Feeds. Conclusion In conclusion, mocking API responses in Cypress is a powerful technique for testing your application’s frontend behavior in a controlled and predictable manner. It promotes faster, more reliable, and isolated testing of various scenarios, helping you catch bugs and ensure your application works as intended. Properly maintained mock data and well-structured tests can be invaluable assets in your testing strategy.
When summer of 2017 arrived, I transitioned to a project which allowed me to contribute remotely. After 25 years of working in technology across eight different employers, I was finally able to determine if working 100% remotely would lead to productivity gains … or losses. Just under 3 years before the pandemic arrived, I discovered I was twice as productive as before – simply by reviewing my commit history from in-office time periods where I was doing similar work. I also felt like the quality of my work improved, largely because of the ability to place myself in focus mode, devoid of any unexpected distractions. I also realized which tooling dependencies I leaned on to help me to become more productive. Aside from the IntelliJ IDEA integrated development environment (IDE), Postman was a critical tool that helped me become successful as a service developer and architect. In a bit of a personal retrospective, I wanted to talk about how Postman has helped me to build thriving APIs and to succeed while working in a 100% remote environment. About Postman Postman started as a side project of software engineer Abhinav Asthana, with a goal to simplify the API testing process. At the time I started using Postman, it was a free plugin that I could use within my Chrome browser. Here’s an example of what Postman looked like back then: Fast-forward to today where Postman is a standalone client used by over 25 million developers globally and half a million companies, including Box, LinkedIn, Paylocity, Paypal, Sling, Twilio, Twitter/X, and WhatsApp. Incredibly, 98% of today’s Fortune 500 companies are using Postman. If you are interested in the evolution of Postman, check out The New Postman API Platform: Redefining API Management for the API-First World. My Original Use Case When I started using Postman in 2014, I was in dire need of a better way to exercise my APIs. Using a cURL command from my Windows-based developer machine was painful to say the least, relying heavily on copying and pasting from a Notepad document I kept on my desktop. Just recapping that time of my life sends chills down my spine. With the Postman app installed in Chrome, I was able to onboard quickly by importing those cURL commands from my Notepad document. As a result of the simple-but-effective user interface, my requests were presented in a better light and easily cloned to further exercise my services. As I gained experience using Postman, I added environments and variables to my requests and started grouping them in collections that were much easier to reference. This approach allowed me to write a given request one time and be able to use the same request in every application environment – from local to development and even production. Leveraging Postman for API Services Today, as a service developer, I see Postman’s in multiple places during the API life cycle. API-First With the success of the API in the early 2000s by companies like Salesforce and eBay, the concept of API-first – defining your API specification before any coding begins – did not become mainstream until about five years later. My first entry point into using API-first was for the RESTful API I built for a fitness-based service I created with my sister-in-law. As part of that effort, I was building both the client and the service, using Angular and Spring Boot, respectively. I used Postman to define my API specification. The benefit of using API-first for the project allowed me to work on both aspects at the same time. So I could work on the service for a period of time, then switch over and work on the client to keep things fresh. The API-first specification served as a contract that I could rely on, and I was even able to create a mock server in Postman. This allowed me to make client calls to a service which wasn’t actually ready for use yet. Test Generation For every request in Postman, a Tests tab provides auto-created code stubs which allow API requests to be executed programmatically using JavaScript. Not familiar with how to write tests? A Postbot provides AI-like interactivity to help with the test script creation process. Once ready, these requests can be executed at the collection level, giving developers the ability to test an entire API. Currently in beta, Postman collections include a “Generate tests” feature that will also create tests that can be created without the need to use JavaScript. I recently used this feature for a collection I created to validate that my APIs were running as expected in multiple regions across the United States. CI Pipelines Postman integrates with continuous integration (CI) pipelines, allowing tests to be executed during the CI life cycle via the Postman command-line interface (CLI). What this means is that builds can fail in cases where expectations established in Postman are not being met. Let’s take a second to understand what this means. APIs created using an API-first specification, created in Postman, can be later validated by tests that were created in Postman. This is in comparison to integration and regression tests that are often created via a secondary development life cycle – at great risk of becoming out of sync with the underlying API specification. Where Postman Soars – Distributed Developers The biggest challenge I faced after becoming 100% remote was the inability to roll my chair over to another software engineer’s workspace for instant feedback or validation. This effort, while disruptive to the target of my questions, did help me personally – but at the cost of the gained knowledge not residing anywhere else but inside my mind. With my closest team member now a little over 1,100 miles from me, sliding my chair to his desk is not a reasonable quest I am willing to make. However, with Postman, I no longer have that need. Shared Workspaces and Collections Postman allows engineers to establish workspaces and collections that can be stored in a central location. Think of collections as groups of related items. For Postman, this is more than a simple HTTP request, as shown below: A workspace in Postman contains one or more collections to make the engineer’s life easier. I’ve often established a workspace for my team and a collection for each service that we support. The cool thing about Postman workspaces is that they can even be made public for anyone to use. In fact, when I was writing a series of articles about Salesforce, I leveraged a publicly-available workspace to understand their APIs better. Salesforce Platform APIs Next-Gen Engineering One change that I have seen proliferate recently is the elimination of the quality engineer role. As a result, quality engineering tasks have become part of the software engineering role. This honestly makes sense to me, because often the quality engineer was left to support the features created by multiple software engineers, leading to an abundance of work being placed onto the quality engineer before features could be released. With this transformation, engineers become responsible for adding the expected level of quality coverage within each feature. Postman provides this ability via JavaScript-based tests and automated test generation that can be integrated into the CI pipeline, as noted above. Peeking Into Enterprise Essentials For larger teams, Postman also offers an Enterprise Essentials solution that provides the following benefits: Real-time collaboration with change tracking and notification Workspace templates to reduce duplicate efforts shared by every project API development that allows producers and consumers to work together, making it a shared initiative Especially as I think about the remote work aspect of being a service developer (or any kind of software developer, for that matter), the Enterprise Essentials features around workspaces and collections make the most sense to me. Collaboration is so key to the scale of what we do. When working with multiple teams across multiple services and APIs, these tools and features make all the difference. At this enterprise level, Postman can accelerate time to market and identify challenges faster – potentially before the first line of code is written. Conclusion My readers may recall that I have been focused on the following mission statement, which I feel can apply to any IT professional: “Focus your time on delivering features/functionality that extends the value of your intellectual property. Leverage frameworks, products, and services for everything else.” - J. Vester To say that Postman adheres to my personal mission statement is a bit of an understatement. Instead, I would go as far to say that Postman has been a key to my personal success as a software engineer for the past eight years. No joke. Postman was there where I needed to be saved from executing painful cURL commands on my Windows-based computer, and it has helped me write APIs that have scaled both vertically and horizontally, building upon an API-first strategy. Postman has eased any challenges associated with team members being located in remote locations around the world – sharing workspaces and collections, plus using the Postman CLI to execute API tests as part of the CI pipeline. Postman has allowed my team to focus on building services which help extend the intellectual property of our current priority. As I close out this personal retrospective, I ask that if you are looking for a way to boost the productivity of your team, consider exploring what Postman has to offer. Have a really great day!
John Vester
Staff Engineer,
Marqeta @JohnJVester
Colin Domoney
Chief Technology Evangelist,
42Crunch
Saurabh Dashora
Founder,
ProgressiveCoder