Java Resources

DZone's Featured Java Resources

Ways To Reduce JVM Docker Image Size

By Illia Pantsyr

This blog post focuses on optimizing the size of JVM Docker images. It explores various techniques such as multi-stage builds, jlink, jdeps, and experimenting with base images. By implementing these optimizations, deployments can be faster, and resource usage can be optimized. The Problem Since Java 11, there is no pre-bundled JRE provided. As a result, basic Dockerfiles without any optimization can result in large image sizes. In the absence of a provided JRE, it becomes necessary to explore techniques and optimizations to reduce the size of JVM Docker images. Now, let's take a look at the simplest version of the Dockerfile for our application and see what's wrong with it. The project we will use in all the examples is Spring Petclinic. The simplest Dockerfile for our project looks like this: NOTE: Do not forget to build your JAR file. Dockerfile FROM eclipse-temurin:17 VOLUME /tmp COPY target/spring-petclinic-3.1.0-SNAPSHOT.jar app.jar After we have built the JAR file of our project, let's build our Dockerfile image and compare the sizes of our JAR file and the created Docker image. Dockerfile docker build -t spring-pet-clinic/jdk -f Dockerfile . docker image ls spring-pet-clinic/jdk # REPOSITORY TAG IMAGE ID CREATED SIZE # spring-pet-clinic/jdk latest 3dcd0ab89c3d 23 minutes ago 465MB If we look at the SIZE column, we can see that the size of our Docker image is 465MB! That's a lot, you might think, but maybe it's because our JAR is pretty big? In order to verify this, let's take a look at the size of our JAR file using the following command: Dockerfile ls -lh target/spring-petclinic-3.1.0-SNAPSHOT.jar | awk '{print $9, $5}' # target/spring-petclinic-3.1.0-SNAPSHOT.jar 55M According to the output of our command, you can see that the size of our JAR file is only 55MB. If we compare it to the size of a built Docker image, our JAR file is almost nine times smaller! Let's move on to analyze the reasons and how to make it smaller. What Are the Reasons for Big Docker Images, and How To Reduce Them? Before we move on to the optimization of our Docker image, we need to find out what exactly is causing it to be so relatively large. To do this, we will use a tool called Dive which is used for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker/OCI image. To install Dive, follow the guide in their README: Now, let’s find out why our Docker image has such a size by exploring layers by using this command: dive spring-pet-clinic/jdk (instead of spring-pet-clinic/jdk use your Docker image name). Its output may feel a little bit overwhelming, but don’t worry, we will explore its output together. For our purpose, we are mostly interested only in the top left part, which is the layers of our Docker image. We can navigate between layers by using the “arrow” buttons. Now, let’s find out which layers our Docker image consists of. Remember, these are the layers of Docker image built from our basic Dockerfile. The first layer is our operating system. By default, it is Ubuntu. In the next one, it installs tzdata, curl, wget, locales, and some more different utils, which takes 50MB! The third layer, as you can see from the screenshot above, is our entire Eclipse Temurin 17 JDK, and it takes 279MB, which is pretty big. And the last one is our built JAR, which takes 58MB. Now that we understand what our Docker image consists of, we can see that a big part of our Docker image includes the entire JDK and things such as timezones, locales, and different utilities, which is unnecessary. The first optimization for our Docker images is to use jlink tool included in Java 9 along with modularity. With jlink, we can create a custom Java runtime that includes only the necessary components, resulting in a smaller final image. Now, let's take a look at our new Dockerfile incorporating the jlink tool, which, in theory, should be smaller than the previous one. Dockerfile # Example of custom Java runtime using jlink in a multi-stage container build FROM eclipse-temurin:17 as jre-build # Create a custom Java runtime RUN $JAVA_HOME/bin/jlink \ --add-modules ALL-MODULE-PATH \ --strip-debug \ --no-man-pages \ --no-header-files \ --compress=2 \ --output /javaruntime # Define your base image FROM debian:buster-slim ENV JAVA_HOME=/opt/java/openjdk ENV PATH "${JAVA_HOME}/bin:${PATH}" COPY --from=jre-build /javaruntime $JAVA_HOME # Continue with your application deployment RUN mkdir /opt/app COPY target/spring-petclinic-3.1.0-SNAPSHOT.jar /opt/app/app.jar CMD ["java", "-jar", "/opt/app/app.jar"] To understand how our new Dockerfile works, let's walk through it: We use multi-stage Docker build in this Dockerfile and it consists of 2 stages. For the first stage, we use the same base image as in the previous Dockerfile. Also, we employ jlink tool to create a custom JRE, including all Java modules using —add-modules ALL-MODULE-PATH The second stage uses the debian:buster-slim base image and sets the environment variables for JAVA_HOME and PATH. It copies the custom JRE created in the first stage to the image. The Dockerfile then creates a directory for the application, copies the application JAR file into it, and specifies a command to run the Java application when the container starts. Let’s now build our container image and find out how much smaller it has become. Dockerfile docker build -t spring-pet-clinic/jlink -f Dockerfile_jlink . docker image ls spring-pet-clinic/jlink # REPOSITORY TAG IMAGE ID CREATED SIZE # spring-pet-clinic/jlink latest e7728584dea5 1 hours ago 217MB Our new container image is 217MB in size, which is two times smaller than our previous one. Stripping Container Image Size, Even More, Using Java Dependency Analysis Tool (Jdeps) What if I told you that the size of our container image can be made even smaller? When paired with jlink, you can also use the Java Dependency Analysis Tool (jdeps), which was first introduced in Java 8, to understand the static dependencies of your applications and libraries. In our previous example, for the jlink —add-modules parameter, we set ALL-MODULE-PATH which adds all existing Java modules in our custom JRE, and obviously, we don’t need to include every module. This way we can use jdeps to analyze the project's dependencies and remove any unused ones, further reducing the image size. Let’s take a look at how to use jdeps in our Dockerfile: Dockerfile # Example of custom Java runtime using jlink in a multi-stage container build FROM eclipse-temurin:17 as jre-build COPY target/spring-petclinic-3.1.0-SNAPSHOT.jar /app/app.jar WORKDIR /app # List jar modules RUN jar xf app.jar RUN jdeps \ --ignore-missing-deps \ --print-module-deps \ --multi-release 17 \ --recursive \ --class-path 'BOOT-INF/lib/*' \ app.jar > modules.txt # Create a custom Java runtime RUN $JAVA_HOME/bin/jlink \ --add-modules $(cat modules.txt) \ --strip-debug \ --no-man-pages \ --no-header-files \ --compress=2 \ --output /javaruntime # Define your base image FROM debian:buster-slim ENV JAVA_HOME=/opt/java/openjdk ENV PATH "${JAVA_HOME}/bin:${PATH}" COPY --from=jre-build /javaruntime $JAVA_HOME # Continue with your application deployment RUN mkdir /opt/server COPY --from=jre-build /app/app.jar /opt/server/ CMD ["java", "-jar", "/opt/server/app.jar"] Even without going into details, you can see that our Dockerfile has become much larger. Now let's analyze each piece and what it is responsible for: We still use multi-stage Docker build. Copy our built Java app and set WORKDIR to /app. Unpacks the JAR file, making its contents accessible for jdeps tool. The second RUN instruction runs jdeps tool on the extracted JAR file to analyze its dependencies and create a list of required Java modules. Here's what each option does: --ignore-missing-deps: Ignores any missing dependencies, allowing the analysis to continue. --print-module-deps: Specifies that the analysis should print the module dependencies. --multi-release 17: Indicates that the application JAR is compatible with multiple Java versions, in our case, Java 17. --recursive: Performs a recursive analysis to identify dependencies at all levels. --class-path 'BOOT-INF/lib/*': Defines the classpath for the analysis, instructing "jdeps" to look in the "BOOT-INF/lib" directory within the JAR file. app.jar > modules.txt: Redirects the output of the "jdeps" command to a file named "modules.txt," which will contain the list of Java modules required by the application. Then, we replace the ALL-MODULE-PATH value for —add-modules jlink parameter with $(cat modules.txt) to include only necessary modules # Define your base image section stays the same as in the previous Dockerfile. # Continue with your application deployment was modified to COPY out JAR file from the previous stage. The only thing left to do is to see how much the container image has shrunk using our latest Dockerfile: Dockerfile docker build -t spring-pet-clinic/jlink_jdeps -f Dockerfile_jdeps . docker image ls spring-pet-clinic/jlink_jdeps # REPOSITORY TAG IMAGE ID CREATED SIZE # spring-pet-clinic/jlink_jdeps latest d24240594f1e 3 hours ago 184MB So, by using only the modules we need to run our application, we reduced the size of our container image by 33MB, not a lot, but still nice. Conclusion Let's take another look, using Dive, at how our Docker images have shrunk after our optimizations. Instead of using the entire JDK, in this case, we built our custom JRE using jlink tool and using debian-slim base image. Which significantly reduced our image size. And, as you can see, we don’t have unnecessary stuff, such as timezones, locales, big OS, and entire JDK. We include only what we use and need. Dockerfile_jlink Here, we went even further and passed only used Java modules to our JRE, making the built JRE even smaller, thus reducing the size of the entire final image. Dockerfile_jdeps In conclusion, reducing the size of JVM Docker images can significantly optimize resource usage and speed up deployments. Employing techniques like multi-stage builds, jlink, jdeps, and experimenting with base images can make a substantial difference. While the size reduction might seem minimal in some cases, the cumulative effect can be significant, especially in environments where multiple containers are running. Thus, optimizing Docker images should be a key consideration in any application development and deployment process. More

Quarkus 3: The Future of Java Microservices With Virtual Threads and Beyond

By Daniel Oh

CORE

Over the past four years, developers have harnessed the power of Quarkus, experiencing its transformative capabilities in evolving Java microservices from local development to cloud deployments. As we stand on the brink of a new era, Quarkus 3 beckons with a promise of even more enhanced features, elevating developer experience, performance, scalability, and seamless cloud integration. In this enlightening journey, let’s delve into the heart of Quarkus 3's integration with virtual threads (Project Loom). You will learn how Quarkus enables you to simplify the creation of asynchronous concurrent applications, leveraging virtual threads for unparalleled scalability while ensuring efficient memory usage and peak performance. Journey of Java Threads You might have some experience with various types of Java threads if you have implemented Java applications for years. Let me remind you real quick how Java threads have been evolving over the last decades. Java threads have undergone significant advancements since their introduction in Java 1.0. The initial focus was on establishing fundamental concurrency mechanisms, including thread management, thread priorities, thread synchronization, and thread communication. As Java matured, it introduced atomic classes, concurrent collections, the ExecutorService framework, and the Lock and Condition interfaces, providing more sophisticated and efficient concurrency tools. Java 8 marked a turning point with the introduction of functional interfaces, lambda expressions, and the CompletableFuture API, enabling a more concise and expressive approach to asynchronous programming. Additionally, the Reactive Streams API standardized asynchronous stream processing and Project Loom introduced virtual threads, offering lightweight threads and improved concurrency support. Java 19 further enhanced concurrency features with structured concurrency constructs, such as Flow and WorkStealing, providing more structured and composable concurrency patterns. These advancements have significantly strengthened Java's concurrency capabilities, making it easier to develop scalable and performant concurrent applications. Java threads continue to evolve, with ongoing research and development focused on improving performance, scalability, and developer productivity in concurrent programming. Virtual threads, generally available (GA) in Java 21, are a revolutionary concurrency feature that addresses the limitations of traditional operating system (OS) threads. OS threads are heavyweight, limited in scalability, and complex to manage, posing challenges for developing scalable and performant concurrent applications. Virtual threads also offer several benefits, such as being a lightweight and efficient alternative, consuming less memory, reducing context-switching overhead, and supporting concurrent tasks. They simplify thread management, improve performance, and enhance scalability, paving the way for new concurrency paradigms and enabling more efficient serverless computing and microservices architectures. Virtual threads represent a significant advancement in Java concurrency, poised to shape the future of concurrent programming. Getting Started With Virtual Threads In general, you need to create a virtual thread using Thread.Builder directly in your Java project using JDK 21. For example, the following code snippet showcases how developers can create a new virtual thread and print a message to the console from the virtual thread. The Thread.ofVirtual() method creates a new virtual thread builder, and the name() method sets the name of the virtual thread to "virtual-thread". Then, the start() method starts the virtual thread and executes the provided Runnable lambda expression, which prints a message to the console. Lastly, the join() method waits for the virtual thread to finish executing before continuing. The System.out.println() statement in the main thread prints a message to the console after the virtual thread has finished executing. Java public class MyVirtualThread { public static void main(String[] args) throws InterruptedException { // Create a new virtual thread using Thread.Builder Thread thread = Thread .ofVirtual() .name("my-vt") .start(() -> { System.out.println("Hello from virtual thread!"); }); // Wait for the virtual thread to finish executing thread.join(); System.out.println("Main thread completed."); } } Alternatively, you can implement the ThreadFactory interface to start a new virtual thread in your Java project with JDK 21. The following code snippet showcases how developers can define a VirtualThreadFactory class that implements the ThreadFactory interface. The newThread() method of this class creates a new virtual thread using the Thread.ofVirtual() method. The name() method of the Builder object is used to set the name of the thread and the factory() method is used to set the ThreadFactory object. Java // Implement a ThreadFactory to start a new virtual thread public class VirtualThreadFactory implements ThreadFactory { private final String namePrefix; public VirtualThreadFactory(String namePrefix) { this.namePrefix = namePrefix; } @Override public Thread newThread(Runnable r) { return Thread.ofVirtual() .name(namePrefix + "-" + r.hashCode()) .factory(this) .build(); } } You might feel it will get more complex when you try to run your actual methods or classes on top of the virtual threads. Luckily, Quarkus enables you to skip the learning curve and execute the existing blocking services on the virtual threads quickly and efficiently. Let’s dive into it. Quarkus Way to the Virtual Thread You just need to keep reminding yourself of two things to run an application on virtual threads. Implement blocking services rather than reactive (or non-blocking) services based on JDK 21. Use @RunOnVirtualThread annotation on top of a method or a class that you want. Here is a code snippet of how Quarkus allows you to run the process() method on a virtual thread. Java @Path("/hello") public class GreetingResource { @GET @Produces(MediaType.TEXT_PLAIN) @RunOnVirtualThread public String hello() { Log.info(Thread.currentThread()); return "Quarkus 3: The Future of Java Microservices with Virtual Threads and Beyond"; } } You can start the Quarkus Dev mode (Live coding) to verify the above sample application. Then, invoke the REST endpoint using the curl command. Shell $ curl http://localhost:8080/hello The output should look like this. Shell Quarkus 3: The Future of Java Microservices with Virtual Threads and Beyond When you take a look at the terminal, you see that Quarkus dev mode is running. You can see that a virtual thread is created to run this application. Shell (quarkus-virtual-thread-0) VirtualThread[#123,quarkus-virtual-thread-0]/runnable@ForkJoinPool-1-worker-1 Try to invoke the endpoint a few more times, and the logs in the terminal should look like this. You learned how Quarkus integrates the virtual thread for Java developers to run blocking applications with a single @RunOnVirtualThread annotation. You should be aware that this annotation is not a silver bullet for all use cases. In the next article, I’ll introduce pitfalls, limitations, and performance test results against reactive applications. More

Java 11 to 21: A Visual Guide for Seamless Migration

By Otavio Santana

CORE

The Resurrection of Virtual Threads: Unraveling Their Journey From JDK 1.1 to JDK 21

By Roopa Kushtagi

What’s New Between Java 17 and Java 21?

By Gunter Rotsaert

CORE

JPA Criteria With Pagination

In Java Persistence API (JPA) development, the flexibility and dynamism of queries play a pivotal role, especially when dealing with dynamic search interfaces or scenarios where the query structure is known only at runtime. The JPA Criteria Query emerges as a powerful tool for constructing such dynamic queries, allowing developers to define complex search criteria programmatically. One critical aspect of real-world applications, particularly those involving user interfaces for specific record searches, is the implementation of pagination. Pagination not only enhances the user experience by presenting results in manageable chunks but also contributes to resource optimization on the application side. This introduction explores the synergy between JPA Criteria Query and Pagination, shedding light on how developers can leverage this combination to efficiently fetch and organize data. The ensuing discussion will delve into the steps involved in implementing pagination using JPA Criteria Query, providing a practical understanding of this essential aspect of Java persistence. The Criteria API provides a powerful and flexible way to define queries dynamically, especially when the structure of the query is known only at runtime. In many real-world applications, providing a search interface with specific record requirements is common. Pagination, a technique where query results are split into manageable chunks, is essential for enhancing user experience and optimizing resource consumption on the application side. Let's delve into the main points of implementing pagination using the JPA Criteria Query. Note: This explanation assumes a working knowledge of the JPA Criteria AP. Implementation Steps Step 1: Fetching Records Java public List<Post> filterPosts(Integer size, Integer offset) { CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder(); CriteriaQuery<Post> criteriaQuery = criteriaBuilder.createQuery(Post.class); Root<Post> root = criteriaQuery.from(Post.class); // Optional: Add selection criteria/predicates // List<Predicate> predicates = new ArrayList<>(); // predicates.add(criteriaBuilder.equal(root.get("status"), "published")); // CriteriaQuery<Post> query = criteriaQuery.where(predicates); List<Post> postList = entityManager .createQuery(criteriaQuery) .setFirstResult(offset) .setMaxResults(size) .getResultList(); return postList; } In this step, we use the CriteriaBuilder and CriteriaQuery to construct a query for the desired entity (Post, in this case). The from method is used to specify the root of the query. If needed, you can add selection criteria or predicates to narrow down the result set. Finally, the setFirstResult and setMaxResults methods are used for pagination, where offset specifies the start position, and size specifies the maximum number of results. Step 2: Count All Records Java private int totalItemsCount(Predicate finalPredicate) { try { CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder(); CriteriaQuery<Long> criteriaQuery = criteriaBuilder.createQuery(Long.class); // Optional: If joins are involved, you need to specify // Root<Post> root = criteriaQuery.from(Post.class); // Join<Post, Comments> joinComments = root.join("comments"); return Math.toIntExact(entityManager.createQuery(criteriaQuery .select(criteriaBuilder.count(root)) .where(finalPredicate)) .getSingleResult()); } catch (Exception e) { log.error("Error fetching total count: {}", e.getMessage()); } return 0; } In this step, we define a method to count all records that satisfy the criteria. The criteriaBuilder is used to construct a CriteriaQuery of type Long to perform the count. The count query is constructed using the select and where methods and the result is obtained using getSingleResult. This implementation provides insight into how the JPA Criteria Query can be utilized for efficient pagination. I hope it helped.

By Amanuel G. Shiferaw

CORE

Making Spring AI and OpenAI GPT Useful With RAG on Your Own Documents

The AIDocumentLibraryChat project uses the Spring AI project with OpenAI to search in a document library for answers to questions. To do that, Retrieval Augmented Generation is used on the documents. Retrieval Augmented Generation The process looks like this: The process looks like this: Upload Document Store Document in Postgresql DB. Split Document to create Embeddings. Create Embeddings with a call to the OpenAI Embedding Model. Store the Document Embeddings in the Postgresql Vector DB. Search Documents: Create Search Prompt Create Embedding of the Search Prompt with a call to the OpenAI Embedding Model. Query the Postgresql Vector DB for documents with nearest Embedding distances. Query Postgresql DB for Document. Create Prompt with the Search Prompt and the Document text chunk. Request an answer from GPT Model and show the answer based on the search prompt and the Document text chunk. Document Upload The uploaded document is stored in the database to have the source document of the answer. The document text has to be split in chunks to create embeddings per chunk. The embeddings are created by an embedding model of OpenAI and are a vectors with more than 1500 dimensions to represent the text chunk. The embedding is stored in an AI document with the chunk text and the id of the source document in the vector database. Document Search The document search takes the search prompt and uses the Open AI embedding model to turn it in an embedding. The embedding is used to search in the vector database for the nearest neighbor vector. That means that the embeddings of search prompt and the text chunk that have the biggest similarities. The id in the AIDocument is used to read the document of the relational database. With the Search Prompt and the text chunk of the AIDocument, the Document Prompt created. Then, the OpenAI GPT model is called with the prompt to create an answer based on Search Prompt and the document context. That causes the model to create answers that are closely based on the documents provided and improves the accuracy. The answer of the GPT model is returned and displayed with a link of the document to provide the source of the answer. Architecture The architecture of the project is built around Spring Boot with Spring AI. The Angular UI provides the user interface to show the document list, upload the documents and provide the Search Prompt with the answer and the source document. It communicates with the Spring Boot backend via the rest interface. The Spring Boot backend provides the rest controllers for the frontend and uses Spring AI to communicate with the OpenAI models and the Postgresql Vector database. The documents are stored with Jpa in the Postgresql Relational database. The Postgresql database is used because it combines the relational database and the vector database in a Docker image. Implementation Frontend The frontend is based on lazy loaded standalone components build with Angular. The lazy loaded standalone components are configured in the app.config.ts: TypeScript export const appConfig: ApplicationConfig = { providers: [provideRouter(routes), provideAnimations(), provideHttpClient()] }; The configuration sets the routes and enables the the http client and the animations. The lazy loaded routes are defined in app.routes.ts: TypeScript export const routes: Routes = [ { path: "doclist", loadChildren: () => import("./doc-list").then((mod) => mod.DOCLIST), }, { path: "docsearch", loadChildren: () => import("./doc-search").then((mod) => mod.DOCSEARCH), }, { path: "**", redirectTo: "doclist" }, ]; In 'loadChildren' the 'import("...").then((mod) => mod.XXX)' loads the the route lazily from the provided path and sets the exported routes defined in the 'mod.XXX' constant. The lazy loaded route 'docsearch' has the index.ts to export the constant: TypeScript export * from "./doc-search.routes"; That exports the doc-search.routes.ts: TypeScript export const DOCSEARCH: Routes = [ { path: "", component: DocSearchComponent, }, { path: "**", redirectTo: "" }, ]; It defines the routing to the 'DocSearchComponent'. The fileupload can be found in the DocImportComponent with the template doc-import.component.html: HTML <h1 mat-dialog-title i18n="@@docimportImportFile">Import file</h1> <div mat-dialog-content> <p i18n="@@docimportFileToImport">File to import</p> @if(uploading) { <div class="upload-spinner"><mat-spinner></mat-spinner></div> } @else { <input type="file" (change)="onFileInputChange($event)"> } @if(!!file) { <div> <ul> <li>Name: {{file.name}</li> <li>Type: {{file.type}</li> <li>Size: {{file.size} bytes</li> </ul> </div> } </div> <div mat-dialog-actions> <button mat-button (click)="cancel()" i18n="@@cancel">Cancel</button> <button mat-flat-button color="primary" [disabled]="!file || uploading" (click)="upload()" i18n="@@docimportUpload">Upload</button> </div> The fileupload is done with the '<input type="file" (change)="onFileInputChange($event)">' tag. It provides the upload feature and calls the 'onFileInputChange(...)' method after each upload. The 'Upload' button calls the 'upload()' method to send the file to the server on click. The doc-import.component.ts has methods for the template: TypeScript @Component({ selector: 'app-docimport', standalone: true, imports: [CommonModule,MatFormFieldModule, MatDialogModule,MatButtonModule, MatInputModule, FormsModule, MatProgressSpinnerModule], templateUrl: './doc-import.component.html', styleUrls: ['./doc-import.component.scss'] }) export class DocImportComponent { protected file: File | null = null; protected uploading = false; private destroyRef = inject(DestroyRef); constructor(private dialogRef: MatDialogRef<DocImportComponent>, @Inject(MAT_DIALOG_DATA) public data: DocImportComponent, private documentService: DocumentService) { } protected onFileInputChange($event: Event): void { const files = !$event.target ? null : ($event.target as HTMLInputElement).files; this.file = !!files && files.length > 0 ? files[0] : null; } protected upload(): void { if(!!this.file) { const formData = new FormData(); formData.append('file', this.file as Blob, this.file.name as string); this.documentService.postDocumentForm(formData) .pipe(tap(() => {this.uploading = true;}), takeUntilDestroyed(this.destroyRef)) .subscribe(result => {this.uploading = false; this.dialogRef.close();}); } } protected cancel(): void { this.dialogRef.close(); } } This is the standalone component with its module imports and the injected 'DestroyRef'. The 'onFileInputChange(...)' method takes the event parameter and stores its 'files' property in the 'files' constant. Then it checks for the first file and stores it in the 'file' component property. The 'upload()' method checks for the 'file' property and creates the 'FormData()' for the file upload. The 'formData' constant has the datatype ('file'), the content ('this.file') and the filename ('this.file.name') appended. Then the 'documentService' is used to post the 'FormData()' object to the server. The 'takeUntilDestroyed(this.destroyRef)' function unsubscribes the Rxjs pipeline after the component is destroyed. That makes unsubscribing pipelines very convenient in Angular. Backend The backend is a Spring Boot application with the Spring AI framework. Spring AI manages the requests to the OpenAI models and the Vector Database Requests. Liquibase Database setup The database setup is done with Liquibase and the script can be found in the db.changelog-1.xml: XML <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.8.xsd"> <changeSet id="1" author="angular2guy"> <sql>CREATE EXTENSION if not exists hstore;</sql> </changeSet> <changeSet id="2" author="angular2guy"> <sql>CREATE EXTENSION if not exists vector;</sql> </changeSet> <changeSet id="3" author="angular2guy"> <sql>CREATE EXTENSION if not exists "uuid-ossp";</sql> </changeSet> <changeSet author="angular2guy" id="4"> <createTable tableName="document"> <column name="id" type="bigint"> <constraints primaryKey="true"/> </column> <column name="document_name" type="varchar(255)"> <constraints notNullConstraintName="document_document_name_notnull" nullable="false"/> </column> <column name="document_type" type="varchar(25)"> <constraints notNullConstraintName="document_document_type_notnull" nullable="false"/> </column> <column name="document_content" type="blob"/> </createTable> </changeSet> <changeSet author="angular2guy" id="5"> <createSequence sequenceName="document_seq" incrementBy="50" startValue="1000" /> </changeSet> <changeSet id="6" author="angular2guy"> <createTable tableName="vector_store"> <column name="id" type="uuid" defaultValueComputed="uuid_generate_v4 ()"> <constraints primaryKey="true"/> </column> <column name="content" type="text"/> <column name="metadata" type="json"/> <column name="embedding" type="vector(1536)"> <constraints notNullConstraintName= "vectorstore_embedding_type_notnull" nullable="false"/> </column> </createTable> </changeSet> <changeSet id="7" author="angular2guy"> <sql>CREATE INDEX vectorstore_embedding_index ON vector_store USING HNSW (embedding vector_cosine_ops);</sql> </changeSet> </databaseChangeLog> In the changeset 4 the table for the Jpa document entity is created with the primary key 'id'. The content type/size is unknown and because of that set to 'blob'. I changeset 5 the sequence for the Jpa entity is created with the default properties of Hibernate 6 sequences that are used by Spring Boot 3.x. In changeset 6 the table 'vector_store' is created with a primary key 'id' of type 'uuid' that is created by the 'uuid-ossp' extension. The column 'content' is of type 'text'('clob' in other databases) to has a flexible size. The 'metadata' column stores the metadata in a 'json' type for the AIDocuments. The 'embedding' column stores the embedding vector with the number of OpenAI dimensions. In changeset 7 the index for the fast search of the 'embeddings' column is set. Due to limited parameters of the Liquibase '<createIndex ...>' '<sql>' is used directly to create it. Spring Boot / Spring AI implementation The DocumentController for the frontend looks like this: Java @RestController @RequestMapping("rest/document") public class DocumentController { private final DocumentMapper documentMapper; private final DocumentService documentService; public DocumentController(DocumentMapper documentMapper, DocumentService documentService) { this.documentMapper = documentMapper; this.documentService = documentService; } @PostMapping("/upload") public long handleDocumentUpload( @RequestParam("file") MultipartFile document) { var docSize = this.documentService .storeDocument(this.documentMapper.toEntity(document)); return docSize; } @GetMapping("/list") public List<DocumentDto> getDocumentList() { return this.documentService.getDocumentList().stream() .flatMap(myDocument ->Stream.of(this.documentMapper.toDto(myDocument))) .flatMap(myDocument -> { myDocument.setDocumentContent(null); return Stream.of(myDocument); }).toList(); } @GetMapping("/doc/{id}") public ResponseEntity<DocumentDto> getDocument( @PathVariable("id") Long id) { return ResponseEntity.ofNullable(this.documentService .getDocumentById(id).stream().map(this.documentMapper::toDto) .findFirst().orElse(null)); } @GetMapping("/content/{id}") public ResponseEntity<byte[]> getDocumentContent( @PathVariable("id") Long id) { var resultOpt = this.documentService.getDocumentById(id).stream() .map(this.documentMapper::toDto).findFirst(); var result = resultOpt.stream().map(this::toResultEntity) .findFirst().orElse(ResponseEntity.notFound().build()); return result; } private ResponseEntity<byte[]> toResultEntity(DocumentDto documentDto) { var contentType = switch (documentDto.getDocumentType()) { case DocumentType.PDF -> MediaType.APPLICATION_PDF; case DocumentType.HTML -> MediaType.TEXT_HTML; case DocumentType.TEXT -> MediaType.TEXT_PLAIN; case DocumentType.XML -> MediaType.APPLICATION_XML; default -> MediaType.ALL; }; return ResponseEntity.ok().contentType(contentType) .body(documentDto.getDocumentContent()); } @PostMapping("/search") public DocumentSearchDto postDocumentSearch(@RequestBody SearchDto searchDto) { var result = this.documentMapper .toDto(this.documentService.queryDocuments(searchDto)); return result; } } The 'handleDocumentUpload(...)' handles the uploaded file with the 'documentService' at the '/rest/document/upload' path. The 'getDocumentList()' handles the get requests for the document lists and removes the document content to save on the response size. The 'getDocumentContent(...)' handles the get requests for the document content. It loads the document with the 'documentService' and maps the 'DocumentType' to the 'MediaType'. Then it returns the content and the content type, and the browser opens the content based on the content type. The 'postDocumentSearch(...)' method puts the request content in the 'SearchDto' object and returns the AI generated result of the 'documentService.queryDocuments(...)' call. The method 'storeDocument(...)' of the DocumentService looks like this: Java public Long storeDocument(Document document) { var myDocument = this.documentRepository.save(document); Resource resource = new ByteArrayResource(document.getDocumentContent()); var tikaDocuments = new TikaDocumentReader(resource).get(); record TikaDocumentAndContent(org.springframework.ai.document.Document document, String content) { } var aiDocuments = tikaDocuments.stream() .flatMap(myDocument1 -> this.splitStringToTokenLimit( myDocument1.getContent(), CHUNK_TOKEN_LIMIT) .stream().map(myStr -> new TikaDocumentAndContent(myDocument1, myStr))) .map(myTikaRecord -> new org.springframework.ai.document.Document( myTikaRecord.content(), myTikaRecord.document().getMetadata())) .peek(myDocument1 -> myDocument1.getMetadata() .put(ID, myDocument.getId().toString())).toList(); LOGGER.info("Name: {}, size: {}, chunks: {}", document.getDocumentName(), document.getDocumentContent().length, aiDocuments.size()); this.documentVsRepository.add(aiDocuments); return Optional.ofNullable(myDocument.getDocumentContent()).stream() .map(myContent -> Integer.valueOf(myContent.length).longValue()) .findFirst().orElse(0L); } private List<String> splitStringToTokenLimit(String documentStr, int tokenLimit) { List<String> splitStrings = new ArrayList<>(); var tokens = new StringTokenizer(documentStr).countTokens(); var chunks = Math.ceilDiv(tokens, tokenLimit); if (chunks == 0) { return splitStrings; } var chunkSize = Math.ceilDiv(documentStr.length(), chunks); var myDocumentStr = new String(documentStr); while (!myDocumentStr.isBlank()) { splitStrings.add(myDocumentStr.length() > chunkSize ? myDocumentStr.substring(0, chunkSize) : myDocumentStr); myDocumentStr = myDocumentStr.length() > chunkSize ? myDocumentStr.substring(chunkSize) : ""; } return splitStrings; } The 'storeDocument(...)' method saves the document to the relational database. Then, the document is converted in a 'ByteArrayResource' and read with the 'TikaDocumentReader' of Spring AI to turn it in a AIDocument list. Then the AIDocument list is flatmapped to split the documents into chunks with the the 'splitToTokenLimit(...)' method that are turned in new AIDocuments with the 'id' of the stored document in the Metadata map. The 'id' in the Metadata enables loading the matching document entity for the AIDocuments. Then the embeddings for the AIDocuments are created implicitly with calls to the 'documentVsRepository.add(...)' method that calls the OpenAI Embedding model and stores the AIDocuments with the embeddings in the vector database. Then the result is returned. The method 'queryDocument(...)' looks like this: Java public AiResult queryDocuments(SearchDto searchDto) { var similarDocuments = this.documentVsRepository .retrieve(searchDto.getSearchString()); var mostSimilar = similarDocuments.stream() .sorted((myDocA, myDocB) -> ((Float) myDocA.getMetadata().get(DISTANCE)) .compareTo(((Float) myDocB.getMetadata().get(DISTANCE)))).findFirst(); var documentChunks = mostSimilar.stream().flatMap(mySimilar -> similarDocuments.stream().filter(mySimilar1 -> mySimilar1.getMetadata().get(ID).equals( mySimilar.getMetadata().get(ID)))).toList(); Message systemMessage = switch (searchDto.getSearchType()) { case SearchDto.SearchType.DOCUMENT -> this.getSystemMessage( documentChunks, (documentChunks.size() <= 0 ? 2000 : Math.floorDiv(2000, documentChunks.size()))); case SearchDto.SearchType.PARAGRAPH -> this.getSystemMessage(mostSimilar.stream().toList(), 2000); }; UserMessage userMessage = new UserMessage(searchDto.getSearchString()); Prompt prompt = new Prompt(List.of(systemMessage, userMessage)); LocalDateTime start = LocalDateTime.now(); AiResponse response = aiClient.generate(prompt); LOGGER.info("AI response time: {}ms", ZonedDateTime.of(LocalDateTime.now(), ZoneId.systemDefault()).toInstant().toEpochMilli() - ZonedDateTime.of(start, ZoneId.systemDefault()).toInstant() .toEpochMilli()); var documents = mostSimilar.stream().map(myGen -> myGen.getMetadata().get(ID)).filter(myId -> Optional.ofNullable(myId).stream().allMatch(myId1 -> (myId1 instanceof String))).map(myId -> Long.parseLong(((String) myId))) .map(this.documentRepository::findById) .filter(Optional::isPresent) .map(Optional::get).toList(); return new AiResult(searchDto.getSearchString(), response.getGenerations(), documents); } private Message getSystemMessage( List<org.springframework.ai.document.Document> similarDocuments, int tokenLimit) { String documents = similarDocuments.stream() .map(entry -> entry.getContent()) .filter(myStr -> myStr != null && !myStr.isBlank()) .map(myStr -> this.cutStringToTokenLimit(myStr, tokenLimit)) .collect(Collectors.joining("\n")); SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(this.systemPrompt); Message systemMessage = systemPromptTemplate .createMessage(Map.of("documents", documents)); return systemMessage; } private String cutStringToTokenLimit(String documentStr, int tokenLimit) { String cutString = new String(documentStr); while (tokenLimit < new StringTokenizer(cutString, " -.;,").countTokens()){ cutString = cutString.length() > 1000 ? cutString.substring(0, cutString.length() - 1000) : ""; } return cutString; } The method first loads the documents best matching the 'searchDto.getSearchString()' from the vector database. To do that the OpenAI Embedding model is called to turn the search string into an embedding and with that embedding the vector database is queried for the AIDocuments with the lowest distance(the distance between the vectors of the search embedding and the database embedding). Then the AIDocument with the lowest distance is stored in the 'mostSimilar' variable. Then all the AIDocuments of the document chunks are collected by matching the document entity id of their Metadata 'id's. The 'systemMessage' is created with the 'documentChunks' or the 'mostSimilar' content. The 'getSystemMessage(...)' method takes them and cuts the contentChunks to a size that the OpenAI GPT models can handle and returns the 'Message'. Then the 'systemMessage' and the 'userMessage' are turned into a 'prompt' that is send with 'aiClient.generate(prompt)' to the OpenAi GPT model. After that the AI answer is available and the document entity is loaded with the id of the metadata of the 'mostSimilar' AIDocument. The 'AiResult' is created with the search string, the GPT answer, the document entity and is returned. The vector database repository DocumentVsRepositoryBean with the Spring AI 'VectorStore' looks like this: Java @Repository public class DocumentVSRepositoryBean implements DocumentVsRepository { private final VectorStore vectorStore; public DocumentVSRepositoryBean(JdbcTemplate jdbcTemplate, EmbeddingClient embeddingClient) { this.vectorStore = new PgVectorStore(jdbcTemplate, embeddingClient); } public void add(List<Document> documents) { this.vectorStore.add(documents); } public List<Document> retrieve(String query, int k, double threshold) { return new VectorStoreRetriever(vectorStore, k, threshold).retrieve(query); } public List<Document> retrieve(String query) { return new VectorStoreRetriever(vectorStore).retrieve(query); } } The repository has the 'vectorStore' property that is used to access the vector database. It is created in the constructor with the injected parameters with the 'new PgVectorStore(...)' call. The PgVectorStore class is provided as the Postgresql Vector database extension. It has the 'embeddingClient' to use the OpenAI Embedding model and the 'jdbcTemplate' to access the database. The method 'add(...)' calls the OpenAI Embedding model and adds AIDocuments to the vector database. The methods 'retrieve(...)' query the vector database for embeddings with the lowest distances. Conclusion Angular made the creation of the front end easy. The standalone components with lazy loading have made the initial load small. The Angular Material components have helped a lot with the implementation and are easy to use. Spring Boot with Spring AI has made the use of Large Language Models easy. Spring AI provides the framework to hide the creation of embeddings and provides an easy-to-use interface to store the AIDocuments in a vector database(several are supported). The creation of the embedding for the search prompt to load the nearest AIDocuments is also done for you and the interface of the vector database is simple. The Spring AI prompt classes make the creation of the prompt for the OpenAI GPT models also easy. Calling the model is done with the injected 'aiClient,' and the results are returned. Spring AI is a very good Framework from the Spring Team. There have been no problems with the experimental version. With Spring AI, the Large Language Models are now easy to use on our own documents.

By Sven Loesekann

Developing Brain-Computer Interface (BCI) Applications With Java: A Guide for Developers

Brain-computer interfaces (BCIs) have emerged as a groundbreaking technology that enables direct communication between the human brain and external devices. BCIs have the potential to revolutionize various fields, including medical, entertainment, and assistive technologies. This developer-oriented article delves deeper into the concepts, applications, and challenges of BCI technology and explores how Java, a widely-used programming language, can be employed in developing BCI applications. Understanding Brain-Computer Interfaces (BCIs) A BCI is a system that acquires, processes and translates brain signals into commands that can control external devices. The primary components of a BCI include: Signal acquisition: Capturing brain signals using non-invasive or invasive methods. Non-invasive techniques, such as Electroencephalography (EEG), are commonly used due to their ease of use and lower risk. Invasive techniques, like Electrocorticography (ECoG), offer higher signal quality but require surgical implantation. Signal processing: Improving the quality of acquired brain signals through preprocessing techniques like filtering and amplification. Various algorithms are then used to extract relevant features from the signals. Classification and translation: Employing machine learning algorithms to classify the extracted features and translate them into commands that can control external devices. Device control: Sending the translated commands to the target device, which can range from computer cursors to robotic limbs. Java Libraries and Frameworks for BCI Development Java offers several libraries and frameworks that can be utilized for various stages of BCI development. Some key libraries and frameworks include: Java Neural Network Framework (JNNF): JNNF is an open-source library that provides tools for creating, training, and deploying artificial neural networks. It can be used for feature extraction, classification, and translation in BCI applications. Encog: Encog is a machine learning framework that supports various neural network architectures, genetic algorithms, and support vector machines. It can be employed for signal processing, feature extraction, and classification in BCI development. Java Data Acquisition (jDaq): jDaq is a Java library that provides a high-level interface to data acquisition hardware, such as EEG devices. It can be used for acquiring brain signals in real-time. Java OpenCV: OpenCV is a popular computer vision library that has Java bindings. It can be used for processing and analyzing brain signal data in BCI applications. Developing a BCI Application With Java: A Step-by-Step Guide Acquire brain signals: Connect your EEG device to your computer and use a library like jDaq to acquire brain signals in real-time. Ensure that the device driver and SDK are compatible with Java. Preprocess and filter signals: Use libraries like Java OpenCV or Encog to preprocess the acquired signals by removing noise, artifacts, and other unwanted elements. Apply suitable filters, such as bandpass or notch filters, to isolate relevant frequency bands. Extract features: Implement feature extraction algorithms, such as Fast Fourier Transform (FFT) or Wavelet Transform, to extract relevant features from the preprocessed signals. You can use libraries like JNNF or Encog for this purpose. Train a Classifier: Split the extracted features into training and testing datasets. Use machine learning algorithms, such as neural networks or support vector machines, to train a classifier on the training dataset. Libraries like JNNF and Encog can be employed for this task. Translate brain signals: Implement a real-time system that acquires brain signals, preprocesses them, extracts features, and classifies them using the trained classifier. Translate the classification results into commands that can control external devices. Control external devices: Send the translated commands to the target device using appropriate communication protocols, such as Bluetooth, Wi-Fi, or USB. Ensure that the device is compatible with Java and has the necessary APIs for communication. Code Snippet Example Here's a simple example of a Java code snippet that demonstrates the basic structure of a BCI application. In this example, we'll use a mock dataset to simulate brain signal acquisition and the Encog library for feature extraction and classification. The example assumes you have already trained a classifier and saved it as a file. First, add the Encog library to your project. You can download the JAR file from the official website (http://www.heatonresearch.com/encog/) or use a build tool like Maven or Gradle. Import the necessary classes: Java import org.encog.engine.network.activation.ActivationSigmoid; import org.encog.ml.data.MLData; import org.encog.ml.data.MLDataPair; import org.encog.ml.data.basic.BasicMLData; import org.encog.ml.data.basic.BasicMLDataSet; import org.encog.neural.networks.BasicNetwork; import org.encog.neural.networks.layers.BasicLayer; import org.encog.persist.EncogDirectoryPersistence; Define a method for preprocessing and feature extraction. This is just a placeholder; you should replace it with your actual preprocessing and feature extraction logic. Java private static double[] preprocessAndExtractFeatures(double[] rawBrainSignal) { // Preprocess the raw brain signal and extract features double[] extractedFeatures = new double[rawBrainSignal.length]; // Your preprocessing and feature extraction logic here return extractedFeatures; } Load the trained classifier (a neural network in this case) from a file and create a method to classify the extracted features: Java private static BasicNetwork loadTrainedClassifier(String classifierFilePath) { BasicNetwork network = (BasicNetwork) EncogDirectoryPersistence.loadObject(new File(classifierFilePath)); return network; } private static int classifyFeatures(double[] extractedFeatures, BasicNetwork network) { MLData input = new BasicMLData(extractedFeatures); MLData output = network.compute(input); // Find the class with the highest output value int predictedClass = 0; double maxOutputValue = output.getData(0); for (int i = 1; i < output.size(); i++) { if (output.getData(i) > maxOutputValue) { maxOutputValue = output.getData(i); predictedClass = i; } } return predictedClass; } Finally, create a main method that simulates brain signal acquisition, preprocesses and extracts features, and classifies them using the trained classifier: Java public static void main(String[] args) { // Load the trained classifier String classifierFilePath = "path/to/your/trained/classifier/file.eg"; BasicNetwork network = loadTrainedClassifier(classifierFilePath); // Simulate brain signal acquisition (replace this with actual data from your EEG device) double[] rawBrainSignal = new double[]{0.5, 0.3, 0.8, 0.2, 0.9}; // Preprocess the raw brain signal and extract features double[] extractedFeatures = preprocessAndExtractFeatures(rawBrainSignal); // Classify the extracted features int predictedClass = classifyFeatures(extractedFeatures, network); System.out.println("Predicted class: " + predictedClass); // Translate the predicted class into a command for an external device // Your translation logic here // Send the command to the target device // Your device control logic here } This example demonstrates the basic structure of a BCI application using Java and the Encog library. You should replace the placeholder methods for preprocessing, feature extraction, and device control with your actual implementation according to your specific BCI application requirements. Challenges and Future Directions Despite the promising potential of BCIs, several challenges need to be addressed: Signal quality: Improving the quality and reliability of brain signal acquisition remains a significant challenge, particularly for non-invasive methods. User training: Users often require extensive training to generate consistent and distinguishable brain signals for accurate BCI control. Ethical and privacy concerns: The development and use of BCIs raise ethical questions related to data privacy, informed consent, and potential misuse of the technology. Conclusion Brain-computer interfaces hold immense potential in transforming various fields by enabling direct communication between the human brain and external devices. Java, with its rich libraries, frameworks, and cross-platform compatibility, can play a crucial role in developing BCI applications. However, addressing the challenges related to signal quality, user training, and ethical concerns is essential for the widespread adoption and success of this revolutionary technology.

By Arun Pandey

Querydsl vs. JPA Criteria, Part 5: Maven Integration

As most technologies or dependencies evolve fast, it's sometimes hard to make the initial setup or upgrade smoothly. The goal of this article is to provide a summary of the Maven setup for the Querydsl framework, depending on the used technology. After that, let's see a short overview of the Querydsl solution. In This Article, You Will Learn How to setup Querydsl with Spring Boot 2.x (i.e Java EE) and Spring Boot 3.x (i.e. Jakarta EE) What is a Maven classifier How is the Maven classifier used in Querydsl build Usage of Eclipse Transformer Plugin Querydsl Setup There are several possibilities to set up Querydsl framework in a Spring Boot application. The correct approach depends on the technologies used. Before we get into it, let's start with the recommended official setup. Official Setup Querydsl framework has a nice documentation site. The Maven integration is described in Chapter 2.1.1 where the recommended setup is based on the following: querydsl-jpa and querydsl-apt dependencies and usage of apt-maven-plugin plugin. The querydsl-apt dependency isn't mentioned on the official site, but such dependency is needed for the generation of metadata Q classes (see Metadata article). If we don't use querydsl-apt dependency then we get the error like this: Plain Text [INFO] --- apt:1.1.3:process (default) @ sat-jpa --- error: Annotation processor 'com.querydsl.apt.jpa.JPAAnnotationProcessor' not found 1 error The full working Maven setup based on the official recommendation is like this: XML <dependencies> ... <dependency> <groupId>com.querydsl</groupId> <artifactId>querydsl-jpa</artifactId> </dependency> ... </dependencies> <build> <plugins> <plugin> <groupId>com.mysema.maven</groupId> <artifactId>apt-maven-plugin</artifactId> <version>1.1.3</version> <executions> <execution> <goals> <goal>process</goal> </goals> <configuration> <outputDirectory>target/generated-sources/java</outputDirectory> <processor>com.querydsl.apt.jpa.JPAAnnotationProcessor</processor> </configuration> </execution> </executions> <dependencies> <dependency> <groupId>com.querydsl</groupId> <artifactId>querydsl-apt</artifactId> <version>${querydsl.version}</version> </dependency> </dependencies> </plugin> </plugins> </build> </project> This setup is working, and the Maven build is successful (see the log below). Unfortunately, several errors can be found there. In our case, the logs contain e.g. error: cannot find symbol import static com.github.aha.sat.jpa.city.City_.COUNTRY. Plain Text [INFO] ---------------------< com.github.aha.sat:sat-jpa >--------------------- [INFO] Building sat-jpa 0.5.2-SNAPSHOT [INFO] from pom.xml [INFO] --------------------------------[ jar ]--------------------------------- [INFO] [INFO] --- clean:3.2.0:clean (default-clean) @ sat-jpa --- [INFO] Deleting <local_path>\sat-jpa\target [INFO] [INFO] --- apt:1.1.3:process (default) @ sat-jpa --- <local_path>\sat-jpa\src\main\java\com\github\aha\sat\jpa\city\CityRepository.java:3: error: cannot find symbol import static com.github.aha.sat.jpa.city.City_.COUNTRY; ^ symbol: class City_ location: package com.github.aha.sat.jpa.city <local_path>\sat-jpa\src\main\java\com\github\aha\sat\jpa\city\CityRepository.java:3: error: static import only from classes and interfaces import static com.github.aha.sat.jpa.city.City_.COUNTRY; ^ <local_path>\sat-jpa\src\main\java\com\github\aha\sat\jpa\city\CityRepository.java:4: error: cannot find symbol import static com.github.aha.sat.jpa.city.City_.NAME; ^ symbol: class City_ location: package com.github.aha.sat.jpa.city ... 19 errors [INFO] [INFO] --- resources:3.2.0:resources (default-resources) @ sat-jpa --- ... [INFO] [INFO] Results: [INFO] [INFO] Tests run: 51, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] [INFO] --- jar:3.2.2:jar (default-jar) @ sat-jpa --- [INFO] Building jar: <local_path>\sat-jpa\target\sat-jpa.jar [INFO] [INFO] --- spring-boot:2.7.5:repackage (repackage) @ sat-jpa --- [INFO] Replacing main artifact with repackaged archive [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 15.680 s [INFO] Finished at: 2023-09-20T08:43:59+02:00 [INFO] ------------------------------------------------------------------------ Let's focus on how to fix this issue in the next parts. Setup for Java EE With Spring Boot 2.x Once I found this StackOverflow issue, I realized that the querydsl-apt plugin is no longer needed. The trick lies in using querydsl-apt a dependency with a jpa classifier instead of using the apt-maven-pluginplugin. Note: the querydsl-apt plugin seems to be deprecated since Querydsl 3 (see the following). With that, the simplified Maven setup looks like this: XML <dependencies> ... <dependency> <groupId>com.querydsl</groupId> <artifactId>querydsl-jpa</artifactId> </dependency> <dependency> <groupId>com.querydsl</groupId> <artifactId>querydsl-apt</artifactId> <version>${querydsl.version}</version> <classifier>jpa</classifier> <scope>provided</scope> </dependency> ... </dependencies> Note: Once we specify the classifier, we also need to specify a version of the dependency. Therefore, we cannot rely on the version defined in Spring Boot anymore. The logs from the Maven build are clean now. Plain Text [INFO] Scanning for projects... [INFO] [INFO] ---------------------< com.github.aha.sat:sat-jpa >--------------------- [INFO] Building sat-jpa 0.5.2-SNAPSHOT [INFO] from pom.xml [INFO] --------------------------------[ jar ]--------------------------------- [INFO] [INFO] --- clean:3.2.0:clean (default-clean) @ sat-jpa --- [INFO] Deleting <local_path>\sat-jpa\target [INFO] [INFO] --- resources:3.2.0:resources (default-resources) @ sat-jpa --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Using 'UTF-8' encoding to copy filtered properties files. [INFO] Copying 1 resource [INFO] Copying 2 resources ... [INFO] [INFO] --- jar:3.2.2:jar (default-jar) @ sat-jpa --- [INFO] Building jar: <local_path>\sat-jpa\target\sat-jpa.jar [INFO] [INFO] --- spring-boot:2.7.5:repackage (repackage) @ sat-jpa --- [INFO] Replacing main artifact with repackaged archive [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 15.148 s [INFO] Finished at: 2023-09-20T08:56:42+02:00 [INFO] ------------------------------------------------------------------------ Setup for Jakarta With Spring Boot 3.x As Spring Boot 3 relies on Jakarta instead of Java EE specification, we need to adjust our Maven setup a little bit. The change is described in Upgrade to Spring Boot 3.0. This article is based on this. Basically, we just need to use jakarta classifier instead of jpa classifier. XML <dependencies> ... <dependency> <groupId>com.querydsl</groupId> <artifactId>querydsl-jpa</artifactId> <version>${querydsl.version}</version> <classifier>jakarta</classifier> </dependency> <dependency> <groupId>com.querydsl</groupId> <artifactId>querydsl-apt</artifactId> <version>${querydsl.version}</version> <classifier>jakarta</classifier> <scope>provided</scope> </dependency> ... </dependencies> The maven build output is the same as in the Java EE setup. Maven Classifiers Let's shed light on the Querydsl solution once we understand Maven classifier usage. Querydsl Solution Querydsl generates metadata Q classes for every entity in order to be able to write queries easily. The querydsl-apt dependency achieves it with an instance of JPAAnnotationProcessor (see e.g., Java Annotation Processing and Creating a Builder for more details on annotation processing). The exact implementation of the annotation processor depends on the technology used. The desired processor is defined in javax.annotation.processing.Processor file located in the used querydsl-apt dependency. The content of this file has to have the full classpath to the desired annotation processor, e.g. com.querydsl.apt.jpa.JPAAnnotationProcessor. Let's go back to the classifiers for a while. Querydsl supports several classifiers (e.g., JPA, jdo, roo, etc.), and each of them needs a different treatment based on the used technology. Therefore, Querydsl needs to specify the supported annotations for each technology. For JPA, Querydsl supports these classifiers: jpa classifier for the old Java EE (with javax.persistence package) and jakarta classifier for a new Jakarta EE (with jakarta.persistence package) as you already know. Purpose of Maven Classifier The purpose of the Maven classifier is explained on the official site as follows: The classifier distinguishes artifacts that were built from the same POM but differ in content. It is some optional and arbitrary string that — if present — is appended to the artifact name just after the version number. As a motivation for this element, consider for example a project that offers an artifact targeting Java 11 but at the same time also an artifact that still supports Java 1.8. The first artifact could be equipped with the classifier jdk11 and the second one with jdk8 such that clients can choose which one to use. Please check, e.g., this guide for more information about the Maven classifier usage. In our case, all the available classifiers for querydsl-apt dependency are depicted below. They can also be listed online here. Similarly, you can also see all the classifiers for querydsl-jpa dependency here. When com.querydsl.apt.jpa.JPAAnnotationProcessor class is de-compiled from querydsl-apt-5.0.0-jpa.jar and querydsl-apt-5.0.0-jakarta.jar dependencies then we can see the only difference (see depicted below) is in the used imports (see lines 5-11). As a result, the JPAAnnotationProcessor is capable of handling different annotations in our classes (see lines 16-20). Use of Maven Classifier All Maven classifiers supported by Querydsl are defined in descriptors element specified in pom.xml file (see lines 11-20) as: XML <plugin> <artifactId>maven-assembly-plugin</artifactId> <executions> <execution> <id>apt-jars</id> <goals> <goal>single</goal> </goals> <phase>package</phase> <configuration> <descriptors> <descriptor>src/main/general.xml</descriptor> <descriptor>src/main/hibernate.xml</descriptor> <descriptor>src/main/jdo.xml</descriptor> <descriptor>src/main/jpa.xml</descriptor> <descriptor>src/main/jakarta.xml</descriptor> <descriptor>src/main/morphia.xml</descriptor> <descriptor>src/main/roo.xml</descriptor> <descriptor>src/main/onejar.xml</descriptor> </descriptors> <outputDirectory>${project.build.directory}</outputDirectory> </configuration> </execution> </executions> </plugin> This configuration is used in order to build multiple JARs defined by descriptors (see above). Each descriptor defines all the specifics for the technology. Usually, the XML descriptor just specifies the source folder (see line 11 in jpa.xml). XML <assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd"> <id>jpa</id> <formats> <format>jar</format> </formats> <includeBaseDirectory>false</includeBaseDirectory> <fileSets> <fileSet> <directory>src/apt/jpa</directory> <outputDirectory>/</outputDirectory> </fileSet> <fileSet> <directory>${project.build.outputDirectory}</directory> <outputDirectory>/</outputDirectory> </fileSet> </fileSets> </assembly> However, the definition of Jakarta EE is a little bit more complicated. The key part in jakarta.xml is unpacking of JAR (see line 16) and using the jakarta classifier (see line 18) in order to activate the Eclipse Transformer Plugin. XML <assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd"> <id>jakarta</id> <formats> <format>jar</format> </formats> <includeBaseDirectory>false</includeBaseDirectory> <moduleSets> <moduleSet> <useAllReactorProjects>true</useAllReactorProjects> <includes> <include>${project.groupId}:${project.artifactId}</include> </includes> <binaries> <unpack>true</unpack> <includeDependencies>false</includeDependencies> <attachmentClassifier>jakarta</attachmentClassifier> <outputDirectory>/</outputDirectory> </binaries> </moduleSet> </moduleSets> <fileSets> <fileSet> <directory>src/apt/jpa</directory> <outputDirectory>/</outputDirectory> </fileSet> </fileSets> </assembly> Note: the value in the id element is used as the classifier, see here. Eclipse Transformer Plugin The last piece in the puzzle lies in the usage of the already-mentioned org.eclipse.transformer.maven plugin. Eclipse Transformer provides tools and runtime components that transform Java binaries, such as individual class files and complete JARs and WARs, mapping changes to Java packages, type names, and related resource names. The org.eclipse.transformer.maven plugin is defined on lines 171-187 in querydsl-apt dependency as: XML <plugin> <groupId>org.eclipse.transformer</groupId> <artifactId>org.eclipse.transformer.maven</artifactId> <version>0.2.0</version> <executions> <execution> <id>jakarta-ee</id> <goals> <goal>run</goal> </goals> <phase>package</phase> <configuration> <classifier>jakarta</classifier> </configuration> </execution> </executions> </plugin> Note: See this blog for more information about Eclipse Transformer plugin usage. Conclusion This article has covered Querydsl setups for Java EE and Jakarta EE. The rest explained the usage of the Maven classifier by Querydsl. The used source code (even though it wasn't a lot here) is available here. Disclaimer: The article is based on my investigation when I tried to figure out the solution. Please let me know of any inaccuracies or misleading information.

By Arnošt Havelka

CORE

JBang: How to Script With Java for Data Import From an API

It's right in the middle of the busy conference season, and I was prepping for an upcoming conference talk. As I often do, I went to Neo4j Aura to spin up a free database and use Cypher with APOC to import data from an API, but this API requires a header, and the APOC procedure that adds headers to a request is blocked by security in Aura. Hmm, I needed a new route. I decided to try JBang, which is a tool for scripting with Java. I had heard about it but hadn't tried it yet. It's pretty cool, so I wanted to share my onboarding. What Is JBang? Java developers have lamented the lack of a scripting language for Java for years. JBang solves this problem. I found an excellent overview of JBang from a post on InfoQ (Scripting Java with a jBang). JBang provides a way of running Java code as a script...[It] is a launcher script, written in bash and powershell, that can discover or download a JVM, and then (down)load the Java script given in an argument. The implementation of JBang is a Java JAR archive, which it then launches to execute further commands. JBang can run jsh or java files; the latter is a standard Java class with a main() method. However, unlike JShell, comments at the top of JBang allow dependencies to be automatically downloaded and set up on the classpath. JShell allows adding JARs to the classpath at launch, but any (recursive) dependencies have to be added manually. JBang seems like a nicer alternative to either using a full-fledged Java project or a Linux script. Let's get a bit more detail about the data API we will pull from before we dive into writing the script! Setup: Install/Download First, we need to install JBang from the download page. I had to find the download for my operating system and then choose an install type. Since I use SDKMan to manage my Java versions, I installed JBang with SDKMan, too. Shell sdk install jbang Several IDEs have plugins for JBang, as well, including IntelliJ. The IntelliJ plugin seems to have several nice features, including import suggestions. However, I had trouble utilizing it from an existing project or randomly created script, but I had to create a separate project initialized with JBang. I probably need to play with this a bit more since it would simplify the import problem (discussed in a bit). Anyway, I decided to mess with the plugin later and just use the command line for now. API Details I wanted to import data for traveling with pets, and the Yelp Fusion API was one that I knew I wanted to use. This was also the one that required a header on the request, which led me down the path toward JBang in the first place. The Yelp API has a really useful playground where I could test a few requests before I started writing the script. I also used the playground to verify syntax and get sample code for an API call in Java. Write the Script In the playground, you can choose the endpoint you want to hit, any parameters, as well as the language you want to use to make the request. I chose Java and the parameters I knew I needed, and it gave me the following code: Java OkHttpClient client = new OkHttpClient(); Request request = new Request.Builder() .url("https://api.yelp.com/v3/businesses/search?location=" + location + "&categories=" + category + "&attributes=dogs_allowed&limit=50&sort_by=distance") .get() .addHeader("accept", "application/json") .addHeader("Authorization", "Bearer " + yelpApiKey) .build(); Response response = client.newCall(request).execute(); Now, I tweaked the code a bit above to use placeholder variables for location, category, and yelpApiKey so that I could pass in arbitrary values later. The code sample from the playground auto-includes your API token, so I copied/pasted the block above into my JBang script, and then I needed to go back and add dependencies. This was where JBang was a little less convenient and where an IDE plugin might come in handy. I had to go to Maven Central and search for the dependencies I needed. There isn't an auto-import, which makes sense since we don't have a dependency manager like Maven or Spring that could potentially search dependencies for useful import suggestions. I also wanted to pull pet travel data from several (of the many) categories Yelp offers. Since there is a high request limit but a smaller result limit, I decided to hit the endpoint for each category independently to retrieve the maximum results for each category. I also wanted a parameter for the location so that I could pull data for different cities. Finally, I needed a file to output the results so that I wouldn't have to hit the API each time I might want to load the data. I added the following variables to the script: Java filename = "yelpApi.json"; String[] yelpCategories = {"active","arts","food","hotelstravel","nightlife","pets","restaurants","shopping"}; String location = "New%20York%20City"; Last but not least, I needed to create the JSON object to format and hold the results and then write that to the JSON file. Java try { JSONObject json = new JSONObject(); JSONArray jsonArray = new JSONArray(); String jsonData = ""; OkHttpClient client = new OkHttpClient().newBuilder().connectTimeout(20, TimeUnit.SECONDS).build(); for (String category : yelpCategories) { <API call> jsonData = response.body().string(); JSONObject obj = new JSONObject(jsonData); JSONArray array = obj.getJSONArray("businesses"); JSONObject place = new JSONObject(); int n = array.length(); for (int i = 0; i < n; ++i) { place = array.getJSONObject(i); if (!place.isEmpty()) { json.append(category, place); } } } FileWriter myWriter = new FileWriter(filename); myWriter.write(json.toString(4)); myWriter.close(); System.out.println("Successfully wrote to Yelp file."); } catch (IOException e) { e.printStackTrace(); } Following this, I needed a few more import statements. You might notice that I added a connect timeout to the request. This is because the servers for one of the APIs were sometimes a bit sluggish, and I decided to wrap the other API calls with the same timeout protection to prevent the script from hanging or erroring out. The full version of the code is available on GitHub. Running the Script To run, we can use the command `jbang` plus the name of the script file. So our command would look like the following: Shell jbang travelPetDataImport.java This will run the script and output the results to the file we specified. We can check the file to make sure the data was written as we expected. Wrap Up! I was really impressed and happy with the capabilities and simplicity of JBang! It provided a straightforward way to write a script using the same Java syntax I'm comfortable with, and it was easy to get started. Next time, I'd like to figure out the IDE plugin so that I can hopefully take advantage of import suggestions and other efficiencies available. I'm looking forward to using JBang more in the future! Resources GitHub repository: Accompanying code for this blog post Website: JBang Documentation: JBang Data: Yelp Fusion API

By Jennifer Reif

CORE

Mastering Java Persistence: Best Practices for Cloud-Native Applications and Modernization

In the ever-evolving landscape of software engineering, the database stands as a cornerstone for storing and managing an organization's critical data. From ancient caves and temples that symbolize the earliest forms of information storage to today's distributed databases, the need to persistently store and retrieve data has been a constant in human history. In modern applications, the significance of a well-managed database is indispensable, especially as we navigate the complexities of cloud-native architectures and application modernization. Why a Database? 1. State Management in Microservices and Stateless Applications In the era of microservices and stateless applications, the database plays a pivotal role in housing the state and is crucial for user information and stock management. Despite the move towards stateless designs, certain aspects of an application still require a persistent state, making the database an integral component. 2. Seizing Current Opportunities The database is not just a storage facility; it encapsulates the current opportunities vital for an organization's success. Whether it's customer data, transaction details, or real-time analytics, the database houses the pulse of the organization's present, providing insights and supporting decision-making processes. 3. Future-Proofing for Opportunities Ahead As organizations embrace technologies like Artificial Intelligence (AI) and Machine Learning (ML), the database becomes the bedrock for unlocking new opportunities. Future-proofing involves not only storing current data efficiently but also structuring the database to facilitate seamless integration with emerging technologies. The Challenges of Database Management Handling a database is not without its challenges. The complexity arises from various factors, including modeling, migration, and the constant evolution of products. 1. Modeling Complexity The initial modeling phase is crucial, often conducted when a product is in its infancy, or the organization lacks the maturity to perform optimally. The challenge lies in foreseeing the data requirements and relationships accurately. 2. Migration Complexity Unlike code refactoring on the application side, database migration introduces complexity that surpasses application migration. The need for structural changes, data transformations, and ensuring data integrity makes database migration a challenging endeavor. 3. Product Evolution Products evolve, and so do their data requirements. The challenge is to manage the evolutionary data effectively, ensuring that the database structure remains aligned with the changing needs of the application and the organization. Polyglot Persistence: Exploring Database Options In the contemporary software landscape, the concept of polyglot persistence comes into play, allowing organizations to choose databases that best suit their specific scenarios. This approach involves exploring relational databases, NoSQL databases, and NewSQL databases based on the application's unique needs. Integrating Database and Application: Bridging Paradigms One of the critical challenges in mastering Java Persistence lies in integrating the database with the application. This integration becomes complex due to the mismatch between programming paradigms in Java and database systems. Patterns for Integration Several design patterns aid in smoothing the integration process. Patterns like Driver, Active Record, Data Mapper, Repository, DAO (Data Access Object), and DTO (Data Transfer Object) provide blueprints for bridging the gap between the Java application and the database. Data-Oriented vs. Object-Oriented Programming While Java embraces object-oriented programming principles like inheritance, polymorphism, encapsulation, and types, the database world revolves around normalization, denormalization, and structural considerations. Bridging these paradigms requires a thoughtful approach. Principles of Database-Oriented Programming: Separating Code (Behavior) from Data Encourage a clean separation between business logic and data manipulation. Representing Data with Generic Data Structures Use generic structures to represent data, ensuring flexibility and adaptability. Treating Data as Immutable Embrace immutability to enhance data consistency and reliability. Separating Data Schema from Data Representation Decouple the database schema from the application's representation of data to facilitate changes without affecting the entire system. Principles of Object-Oriented Programming Expose Behavior and Hide Data Maintain a clear distinction between the functionality of objects and their underlying data. Abstraction Utilize abstraction to simplify complex systems and focus on essential features. Polymorphism Leverage polymorphism to create flexible and reusable code. Conclusion Mastering Java Persistence requires a holistic understanding of these principles, patterns, and paradigms. The journey involves selecting the proper database technologies and integrating them seamlessly with Java applications while ensuring adaptability to future changes. In this dynamic landscape, success stories, documentation, and a maturity model serve as guiding beacons, aiding developers and organizations in their pursuit of efficient and robust database management for cloud-native applications and modernization initiatives. Video and Slide Presentation Slides

By Otavio Santana

CORE

How To Convert ODF Files to PDF in Java

When the time comes to pick and download a suite of office applications, we can expect that the MS Office suite will dominate the conversation. Each application included in the MS Office package comes equipped with dozens of powerful, intuitive features, and all the OpenXML file formats – .DOCX, .XLSX, PPTX, etc. – can shoulder the burden of immense, data-intensive edits and developer customizations while retaining a relatively small file size through efficient compartmentalization and lossless compression. However, OpenXML files aren’t the only ones represented in XML and zipped into convenient lossless file containers. The Apache OpenOffice suite – which originated in 2006, right around the timeframe when OpenXML format became standard for MS Office files – also offers XML-based file structure with lossless Zip compression, and this option makes for a nice alternative to OpenXML files when cost considerations come into play. For the average business user, Open Office files offer mostly the same basic content-creation functionality as MS Office files, and the key tradeoffs are only typically noticeable when the myriad built-in connections and compatibility features MS Office provides (with its much broader suite of business applications) come into consideration. On the other hand, for the average developer building applications and workflows that automatically interact with and make changes to office documents in one shape or another, the XML-based file structure is all that really counts. It’s easy to leverage a standardized, existing set of developer tools to unzip and manipulate XML-based file content within the logic of any file-processing application. Just like a developer in an MS Office environment can, for example, use custom code or an API solution to unzip a DOCX file and extract copies of all the image files nested within the document, a developer in an OpenOffice environment can do essentially the same thing with an ODT file using similar knowledge of XML data structures. Ironically, perhaps, one of the most practical similarities between OpenOffice files and MS Office files is their inherent lack of presentability as a final product. When OpenOffice users wrap up their data analysis, content writing, or presentation projects, it’s unlikely anyone reviewing their finalized content will be opening an .ODT, .ODS, or .ODP file to do so – just like they wouldn’t view MS Office content in .DOCX, XLSX, or .PPTX format either. Chances are extremely high that they’ll open that finalized content in a vector or raster PDF iteration of those files – and there’s good reason for that, of course. Once converted to PDF, that content can’t be easily altered or stolen, and on top of that, the presentability of PDF – whether viewed in a browser client or within a PDF reader – far outclasses the noisy, work-in-progress look of content opened within its original processing application. Just like developers working in MS Office environments, it’s important that developers working in OpenOffice environments have access to quick, secure, and easy-to-use solutions for converting finalized OpenOffice content to PDF when users require it. To that end, I’ll demonstrate three free-to-use APIs designed to convert all major Open Document Format file types to PDF. Also, this tutorial will provide ready-to-run code examples further down the page to make the process of structuring API calls easy. Demonstration Each of the below API solutions will allow developers to convert .ODT, .ODS, and .ODP files to standard PDFs using minimal Java code examples in secure, in-memory requests with all document data released upon completion. Each request can be authorized with a single free-tier API key, and these will allow up to 800 API calls per month. To begin structuring our API request, we can start by installing the Java SDK. To install with Maven, let’s first add a reference to the repository in pom.xml: XML <repositories> <repository> <id>jitpack.io</id> <url>https://jitpack.io</url> </repository> </repositories> Next, let’s add a reference to the dependency in pom.xml: XML <dependencies> <dependency> <groupId>com.github.Cloudmersive</groupId> <artifactId>Cloudmersive.APIClient.Java</artifactId> <version>v4.25</version> </dependency> </dependencies> With SDK installation complete, we can use the below code to convert our ODT files to PDF: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.ConvertDocumentApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); ConvertDocumentApi apiInstance = new ConvertDocumentApi(); File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on. try { byte[] result = apiInstance.convertDocumentOdtToPdf(inputFile); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling ConvertDocumentApi#convertDocumentOdtToPdf"); e.printStackTrace(); } And we can use the below code to convert our ODS files to PDF: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.ConvertDocumentApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); ConvertDocumentApi apiInstance = new ConvertDocumentApi(); File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on. try { byte[] result = apiInstance.convertDocumentOdsToPdf(inputFile); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling ConvertDocumentApi#convertDocumentOdsToPdf"); e.printStackTrace(); } Lastly, we can use the below code to convert our ODP presentations to PDF: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.ConvertDocumentApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); ConvertDocumentApi apiInstance = new ConvertDocumentApi(); File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on. try { byte[] result = apiInstance.convertDocumentOdpToPdf(inputFile); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling ConvertDocumentApi#convertDocumentOdpToPdf"); e.printStackTrace(); } We’ll receive the encoding for our new PDF file in the API response, and we can go ahead and create our new file with that encoding.

By Brian O'Neill

CORE

Java 21 Is a Major Step for Java: Non-Blocking IO and Upgraded ZGC

It looks like Java 21 is going to pose a strong challenge to Node JS! There are two massive performance enhancements in Java 21, and they address two of Java's often-criticized areas: Threads and blocking IO (somewhat fair criticism) and GC (relatively unfair criticism?) Major highlights of Java 21: Project Loom and virtual threads ZGC (upgraded) 1. Virtual Threads For the longest time, we have looked at non-blocking IO, async operations, and then Promises and Async/Await for orchestrating the async operations. So, we have had to deal with callbacks, and do things like Promises.all() or CompletableFuture.thenCompose() to join several async operations and process the results. More recently, Reactive frameworks have come into the picture to "compose" tasks as functional pipelines and then run them on thread pools or executors. The reactive functional programming is much better than "callback hell", and thus, we were forced to move to a functional programming model so that non-blocking/async can be done in an elegant way. Virtual Threads are bringing an end to callbacks and promises. Java team has been successful in providing an almost-drop-in-replacement for Threads with dirt-cheap Virtual Threads. So, even if you do the old Thread.sleep(5000) the virtual thread will detach instead of blocking. In terms of numbers, a regular laptop can do 2000 to 5000 threads whereas the same machine can do 1 Million + virtual threads. In fact, the official recommendation is to avoid the pooling of virtual threads. Every task is recommended to be run on a new virtual thread. Virtual threads support everything - sleep, wait, ThreadLocal, Locks, etc. Virtual Threads allow us to just write regular old iterative and "seemingly blocking" code, and let Java detach/attach real threads so that it becomes non-blocking and high-performance. However, we still need to wait for Library/Framework implementers like Apache Tomcat and Spring to move everything to Virtual Threads from native Threads. Once the frameworks complete the transition, all Java microservices/monoliths that use these upgraded frameworks will become non-blocking automatically. Take the example of some of the thread pools we encounter in our applications - Apache Tomcat NIO has 25 - 50 worker threads. Imagine NIO can have 50,000 virtual threads. Apache Camel listener usually has 10-20 threads. Imagine Camel can have 1000-2000 virtual threads. Of course, there are no more thread pools with virtual threads - so, they will just have unlimited 1000s of threads. This just about puts a full stop to "thread starvation" in Java. Just by upgrading to Frameworks / Libraries that fully take advantage of Java 21, all our Java microservices will become non-blocking simply with existing code. (Caveat: some operations like synchronized will block virtual threads also. However, if we replace them with virtual-threads-supported alternatives like Lock.lock(), then virtual threads will be able to detach and do other tasks till the lock is acquired. For this, a little bit of code change is needed from Library authors, and also in some cases in project code bases to get the benefits of virtual threads). 2. ZGC ZGC now supports Terabyte-size Java Heaps with permanent sub-millisecond pauses. No important caveats... it may use say 5-10% more memory or 5-10% slower allocation speed, but no more stop-the-world GC pauses and no more heap size limits. Together these two performance improvements are going to strengthen Java's position among programming languages. It may pause the rising dominance of Node JS and to some extent Reactive programming. Reactive/Functional programming may still be good for code-readability and for managing heavily event-driven applications, but we don't need reactive programming to do non-blocking IO in Java anymore.

By Theodore Ravindranath

Improving Unit Test Maintainability

When doing unit tests, you have probably found yourself in the situation of having to create objects over and over again. To do this, you must call the class constructor with the corresponding parameters. So far, nothing unusual, but most probably, there have been times when the values of some of these fields were irrelevant for testing or when you had to create nested "dummy" objects simply because they were mandatory in the constructor. All this has probably generated some frustration at some point and made you question whether you were doing it right or not; if that is really the way to do unit tests, then it would not be worth the effort. That is to say, typically, a test must have a clear objective. Therefore, it is expected that within the SUT (system under test) there are fields that really are the object of the test and, on the other hand, others are irrelevant. Let's take an example. Let's suppose that we have the class "Person" with the fields Name, Email, and Age. On the other hand, we want to do the unit tests of a service that, receiving a Person object, tells us if this one can travel for free by bus or not. We know that this calculation only depends on the age. Children under 14 years old travel for free. Therefore, in this case, the Name and Email fields are irrelevant. In this example, creating Person objects would not involve too much effort, but let's suppose that the fields of the Person class grow or nested objects start appearing: Address, Relatives (List of People), Phone List, etc. Now, there are several issues to consider: It is more laborious to create the objects. What happens when the constructor or the fields of the class change? When there are lists of objects, how many objects should I create? What values should I assign to the fields that do not influence the test? Is it good if the values are always the same, without any variability? Two well-known design patterns are usually used to solve this situation: Object Mother and Builder. In both cases, the idea is to have "helpers" that facilitate the creation of objects with the characteristics we need. Both approaches are widespread, are adequate, and favor the maintainability of the tests. However, they still do not resolve some issues: When changing the constructors, the code will stop compiling even if they are fields that do not affect the tests. When new fields appear, we must update the code that generates the objects for testing. Generating nested objects is still laborious. Mandatory and unused fields are hard coded and assigned by default, so the tests have no variability. One of the Java libraries that can solve these problems is "EasyRandom." Next, we will see details of its operation. What is EasyRandom? EasyRandom is a Java library that facilitates the generation of random data for unit and integration testing. The idea behind EasyRandom is to provide a simple way to create objects with random values that can be used in tests. Instead of manually defining values for each class attribute in each test, EasyRandom automates this process, automatically generating random data for each attribute. This library handles primitive data types, custom classes, collections, and other types of objects. It can also be configured to respect specific rules and data generation restrictions, making it quite flexible. Here is a basic example of how EasyRandom can be used to generate a random object: Java public class EasyRandomExample { public static void main(String[] args) { EasyRandom easyRandom = new EasyRandom(); Person randomPerson = easyRandom.nextObject(Person.class); System.out.println(randomPerson); } } In this example, Person is a dummy class, and easyRandom.nextObject(Person.class) generates an instance of Person with random values for its attributes. As can be seen, the generation of these objects does not depend on the class constructor, so the test code will continue to compile, even if there are changes in the SUT. This would solve one of the biggest problems in maintaining an automatic test suite. Why Is It Interesting? Using the EasyRandom library for testing your applications has several advantages: Simplified random data generation: It automates generating random data for your objects, saving you from writing repetitive code for each test. Facilitates unit and integration testing: By automatically generating test objects, you can focus on testing the code's behavior instead of worrying about manually creating test data. Data customization: Although it generates random data by default, EasyRandom also allows you to customize certain fields or attributes if necessary, allowing you to adjust the generation according to your needs. Reduced human error: Manual generation of test data can lead to errors, especially when dealing with many fields and combinations. EasyRandom helps minimize human errors by generating consistent random data. Simplified maintenance: If your class requirements change (new fields, types, etc.), you do not need to manually update your test data, as EasyRandom will generate them automatically. Improved readability: Using EasyRandom makes your tests cleaner and more readable since you do not need to define test values explicitly in each case. Faster test development: By reducing the time spent creating test objects, you can develop tests faster and more effectively. Ease of use: Adding this library to our Java projects is practically immediate, and it is extremely easy to use. Where Can You Apply It? This library will allow us to simplify the creation of objects for our unit tests, but it can also be of great help when we need to generate a set of test data. This can be achieved by using the DTOs of our application and generating random objects to later dump them into a database or file. Where it is not recommended: this library may not be worthwhile in projects where object generation is not complex or where we need precise control over all the fields of the objects involved in the test. How To Use EasyRandom Let's see EasyRandom in action with a real example, environment used, and prerequisites. Prerequisites Java 8+ Maven or Gradle Initial Setup Inside our project, we must add a new dependency. The pom.xml file would look like this: XML <dependency> <groupId>org.jeasy</groupId> <artifactId>easy-random-core</artifactId> <version>5.0.0</version> </dependency> Basic Use Case The most basic use case has already been seen before. In this example, values are assigned to the fields of the person class in a completely random way. Obviously, when testing, we will need to have control over some specific fields. Let's see this as an example. Recall that EasyRandom can also be used with primitive types. Therefore, our example could look like this. Java public class PersonServiceTest { private final EasyRandom easyRandom = new EasyRandom(); private final PersonService personService = new PersonService(); @Test public void testIsAdult() { Person adultPerson = easyRandom.nextObject(Person.class); adultPerson.setAge(18 + easyRandom.nextInt(80)); assertTrue(personService.isAdult(adultPerson)); } @Test public void testIsNotAdult() { Person minorPerson = easyRandom.nextObject(Person.class); minorPerson.setAge(easyRandom.nextInt(17)); assertFalse(personService.isAdult(minorPerson)); } } As we can see, this way of generating test objects protects us from changes in the "Person" class and allows us to focus only on the field we are interested in. We can also use this library to generate lists of random objects. Java @Test void generateObjectsList() { EasyRandom generator = new EasyRandom(); //Generamos una lista de 5 Personas List<Person> persons = generator.objects(Person.class, 5) .collect(Collectors.toList()); assertEquals(5, persons.size()); } This test, in itself, is not very useful. It is simply to demonstrate the ability to generate lists, which could be used to dump data into a database. Generation of Parameterized Data Let's see now how to use this library to have more precise control in generating the object itself. This can be done by parameterization. Set the value of a field. Let's imagine the case that for our tests, we want to keep certain values constant (an ID, a name, an address, etc.) To achieve this, we would have to configure the initialization of objects using "EasyRandomParameters" and locate the parameters by their name. Let's see how: Java EasyRandomParameters params = new EasyRandomParameters(); // Asignar un valor al campo por medio de una función lamba params.randomize(named("age"),()-> 5); EasyRandom easyRandom = new EasyRandom(params); // El objeto tendrá siempre una edad de 5 Person person = easyRandom.nextObject(Person.class); Of course, the same could be done with collections or complex objects. Let's suppose that our class Person, contains an Address class inside and that, in addition, we want to generate a list of two persons. Let's see a more complete example: Java EasyRandomParameters parameters = new EasyRandomParameters() .randomize(Address.class, () -> new Address("Random St.", "Random City")) EasyRandom easyRandom = new EasyRandom(parameters); return Arrays.asList( easyRandom.nextObject(Person.class), easyRandom.nextObject(Person.class) ); Suppose now that a person can have several addresses. This would mean the "Address" field will be a list inside the "Person" class. With this library, we can also make our collections have a variable size. This is something that we can also do using parameters. Java EasyRandomParameters parameters = new EasyRandomParameters() .randomize(Address.class, () -> new Address("Random St.", "Random City")) .collectionSizeRange(2, 10); EasyRandom easyRandom = new EasyRandom(parameters); // El objeto tendrá una lista de entre 2 y 10 direcciones Person person = easyRandom.nextObject(Person.class); Setting Pseudo-Random Fields As we have seen, setting values is quite simple and straightforward. But what if we want to control the randomness of the data? We want to generate random names of people, but still names and not just strings of unconnected characters. This same need is perhaps clearer when we are interested in having randomness in fields such as email, phone number, ID number, card number, city name, etc. For this purpose, it is useful to use other data generation libraries. One of the best-known is Faker. Combining both libraries, we could get a code like this: Java EasyRandomParameters params = new EasyRandomParameters(); //Generar número entre 0 y 17 params.randomize(named("age"), () -> Faker.instance().number().numberBetween(0, 17)); // Generar nombre "reales" aleatorios params.randomize(named("name"), () -> Faker.instance().name().fullName()); EasyRandom easyRandom = new EasyRandom(params); Person person = easyRandom.nextObject(Person.class); There are a multitude of parameters that allow us to control the generation of objects. Closing EasyRandom is a library that should be part of your backpack if you develop unit tests, as it helps maintain unit tests. In addition, and although it may seem strange, establishing some controlled randomness in tests may not be a bad thing. In a way, it is a way to generate new test cases automatically and will increase the probability of finding bugs in code.

By Francisco Moreno

CORE

Mastering Backpressure in Java: Concepts, Real-World Examples, and Implementation

Backpressure is a critical concept in software development, particularly when working with data streams. It refers to the control mechanism that maintains the balance between data production and consumption rates. This article will explore the notion of backpressure, its importance, real-world examples, and how to implement it using Java code. Understanding Backpressure Backpressure is a technique employed in systems involving data streaming where the data production rate may surpass the consumption rate. This imbalance can lead to data loss or system crashes due to resource exhaustion. Backpressure allows the consumer to signal the producer when it's ready for more data, preventing the consumer from being overwhelmed. The Importance of Backpressure In systems without backpressure management, consumers may struggle to handle the influx of data, leading to slow processing, memory issues, and even crashes. By implementing backpressure, developers can ensure that their applications remain stable, responsive, and efficient under heavy loads. Real-World Examples Video Streaming Services Platforms like Netflix, YouTube, and Hulu utilize backpressure to deliver high-quality video content while ensuring the user's device and network can handle the incoming data stream. Adaptive Bitrate Streaming (ABS) dynamically adjusts the video stream quality based on the user's network conditions and device capabilities, mitigating potential issues due to overwhelming data. Traffic Management Backpressure is analogous to traffic management on a highway. If too many cars enter the highway at once, congestion occurs, leading to slower speeds and increased travel times. Traffic signals or ramp meters can be used to control the flow of vehicles onto the highway, reducing congestion and maintaining optimal speeds. Implementing Backpressure in Java Java provides a built-in mechanism for handling backpressure through the Flow API, introduced in Java 9. The Flow API supports the Reactive Streams specification, allowing developers to create systems that can handle backpressure effectively. Here's an example of a simple producer-consumer system using Java's Flow API: Java import java.util.concurrent.*; import java.util.concurrent.Flow.*; public class BackpressureExample { public static void main(String[] args) throws InterruptedException { // Create a custom publisher CustomPublisher<Integer> publisher = new CustomPublisher<>(); // Create a subscriber and register it with the publisher Subscriber<Integer> subscriber = new Subscriber<>() { private Subscription subscription; private ExecutorService executorService = Executors.newFixedThreadPool(4); @Override public void onSubscribe(Subscription subscription) { this.subscription = subscription; subscription.request(1); } @Override public void onNext(Integer item) { System.out.println("Received: " + item); executorService.submit(() -> { try { Thread.sleep(1000); // Simulate slow processing System.out.println("Processed: " + item); } catch (InterruptedException e) { e.printStackTrace(); } subscription.request(1); }); } @Override public void onError(Throwable throwable) { System.err.println("Error: " + throwable.getMessage()); executorService.shutdown(); } @Override public void onComplete() { System.out.println("Completed"); executorService.shutdown(); } }; publisher.subscribe(subscriber); // Publish items for (int i = 1; i <= 10; i++) { publisher.publish(i); } // Wait for subscriber to finish processing and close the publisher Thread.sleep(15000); publisher.close(); } } Java class CustomPublisher<T> implements Publisher<T> { private final SubmissionPublisher<T> submissionPublisher; public CustomPublisher() { this.submissionPublisher = new SubmissionPublisher<>(); } @Override public void subscribe(Subscriber<? super T> subscriber) { submissionPublisher.subscribe(subscriber); } public void publish(T item) { submissionPublisher.submit(item); } public void close() { submissionPublisher.close(); } } In this example, we create a CustomPublisher class that wraps the built-in SubmissionPublisher. The CustomPublisher can be further customized to generate data based on specific business logic or external sources. The Subscriber implementation has been modified to process the received items in parallel using an ExecutorService. This allows the subscriber to handle higher volumes of data more efficiently. Note that the onComplete() method now shuts down the executorService to ensure proper cleanup. Error handling is also improved in the onError() method. In this case, if an error occurs, the executorService is shut down to release resources. Conclusion Backpressure is a vital concept for managing data streaming systems, ensuring that consumers can handle incoming data without being overwhelmed. By understanding and implementing backpressure techniques, developers can create more stable, efficient, and reliable applications. Java's Flow API provides an excellent foundation for building backpressure-aware systems, allowing developers to harness the full potential of reactive programming.

By Arun Pandey

Java

DZone's Featured Java Resources

Top Java Experts

The Latest Java Topics