2024 DevOps Lifecycle: Share your expertise on CI/CD, deployment metrics, tech debt, and more for our Feb. Trend Report (+ enter a raffle!).
Kubernetes in the Enterprise: Join our Virtual Roundtable as we dive into Kubernetes over the past year, core usages, and emerging trends.
In our Culture and Methodologies category, dive into Agile, career development, team management, and methodologies such as Waterfall, Lean, and Kanban. Whether you're looking for tips on how to integrate Scrum theory into your team's Agile practices or you need help prepping for your next interview, our resources can help set you up for success.
The Agile methodology is a project management approach that breaks larger projects into several phases. It is a process of planning, executing, and evaluating with stakeholders. Our resources provide information on processes and tools, documentation, customer collaboration, and adjustments to make when planning meetings.
There are several paths to starting a career in software development, including the more non-traditional routes that are now more accessible than ever. Whether you're interested in front-end, back-end, or full-stack development, we offer more than 10,000 resources that can help you grow your current career or *develop* a new one.
Agile, Waterfall, and Lean are just a few of the project-centric methodologies for software development that you'll find in this Zone. Whether your team is focused on goals like achieving greater speed, having well-defined project scopes, or using fewer resources, the approach you adopt will offer clear guidelines to help structure your team's work. In this Zone, you'll find resources on user stories, implementation examples, and more to help you decide which methodology is the best fit and apply it in your development practices.
Development team management involves a combination of technical leadership, project management, and the ability to grow and nurture a team. These skills have never been more important, especially with the rise of remote work both across industries and around the world. The ability to delegate decision-making is key to team engagement. Review our inventory of tutorials, interviews, and first-hand accounts of improving the team dynamic.
Kubernetes in the Enterprise
In 2022, Kubernetes has become a central component for containerized applications. And it is nowhere near its peak. In fact, based on our research, 94 percent of survey respondents believe that Kubernetes will be a bigger part of their system design over the next two to three years. With the expectations of Kubernetes becoming more entrenched into systems, what do the adoption and deployment methods look like compared to previous years?DZone's Kubernetes in the Enterprise Trend Report provides insights into how developers are leveraging Kubernetes in their organizations. It focuses on the evolution of Kubernetes beyond container orchestration, advancements in Kubernetes observability, Kubernetes in AI and ML, and more. Our goal for this Trend Report is to help inspire developers to leverage Kubernetes in their own organizations.
DZone's Annual DevOps Research — Join Us! [survey + raffle]
Will technology, automation, or AI take our jobs? This is something we hear a lot at the moment - but it’s always been this way. Right back at the start of my career, I automated something that was canned the moment I left as folks were afraid of losing their job. Some background to why I am writing this post. I was reading a tweet today from Michelle Bakels that was a cautionary tale from her enterprise days. I’m not going to dump the entire text here - go read the tweet as it’s fascinating, but the TL;DR is folks are scared of telling IT what they do in case it is automated, and they are out of a job. Will Technology Replace? This is something that is constantly talked about in the tech world, though it feels like it has really come to the surface recently with the rise of AI. Will technology replace us? Will AI take our jobs (one of the catalysts of the recent SAG-AFTRA strikes in the TV and movie industry)? Will self-driving cars replace taxis? Will cars replace horses? Will movable type replace handwritten books? The list goes on. Rather than dive into this discussion, I thought it would be fun to tell a story of my own - of how I automated something during an internship and how it got killed the minute I left, as folks were worried it would replace their jobs. Jim Gets an Internship at Marconi This was just after I graduated in 1999 — I started working on a summer internship at Marconi (playing the mamba, listening to the radio) in their underwater weapons division. This was probably the start of my route toward being a pacifist and hating all weapons and the military-industrial complex. My job was to analyze torpedo data. Marconi built these $1M Spearfish torpedoes that didn’t work very well, and they would do trials on them in various parts of the world (Faslane in Scotland, deep water testing in the Caribbean). The testing torpedoes would record data about their decisions, as well as telemetry such as depth, speed, location, etc. They would also record radar data and what they identified in the way of ships. The goal was to drop a torpedo in the water and send it to a ship, then gather the data afterward to see if it did the right thing. It is not quite testing in production as the torpedoes had no explosives, but it is the closest you can get. The Reporting Process Once the trials were over, someone had to analyze the results, and this was the team I was on. The process was horribly manual: Remotely log into a mainframe and request a printout of the data from the torpedo trial as a hex dump on green and white continuous feed paper. Work through the hex dump using highlighters to ‘read’ what was happening. Hex codes marked the steps in the output, and each one had many bytes of information. For example, one hex code was for direction, and you knew that meant the next X bytes were the direction information, and the next character was the next set of data. Very laborious and error-prone Enter all this data into an Excel spreadsheet showing each step Extract location and direction data and put this into another page on the spreadsheet Generate a chart to show the torpedo’s movements Create a Word document with all the steps that the torpedo took, along with the charts. Actually, do the analysis and look for issues for the software teams to investigate (There are probably more steps that I have forgotten as it was 24 years ago) Apart from the last step, which actually used your brain, the rest of the process was repetitive and dull. It could easily take a week or two to get a report ready to be able to do the final analysis step. And there was a whole team doing this, most of them ex-Ministry of Defense folks who just wanted to get their pension. In Comes Jim With an Idea! So I had an idea! I was an engineer, after all. I could automate all the things, I’d be the hero, save the company money, and everyone would love me. So, I set about doing this whilst I was working on a report. Automate All the Things Firstly, I had to automate getting the data — can’t do this from a printer. Luckily, the mainframe that I logged into to print could just dump the data onto the screen. How could I get this in a way I could do things with it? Excel! So, I wrote some good old VBA in Excel to connect over SSH to the mainframe and download all the data. I had one control sheet that you enter the trial number into, a button to click, and away it would go, downloading the data and filling out a second sheet. Next was replacing the highlighter pen! Easy to do in Excel: work through each cell to get the action for each step, highlight the cells as if I were using a highlighter pen and paper, and extract the data for each action. Once I had the actions, it was easy to convert from the hex codes to the steps information that was wanted. Same with locations — easy to extract and plot the chart. Finally, it was time to create a Word doc — and VBA can do this nicely. The Result All in all, I spent about four weeks on this and was able to generate a report ready for analysis in about 10 minutes, as opposed to 1-2 weeks. I packaged this as an add-in, so anyone could use it — just add the add-in, enter the trial number, and away it went, spat out the doc. I even gave it a cool name: Spearfish Trials Analysis Report Writers Automated Reporting Suite - or STAR WARS for short (this was a few months after Episode 1 came out, after all). I was impressed, and I could save everyone time! The Reception Everyone hated it. Simple as that. I was told there was no way it could be better than a human, and one of the top report writers challenged me to produce a better report. In 10 minutes, I had one. In four days, she had her version, and she’d made some mistakes — which made her hate it (and me) even more. After I left, it was deleted, and everyone forgot about its existence. It took me a while to figure out why — everyone was afraid. This one spreadsheet could replace almost the entire team of 7 people. You’d just need this spreadsheet and one person to do the analysis, as the bulk of the job was manually processing reports. I could have put six people out of work in just four weeks. Regardless of how you feel about waste in enterprises, this was six humans with families who had houses, mortgages, rent, and bills, who needed to eat and to live. The lives of six families could have been destroyed with one spreadsheet. Conclusion Yes, automation is coming. Yes, technology can solve so many things. Yes, in theory, if a job is replaced by automation, then folks can focus their knowledge and energy on solving bigger problems. But the reality is business is run by people who want to make as much money as possible, so automation is a way they can cut jobs and boost shareholder value, whatever that is (I’m guessing making rich folks richer). This is not always the case, and this is a very cynical and dystopian view, but in most cases, this is the truth. Even in the high-paying tech world of multi-trillion dollar companies, folks are dropped in an instant to boost share prices. So, it’s not surprising folks are worried about computers taking their jobs. If we are in the throes of an AI revolution, then maybe we need to really push the conversations around universal basic income and tax large corporations and billionaires to pay for it.
Site reliability engineering is a new practice that has been growing in popularity among many businesses. Also known as SRE, the new activity puts a premium on monitoring, tracking bugs, and creating systems and automation that solve the problem in the long term. Nowadays, most companies get fond of deploying band-aid solutions that often leave them with flawed systems that easily fall apart when bugs arise. SRE practice fixes that by putting a premium on proactively monitoring problems and creating long-term solutions. As more companies adopt SRE, they change the way IT departments operate. What Is IT Ops? Information technology operations (IT Ops) is the discipline of overseeing the management of information technology infrastructure and the lifecycle of applications. IT Ops focuses on ensuring that the company's IT infrastructure is healthy, secure, and scalable. IT Ops is a broad term that encompasses a variety of departments, each contributing to the overall success of IT operations. SRE vs. DevOps With regards to SRE vs. DevOps, it helps to think of one as the goal and the other as the means of getting to that goal. DevOps intends to bridge development and operations into one. Site reliability engineering makes that intention a possibility. So, DevOps is the goal and SRE is the method from a bird’s eye point of view. DevOps talks about what needs to get done to align the objectives and activities of development and operations. SRE answers the question “How do we make that happen?” Here are some ways that SRE positively impacts a business’ operations. 1. Software-First Approach Any company maintaining an SRE team will often hear them talking about automating processes with software. At the heart of site reliability engineering is the goal of automating processes that solve issues once and for all. Most misconceptions around SRE are that its goal is to spot the leaks and patch them up. But SRE is more about creating a system that automatically changes the pipe when leaks happen. Much of SRE is about developing software and systems that automate incident management. This automation-first mindset puts a premium on system builders in IT and teaches the whole company to adapt to the same school of thought in everything we do. Why stick with manual tasks when you can automate them? 2. Focus on SLOs and Error Budget One of the priorities of an SRE team is to determine a service-level objective or a bare minimum goal of availability. The SLO is the minimum requirement a team must need in terms of the availability of a system or software to users. The next thing they would then do is set an error budget, which indicates the margin of error allowed for a system. What this means is that SRE gives importance to commitment when it comes to providing exceptional customer experience. Even the way SRE teams approach bug tracking should have a user experience approach. This, among many other SRE practices, helps bridge the gap between how people use systems and how developers can design them to meet minimum standards of excellence. 3. Proactive Stability Assurance What makes a great site reliability engineer is one’s ability to be proactive. Given that 93% of SREs correlate their work with “monitoring and alerting,” critical problem-solving skills are a must. With that available skillset in IT operations, it affects the whole department and even the whole company, pushing for a solution-oriented culture as a whole. A proactive culture brings greater stability assurance to systems and operations. 4. Dev and Ops Collaboration For site reliability management to be effective, collaboration and alignment must happen. This is probably why 81% of SREs do most of their work in the office. While incidences of work-from-home setups amongst SREs have increased over the years, the point is that SRE practices revolve around collaboration. The SRE culture advocates for business objective alignment and monitoring using service level agreements (SLAs) and metrics that help us understand performance and error management. The main job description of SRE teams is to spot errors in systems, find the root problems, and resolve them. By seeking to maintain a healthy system in collaboration with all players and departments, an SRE or SRE team encourages hand-in-hand work and somehow “forces” us to band together to solve system issues. 5. Commoditizing Efficiency and SRE Solutions SRE roles and responsibilities can be quite extensive and, thus, expensive, especially for smaller organizations. The cost of having your incident management system, for instance, can be astronomical, which might be justified if you’re a company like Facebook or Google. But what if you’re a tech startup or a small to medium tech company? In response to the need to commoditize more efficient practices, there has been an increase in the incident management system market over the years. Adopting the SRE Model Technology is forever changing the way companies operate, and many of the activities that businesses jump into start to become more digitized. SRE is allowing all people from various practices, both tech and non-tech-related, to take a software development approach to everything. As teams deploy an SRE maturity model, SRE principles, practices, and skills into the mix, it revolutionizes the way we approach problems and come up with solutions. Here’s how a team might take on an SRE model or approach in their company. Define a frameworkThe first step to deploying an SRE model is defining the framework. Decide on the parameters, tools, and culture that your department or team might take on and resolve to use those systems put in place. Hire skilled engineersThere’s a debate as to whether SRE teams need developers who are great at operations or operations people who are great at development. Albeit the chicken and egg banter, what matters is that SRE teams must have people who have an understanding of both the engineering and system application and operation side of the game. Implement tools and technologiesSRE teams use every available tool, including open source projects for SRE to bring greater stability to a company’s systems. A company will also need an incident management system put in place. With good SRE and Incident Management tools, smaller companies can work on incidents even with on-call or part-time SREs to come in only when necessary, thus improving engineering delivery considerably, making faster recovery, and reducing SLO breaches. Update processesWith the way that problems adapt, solution-makers need to adapt too. SRE is built on the principle of adaptability — being able to shift, pivot, and change when times change. As the old cliche goes, the only constant in this world is change. And in the uncertain, ambiguous, and volatile nature of the world that we live in where things that could go wrong will most likely go wrong (as Murphy’s law states), adaptability in a team or organization can be extremely helpful.One aspect that helps SRE teams pivot much easier is having the right IT management software tools to better monitor, analyze, and implement solutions to fix incidents, bugs, and problems at the operational level. Equipping an SRE or SRE team makes it much easier to create solutions to prevalent problems. Change the culture to support the modelAt the heart of SRE is not a system or software, but a culture. That culture highlights three non-negotiables: proactivity, solution-focus, and user experience. A department dedicated to DevOps and SRE, and the whole company, for that matter, should support that model. Conclusion To remain competitive in the evolving landscape, organizations are encouraged to explore and implement the SRE model. Embracing the SRE model is not just a technological shift but a cultural one, emphasizing proactivity, solution focus, and user experience.
TL; DR: How to Spot Successful Scrum Masters In this article, I unravel the secrets of what makes a Scrum Master not just good but amazingly outstanding. From regularly achieving Sprint Goals, delivering value to customers, and building stakeholder rapport easily, discover the traits that set apart successful Scrum Masters. Moreover, we also shed light on the pitfalls to avoid if you want to keep the respect of your teammates and probably your job. The Successful Scrum Masters Evaluating a Scrum Master’s effectiveness involves several major indicators: Sprint Goal Achievement Consistency: Assessing the Scrum Master’s impact can be done by observing how consistently the team accomplishes the Sprint Goals, contributing to the overarching Product Goal. As set for each Sprint, regularly achieving these goals suggests the Scrum Master’s effectiveness in creating an environment where Product Backlog management and Sprint Planning work well, including all team members. This consistency in reaching Sprint Goals, in line with the Scrum Guide’s emphasis, is a crucial indicator of the team’s progress and the Scrum Master’s adeptness in guiding them along the path to fulfilling the current Product Goal. Quality of Increments: Evaluating the quality and consistency of the Increments produced by the team is another vital indicator of a Scrum Master’s effectiveness. In Scrum, creating high-quality increments that meet the Definition of Done and customer expectations reflects the team’s collective effort and the Scrum Master’s success in promoting an environment aligned with Agile principles. This approach involves ensuring fewer defects, positive user feedback, and alignment with customer requirements, indicating that the team is collaboratively delivering working software effectively and consistently. Stakeholder Feedback: Collecting feedback from stakeholders outside the Scrum team, such as users, customers, and internal business partners, offers valuable insights into the Scrum Master’s performance. Positive stakeholder feedback regarding effective communication, educational efforts, progress transparency, and satisfaction with outcomes reflects the Scrum Master’s successful facilitation of Agile principles. This emphasis on stakeholder collaboration demonstrates the Scrum Master’s role in ensuring that the team’s work aligns with broader organizational goals and external user needs. Measuring Value Creation: In recognizing the contributions of Scrum Masters, a comprehensive approach would be to evaluate their impact on key outcome-based metrics that focus on creating value for customers and contributing to the organization’s bottom line. This approach includes assessing customer satisfaction rates and determining how effectively the Scrum Master guides the team to meet customer needs and expectations. Also important is measuring business impact and ROI, reflecting the Scrum Master’s role in supporting the Product Owner’s efforts to align team efforts with strategic business goals. User engagement metrics provide insights into the usability and appeal of the product. In contrast, the quality of product/service metrics like defect rates indicate the team’s commitment to excellence under the Scrum Master’s leadership. Finally, the Scrum Master’s encouragement of innovation and continuous improvement within the team ensures ongoing value creation, making these metrics integral to understanding their genuine contribution to the organization. Impediment Resolution and Innovation: Evaluating a Scrum Master’s effectiveness in resolving impediments and fostering innovation and process improvements is also crucial. Their ability to promptly and effectively address obstacles, be they technical, process-related, or interpersonal, is essential for sustaining team momentum. Simultaneously, their success in encouraging the team to adopt innovative practices and continuously refine processes reflects adherence to the Agile Manifesto’s principle of continuous improvement. This dual focus ensures the Scrum Master contributes significantly to the team’s overall success and adaptability. Improvements: Evaluating the tangible improvements made from action items identified in Retrospectives is another indicator of a Scrum Master’s effectiveness. This part involves looking at how well the team implements changes and improvements in their processes and the outcomes of these changes. Continuous improvement is a core Agile principle, and effective Retrospectives are crucial for this process. Team Health: Regular team surveys can provide insights into the Scrum Master’s influence on team morale and collaboration. Questions can focus — among other areas — on the team’s perception of the Scrum Master’s leadership, communication skills, and effectiveness in removing impediments. High levels of team satisfaction often correlate with successfully living Scrum principles, aligning with the Agile Manifesto’s emphasis on motivated individuals. Attributes of Successful Scrum Masters Now, let’s shift our focus and delve into the attributes that distinguish a successful Scrum Master: Facilitation Skills: Successful Scrum Masters are adept at facilitating team meetings and Scrum events, ensuring they are focused and productive. They create an environment where team members feel comfortable sharing ideas and concerns, fostering open communication. This skill is essential in realizing the Agile Manifesto’s value of “Individuals and interactions over processes and tools.” They guide the team in defining objectives, making decisions, and solving problems efficiently, ensuring that events run effectively. Successful Scrum Masters also educate their teammates in applying basic facilitation techniques so they can cover for them. Conflict Resolution: Conflict is inevitable in any team setting. Successful Scrum Masters are skilled in conflict resolution, fostering a healthy, collaborative environment. They address issues promptly and constructively, ensuring that conflicts do not hinder team progress and adhering to Scrum values. This ability is essential for maintaining Scrum’s collaborative spirit. Servant Leadership: The Scrum Guide 2020 emphasizes the role of the Scrum Master as a “true leader.” Successful Scrum Masters prioritize their team’s and organization’s needs over their personal agenda. They empower team members, encourage autonomy, and help remove obstacles. This approach also aligns with the Agile Manifesto’s principles of building projects around motivated individuals, providing them with the environment and support they need, and trusting them to get the job done. Empathy and Emotional Intelligence: High emotional intelligence enables Scrum Masters to understand and manage their emotions and support their team members accordingly. They are empathetic and capable of sensing team dynamics and addressing underlying issues. This emotional understanding fosters a trusting and psychologically safe environment, which is crucial for effective collaboration and communication. Continuous Improvement Focus: Effective Scrum Masters constantly seek opportunities for process improvement, embodying the Agile principle of reflecting on becoming more effective. They encourage the team to embrace change and continuously improve their practices, workflows, and behaviors, leading to increased productivity and product quality. Transparency Advocate: Promoting transparency in all aspects of work is crucial. Successful Scrum Masters ensure the team’s work, challenges, and successes are visible to all stakeholders. This focus aligns with the Scrum Guide’s emphasis on openness and honesty and supports the Agile Manifesto’s value of customer collaboration over contract negotiation. Technical Understanding: While they don’t need to be technical experts, successful Scrum Masters have a good grasp of the technical aspects of projects. This understanding helps them facilitate technical discussions, comprehend challenges faced by the team, and assist in removing technical impediments. It is beneficial that Scrum Masters have sufficient knowledge of the domain to be effective in their roles. Attributes of Unsuccessful Scrum Masters Finally, we need to address traits of failing Scrum Masters, for example: Poor Facilitation: Ineffective facilitation of meetings and Scrum events can lead to disorganization and wasted time. Unsuccessful Scrum Masters might struggle to keep meetings on track, fail to engage all team members, or be unable to achieve meeting objectives, which goes against the idea of effective interactions. Directive Approach: A command-and-control approach contradicts the Scrum Master’s role as a servant leader. Unsuccessful Scrum Masters who dictate rather than guide their teams impede the team’s ability to self-organize and innovate. Lack of Engagement: Unsuccessful Scrum Masters often lack engagement with their teams. They may fail to understand team dynamics or be unaware of the challenges faced by team members. This detachment leads to ineffectiveness and a lack of alignment with Scrum values, undermining team morale and productivity. Inadequate Conflict Management: Failing to manage conflicts effectively can create a toxic team environment. Unsuccessful Scrum Masters who avoid addressing conflicts, sweep them under the rug, or handle them poorly disrupt team collaboration and communication, essential elements of successful Scrum teams. Resistance to Change: An inability to adapt to changes or incorporate feedback is detrimental in Agile environments. Unsuccessful Scrum Masters who resist change and act dogmatically hinder the team’s progress and go against the Agile principle of welcoming changing requirements, even late in development. Ignoring Technical Aspects: A lack of understanding of the technical aspects of projects and products can prevent Scrum Masters from effectively removing impediments or facilitating technical discussions. This gap can lead to delays and misunderstandings, diminishing the team’s success potential. If you like to learn more about Scrum Master anti-patterns, delve into the following article: Scrum Master Anti-Patterns — 20 Signs Your Scrum Master Needs Help. Food for Thought Consider the following questions to help your teams and your organization to have successful Scrum Masters and embrace agility fully: Considering the increasing pressure on Scrum Masters to demonstrate their direct contribution to organizational success, what innovative strategies can they employ to showcase their value beyond traditional agile metrics? How can Scrum Masters evolve their role to become indispensable in organizations skeptical about the tangible benefits of agile practices, especially in sectors where Agile is not yet the norm? Considering the rapid advancements in AI and automation, how can Scrum Masters leverage these technologies to improve their effectiveness while maintaining agile principles? Successful Scrum Masters — Conclusion The role of a Scrum Master transcends mere process facilitation; it’s about nurturing an environment where Agile principles thrive. Successful Scrum Masters blend skills like effective facilitation, servant leadership, and conflict resolution with deep empathy and a focus on continuous improvement. Conversely, recognizing the traits of unsuccessful Scrum Masters is equally vital for course correction and maintaining integrity. This article not only sheds light on what makes a Scrum Master successful but also invites readers to ponder the evolving dynamics of leadership in the face of new challenges, technological advancements, and economic hardship. What attributes of Successful Scrum Masters have you observed? Please share your experience in the comments.
As regular readers know, I recently changed companies. After all the interviews, the next part of that process was the offer negotiation phase. To be incredibly transparent, I hate that part of the interview process like almost nothing else in my life. It’s gut-churning and mind-numbing and terror-inducing all at the same time. I always feel like I’m doing it wrong, and at the end of the process, I’m certain I’ve made horrible mistakes that will haunt me for the rest of my career. In the 35+ years I’ve worked in tech, I’ve changed jobs several times, each time interviewing with several companies before making a career move. After some back-of-the-napkin math, I realized I’ve received dozens of offers over the course of my career. Anyone who’s worked in tech understands this is pretty typical, and not just in this current period of marketplace unrest, with the “great resignation/reshuffling/whatever” weekly announcements of layoffs and companies getting acquired, spun off, or shut down like it was a game of whack-a-mole where the moles were actually Russian nesting dolls. Moving around like that means I’ve negotiated my salary, benefits, days off, and so on a bunch of times. And you’d think that after so much practice over so many years, it would feel like old hat. No big deal. A walk in the park. Nothing could be further from the truth. There are, of course, plenty of resources to learn how to be a better negotiator and to take control (not to mention ownership) of one's career. I'm a big fan of Chris Voss' book "Never Split the Difference." I'm an equally big fan of Josh Doody, who I've enjoyed hearing from on | three | separate | appearances | on Corey Quinn's Screaming In the Cloud podcast. But even with the information I gleaned from those (and other) resources, I’m still uncomfortable with the entire process. Negotiating the purchase of a car or house is so much easier emotionally because objects aren’t tied to your self-worth the way a job offer objectifies your value as a professional. This makes it more important than ever to take a principal we hold true in our datacenters and apply it to our careers: Namely, if something isn’t a core competency for the business, don’t spend huge amounts of time, effort, or budget to build it internally. And it’s an indisputable fact that – while salary negotiation is important to my career – it’s utterly irrelevant to my day-to-day job. This is why, when my friend and former colleague Kymberlee Price (aka “KymPossible” on Bluesky, hence the title of this blog) offered to help with the negotiation phase during my last job search, I jumped at the chance. If you are currently in the negotiation phase of your own job, stop reading this and contact her. If you are hunting for your next job and know that negotiations are on the horizon, write down her info and contact her after you finish reading this. And if you aren’t looking for a new job at this moment, I’d remind you that – in tech at least – your transition to turning on that green “Open to Work” banner in LinkedIn can happen at any time. So, grab her info and put it somewhere for safe keeping. If you’re thinking, “I’m just not comfortable with being super aggressive to my future employer”, you need to move over and make some room because I’m not either. In fact, almost every one of Kym’s clients says the same thing. And the truth is, nobody is saying you need to be. In fact, aggressive is the very LAST thing that you need for effective negotiations. Firm? Absolutely. Confident? 100%. But aggressive? Never. Like any great tech practitioner, helping people negotiate their salary is Kym’s SIDE HUSTLE. By day, she’s a kick-ass product security leader. Kym relies on data and analysis to identify how the offer on the table stands up to industry standards, your current compensation, the likely movement of the market in the next few years, and more. Kym’s process takes the emotion completely out of the conversation. Her view of “total compensation” goes way beyond what I’m used to considering. Not only did she factor in the usual elements like stock options, bonuses, etc., but she also included out-of-pocket healthcare costs, 401k matching, PTO, and a projection of how the complete offer will “age” over five years, based on average raises, bonuses, etc. The approach is hard to argue against. It’s not a fierce contest of wills where one side insists they are “worth more,” and the other does their best to convince them otherwise. It’s a frank and fact-based discussion of how the company’s offer compares competitively to other options available and proposes adjustments to close any gaps so you can happily accept. A company can still choose to say, “That’s not something we can do.” but this is a far cry from the standard pushback of “We don’t believe we (or anyone else) should do this.” Which, I gotta tell you, is a breath of fresh air. Now, all of this implies that you’re always going to get a lowball offer – which isn’t the case. Kym tells me that when clients get a phenomenal offer straight out of the gate, she provides them with a few key questions to ask the recruiter to confirm it really is a great offer and ensure there aren’t any surprises later. Sometimes, the base compensation is great, but you’re losing $50-100k a year in benefits, retirement matching, etc. Maybe you’re okay with that tradeoff, but you need to know about it up front. As a Tech professional, I need to build and maintain my skills as a developer, sysadmin, and engineer. I need to stay up to date on the latest tools, techniques, and technologies. But I don’t need to spend precious time learning the fine art of salary negotiation, even when I will probably have to do it several more times before I retire. Not when there are amazing folks like Kymberlee Price around to do it and do it far better than I could ever imagine.
Hello DZone Community! Recently, you might have seen our announcement about the updates to the Core program. We’ve received a lot of great feedback about the new program, and we’re very excited to continue growing and expanding it to more members! But that was just the beginning. We’ve been working hard on improvements across the entire DZone community, and today, we are thrilled to announce some big improvements to your DZone profiles! There’s a lot to unpack with these new profiles, but the overall gist of it is that it gives them a fresh new look (ooh shiny!!) and adds some new features for you. Among other things, we’ve added: A section for your education, training, and credentials earned Sections for any Trend Reports and Refcards you’ve contributed to A section for any DZone events you’ve been a part of While all members will receive the above updates to their profiles, we’ve built some additional features for our Core members. They truly go above and beyond for the DZone community by being highly engaged and regularly contributing expert content to the site. These additional changes will help continue to elevate them as thought leaders both within the DZone community and across the industry at large. Core member profiles will now have: Optimized profile A place to add open-source projects they're working on or support A section recognizing when they're highlighted as a Featured Expert on DZone A new, exclusive banner showcasing their Core membership We could not be more excited to roll out these new profiles to you all. Every single one of our contributors is essential to what we do at DZone, and these new profiles will help highlight to our community and the rest of our audience just how knowledgeable and important you are to DZone. We literally would not be here without you! If you haven't already and would like to begin your contributor journey, you can start by creating your own article! Our team of editors is here to help along the way. You can reach out to editors@dzone.com with any of your content questions. Please spend some time poking around your new profile, and let us know what you think. We’re always open to feedback and new ideas! Drop us a line at community@dzone.com with your thoughts. We are so incredibly grateful for all you do for DZone! Sincerely, The DZone Team
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report From cultural and structural challenges within an organization to balancing daily work and dividing it between teams and individuals, scaling teams of site reliability engineers (SREs) comes with many challenges. However, fostering a resilient site reliability engineering (SRE) culture can facilitate the gradual and sustainable growth of an SRE team. In this article, we explore the challenges of scaling and review a successful scaling framework. This framework is suitable for guiding emerging teams and startups as they cultivate an evolving SRE culture, as well as for established companies with firmly entrenched SRE cultures. The Challenges of Scaling SRE Teams As teams scale, complexity may increase as it can be more difficult to communicate, coordinate, and maintain a team's coherence. Below is a list of challenges to consider as your team and/or organization grows: Rapid growth – Rapid growth leads to more complex systems, which can outpace the capacity of your SRE team, leading to bottlenecks and reduced reliability. Knowledge-sharing – Maintaining a shared understanding of systems and processes may become difficult, making it challenging to onboard new team members effectively. Tooling and automation – Scaling without appropriate tooling and automation can lead to increased manual toil, reducing the efficiency of the SRE team. Incident response – Coordinating incident responses can become more challenging, and miscommunications or delays can occur. Maintaining a culture of innovation and learning – This can be challenging as SREs may become more focused on solving critical daily problems and less focused on new initiatives. Balancing operational and engineering work – Since SREs are responsible for both operational tasks and engineering work, it is important to ensure that these teams have enough time to focus on both areas. A Framework for Scaling SRE Teams Scaling may come naturally if you do the right things in the right order. First, you must identify what your current state is in terms of infrastructure. How well do you understand the systems? Determine existing SRE processes that need improvement. For the SRE processes that are necessary but are not employed yet, find the tools and the metrics necessary to start. Collaborate with the appropriate stakeholders, use feedback, iterate, and improve. Step 1: Assess Your Current State Understand your system and create a detailed map of your infrastructure, services, and dependencies. Identify all the components in your infrastructure, including servers, databases, load balancers, networking equipment, and any cloud services you utilize. It is important to understand how these components are interconnected and dependent on each other — this includes understanding which services rely on others and the flow of data between them. It's also vital to identify and evaluate existing SRE practices and assess their effectiveness: Analyze historical incident data to identify recurring issues and their resolutions. Gather feedback from your SRE team and other relevant stakeholders. Ask them about pain points, challenges, and areas where improvements are needed. Assess the performance metrics related to system reliability and availability. Identify any trends or patterns that indicate areas requiring attention. Evaluate how incidents are currently being handled. Are they being resolved efficiently? Are post-incident reviews being conducted effectively to prevent recurrences? Step 2: Define SLOs and Error Budgets Collaborate with stakeholders to establish clear and meaningful service-level objectives (SLOs) by determining the acceptable error rate and creating error budgets based on the SLOs. SLOs and error budgets can guide resource allocation optimization. Computing resources can be allocated to areas that directly impact the achievement of the SLOs. SLOs set clear, achievable goals for the team and provide a measurable way to assess the reliability of a service. By defining specific targets for uptime, latency, or error rates, SRE teams can objectively evaluate whether the system is meeting the desired standards of performance. Using specific targets, a team can prioritize their efforts and focus on areas that need improvement, thus fostering a culture of accountability and continuous improvement. Error budgets provide a mechanism for managing risk and making trade-offs between reliability and innovation. They allow SRE teams to determine an acceptable threshold for service disruptions or errors, enabling them to balance the need for deploying new features or making changes to maintain a reliable service. Step 3: Build and Train Your SRE Team Identify talent according to the needs of each and every step of this framework. Look for the right skillset and cultural fit, and be sure to provide comprehensive onboarding and training programs for new SREs. Beware of the golden rule that culture eats strategy for breakfast: Having the right strategy and processes is important, but without the right culture, no strategy or process will succeed in the long run. Step 4: Establish SRE Processes, Automate, Iterate, and Improve Implement incident management procedures, including incident command and post-incident reviews. Define a process for safe and efficient changes to the system. Figure 1: Basic SRE process One of the cornerstones of SRE involves how to identify and handle incidents through monitoring, alerting, remediation, and incident management. Swift incident identification and management are vital in minimizing downtime, which can prevent minor issues from escalating into major problems. By analyzing incidents and their root causes, SREs can identify patterns and make necessary improvements to prevent similar issues from occurring in the future. This continuous improvement process is crucial for enhancing the overall reliability and performance whilst ensuring the efficiency of systems at scale. Improving and scaling your team can go hand in hand. Monitoring Monitoring is the first step in ensuring the reliability and performance of a system. It involves the continuous collection of data about the system's behavior, performance, and health. This can be broken down into: Data collection – Monitoring systems collect various types of data, including metrics, logs, and traces, as shown in Figure 2. Real-time observability – Monitoring provides real-time visibility into the system's status, enabling teams to identify potential issues as they occur. Proactive vs. reactive – Effective monitoring allows for proactive problem detection and resolution, reducing the need for reactive firefighting. Figure 2: Monitoring and observability Alerting This is the process of notifying relevant parties when predefined conditions or thresholds are met. It's a critical prerequisite for incident management. This can be broken down into: Thresholds and conditions – Alerts are triggered based on predefined thresholds or conditions. For example, an alert might be set to trigger when CPU usage exceeds 90% for five consecutive minutes. Notification channels – Alerts can be sent via various notification channels, including email, SMS, or pager, or even integrated into incident management tools. Severity levels – Alerts should be categorized by severity levels (e.g., critical, warning, informational) to indicate the urgency and impact of the issue. Remediation This involves taking actions to address issues detected through monitoring and alerting. The goal is to mitigate or resolve problems quickly to minimize the impact on users. Automated actions – SRE teams often implement automated remediation actions for known issues. For example, an automated scaling system might add more resources to a server when CPU usage is high. Playbooks – SREs follow predefined playbooks that outline steps to troubleshoot and resolve common issues. Playbooks ensure consistency and efficiency during remediation efforts. Manual interventions – In some cases, manual intervention by SREs or other team members may be necessary for complex or unexpected issues. Incident Management Effective communication, knowledge-sharing, and training are crucial during an incident, and most incidents can be reproduced in staging environments for training purposes. Regular updates are provided to stakeholders, including users, management, and other relevant teams. Incident management includes a culture of learning and continuous improvement: The goal is not only to resolve the incident but also to prevent it from happening again. Figure 3: Handling incidents A robust incident management process ensures that service disruptions are addressed promptly, thus enhancing user trust and satisfaction. In addition, by effectively managing incidents, SREs help preserve the continuity of business operations and minimize potential revenue losses. Incident management plays a vital role in the scaling process since it establishes best practices and promotes collaboration, as shown in Figure 3. As the system scales, the frequency and complexity of incidents are likely to increase. A well-defined incident management process enables the SRE team to manage the growing workload efficiently. Conclusion SRE is an integral part of the SDLC. At the end of the day, your SRE processes should be integrated into the entire process of development, testing, and deployment, as shown in Figure 4. Figure 4: Holistic view of development, testing, and the SRE process Iterating on and improving the steps above will inevitably lead to more work for SRE teams; however, this work can pave the way for sustainable and successful scaling of SRE teams at the right pace. By following this framework and overcoming the challenges, you can effectively scale your SRE team while maintaining system reliability and fostering a culture of collaboration and innovation. Remember that SRE is an ongoing journey, and it is essential to stay committed to the principles and practices that drive reliability and performance. This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
The purpose of project management is to complete a project of agreed quality within a certain budget and time frame. Though software development projects come with their own nuances, the principles of project management can be applied to manage software projects better. It is because, like any other project, a software project also goes through the stages of initiation, planning, execution, and completion. Project managers who are new to software development project management tend to make some basic mistakes in the early stages of their careers. Project management hacks can help you improve your work productivity, team performance, and overall project success rate. In this article, I will talk about the importance of using project management hacks and the top 11 project management hacks for software project managers from my understanding and managing projects for over a decade. Importance of Adopting Effective Project Management Hacks Project management hacks are the tips and tricks that help you become more efficient, more productive, and make your work life easy. The primary focus of using these hacks is to save your time and make work management easy for you. For example, using workflow automation to assign repetitive tasks or sending automated notifications once a particular stage of a task is reached is a project management hack. It helps you save time and effort. Like this, there are various project management hacks that can help you with project planning, goal setting, resource management, team management, and so on. 11 Project Management Hacks to Drive Successful Software Development The role of a software project manager is to provide technical guidance to the team and manage the people. Here are the 11 project management hacks you can use for effective software development project management: Phase 1: Pre-Project Planning Hacks 1. Define the Full Scope of the Project The first step of project management is to define the scope of the project. It talks about the project goals, deliverables, tasks, costs, and deadlines. It is very important for a manager to define the full scope of the project before starting to work on a software project with a team. A well-defined project scope helps you: Align software with the business needs better and identify the boundaries and constraints of the project. Ensure all the key stakeholders agree on what the project is all about and what your team will work on. Reduce the likelihood of scope creep for the development team. 2. Choose a High-Performance Team Without the right team, it is hard to work successfully on a project. If you already have resources, assess their professional and personal attributes to find out the best-suited developers for the project. If you are hiring new resources, participate in the hiring process to find the right talent on board. It will help you have the right people in your team. 3. Establish Clear Communication Channels Many different team members are involved in a software project. Also, a project can run anywhere from 3 months to a year or more. Therefore, it is very important to establish dedicated channels of communication for the project life cycle. It helps you streamline communication, facilitate collaboration with the team, and record all the conversations for effective tracking and conflict resolution. 4. Manage Risks Effectively New project managers tend to over-optimize the resources and set unrealistic deadlines. Try to effectively manage the project risks. Based on the severity, divide the risks into three categories: high, low, and medium. Plan an effective solution for every risk. It will help you manage risks better and avoid project failure. Not just that, proactively identify and assess potential risks during the project. It will help you find issues that can derail the project and develop contingency plans to mitigate the potential impact. Phase 2: Project Execution and Monitoring Hacks 5. Break Large Goals Into Small Goals Many beginner software project managers make the mistake of not breaking large goals into measurable small ones. Thus, they are not able to achieve project goals in time due to a lack of tracking and goal setting. Identify and set key milestones and set measurable small goals around those milestones. Use your technical expertise to set the most useful KPIs to better track the project progress and performance of the team. 6. Allocate the Work as per the Resource Capability In the real world, resource A is not simply equal to resource B. For example, one developer may develop a feature in X days while the other developer may take longer. One developer may be good at testing while the other is in coding or deployment. You have to plan and allocate the work according to the resource capabilities, experience, and strengths. Just simply equally allocating the tasks to each team member does not work. You have to be smart in task delegation and resource planning. 7. Communicate Clearly To develop software successfully, every member of the team from business analyst and product owner to designer, developer, tester, and DevOps engineer needs to work in unison. Communicate each individual's roles and responsibilities. Tell them how their goals impact the overall progress of the project and the performance of other team members. Give them feedback when required. Keep the team in a loop for every change in requirement and feedback from the client or stakeholders. It will help your team to work productively and make your life easy. 8. Use the Right Project Management Tools It is hard to manage software projects on pen and paper or using outdated spreadsheets. Use dedicated project management software to plan, execute, and manage the project. It will provide you with the tools to plan a project, create tasks, create a workflow, set goals, track progress, manage teamwork, facilitate team communication, share documents, and prepare reports. It will help you stay on top of project management and work efficiently with ease. 9. Embrace Agile Software Development More than 70% of U.S. companies use Agile methodology for software development. Use Agile frameworks such as Scrum or Kanban for software development project management. It will help you better control software development, manage project risks, and handle the team. Why? You are developing the software in short sprints, which are easy to plan and manage. You are continuously releasing software in the market for end-user feedback and receiving client approval at every stage. You are constantly reviewing the progress based on the feedback and adapting for improved performance. Phase 3: Post Execution Hacks 10. Involve Team Members in Project Review All successful project managers recommend reviewing the project performance once it is finished. A detailed project review tells you about what went well, what went wrong, and what areas you need to improve. Do not just review the project from your eyes. Request feedback from your entire team to learn about the positives and negatives. It will help you improve significantly for the next project. 11. Celebrate the Success of the Team and Recognize the Achievements It is very important to keep your team happy. A good project manager is also a good people manager. Listen to your team’s needs proactively. Appreciate your team efforts. Recognize good team performances and celebrate successes. This will give a huge boost to the morale of the employees. Wrapping Up A software project manager has to take care of many responsibilities. Anything that helps a project manager save time, work efficiently, and improve productivity is welcomed. Project management hacks are the tips and tricks that make your work life easy, improve productivity, and help your team work efficiently.
In the realm of professional pursuits, there exists a common misconception that managing software development is akin to riding a bike – a static skill that, once acquired, can be smoothly pedaled forward with minimal adjustments. However, in the fast-evolving landscape of technology, such a comparison is not only overly simplistic but can lead to profound misjudgments in leadership. Unlike the steadfast predictability of a bicycle ride, software development is a dynamic and ever-changing process that defies the static nature of traditional analogies. As we celebrate the first birthday of our software endeavors, it's imperative to address the fallacy that managing software projects is as straightforward as steering a two-wheeler down a familiar path. This misapprehension often stems from leaders who, having once mastered coding or project management, find themselves trapped in a mindset that underestimates the fluidity of the software development journey. In this article, we unravel the intricacies of why software development is fundamentally distinct from riding a bike, shedding light on the pitfalls that managers and CTOs may encounter when they cling to static paradigms in a world that thrives on adaptability and innovation. Join us as we explore the dynamic nature of software development and challenge the notion that it can be steered with the simplicity of a handlebar. In the not-so-distant past, the scarcity and costliness of data storage spurred a focus on normalizing databases to conserve every precious byte. However, as technology advanced, we witnessed a paradigm shift. The advent of NoSQL databases prompted a reevaluation of our practices, challenging the once-unquestioned norms of normalization. Today, we find ourselves navigating the complexities of denormalization and replication, leveraging the databases' capabilities to handle the deluge of data in the age of information abundance. As access to computing power expanded with the rise of cloud platforms, the architectural landscape underwent a metamorphosis. Traditional monolithic structures gave way to the nimble and scalable world of microservices. With the cloud offering a buffet of resources, developers embraced a distributed approach, empowering them to create systems that are not only resilient but also capable of seamlessly scaling to meet the demands of modern applications. The software development life cycle has witnessed its evolution, from the rigidity of waterfall methodology to the agility of modern development practices. The cloud-native methodology has emerged as a flexibility champion, enabling teams to iterate rapidly and respond to changing requirements. Today, we stand in the era of Agile, where collaboration, adaptability, and continuous delivery reign supreme, ushering in an age where the pace of development matches the speed of technological innovation. Gone are the days when users patiently queue in lines to make a purchase. The digital age has ushered in a new era of seamless experiences, where transactions occur at the tap of a screen. The evolution of software has not only transformed the way we develop applications but has fundamentally altered user expectations, demanding intuitive interfaces and instant gratification. Artificial intelligence (AI) stands as the next frontier as we peer into the future. The integration of AI and generative AI has the potential to revolutionize how we conceive, build, and optimize software. Algorithms that learn and adapt, coupled with the ability to generate code, hint at a future where development becomes an even more symbiotic dance between human creativity and machine intelligence. In this ever-shifting landscape, software development remains a dynamic canvas where each stroke of innovation leaves an indelible mark. As we navigate the currents of change, it's crucial to recognize that the journey is far from over – the horizon holds new technologies, methodologies, and challenges, inviting us to continuously adapt, learn, and redefine the future of software development. A Brief History of Software Development Embarking on a journey through the epochs of software development is akin to navigating a landscape that constantly redefines itself. This session explores the dynamic evolution that has shaped the essence of how we conceive, craft, and deliver software solutions. As we traverse the annals of time, we'll unveil the intricate tapestry of changes woven together to form the contemporary fabric of software development. From the early days when data was a precious commodity to the present era of information abundance, from the rigid structures of waterfall methodologies to the agile dance of cloud-native development, each phase has left an indelible mark on the software development saga. Join us as we delve into the database dilemmas, architectural ascents, methodology metamorphoses, and user experience unleashing that define the narrative of our digital evolution. As we stand on the cusp of an era where artificial intelligence and generative AI promise to reshape the very foundations of our craft, it becomes imperative to reflect on the past, understand the present, and anticipate the future. The history of software development is not merely a chronological progression; it is a story of adaptation, innovation, and resilience. So, let us journey together through the corridors of time, where each twist and turn reveals a new facet of this ever-evolving realm. Welcome to exploring the dynamic symphony that is the history of software development. In the not-so-distant past, the scarcity and costliness of data storage spurred a focus on normalizing databases to conserve every precious byte. However, as technology advanced, we witnessed a paradigm shift. The advent of NoSQL databases prompted a reevaluation of our practices, challenging the once-unquestioned norms of normalization. Today, we find ourselves navigating the complexities of denormalization and replication, leveraging the databases' capabilities to handle the deluge of data in the age of information abundance. Historical cost of computer memory and storage As access to computing power expanded with the rise of cloud platforms, the architectural landscape underwent a metamorphosis. Traditional monolithic structures gave way to the nimble and scalable world of microservices. With the cloud offering a buffet of resources, developers embraced a distributed approach, empowering them to create systems that are not only resilient but also capable of seamlessly scaling to meet the demands of modern applications. New architecture using microservices The software development life cycle has witnessed its evolution, from the rigidity of waterfall methodology to the agility of modern development practices. The cloud-native methodology has emerged as a flexibility champion, enabling teams to iterate rapidly and respond to changing requirements. Today, we stand in the era of Agile, where collaboration, adaptability, and continuous delivery reign supreme, ushering in an age where the pace of development matches the speed of technological innovation. Agile methodology process Gone are the days when users patiently queue in lines to make a purchase. The digital age has ushered in a new era of seamless experiences, where transactions occur at the tap of a screen. The evolution of software has not only transformed the way we develop applications but has fundamentally altered user expectations, demanding intuitive interfaces and instant gratification. Artificial intelligence (AI) stands as the next frontier as we peer into the future. The integration of AI and generative AI has the potential to revolutionize how we conceive, build, and optimize software. Algorithms that learn and adapt, coupled with the ability to generate code, hint at a future where development becomes an even more symbiotic dance between human creativity and machine intelligence. In this ever-shifting landscape, software development remains a dynamic canvas where each stroke of innovation leaves an indelible mark. As we navigate the currents of change, it's crucial to recognize that the journey is far from over – the horizon holds new technologies, methodologies, and challenges, inviting us to continuously adapt, learn, and redefine the future of software development. Why Past Successes May Lead to Future Failures In the dynamic realm of software development, the adage "what worked in the past will work in the future" is a dangerous oversimplification that can potentially steer leaders and C-level executives into turbulent waters. This session aims to unravel why a deep understanding of the industry's evolution is beneficial and imperative for those guiding the ship. While the fundamental principles of computer science serve as a bedrock, the landscape in which they are applied undergoes perpetual transformation. Managers, CTOs, and executives who once thrived as hands-on engineers may be treading on thin ice if they believe their past achievements grant them a timeless understanding of the field. The danger lies in assuming that what was influential in the past remains applicable in an industry where change is the only constant. As software development evolves, so do the methodologies, tools, and paradigms that govern it. Leaders who cease to code and detach from the front lines risk becoming obsolete in their understanding of current practices. The disconnect between the executive suite and the development trenches can lead to misguided decisions, as what may have been best practice a decade ago might now be an antiquated approach. To remain relevant and practical, leaders must embrace the ethos of lifelong learning. It includes staying abreast of emerging technologies, methodologies, and trends. Arrogance and an unwillingness to adapt can hinder progress, whereas humility and a willingness to learn from younger, less experienced team members can foster a collaborative and innovative environment. In the evolving landscape, leadership roles have transformed as well. The emergence of positions like Staff Engineer exemplifies a harmonious convergence of coding proficiency and strategic thinking. This hybrid role acknowledges the value of technical prowess while emphasizing the strategic vision necessary for leadership positions. It's a testament that one need not abandon the code editor to ascend the career ladder. Recognizing that the history of software development is a dynamic narrative, not a static manual, is crucial for effective leadership. Managers and executives must acknowledge that the very fabric of the industry has changed, and what led to success in the past may not be a blueprint for the future. By staying curious, embracing continuous learning, and fostering a culture of collaboration, leaders can navigate the currents of software development and guide their teams toward success in an ever-evolving landscape. Summary As our journey through the dynamic history of software development comes to a close, it’s crucial to distill the essence of our exploration into actionable insights for leaders and visionaries. 1. Embrace the Current: Leaders must internalize the fluid nature of software development. Acknowledge that what worked yesterday might not work tomorrow, and be prepared to adapt swiftly to the evolving currents of technology and methodologies. 2. Continuous Learning is Key: The heartbeat of effective leadership in software development is a commitment to continual learning. Staying curious, remaining open to new ideas, and fostering a culture of shared knowledge ensures that leaders don’t just lead; they inspire growth. 3. Humility Fuels Innovation: A humble leader is an influential leader. Recognizing the value of diverse perspectives, including those of younger team members, fosters an environment where Innovation can flourish. Arrogance, on the other hand, creates blind spots that hinder progress. 4. The Hybrid Leader: The emergence of roles like the Staff Engineer signals a departure from traditional hierarchies. Leaders need not forsake coding to ascend the ladder; instead, they can integrate technical expertise with strategic vision, creating a harmonious synergy that propels teams forward. 5. Navigate With Purpose: Purposeful navigation is paramount in the dynamic seas of software development. Leaders must define clear goals, inspire their teams, and foster an environment where adaptability is not a reaction but a proactive stance. As we chart the course ahead, remember that leadership in software development is not about steering a static vessel but mastering the art of sailing through ever-changing waters. Embrace the dynamism, learn continually, lead with humility, and set sail towards a future where Innovation and adaptability are the guiding stars. The dynamic journey continues, and effective leadership will always be the compass for success in software development. Safe travels!
Do you excel in the art of setting unattainable, imposed, or plain non-existing Sprint Goals? In other words, are you good at missing Sprint Goals with regularity? If not, don’t worry; help is on the way! In this article, we’ll explore how to consistently miss the mark. For example, enjoy the thrill of cherry-picking unrelated backlog items and defining success by sheer output, not outcome. Countless Scrum Teams have thoroughly tested all suggestions. They are ideally suited for teams who love the challenge of aimlessly wandering through Sprints! The Essence and Inherent Importance of the Sprint Goal Before we indulge ourselves in missing Sprint Goals and, thus, failing core responsibilities as a Scrum Team, let’s revisit the original ideas behind Sprint Goals: The Sprint Goal is a Scrum team’s single objective for Sprint, delivering the most valuable result from the customers’ and the organization’s perspective. It informs the composition of the Sprint Backlog and becomes a part of it, thus acting as a beacon that guides the Developers during the Sprint. Moreover, it is instrumental to creating the Sprint plan, having a successful Daily Scrum, and collaborating and supporting each other as a Scrum team. Also, the Sprint Goal helps the Scrum team to identify whether their work was successful: did we accomplish the goal at the end of the Sprint? In that respect, it separates a few weeks of working on “stuff” from experiencing the satisfaction and joy of being a successful Scrum team, delivering value to customers and the organization. The Sprint Goal thus supports a Scrum team — and its organization — to move from an industrial paradigm-driven output orientation, the proverbial feature factory, to an outcome-based approach to solving your customers’ most valuable problem every Sprint. This change of perspective has a far-reaching consequence: every Sprint, the Scrum team strives to accomplish the Sprint Goal, which is different from maximizing the output in the form of work hours or the number of work items. The process of forming a Sprint Goal begins with Sprint Planning, when the Developers, the Product Owner, and the Scrum Master come together to decide on the next steps for building, ensuring the delivery of maximum value to customers in the forthcoming Sprint. How to Create Sprint Goals Initially, the Product Owner highlights the overarching Product Goal and outlines the business aim for the new Sprint. Using this as a foundation, the Scrum team collaboratively establishes the Sprint Goal, considering various factors such as: Team availability during the Sprint. Any changes in team composition, including new members joining or existing members departing. The desired quality level as specified in the Definition of Done. The team’s proficiency with the necessary technology. The availability of required tools. Dependencies to other teams or suppliers. Specific governance requirements that need to be met. The necessity to manage daily operations, like maintaining the product’s functionality, and how this impacts team capacity. Following this, the Developers pledge their commitment to the Sprint Goal. It’s important to understand that this commitment isn’t to a fixed amount of work, such as the tasks listed in the Sprint Backlog after Sprint Planning. Scrum focuses on outcomes rather than outputs. In response, the Developers then project the work needed to reach the Sprint Goal. They do this by selecting items from the Product Backlog to include in the Sprint Backlog. If additional, previously unidentified tasks are necessary to achieve the Sprint Goal, they add these to the Sprint Backlog. Moreover, the Developers form an initial plan for accomplishing their projection. Doing so for the first two or three days is advisable, as the team will begin gathering insights once the work commences. Detailed planning for the entire Sprint at this stage would be counterproductive. 10 Sure-Fire Ways to Miss Your Sprint Goals Here are my top ten approaches to missing Sprint Goals to ensure you will fail your stakeholders every single Sprint: No Visualization of Progress: The Developers cannot promptly assess whether they are on track to achieve the Sprint Goal. This lack of clarity often stems from inadequate tracking and visualization of progress. The Daily Scrum addresses this by ensuring the team is aligned and on track, with adjustments made as needed to the plan or Sprint Backlog. Without a clear understanding of their progress, Developers are less likely to meet the Sprint Goal, as success in Sprints builds from growing confidence over time, not last-minute efforts. Kanban through the Backdoor: The Scrum team consistently takes on too many tasks, leading to a regular overflow of unfinished work into the next Sprint — without further consideration or inspection. This practice, especially when 30 to 40 percent of tasks routinely spill over, indicates a shift towards a ‘time-boxed Kanban’ style rather than adhering to Scrum principles. This habitual spillover suggests a need to reassess and realign the team’s approach to fit the Scrum framework better. Scope Stretching or Gold-Plating: The Developers expand the scope of the Sprint beyond the agreed-upon Sprint Goal by adding extra, unnecessary work to the Product Backlog items in the Sprint Backlog. This issue arises when Developers disregard the original scope agreement with the Product Owner and unilaterally decide to enhance tasks without consultation. This behavior can lead to questionable allocation of development time, as it shifts focus away from the agreed priorities and goals, potentially impacting the team’s ability to deliver value effectively. This anti-pattern may reflect a disconnect between the Developers and the Product Owner, undermining the collaborative spirit essential for proper Scrum implementation. Cherry-Picking Product Backlog Items: The Developers select Product Backlog items unrelated to the Sprint Goal, resulting in a disorganized assortment of tasks. This issue often arises from a lack of a clear Sprint Goal or a goal that is too vague or simply a task list. Factors contributing to this pattern may include the need to address urgent technical issues, a desire to pursue new learning opportunities or disagreement with the product direction. If these scenarios don’t apply, it raises concerns about the team’s unity and effectiveness, suggesting they might operate more as individuals than as a cohesive Scrum team. The Imposed Sprint Goal: In this case, the Sprint Goal is not a collective decision of the Scrum team but rather dictated by an individual, often a dominant Product Owner or lead engineer. This scenario often unfolds in environments lacking psychological safety, where team members, despite foreseeing potential failure, remain silent and unopposed to the imposition. This pattern reflects a deeper issue within the team, signaling a departure from the core Scrum Values. Some team members may have resigned to the status quo, losing interest in continuous improvement and collaboration. In such cases, the team might be more accurately described as a group of individuals working in parallel, more focused on their paychecks than genuine teamwork and shared success. The Overly Ambitious Sprint Goal: In this scenario, Scrum teams, often new ones, set unattainably high Sprint Goals, leading to an oversized Sprint Backlog and inevitable underdelivery at Sprint’s end. This issue typically decreases as the team gains experience and better understands their capacity and customer problems. Mature Scrum teams learn to align their capabilities with their aspirations, ensuring they deliver the best possible value to customers and the organization. Lack of Focus: The organization treats the Scrum team as a jack-of-all-trades unit, burdening them with various unrelated tasks hampering the team’s ability to formulate a cohesive Sprint Goal. Such a scenario is counterproductive to Scrum’s essence, which is about tackling complex problems through self-managing, autonomous teams and minimizing development risks. While Scrum excels at achieving specific objectives, its effectiveness diminishes when external stakeholders dictate the team’s workload in detail. This approach undermines Scrum’s core principle of focused, goal-oriented work and risks turning the team into a reactive rather than proactive unit. No Space for Non-Sprint Goal-Related Work: The Scrum team focuses solely on the Sprint Goal, overlooking other critical tasks such as customer support and organizational demands. Effective Scrum practice requires balancing the Sprint Goal with responding to unexpected, yet crucial, issues. Ignoring significant problems, like a critical bug or a malfunctioning payment system, just because they fall outside the Sprint Goal can quickly erode stakeholder trust. Scrum is about adaptability and responding to new challenges, not rigidly adhering to an initial plan, turning the Sprint into a Waterfall-ish time box. Regularly Not Delivering the Sprint Goal: Some Scrum teams fail to meet their Sprint Goals with the precision of a Swiss clockwork. This ongoing issue undermines Scrum’s core objective: solving customer problems effectively and aiding organizational sustainability. Scrum’s usefulness relies on meeting the Sprint Goal, which should be the norm, not the exception. Continual failures, whether due to technical issues, skill shortages, or unforeseen complexities, question the validity of using Scrum. A successful application of Scrum involves a commitment to goals in return for decision-making autonomy and self-organization, not merely mimicking Kanban under the guise of Scrum. No Sprint Goal: Here, the Product Owner presents a disparate collection of tasks, lacking a cohesive objective, which leaves the Scrum team without clear direction. This situation indicates a potential misapplication of Scrum principles, suggesting that shifting to a more flow-based system like Kanban might better suit the team’s needs. Typically, this pattern arises when a Product Owner is either overwhelmed by stakeholder demands or lacks the experience to align tasks effectively with the team’s overall Product Goal. Food for Thought — Missing Sprint Goals Consider the following questions to help your teams and your organization to avoid missing Sprint Goal and embrace agility fully: Are there other underlying team dynamics or organizational practices contributing to these anti-patterns? What are the long-term impacts of these anti-patterns on the overall health and productivity of the Scrum team and its standing within the organization? How can the Scrum framework be adapted or reinforced to mitigate these anti-patterns, especially in diverse or rapidly changing work environments? Conclusion These ten Sprint Goal anti-patterns highlight various challenges that Scrum teams may face, from minor inefficiencies to major dysfunctions that can significantly undermine Scrum principles and, thus, the team’s effectiveness. Addressing these issues requires a nuanced understanding of team dynamics, organizational culture, and commitment to continuous improvement and adherence to Scrum values. By recognizing and proactively addressing these anti-patterns, Scrum teams can enhance their ability to deliver value effectively and sustainably. What anti-patterns have you encountered, and how did you counter missing Sprint Goals? Please share your experience in the comments.
Site reliability engineering (SRE) is a discipline in which automated software systems are built to manage the development operations (DevOps) of a product or service. In other words, SRE automates the functions of an operations team via software systems. The main purpose of SRE is to encourage the deployment and proper maintenance of large-scale systems. In particular, site reliability engineers are responsible for ensuring that a given system’s behavior consistently meets business requirements for performance and availability. Furthermore, whereas traditional operations teams and development teams often have opposing incentives, site reliability engineers are able to align incentives so that both feature development and reliability are promoted simultaneously. Basic SRE Principles This article covers key principles that underlie SRE, provides some examples of those key principles, and includes relevant details and illustrations to clarify these examples. Principle Description Example Embrace risk No system can be expected to have perfect performance. It’s important to identify potential failure points and create mitigation plans. Additionally, it’s important to budget a certain percentage of business costs to address these failures in real time. A week consists of 168 hours of potential availability. The business sets an expectation of 165 hours of uptime per week to account for both planned maintenance and unplanned failures. Set service level objectives (SLOs) Set reasonable expectations for system performance to ensure that customers and internal stakeholders understand how the system is supposed to perform at various levels. Remember that no system can be expected to have perfect performance. The website is up and running 99% of the time. 99% of all API requests return a successful response. The server output matches client expectations 99% of the time. 99% of all API requests are delivered within one second. The server can handle 10,000 requests per second. Eliminate work through automation Automate as many tasks and processes as possible. Engineers should focus on developing new features and enhancing existing systems at least as often as addressing real-time failures. Production code automatically generates alerts whenever an SLO is violated. The automated alerts send tickets to the appropriate incident response team with relevant playbooks to take action. Monitor systems Use tools, to monitor system performance. Observe performance, incidents, and trends. A dashboard that displays the proportion of client requests and server responses that were delivered successfully in a given time period. A set of logs that displays the expected and actual output of client requests and server responses in a given time period. Keep things simple Release frequent, small changes that can be easily reverted to minimize production bugs. Delete unnecessary code instead of keeping it for potential future use. The more code and systems that are introduced, the more complexity created; it’s important to prevent accidental bloat. Changes in code are always pushed via a version control system that tracks code writers, approvers, and previous states. Outline the release engineering process Document your established processes for development, testing, automation, deployments, and production support. Ensure that the process is accessible and visible. A published playbook lists the steps to address reboot failure. The playbook contains references to relevant SLOs, dashboards, previous tickets, sections of the codebase, and contact information for the incident response team. Embrace Risk No system can be expected to have perfect performance. It’s important to create reasonable expectations about system performance for both internal stakeholders and external users. Key Metrics For services that are directly user-facing, such as static websites and streaming, two common and important ways to measure performance are time availability and aggregate availability. This article provides an example of calculating time availability for a service. For other services, additional factors are important, including speed (latency), accuracy (correctness), and volume (throughput). An example calculation for latency is as follows: Suppose 10 different users serve up identical HTTP requests to your website, and they are all served properly. The return times are monitored and recorded as follows: 1 ms, 3 ms, 3 ms, 4 ms, 1 ms, 1 ms, 1 ms, 5 ms, 3 ms, and 2 ms. The average response time, or latency, is 24 ms / 10 returns = 2.4 ms. Choosing key metrics makes explicit how the performance of a service is evaluated, and therefore what factors pose a risk to service health. In the above example, identifying latency as a key metric indicates average return time as an essential property of the service. Thus, a risk to the reliability of the service is “slowness” or low latency. Define Failure In addition to measuring risks, it’s important to clearly define which risks the system can tolerate without compromising quality and which risks must be addressed to ensure quality. This article provides an example of two types of measurements that address failure: mean time to failure (MTTF) and mean time between failures (MTBF). The most robust way to define failures is to set SLOs, monitor your services for violations in SLOs, and create alerts and processes for fixing violations. These are discussed in the following sections. Error Budgets The development of new production features always introduces new potential risks and failures; aiming for a 100% risk-free service is unrealistic. The way to align the competing incentives of pushing development and maintaining reliability is through error budgets. An error budget provides a clear metric that allows a certain proportion of failure from new releases in a given planning cycle. If the number or length of failures exceeds the error budget, no new releases may occur until a new planning period begins. The following is an example error budget. Planning cycle Quarter Total possible availability 2,190 hours SLO 99.9% time availability Error budget 0.1% time availability = 21.9 hours Suppose the development team plans to release 10 new features during the quarter, and the following occurs: The first feature doesn’t cause any downtime. The second feature causes downtime of 10 hours until fixed. The third and fourth features each cause downtime of 6 hours until fixed. At this point, the error budget for the quarter has been exceeded (10 + 6 + 6 = 22 > 21.9), so the fifth feature cannot be released. In this way, the error budget has ensured an acceptable feature release velocity while not compromising reliability or degrading user experience. Set Service Level Objectives (SLOs) The best way to set performance expectations is to set specific targets for different system risks. These targets are called service level objectives, or SLOs. The following table lists examples of SLOs based on different risk measurements. Time availability Website running 99% of the time Aggregate availability 99% of user requests processed Latency 1 ms average response rate per request Throughput 10,000 requests handled every second Correctness 99% of database reads accurate Depending on the service, some SLOs may be more complicated than just a single number. For example, a database may exhibit 99.9% correctness on reads but have the 0.1% of errors it incurs always be related to the most recent data. If a customer relies heavily on data recorded in the past 24 hours, then the service is not reliable. In this case, it makes sense to create a tiered SLObased on the customer’s needs. Here is an example: Level 1 (records within the last 24 hours) 99.99% read accuracy Level 2 (records within the last 7 days) 99.9% read accuracy Level 3 (records within the last 30 days) 99% read accuracy Level 4 (records within the last 6 months) 95% read accuracy Costs of Improvement One of the main purposes of establishing SLOs is to track how reliability affects revenue. Revisiting the sample error budget from the section above, suppose there is a projected service revenue of $500,000 for the quarter. This can be used to translate the SLO and error budget into real dollars. Thus, SLOs are also a way to measure objectives that are indirectly related to system performance. SLO Error Budget Revenue Lost 95% 5% $25,000 99% 1% $5,000 99.90% 0.10% $500 99.99% 0.01% $50 Using SLOs to track indirect metrics, such as revenue, allows one to assess the cost of improving service. In this case, spending $10,000 on improving the SLO from 95% to 99% is a worthwhile business decision. On the other hand, spending $10,000 on improving the SLO from 99% to 99.9% is not. Eliminate Work Through Automation One characteristic that distinguishes SREs from traditional DevOps is the ability to scale up the scope of a service without scaling the cost of the service. Called sublinear growth, this is accomplished via automation. In a traditional development-operations split, the development team pushes new features, while the operations team dedicates 100% of its time to maintenance. Thus, a pure operations team will need to grow 1:1 with the size and scope of the service it is maintaining: If it takes O(10) system engineers to serve 1000 users, it will take O(100) engineers to serve 10K users. In contrast, an SRE team operating according to best practices will devote at least 50% of its time to developing systems that remove the basic elements of effort from the operations workload. Some examples of this include the following: A service that detects which machines in a large fleet need software updates and schedules software reboots in batches over regular time intervals. A “push-on-green” module that provides an automatic workflow for the testing and release of new code to relevant services. An alerting system that automates ticket generation and notifies incident response teams. Monitor Systems To maintain reliability, it is imperative to monitor the relevant analytics for a service and use monitoring to detect SLO violations. As mentioned earlier, some important metrics include: The amount of time that a service is up and running (time availability) The number of requests that complete successfully (aggregate availability) The amount of time it takes to serve a request (latency) The proportion of responses that deliver expected results (correctness) The volume of requests that a system is currently handling (throughput) The percentage of available resources being consumed (saturation) Sometimes durability is also measured, which is the length of time that data is stored with accuracy. Dashboards A good way to implement monitoring is through dashboards. An effective dashboard will display SLOs, include the error budget, and present the different risk metrics relevant to the SLO. Example of an effective SRE dashboard (source) Logs Another good way to implement monitoring is through logs. Logs that are both searchable in time and categorized via request are the most effective. If an SLO violation is detected via a dashboard, a more detailed picture can be created by viewing the logs generated during the affected timeframe. Example of a monitoring log (source) Whitebox Versus Blackbox The type of monitoring discussed above that tracks the internal analytics of a service is called whitebox monitoring. Sometimes it’s also important to monitor the behavior of a system from the “outside,” which means testing the workflow of a service from the point of view of an external user; this is called blackbox monitoring. Blackbox monitoring may reveal problems with access permissions or redundancy. Automated Alerts and Ticketing One of the best ways for SREs to reduce effort is to use automation during monitoring for alerts and ticketing. The SRE process is much more efficient than a traditional operations process. A traditional operations response may look like this: A web developer pushes a new update to an algorithm that serves ads to users. The developer notices that the latest push is reducing website traffic due to an unknown cause and manually files a ticket about reduced traffic with the web operations team. A system engineer on the web operations team receives a ticket about the reduced traffic issue. After troubleshooting, the issue is diagnosed as a latency issue caused by a stuck cache. The web operations engineer contacts a member of the database team for help. The database team looks into the codebase and identifies a fix for the cache settings so that data is refreshed more quickly and latency is decreased. The database team updates the cache refresh settings, pushes the fix to production, and closes the ticket. In contrast, an SRE operations response may look like this: The ads SRE team creates a deployment tool that monitors three different traffic SLOs: availability, latency, and throughput. A web developer is ready to push a new update to an algorithm that serves ads, for which he uses the SRE deployment tool. Within minutes, the deployment tool detects reduced website traffic. It identifies a latency SLO violation and creates an alert. The on-call site reliability engineer receives the alert, which contains a proposal for updated cache refresh settings to make processing requests faster. The site reliability engineer accepts the proposed changes, pushes the new settings to production, and closes the ticket. By using an automated system for alerting and proposing changes to the database, the communication required, the number of people involved, and the time to resolution are all reduced. The following code block is a generic language implementation of latency and throughput thresholds and automated alerts triggered upon detected violations. Java # Define the latency SLO threshold in seconds and create a histogram to track LATENCY_SLO_THRESHOLD = 0.1 REQUEST_LATENCY = Histogram('http_request_latency_seconds', 'Request latency in seconds', ['method', 'endpoint']) # Define the throughput SLO threshold in requests per second and a counter to track THROUGHPUT_SLO_THRESHOLD = 10000 REQUEST_COUNT = Counter('http_request_count', 'Request count', ['method', 'endpoint', 'http_status']) # Check if the latency SLO is violated and send an alert if it is def check_latency_slo(): latency = REQUEST_LATENCY.observe(0.1).observe(0.2).observe(0.3).observe(0.4).observe(0.5).observe(0.6).observe(0.7).observe(0.8).observe(0.9).observe(1.0) quantiles = latency.quantiles(0.99) latency_99th_percentile = quantiles[0] if latency_99th_percentile > LATENCY_SLO_THRESHOLD: printf("Latency SLO violated! 99th percentile response time is {latency_99th_percentile} seconds.") # Check if the throughput SLO is violated and send an alert if it is def check_throughput_slo(): request_count = REQUEST_COUNT.count() current_throughput = request_count / time.time() if current_throughput > THROUGHPUT_SLO_THRESHOLD: printf("Throughput SLO violated! Current throughput is {current_throughput} requests per second.") Example of automated alert calls Keep Things Simple The best way to ensure that systems remain reliable is to keep them simple. SRE teams should be hesitant to add new code, preferring instead to modify and delete code where possible. Every additional API, library, and function that one adds to production software increases dependencies in ways that are difficult to track, introducing new points of failure. Site reliability engineers should aim to keep their code modular. That is, each function in an API should serve only one purpose, as should each API in a larger stack. This type of organization makes dependencies more transparent and also makes diagnosing errors easier. Playbooks As part of incident management, playbooks for typical on-call investigations and solutions should be authored and published publicly. Playbooks for a particular scenario should describe the incident (and possible variations), list the associated SLOs, reference appropriate monitoring tools and codebases, offer proposed solutions, and catalog previous approaches. Outline the Release Engineering Process Just as an SRE codebase should emphasize simplicity, so should an SRE release process. Simplicity is encouraged through a couple of principles: Smaller size and higher velocity: Rather than large, infrequent releases, aim for a higher frequency of smaller ones. This allows the team to observe changes in system behavior incrementally and reduces the potential for large system failures. Self-service: An SRE team should completely own its release process, which should be automated effectively. This both eliminates work and encourages small-size, high-velocity pushes. Hermetic builds: The process for building a new release should be hermetic, or self-contained. That is to say, the build process must be locked to known versions of existing tools (e.g., compilers) and not be dependent on external tools. Version Control All code releases should be submitted within a version control system to allow for easy reversions in the event of erroneous, redundant, or ineffective code. Code Reviews The process of submitting releases should be accompanied by a clear and visible code review process. Basic changes may not require approval, whereas more complicated or impactful changes will require approval from other site reliability engineers or technical leads. Recap of SRE Principles The main principles of SRE are embracing risk, setting SLOs, eliminating work via automation, monitoring systems, keeping things simple, and outlining the release engineering process. Embracing risk involves clearly defining failure and setting error budgets. The best way to do this is by creating and enforcing SLOs, which track system performance directly and also help identify the potential costs of system improvement. The appropriate SLO depends on how risk is measured and the needs of the customer. Enforcing SLOs requires monitoring, usually through dashboards and logs. Site reliability engineers focus on project work, in addition to development operations, which allows for services to expand in scope and scale while maintaining low costs. This is called sublinear growth and is achieved through automating repetitive tasks. Monitoring that automates alerting creates a streamlined operations process, which increases reliability. Site reliability engineers should keep systems simple by reducing the amount of code written, encouraging modular development, and publishing playbooks with standard operating procedures. SRE release processes should be hermetic and push small, frequent changes using version control and code reviews.