Network, Software Engineering and Systems - Technology Performance Pulse

How to Prepare for Your DevOps Interview

DZone

SEPTEMBER 5, 2019

Over the past decade, DevOps has emerged as a new tech culture and career that marries the rapid iteration desired by software development with the rock-solid stability of the infrastructure operations team. As of August 2019, there are currently over 50,000 LinkedIn DevOps job listings in the United States alone.

DevOps

DevOps Software Engineering Infrastructure Engineering

Designing Instagram

High Scalability

JANUARY 11, 2022

The streaming data store makes the system extensible to support other use-cases (e.g. FUN FACT : In this talk , Rodrigo Schmidt, director of engineering at Instagram talks about the different challenges they have faced in scaling the data infrastructure at Instagram. System Components. Streaming Data Model. References.

Design

Design Media Storage Logistics

What Is Load Testing? Ensuring Robust System Performance Under Pressure

DZone

JULY 5, 2023

While load testing may sound like an esoteric domain exclusive to software engineers or network administrators, it is, in fact, a silent superhero in our increasingly digital world. It's the silent force keeping the digital infrastructure wheel rotating smoothly, even during peak usage times.

Systems

Systems Testing Software Engineering Performance

Protect your organization against zero-day vulnerabilities

Dynatrace

AUGUST 3, 2022

Malicious attackers have gotten increasingly better at identifying vulnerabilities and launching zero-day attacks to exploit these weak points in IT systems. A zero-day exploit is a technique an attacker uses to take advantage of an organization’s vulnerability and gain access to its systems. half of all corporate networks.

Java

Java Traffic Benchmarking Strategy

Site reliability done right: 5 SRE best practices that deliver on business objectives

Dynatrace

MAY 31, 2023

This shift is leading more organizations to hire site reliability engineers to guarantee the reliability and resiliency of their services. How site reliability engineering affects organizations’ bottom line SRE applies the disciplines of software engineering to infrastructure management, both on-premises and in the cloud.

Best Practices

Best Practices DevOps Latency Metrics

Bringing AV1 Streaming to Netflix Members’ TVs

The Netflix TechBlog

NOVEMBER 9, 2021

The Android launch leveraged the open-source software decoder dav1d built by the VideoLAN, VLC, and FFmpeg communities and sponsored by AOMedia. We were very pleased to see that AV1 streaming improved members’ viewing experience, particularly under challenging network conditions.

Media

Media Open Source Software Engineering Efficiency

Consistent caching mechanism in Titus Gateway

The Netflix TechBlog

NOVEMBER 3, 2022

As the number of Titus users increased over the years, the load and pressure on the system increased substantially. cell): Titus Job Coordinator is a leader elected process managing the active state of the system. For example, a batch workflow orchestration system may create multiple jobs which are part of a single workflow execution.

Cache

Cache Latency Traffic Systems

Snap: a microkernel approach to host networking

The Morning Paper

NOVEMBER 10, 2019

Snap: a microkernel approach to host networking Marty et al., This paper describes the networking stack, Snap , that has been running in production at Google for the last three years+. You need a lot of software engineers and the willingness to rewrite a lot of software to entertain that idea. SOSP’19.

Network

Network Transportation Latency Entertainment

Extend the AI and automation core of Dynatrace with host extensions to resolve infrastructure problems

Dynatrace

MAY 13, 2020

OneAgent gives you all the operational and business performance metrics you need, from the front end to the back end and everything in between—cloud instances, hosts, network health, processes, and services. GPU-based machine learning system crashes, and you don’t know why? Example 1: Gain visibility into your NVIDIA GPUs.

Infrastructure

Infrastructure Metrics Monitoring Software Engineering

AWS observability: AWS monitoring best practices for resiliency

Dynatrace

NOVEMBER 22, 2021

Visibility into system activity and behavior has become increasingly critical given organizations’ widespread use of Amazon Web Services (AWS) and other serverless platforms. AWS Lambda makes it easy to design, run, and maintain application systems without having to provision or manage infrastructure. Dynatrace news. Amazon EC2.

Best Practices

Best Practices AWS Monitoring Serverless

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN.

AWS

AWS Entertainment Open Source Benchmarking

All of Netflix’s HDR video streaming is now dynamically optimized

The Netflix TechBlog

NOVEMBER 29, 2023

Join us and be a part of the amazing team that brought you this tech-blog; open positions: Software Engineer, Cloud Gaming Software Engineer, Live Streaming References [1] L. Krasula, A. Choudhury, S. Malfait, A. 263–1–8 (2023) [ online ] [2] A.

Open Source

Open Source Software Engineering Internet Internet

What is application security? And why it needs a new approach

Dynatrace

MARCH 17, 2021

Application security is a software engineering term that refers to several different types of security practices designed to ensure applications do not contain vulnerabilities that could allow illicit access to sensitive data, unauthorized code modification, or resource hijacking. Dynatrace news. So, why is all this important?

Open Source

Open Source Cloud Games Java

Edge Authentication and Token-Agnostic Identity Propagation

The Netflix TechBlog

FEBRUARY 9, 2021

A few years ago, we decided to address this complexity by spinning up a new initiative, and eventually a new team, to move the complex handling of user and device authentication, and various security protocols and tokens, to the edge of the network, managed by a set of centralized services, and a single team.

Architecture

Architecture Latency Servers Website

The Show Must Go On: Securing Netflix Studios At Scale

The Netflix TechBlog

SEPTEMBER 13, 2021

Nearly all of the blockers related to systems in which (usually for historical reasons) some application team was solving both authentication and application routing in a custom way. A big part of their work is this idea of harvesting developer intent and automating the necessary touchpoints across our systems.

Internet

Internet Internet Cloud Traffic

Experimentation is a major focus of Data Science across Netflix

The Netflix TechBlog

JANUARY 11, 2022

Data Scientists play a vital role in building automated systems that leverage causal inference to decide how we spend our advertising budget. One way we do this is through constantly improving the recommendation systems that produce a personalized home page experience for each of our members.

Innovation

Innovation Metrics Engineering Testing

Migrating a privacy-safe information extraction system to a Software 2.0 design

The Morning Paper

FEBRUARY 16, 2020

Migrating a privacy-safe information extraction system to a software 2.0 This is a comparatively short (7 pages) but very interesting paper detailing the migration of a software system to a ‘Software 2.0’ A really interesting thing happens when you go from developing a Software 1.0 (i.e.,

Systems

Systems Design Software Software

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN.

AWS

AWS Entertainment Open Source Benchmarking

Netflix at AWS re:Invent 2019

The Netflix TechBlog

NOVEMBER 22, 2019

4:45pm-5:45pm NFX 202 A day in the life of a Netflix Engineer Dave Hahn , SRE Engineering Manager Abstract : Netflix is a large, ever-changing ecosystem serving millions of customers across the globe through cloud-based systems and a globally distributed CDN.

AWS

AWS Entertainment Open Source Benchmarking

Evolution of ML Fact Store

The Netflix TechBlog

APRIL 26, 2022

The first version of our logger library optimized for storage by deduplicating facts and optimized for network i/o using different compression methods for each fact. Hence, we designed a comprehensive system that monitors the quality of data flowing through Axion to detect corruptions, whether introduced by Axion or outside Axion.

Storage

Storage Design Scalability Latency

Millions of tiny databases

The Morning Paper

MARCH 3, 2020

In the same spirit as Paxos Made Live , this paper describes the details, choices and tradeoffs that are required to put a consensus system into production. Physalia is designed to offer consistency and high-availability, even under network partitions. In theory, systems built using this pattern can achieve extremely high availability.

Database

Database AWS Network Design

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages (Part 1)

The Morning Paper

JANUARY 21, 2020

Allspaw highlights four key challenges: The systems are uniquely opaque, with multiple layers of abstraction hiding underlying complexity, performance variability under normal conditions, and an increasing interdependence between services, including across organisational boundaries. Moreover: Causality is complex and networked.

Internet

Internet Internet Strategy Energy

AI meets operations

O'Reilly

FEBRUARY 2, 2020

Second, the behavior of AI systems changes over time. Is it important to observe what happens on each layer of a neural network? Given source code and the training data, you could re-produce a model, but it almost certainly wouldn’t be the same because of randomization in the training process.

Software Architecture

Software Architecture Monitoring Software Engineering Code

Teaching rigorous distributed systems with efficient model checking

The Morning Paper

APRIL 16, 2019

Teaching rigorous distributed systems with efficient model checking Michael et al., It describes the labs environment, DSLabs , developed at the University of Washington to accompany a course in distributed systems. Enabling students to build running performant versions of all of those systems in the time available is one challenge.

Systems

Systems Efficiency Testing Design

Using SQL Server’s SNITrace to Troubleshoot Networking Issues

SQL Server According to Bob

JANUARY 11, 2020

Using SQL Server’s SNITrace to Troubleshoot Networking Issues In the process of tracking down a few TCP 10054 issues (highlighted here: [link] ) I also used the SNITrace (SNI Trace) capabilities.

Network

Network Servers Processing Software Engineering

Growth Engineering at Netflix?—?Automated Imagery Generation

The Netflix TechBlog

FEBRUARY 9, 2021

The homepage needs to load in a reasonable amount of time, even in poor network conditions. Let’s put it all together and review the system interaction diagram. We need to be able to easily determine what imagery is present for a given platform, region, and language.

Engineering

Engineering Storage Latency Entertainment

Tackling the Pipeline Problem in the Architecture Research Community

ACM Sigarch

APRIL 8, 2019

However, we often hear anecdotes that the number of prospective graduate students applying to computer architecture/systems is small and shrinking. Networking sessions that create opportunities for students to interact with graduate students and established architects in academia and industry. Why is that?

Architecture

Architecture Open Source Hardware Software Engineering

SQL Server on IoT Edge and Developer Machines – Smaller Footprint

SQL Server According to Bob

MAY 19, 2019

Partitioning allows SQL Server to scale to the largest systems with record-setting performance. When promoted to a super-latch on a 64 CPU system the memory requirement becomes 32 + (32 *64) = 2080 bytes. However, on smaller systems and VMs the partitioning may not be required to maintain performance. Networking Pump Threads.

IoT

IoT Servers Development Cache

Scaling symbolic evaluation for automated verification of systems code with Serval

The Morning Paper

NOVEMBER 12, 2019

Scaling symbolic evaluation for automated verification of systems code with Serval Nelson et al., Serval is a framework for developing automated verifiers of systems software. Serval enables us, with a reasonable effort, to develop multiple verifiers, apply the verifiers to a range of systems, and find previously unknown bugs.

Code

Code Systems Programming Google

Cloud Adoption in 2020

O'Reilly

MAY 19, 2020

Software engineers represent the largest cohort, comprising almost 20% of all respondents (see Figure 1 ). Technical leads and architects (about 11%) are next, followed by software and systems architects (9+%). For this audience, SRE’s future is brighter than AI’s, however. Respondent Demographics.

Cloud

Cloud Serverless AWS DevOps

Ten Tips For The Aspiring Designer Beginners (Part 1)

Smashing Magazine

JANUARY 5, 2022

That’s right; I’ve parked day-to-day design work in favor of becoming someone very active in the design community, focusing on best practice design advice and scalable systems. Prioritize Networking Over Pushing Pixels. Prioritize networking over pushing pixels. A Design System Is Not A Sticker Sheet ,” by Corey Roth.

Design

Design Website Social Media Best Practices

Communal Computing’s Many Problems

O'Reilly

JULY 20, 2021

There may be alarm systems. The concept of Zero Trust Networks speaks to this problem. There have been cases of harassment, intimidation, and domestic abuse by people whose access should have been revoked: for example, an ex-partner turning off the heating system. It’s important to account for children from the beginning.

Google

Google Games Technology Technology

What is a Site Reliability Engineer (SRE)?

Dotcom-Montior

OCTOBER 6, 2021

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.

Engineering

Engineering DevOps Monitoring Google

Reverb: speculative debugging for web applications

The Morning Paper

JANUARY 26, 2020

In the context of the papers we’ve been looking at recently, and for a constrained environment, Reverb is helping its users to form an accurate mental model of the system state, and to form and evaluate hypotheses in-situ. candidate bug-fixes) during replay.

Programming

Programming Servers Network Latency

The Persistent Imbalance Between Supply and Demand for Software Development Labor

The Agile Manager

JANUARY 31, 2014

Yet we continue to find new applications for software: it is increasingly a product differentiator (embedded systems) or a product category of its own (social networking). Economic and perhaps even political pressure will intensify to industrialize software development. But demand tends to be impatient.

Software

Software Software Development Government

Why is Hiring so Hard? How to Improve Your Hiring Fortunes

Strategic Tech

SEPTEMBER 29, 2018

finding good software engineers takes so long and requires so much effort… but it doesn’t have to. Improving your hiring fortunes is not just about optimising your hiring process, it’s about making systemic changes to your organisation. Hiring is so hard?—?finding Contact me if you are interested or would like to know more.

Software Engineering

Software Engineering Open Source Engineering Java

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 30, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. We both have had long careers supporting system administration, and LISA has always felt like a homecoming, reuniting with old friends while welcoming newcomers. Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

Automating chaos experiments in production

The Morning Paper

JULY 4, 2019

Are you ready to take your system assurance programme to the next level? In all cases we need to be able to carefully monitor the impact on the system, and back out if things start going badly wrong. Netflix’s system is deployed on the public cloud as complex set of interacting microservices.

Latency

Latency Engineering Metrics Traffic

SQL Mysteries: Why is my SQL Server experiencing lots of 17830 (TCP 10054) errors?

SQL Server According to Bob

OCTOBER 24, 2019

I thought the network trace might reveal a SYN, a long delay that exceeded the connection timeout and a close (RST) from the client. What was causing the SQL Server networking client to call the TCP open and then call TCP close without exceeding the connection timeout and without attempting the TDS login activities? Powershell Script.

Servers

Servers Network Software Engineering Traffic

USENIX LISA 2018: CFP Now Open

Brendan Gregg

APRIL 29, 2018

USENIX’s LISA conference is the premier event for topics in production system engineering. We both have had long careers supporting system administration, and LISA has always felt like a homecoming, reuniting with old friends while welcoming newcomers. Join us for 3 days in Nashville at LISA'18.

DevOps

DevOps Network Best Practices Programming

SQL Server on Linux: CU4 – NewSequentialId() – Uuid

SQL Server According to Bob

FEBRUARY 22, 2018

On Windows the system stores this in the registry and during startup increments the value. Mac Address: Is usually associated with a system component (network card) and for SQL Server on Linux, SQLPAL generates a pseudo-mac using a uuid (uuid_generate) and preserves the value in the instance_id file during the first startup.

Servers

Servers Software Engineering Database Network

Microservices – What CSPs can Learn From IT

VoltDB

DECEMBER 8, 2017

Omnipresent connectivity is driving new disruptive business models which are further driving up demands on the networks. Virtualization of appliances and systems is seen as a necessary step to add the agility to meet these increasing and evolving service demands. John Abraham] I don’t think so. There is not a consensus yet.

Latency

Latency Virtualization Cloud Software Engineering

How to Prepare for Your DevOps Interview

Designing Instagram

Trending Sources

What Is Load Testing? Ensuring Robust System Performance Under Pressure

Protect your organization against zero-day vulnerabilities

Site reliability done right: 5 SRE best practices that deliver on business objectives

Bringing AV1 Streaming to Netflix Members’ TVs

Consistent caching mechanism in Titus Gateway

Snap: a microkernel approach to host networking

Extend the AI and automation core of Dynatrace with host extensions to resolve infrastructure problems

AWS observability: AWS monitoring best practices for resiliency

Netflix at AWS re:Invent 2019

All of Netflix’s HDR video streaming is now dynamically optimized

What is application security? And why it needs a new approach

Edge Authentication and Token-Agnostic Identity Propagation

The Show Must Go On: Securing Netflix Studios At Scale

Experimentation is a major focus of Data Science across Netflix

Migrating a privacy-safe information extraction system to a Software 2.0 design

Netflix at AWS re:Invent 2019

Netflix at AWS re:Invent 2019

Evolution of ML Fact Store

Millions of tiny databases

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages (Part 1)

AI meets operations

Sponsored Post: G-Core Labs, Close, Wynter, Pinecone, Kinsta, Bridgecrew, IP2Location, StackHawk, InterviewCamp.io, Educative, Stream, Fauna, Triplebyte

Sponsored Post: G-Core Labs, Close, Wynter, Pinecone, Kinsta, Bridgecrew, IP2Location, StackHawk, InterviewCamp.io, Educative, Stream, Fauna, Triplebyte

Teaching rigorous distributed systems with efficient model checking

Using SQL Server’s SNITrace to Troubleshoot Networking Issues

Growth Engineering at Netflix?—?Automated Imagery Generation

Tackling the Pipeline Problem in the Architecture Research Community

SQL Server on IoT Edge and Developer Machines – Smaller Footprint

Scaling symbolic evaluation for automated verification of systems code with Serval

Cloud Adoption in 2020

Ten Tips For The Aspiring Designer Beginners (Part 1)

Communal Computing’s Many Problems

What is a Site Reliability Engineer (SRE)?

Reverb: speculative debugging for web applications

The Persistent Imbalance Between Supply and Demand for Software Development Labor

Why is Hiring so Hard? How to Improve Your Hiring Fortunes

USENIX LISA 2018: CFP Now Open

Automating chaos experiments in production

SQL Mysteries: Why is my SQL Server experiencing lots of 17830 (TCP 10054) errors?

USENIX LISA 2018: CFP Now Open

SQL Server on Linux: CU4 – NewSequentialId() – Uuid

Microservices – What CSPs can Learn From IT

Stay Connected