Tag: High Scalability

Stuff The Internet Says On Scalability For September 6th, 2019

Stuff The Internet Says On Scalability For September 6th, 2019

Wake up! It’s HighScalability time:

Coolest or most coolest thing ever?

Do you like this sort of Stuff? I’d love your support on Patreon. I wrote Explain the Cloud Like I’m 10 for people who need to understand the cloud. And who doesn’t these days? On Amazon it has 54 mostly 5 star reviews (125 on Goodreads). They’ll learn a lot and likely add you to their will.

Number Stuff:

  • lots: programmers who can’t actually program. 
  • 2x: faster scheduling of jobs across a datacenter using reinforcement learning, a trial-and-error machine-learning technique, to tailor scheduling decisions to specific workloads in specific server clusters. 
  • 300 msecs: time it takes a proposed Whole Foods biometric payment system to scan your hand and process your transaction.
  • $8 million: Slack revenue loss from 2 hours of downtime. (catchpoint email)
  • 8.4 million+: websites participating in Google’s user tracking/data gathering network. It broadcasts personal data about visitors to these sites to 2,000+ companies, hundreds of billions of times a day
  • 20x: BlazingSQL faster than Apache Spark on Google Cloud Platform using NVIDIA’s T4 GPUs by loading data directly into GPU memory using GPU DataFrame (GDF).
  • 405: agencies with access to Ring data. 
  • middle: age at which entrepreneurs are most successful. Youth is not a key trait of successful entrepreneurs. 
  • 5: years until we have carbon nanotube chips in our computers.
  • 5 billion: DVDs shipped by Netflix over 21 years.
  • 51%: chance the world as we know it will not end by 2050.
  • 1,100: US business email compromise scams per month at a cost of $300 million. 

Quotable Stuff:

  • @kennwhite: Merkle trees aren’t gonna fix a low-bid state contractor unpatched 2012 IIS web server
  • Werner Vogels: To succeed in using application development to increase agility and innovation speed, organizations must adopt five elements, in any order: microservices; purpose-built databases; automated software release pipelines; a serverless operational model; and automated, continuous security. The common thing we have seen, though, is that customers who build modern applications see benefits across their entire businesses, especially in how they allocate time and resources. They spend more time on the logic that defines their business, scale up systems to meet peak customer demand easily, increase agility, and deliver new features to market faster and more often.
  • @Carnage4Life: This post points out that rents consume $1 out of every $8 of VC investment in the Bay Area. 
  • @kentonwilliston: Too little, too late. RISC-V has already cornered the “open” core market IMO, and if I wanted a second option it’s hard to see why ‘d go with Power over others like MIPS open
  • echopom: > Why Does Developing on Kubernetes Suck ? IMHO because we are in a phase of transition. Having worked for years in software industry , I’m convinced we are halfway to a much bigger transformation for Software Engineers / SRE , Developers etc… I work in a Neobank ( N26 , Revolut, etc…) , we are currently in the process of re-writing our entire Core Banking System with MicroServices on top of Kubernetes with Kafka. Not a single day pass without having engineers needing to have an exchange about defining basically all of the the terms that exist within the K8/Docker/Kafa world. – What’s a Pod ? How does a pod behave if Kafa goes down ? Do we really need ZooKeeper etc….Their workflows is insanely complex and requires hours if not a day to deploy a single change… obviously let’s not even talk about the amount of work our SRE has in the pipe to “package” the entire stack of 150+ services in K8 through a single YAML file….
  • millerm: I have had this thought for many years. Where is all the perfectly designed, bug free, maintenance-bliss, fully documented, fully tested, future-proofed code located so we can all marvel at its glory?
  • @dvassallo: I agree with the advice. Still, I like these PaaS experiments. There’s a big opportunity for “conceptual compression” on AWS, and I bet one day we’ll see a good PaaS/framework that would be a good choice for the average Twitter for Pets app. And I doubt that would come from AWS.
  • JPL: Atomic clocks combine a quartz crystal oscillator with an ensemble of atoms to achieve greater stability. NASA’s Deep Space Atomic Clock will be off by less than a nanosecond after four days and less than a microsecond (one millionth of a second) after 10 years. This is equivalent to being off by only one second every 10 million years.
  • Nathan Schneider: Pursuing decentralization at the expense of all else is probably futile, and of questionable usefulness as well. The measure of a technology should be its capacity to engender more accountable forms of trust.
  • @tef_ebooks: docker is just static linking for millenials
  • @Hacksterio: “But it’s not until we look at @TensorFlow Lite on the @Raspberry_Pi 4 that we see the real surprise. Here we see a 3X-4X increase in inferencing speed between our original TensorFlow benchmark, and the new results using TensorFlow Lite…”
  • @cmeik: I used to think, and have for many years, that partial failure was the fundamental thing and that’s what needed to be surfaced.  I’m not sure I believe that anymore, I’m starting to think it’s more about uncertainity instead.  But, I don’t know.
  • @benedictevans: Fun with maths: The Moto MC68000 CPU in the original Mac had 68k transistors. Apple sold 372k units in 1984. 68k x 372k=25.3bn The A12X SoC in an iPad Pro has 10bn transistors. So, if you’re inclined to really unfair comparisons: 3 iPads => all Macs sold in the first year
  • atombender: This meme needs to die. Kubernetes is not overkill for non-Google workloads. In my current work, we run several Kubernetes clusters via GKE on Google Cloud Platform. We’re a tiny company — less than 20 nodes running web apps, microservices and search engines — but we’re benefiting hugely from the operational simplicity of Kubernetes. Much, much, much better than the old fleet of Puppet-managed VMs we used to run. Having surveyed the competition (Docker Swarm, Mesos/Marathon, Rancher, Nomad, LXD, etc.), I’m also confident that Kubernetes was the right choice. Kubernetes may be a large and complex project, but the problem it solves is also complex. Its higher-level cluster primitives are vastly better adapted to modern operations than the “simple” Unix model of daemons and SSH and what not. The attraction isn’t just the encapsulation that comes with containers, but the platform that virtualizes physical nodes and allows containers to be treated as ephemeral workloads, along with supporting primitives like persistent volumes, services, ingresses and secrets, and declarative rules like horizontal autoscalers and disruption budgets. Given this platform, you have a “serverless”-like magically scaling machine full of tools at your fingertips. You don’t need a huge workload to benefit from that.
  • cryptica: I’m starting to think that many of the most successful tech companies of the past decade are not real monopolies but succeeded purely because the centralization of capital made it difficult for alternative projects to compete for a limited period of time. Even projects with strong network effects are unlikely to last forever.
  • Code Lime: [Hitting the same database from several microservices] almost refutes the whole philosophy of microservice architecture. They should be independent and self-contained. They should own their data and have complete freedom on how it is persisted. They are abstractions that help de-couple processes. Obviously, they come with a fair amount of overhead for this flexibility. Yet, flexibility is what you should aim for.
  • gervase: When I was running hiring at a previous startup, we ran into this issue often. When I proposed adding FizzBuzz to our screening process, I got a fair amount of pushback from the team that it was a waste of the candidates’ time. Once we’d actually started using it, though, we found it filtered between 20-30% of our applicant pool, even when we let them use literally any language they desired, presumably their strongest. When I was running hiring at a previous startup, we ran into this issue often. When I proposed adding FizzBuzz to our screening process, I got a fair amount of pushback from the team that it was a waste of the candidates’ time. Once we’d actually started using it, though, we found it filtered between 20-30% of our applicant pool, even when we let them use literally any language they desired, presumably their strongest.
  • @jensenharris: There’s no such thing as a “startup inside of a big company.” This misnomer actively misleads both big company employees working in such teams as well as people toiling in actual startups. Despite all best efforts to create megacorp “startups”, they can never exist. Here’s why: 1) The most fundamental, pervasive background thread of an early-stage startup is that when it fails, everyone has find a new job. The company is gone, kaput, relegated to the dustbin of Crunchbase. The company literally lives & dies on the work every employee does every day.
  • @math_rachel: “A company can easily lose sight of its strategy and instead focus strictly on the metrics that are meant to represent it… Wells Fargo never actually had a cross-selling strategy. It had a cross-selling metric.”
  • @ben11kehoe: Don’t put your processed data in the same bucket as the raw ingested data—different lifecycle and backup requirements #sdmel19
  • Lauren Feiner: The proposed solutions focus on removing weaker players from the ecosystem and undermining the hate clusters from within. Johnson and his team suggest that, rather than attacking a highly vocal and powerful player, social media platforms remove smaller clusters and randomly remove individual members. Removing just 10% of members from a hate cluster would cause it to begin to collapse, the researchers say.
  • @mathiasverraes: Philosophy aside, the important questions are, does exposing persisted events have the same practical downsides (in the long term) as exposing state? If so, are there better mitigations? Are the downsides outweighing the upsides? I’m leaning to no, yes, no.
  • @jessitron: “building software isn’t at all like assembling a car. In terms of managing growth, it’s more like raising a child or tending a garden.” @KevinSimler
  • Kevin Simler: In a healthy piece of code, entropic decay is staved off by dozens of tiny interventions — bug fixes, test fixes, small refactors, migrating off a deprecated API
  • streetcat1: First, I must say that the inventors of UML saw it as the last layer. The grand vision was a complete code generation from UML diagrams. And this was the overall grand vision that drove OO in general. I think that this is was happening now with the “low code” startups. 
  • The whole idea is to separate the global decisions (which are hard to change) – e.g. architecture, what classes, what each class do, from the local one (e.g. which data structure to use). So you would use UML for the global decisions, and than make programming the classes almost mechanical.
  • Yegor Bugayenko: Our empirical evidence suggests even expert programmers really learn to program within a given domain. When expert programmers switch domains, they do no better than a novice. Expertise in programming is domain-specific. We can teach students to represent problems in a form the computer could solve in a single domain, but to teach them how to solve in multiple domains is a big-time investment. Our evidence suggests students graduating with a four-year undergraduate degree don’t have that ability. Solving problems with a computer requires skills and knowledge different from solving them without a computer. That’s computational thinking. We will never make the computer completely disappear. The interface between humans and computers will always have a mismatch, and the human will likely have to adapt to the computer to cover that mismatch. But the gap is getting smaller all the time. In the end, maybe there’s not really that much to teach under this definition of computational thinking. Maybe we can just design away the need for computational thinking.
  • @aphyr: If you do this, it makes life so much easier. Strict serializability and linearizability become equivalent properties over histories of txns on maps. If you insist on making the individual r/w micro-ops linearizable, it *breaks* serializability, as we previously discussed.
  • Jennifer Riggins: Datadog itself conducts regular game days where it kills a certain service or dependency to learn what threatens resiliency. These game days are partnerships between the people building whatever’s being tested — as they know best and are initially on-call if it breaks — and a site reliability engineer. This allows the team to test monitoring and alerting, making sure that dashboards are in place and there are runbooks and docs to follow, making sure that the site reliability engineer is equipped to eventually take over.
  • dragonsh: Indeed Uber did try to enter Indonesia they failed, they were out of most of South East and East Asia, because they don’t have enough engineering talent to build a system for those specific countries, local companies like Grab in Singapore, Gojek in Indonesia, Didi in China beat them. So why would you think those companies do not have a talent to build systems better suited to their own environment than Uber.
  • blackoil: We handle peaks of 800k tps in few systems. It is for an analytical platform. Partition in Kafka by some evenly distributed key, Create simple apps that read from a partition and process it, commit offset. Avoid communication between process/threads. Repartition using kafka only. For some cases we had to implement sampling wherein usecase required highly skewed partitions.
  • chihuahua: I was working at Amazon when the 2-pizza team idea was introduced. A week or two later, we though “we’re a 2-pizza team now, let’s order some pizza”. That when we found out that there was no budget for pizza, it was merely a theoretical concept. At the time the annual “morale budget” (for food and other items) was about $5 per person. These days I think the morale budget is a bit higher; in 2013 there were birthday cakes once a month.
  • kator: Another thing that often gets overlooked is the concept of “Single Threaded Owner”. I’m an STO on a topic, that means I write and communicate the known truth and our strategy and plans, I participate in discussions around that topic, I talk to customers about it, I read industry news and leverage my own experience in that topic. Others know me as that STO and reach out to me with related topics if something makes sense to me in my topic area then I try to address it, if not I connect the person with another STO I think would be interested in their idea or problem. Success at Amazon is deeply driven by networking, we have an internal tool called Phonetool which allows you to quickly navigate the company and find people who are close to the topic you have in mind. I keep thinking it’s like the six degrees of separation concept, if somebody doesn’t know the topic they know someone who is closer to the topic, within a couple of emails you are in a conversation with someone on the other side of the company who is passionate, fired up and knows more about the topic than you thought could be known. They’re excited to talk to you about their topic and teach you or learn from your new idea related to their area of focus.
  • Const-me: You know what is a waste of my time? When I wrote a good answer to a question which I think is fine, which then goes to oblivion because some other people, who often have absolutely no clue what’s asked, decide the question is not good enough.
  • Matt Parsons: Names can’t transmit meaning. They can transmit a pointer, though, which might point to some meaning. If that meaning isn’t the right meaning, then the recipient will misunderstand. Misunderstandings like this can be difficult to track down, because our brains don’t give us a type error with a line and column number to look at. Instead, we just feel confused, and we have to dig through our concept graph to figure out what’s missing or wrong.
  • Avast: The findings from the analysis of the obtained snapshot of the C&C server were quite surprising. All of the executable files on the server were infected with the Neshta fileinfector. The authors of Retadup accidentally infected themselves with another malware strain. This only proves a point that we have been trying to make – in good humor – for a long time: malware authors should use robust antivirus protection. 
  • @greenbirdIT: According to 1 study, computational resources required to train large AI models is doubling every three to four months. 
  • @ACLU: Amazon wants to connect doorbell cameras to facial recognition databases, with the ability to call the police if any “suspicious” people are detected.
  • @esh: Kira just discovered the joy of increasing the AWS Lambda MemorySize from the default of 128 to 1792, resulting in the use of a full CPU and a much faster response time. Her Slack command now answers in 2.5 seconds instead of 35 seconds. And the cool thing is that it costs the same to run it faster.
  • @johncutlefish: “We know something is working when we spend longer on it, instead of shorter. Her team was delivering into production daily/weekly. They could have easily bragged about how “quickly” they “move things to done”. But she didn’t”
  • tilolebo: Isn’t it possible to just set the haproxy maxconn to a slightly lower value than what the backend can deal with, and then let the reverse proxy retry once with another backend? Or even queue it for some hundreds of milliseconds before that? This way you avoid overloading backends. Also, haproxy provides tons of real-time metrics, included the highwatermark for concurrent and queued connections.
  • ignoramous: Scheduled tasks are a great way to brown-out your downstream dependencies. In one instance, MDAM RAID checks caused P99 latency spikes first Sunday of every month [0] (default setting). It caused a lot of pain to our customers until the check was IO throttled, which meant spikes weren’t as high, but lasted for a longer time. Scheduled tasks are a great way to brown-out yourself.
  • @sfiscience: “The universe is not a menu. There’s no reason to think it’s full of planets just waiting for humans to turn up. For most of Earth’s history, it hasn’t been comfortable for humans.” – Olivia Judson

Useful Stuff:

  • Always on the lookout for examples from different stacks. Here’s a new power couple. Using Backblaze B2 and Cloudflare Workers for free image hosting. It looks pretty straightforward and even better “Everything I’ve mentioned in the post is 100% free, assuming you stay within reasonable limits.” Backblaze has a 10GB free file limit, and then charges $0.005/GB/Month thereafter. Cloudflare Workers also offers a free tier which includes 100,00 requests every 24 hours, with a maximum of 1,000 requests every 10 minutes. Also, Migrating 23TB from S3 to B2 in just 7 hours
  • Anything C can do Rust can do better. Well, not quite yet. Intel’s Josh Triplett on what would it take for Rust to be comparable to C. Rust is 4 years old. Rust needs full parity with C to support the long tail of systems software. Rust has automatic memory management without GC. Calls to free are inserted by the compiler at compile time. Like C Rust does not have a runtime. Unlike C Rust has safe concurrent programming. The memory safety makes it easier to implment safe concurrency. Rust would have prevented 73% of security bugs in Mozilla. Rust needs better C interoperability. Rust need to improve code size by not linking in used instructions. Needs to support inline assembly. Needs safe SIMD intrinsics. Needs to support bloat16 to minimize storage space and bandwidth for floating point calcs. 
  • Apparently we now need 2FA all the way down. Fraudsters Used AI to Mimic CEO’s Voice in Unusual Cybercrime Case: Criminals used artificial intelligence-based software to impersonate a chief executive’s voice and demand a fraudulent transfer of €220,000 ($243,000) in March in what cybercrime experts described as an unusual case of artificial intelligence being used in hacking.
  • What’s Mars Solar Conjunction, and Why Does It Matter? For 10 days we won’t talk to devices on Mars. Why? “because Mars and Earth will be on opposite sides of the Sun, a period known as Mars solar conjunction. The Sun expels hot, ionized gas from its corona, which extends far into space. During solar conjunction, this gas can interfere with radio signals when engineers try to communicate with spacecraft at Mars, corrupting commands and resulting in unexpected behavior from our deep space explorers. To be safe, engineers hold off on sending commands when Mars disappears far enough behind the Sun’s corona that there’s increased risk of radio interference.”  This period of when commands are not sent called a “command moratorium.” Talk about a maintenance window! This is the kind of thing Delay-tolerant networking has to take into account. Machines will need enough native intelligence to survive without human guiding hands.
  • Ready for a new aaS? iPaaS is integration platform as a service: iPaaS lets you connect anything to anything, in any way, and anywhere. iPaaS works very well in huge enterprise environment that need to integrate a lot of on-premises applications and cloud-based applications or data providers.
  • Living life on the edge. Macquarie Bank replaced 60 EC2 instances with code running at the lambda edge for lower latency and a 80% cost savings. At the edge before a response goes back to the client they inject a few headers: HSTS to require encryption; and X-Frame-Options to prevent pages from being loaded in an iframe and to protect against cross-site scripting attacks. They also validate the JWT token and redirect to a login page if it’s invalid. WAF and Shield are also used for protection. 
  • I had no idea you could and should prune lambda versions. The Dark Side of AWS Lambda: Lambda versions every function. When you couple CI/CD with rapid development and Lambda functions, you get many versions. Hundreds even. And Lambda code storage is limited to 75GB. We hit that limit, and we hit it hard. AWS does allow you to delete specified versions of functions that are no longer in use. 
  • Was Etsy too good to be true? Platforms follow a life cycle: 
    • Most platform users can’t make a living: Though he once dreamed of Etsy sellers making their livings selling things they made themselves, he knows now that was never really what happened for the vast majority. Even when he was CEO and things were small and maybe idyllic, only a fraction of a percentage of sellers were making more than $30,000 a year. 
    • The original platform niche is abandoned as the platform searches for more revenue by broadening its audience: “It’s just a place to sell now,” Topolski says, delineating her personal relationship with the platform that built her business and helped her find the community that makes up much of her world.
    • Platform costs are shifted to users: “I get it, Etsy as a whole needs to be competitive in a marketplace that’s completely shifted toward being convenient,” she tells me. “But it’s a financial issue for people like me whose products are extremely expensive to ship. All of a sudden my items are $10 to $15 more expensive, but I didn’t add any value to justify that pricing.” 
    • User margins become a source of platform profits: before Silverman took over, an Etsy executive told Forbes that more than 50 percent of Etsy’s revenue comes from seller services, like its proprietary payment processing system, which takes a fee of 3 percent, plus 25 cents per US transaction (the company made it the mandatory default option in May, removing the option for sellers to use individual PayPal accounts). New advertising options and customer support features in Etsy Plus — available to sellers willing to pay a $10 monthly fee — expand on that.
    • An edifice complex often signals the end: One moment that sticks out in her mind: a tour of Etsy’s new nine-story, 200,000-square-foot offices in Brooklyn’s Dumbo neighborhood, which opened in the spring of 2016. “I remember immediately getting this sinking feeling that none of it was for us,” she says. It didn’t seem like the type of place she could show up for a casual lunch. It was nice that the building was environmentally-friendly, that it was big and beautiful. It was weird that there was so much more security and less crafting, replaced by the sleek lines of a grown-up startup.
    • Once valuable platform users become just another metric/kpi: “We’re the heart of the company, creating literally all content and revenue,” she says, “and suddenly we weren’t particularly welcome anymore.”
  • koreth: I have been working on an ES/CQRS system for about 4 years and enjoy it…it’s a payment-processing service. 
    • What have the costs been? The message-based model is kind of viral. Ramping up new engineers takes longer, because many of them also have never seen this kind of system before. Debugging gets harder because you can no longer look at a simple stack trace.  I’ve had to explain why I’m spending time hacking on the framework rather than working on the business logic.
    • What have the benefits been? The fact that the inputs and outputs are constrained makes it phenomenally easier to write meaningful, non-brittle black-box unit tests. Having the ability to replay the event log makes it easy to construct new views for efficient querying. Debugging gets easier because you have an audit trail of events and you can often suck the relevant events into a development environment and replay them. Almost nothing had to change in the application code when we went from a single-node-with-hot-standby configuration to a multiple-active-nodes configuration. The audit trail is the source of truth, not a tacked-on extra thing that might be wrong or incomplete. 
  • How long before SSDs replace HDDs? DSHR says a lot longer than you might think. The purchase cost of an HDD is much more than 20% of the power and cooling costs over its service life. So speed isn’t as important as low $/TB. Speed in nearline is nice, but it isn’t what the nearline tier is for. At 5x their cost won’t justify wholesale replacement of the nearline tier. The recent drop in SSD price reflects the transition to 3D flash. The transition to 4D flash is far from imminent, so this is a one-time effect.
  • As soon as you have the concept of a transaction — a group of read and write operations — you need to have rules for what happens during the timeline between the first of the operations of the group and the last of the operations of the group. An explanation of the difference between Isolation levels vs. Consistency levels: Database isolation refers to the ability of a database to allow a transaction to execute as if there are no other concurrently running transactions (even though in reality there can be a large number of concurrently running transactions). The overarching goal is to prevent reads and writes of temporary, incomplete, aborted, or otherwise incorrect data written by concurrent transactions. Database consistency is defined in different ways depending on the context, but when a modern system offers multiple consistency levels, they define consistency in terms of the client view of the database. If two clients can see different states at the same point in time, we say that their view of the database is inconsistent (or, more euphemistically, operating at a “reduced consistency level”). Even if they see the same state, but that state does not reflect writes that are known to have committed previously, their view is inconsistent with the known state of the database. 
  • How We Manage a Million Push Notifications an Hour. Key idea: Each time we found a point which needed to handle multiple implementations of the same core logic, we put it behind a dedicated service: Multiple devices for a user was put behind the token service. Multiple applications were given a common interface on notification server. Multiple providers were handled by individual job queues and notification workers. Also, Rust at OneSignal
  • Having attended more than a few Hadoop meetups this was like reading a young friend was moving into a retirement home. What happened to Hadoop.
    • Something happened within the big data world to erode Hadoop’s foundation of a distributed file system (HDFS) coupled with a compute engine for running MapReduce (the original Hadoop programming model) jobs: 
      • Mobile phones became smartphones and began generating streams of real-time data.
      • Companies were reminded that they had already invested untold billions in relational database and data warehouse technologies
      • Competitive or, at least, alternative projects such as Apache Spark began to spring up from companies, universities, and web companies trying to push Hadoop, and the whole idea of big data, beyond its early limitations.
      • Venture capital flowed into big data startups. 
      • Open source, now very much in the mainstream of enterprise IT, was getting better
      • Cloud computing took over the world, making it easier not just to virtually provision servers, but also store data cheaply and to use managed services that tackle specific use case
      • Docker and Kubernetes were born. Together, they opened people’s eyes to a new way of packaging and managing applications and infrastructure
      • Microservices became the de facto architecture for modern applications
    • What are the new trends?
      • Streaming data and event-driven architectures are rising in popularity. 
      • Apache Kafka is becoming the nervous system for more data architectures.
      • Cloud computing dominates infrastructure, storage, and data-analysis and AI services.
      • Relational databases — including data warehouses — are not going anywhere,
      • Kubernetes is becoming the default orchestration layer for everything,
  • 6 Lessons we learned when debugging a scaling problem on GitLab.com: But the biggest lesson is that when large numbers of people schedule jobs at round numbers on the clock, it leads to really interesting scaling problems for centralized service providers like GitLab. If you’re one of them, you might like to consider putting in a random sleep of maybe 30 seconds at the start, or pick a random time during the hour and put in the random sleep, just to be polite and fight the tyranny of the clock.
  • Federated GraphQL Server at Scale: Zillow Rental Manager Real-time Chat Application: We share how we try to achieve developer productivity and synergy between different teams by having a federated GraphQL server…we decided to go with a full-fledged GraphQL, Node, React-Typescript application which would form the frontend part of the satellite…Both, Rental Manager and Renter Hub talk to Satellite GraphQl server (express-graphql server) which maps requests to the appropriate endpoint in Satellite API after passing through the authentication service for each module…We implemented a layered approach where each module houses multiple features and each feature has its own schema, resolvers, tests, and services. This strategy allows us to isolate each feature into its own folder and then stitch everything together at the root of our server. Each feature has its own schema and is written in a file with .graphql extension so that we can leverage all the developer tooling around GraphQl. 

Soft Stuff:

  • cloudstateio/cloudstate: a standards effort defining a specification, protocol, and reference implementation, aiming to extend the promise of Serverless and its Developer Experience to general-purpose application development. CloudState builds on and extends the traditional stateless FaaS model, by adding support for long-lived addressable stateful services and a way of accessing mapped well-formed data via gRPC, while allowing for a range of different consistency model—from strong to eventual consistency—based on the nature of the data and how it should be processed, managed, and stored.
  • TULIPP (article): makes it possible to develop energy-efficient embedded image processing systems more quickly and less expensively, with a drastic reduction in time-to-market. The results are impressive: The processing, which originally took several seconds to analyze a single image on a high-end PC, can now run on the drone in real time, i.e. now approximately 30 images are analyzed per second. “The speed of pedestrian detection algorithm could be increased by a factor of 100: Now the system can analyze 14 images per second compared to one image every seven seconds. Enhancement of X-ray image quality by applying noise-removing image filters allowed reducing the intensity of radiation during surgical operations to one fourth of the previous level. At the same time energy consumption could be significantly reduced for all three applications.

Pub Stuff:

  • A link layer protocol for quantum networks: Here, we take the first step from a physics experiment to a quantum internet system. We propose a functional allocation of a quantum network stack, and construct the first physical and link layer protocols that turn ad-hoc physics experiments producing heralded entanglement between quantum processors into a well-defined and robust service. This lays the groundwork for designing and implementing scalable control and application protocols in platform-independent software.
  • How Chemistry Computes: Language Recognition by Non-Biochemical Chemical Automata. From Finite Automata to Turing Machines:  Our Turing machine uses the Belousov-Zhabotinsky chemical reaction and checks the same symbol in an Avogadro′s number of processors. Our findings have implications for chemical and general computing, artificial intelligence, bioengineering, the study of the origin and presence of life on other planets, and for artificial biology.
  • Choosing a cloud DBMS: architectures and tradeoffs: My key takeaways as a TL;DR: Store your data in S3; Use portable data format that gives you future flexibility to process it with multiple different systems (e.g. ORC or Parquet); Use Athena for workloads it can support (Athena could not run 4 of the 22 TPC-H queries, and Spectrum could not run 2 of them), especially if you are doing less frequent ad-hoc queries.
  • The Art Of PostgreSQL: is the new edition of my previous release, Mastering PostgreSQL in Application Development. It contains mostly fixes to the old content, a new title, and a new book design (PDF and paperback). Content wise, The Art of PostgreSQL also comes with a new whole chapter about PostgreSQL Extensions.
  • TeaVaR: Striking the Right Utilization-Availability Balance in WAN Traffic Engineering: We advocate a novel approach to this challenge that draws inspiration from financial risk theory: leverage empirical data to generate a probabilistic model of network failures and maximize bandwidth allocation to network users subject to an operator-specified availability target. Our approach enables network operators to strike the utilizationavailability balance that best suits their goals and operational reality. We present TeaVaR (Traffic Engineering Applying Value at Risk), a system that realizes this risk management approach to traffic engineering (TE). 

from High Scalability

Top Redis Use Cases by Core Data Structure Types

Top Redis Use Cases by Core Data Structure Types

Top Redis Use Cases by Core Data Structure Types - ScaleGrid Blog

Redis, short for Remote Dictionary Server, is a BSD-licensed, open-source in-memory key-value data structure store written in C language by Salvatore Sanfillipo and was first released on May 10, 2009. Depending on how it is configured, Redis can act like a database, a cache or a message broker. It’s important to note that Redis is a NoSQL database system. This implies that unlike SQL (Structured Query Language) driven database systems like MySQL, PostgreSQL, and Oracle, Redis does not store data in well-defined database schemas which constitute tables, rows, and columns. Instead, Redis stores data in data structures which makes it very flexible to use. In this blog, we outline the top Redis use cases by the different core data structure types.

Data Structures in Redis

Let’s have  a look at some of the data types that Redis supports. In Redis, we have strings, lists, sets, sorted sets, and hashes, which we are going to cover in this article. Additionally, we have other data types such as bitmaps, hyperloglogs and geospatial indexes with radius queries and streams. While there are some Redis GUI tools written by the Redis community, the command line is by far the most important client, unlike popular SQL databases users which often prefer GUI management systems, for instance, phpMyAdmin for MySQL and PgAdmin for PostgreSQL.

Let us take a closer look at the data types that exist in Redis.

Redis Strings

Redis Strings are the most basic type of Redis value leveraged by all other data structure types, and are quite similar to strings in other programming languages such as Java or Python. Strings, which can contain any data type, are considered binary safe and have a maximum length of 512MB. Here are a couple useful commands for Redis strings:

To store a string ‘john’ under a key such as ‘student’ in Redis, run the command:

SET “student” “john”

To retrieve the string, use the GET command as shown:

GET “student”

To delete the string contained in the key use the DEL command:

DEL “student”

Redis Strings Use Cases

  1. Session Cache: Many websites leverage Redis Strings to create a session cache to speed up their website experience by caching HTML fragments or pages. Since data is stored temporarily in the RAM, this attribute makes Redis a perfect choice as a session cache. It is able to temporarily store user-specific data, for instance, items stored in a shopping cart in an online store, which is crucial in that your users do not lose their data in the event they log out or lose connection.
  2. Queues: Any application that deals with traffic congestion, messaging, data gathering, job management, or packer routing should consider a Redis Queue, as this can help you manage your queue size by rate of arrival and departure for resource distribution.
  3. Usage & Metered Billing: A lesser known use case for Redis Strings is the real-time metering for consumption-based pricing models. This allows SaaS platforms that bill based on actual usage to meter their customers activity, such as in the telecommunications industry where they may charge for text messages or minutes.

Redis Lists

Lists contain strings that are sorted by their insertion order. With Redis Lists, you can add items to the head or tail of the lists, which is very useful for queueing jobs. If there are more urgent jobs you require to be executed, these can be pushed in front of other lower priority jobs in the queue. We would use the LPUSH command to insert an element at the head, or left of the string, and the RPUSH command to insert at the tail, or right of our string. Let’s look at an example:

LPUSH list x   # now the list is "x"

LPUSH list y   # now the list is "y","x"

RPUSH list z   # now the list is "y","x","z" (notice how the ‘z’ element was added to the end of the list by RPUSH command)

Redis List Use Cases

  1. Social Networking Sites: Social platforms like Twitter use Redis Lists to populate their timelines or homepage feeds, and can customize the top of their feeds with trending tweets or stories.
  2. RSS Feeds: Create news feeds from custom sources where you can pull the latest updates and allow interested followers to subscribe to your RSS feed.
  3. Leaderboards: Forums like Reddit and other voting platforms leverage Redis Lists to add articles to the leaderboard and sort by most voted entries.

Learn how to build your own Twitter feed in our Caching tweets using Node.js, Redis and Socket.io blog post.

Redis Sets

Redis Sets are powerful data types that support powerful operations like intersections and unions. They are not in any order and are usually used when you want to perform an audit and see relationships between various variables. Sets are reasonably fast, and regardless of the number of elements you have stored, it will take the same time to add or remove items in a set. Furthermore, sets do not allow duplicate keys or duplicate members, so a key added multiple times in a set will simply be ignored. This is driven by a function called SADD which avoids duplication of multiple similar entries. The SADD attribute can be employed when checking unique values, and can also for scheduling jobs running in the background, including cron jobs which are automated scripts.

These are particularly helpful for analyzing real-time customer behavior for your online shopping site. For instance, if you’re running an online clothing store, Redis Sorted Sets employ relationship matching technique such as unions, intersections, and subtractions (commonly applied in Venn diagrams) to give an accurate picture of customer behavior. You can retrieve data on shopping patterns between genders, which clothes products sell more the most, and which hours record the highest sales.

Redis Sets Use Cases

  1. Analyzing Ecommerce Sales: Many online stores use Redis Sets to analyze customer behavior, such as searches or purchases for a specific product category or subcategory. For example, an online bookstore owner can find out how many customers purchased medical books in Psychology.
  2. IP Address Tracking: Redis Sets are a great tool for developers who want to analyze all of the IP addresses that visited a specific website page or blog post, and to be able to ignore all of the duplicates for unique visitors with their SADD function.
  3. Inappropriate Content Filtering: For any app that collects user input, it’s a good idea to implement content filtering for inappropriate words, and you can do this with Redis Sets by adding words you’d like to filter to a SET key and the SADD command.

Sorted Sets

As the name suggests, Redis Sorted Sets are a collection of strings that assign an order to your elements, and are one of the most advanced data structures in Redis. These are similar to Redis Sets, only that Sets have no order while Sorted Sets associate every member with a score. Sorted Sets are known for being very fast, as you can return ordered lists and access elements in the shortest time possible.

Redis Sorted Sets Use Cases

  1. Q&A Platforms: Many Q&A platforms like Stack Overflow and Quora use Redis Sorted Sets to rank the highest voted answers for each proposed question to ensure the best quality content is listed at the top of the page.
  2. Gaming App Scoreboards: Online gaming apps leverage Redis Sorted Sets to maintain their high score lists, as scores can be repeated, but the strings which contain the unique user details cannot.
  3. Task Scheduling Service: Redis Sorted Sets are a great tool for a task scheduling service, as you can associate a score to rank the priority of a task in your queue. For any task that does not have a score noted, you can use the WEIGHTS option to a default of 1.
  4. Geo Hashing: The Redis geo indexing API uses a Sorted Set for the Geo Hash technique which allows you to index locations based on latitude and longitude, turning multi dimensional data into linear data.

Redis Hashes

Redis Hashes are maps between string fields and string values. This is the go-to data type if you need to essentially create a container of unique fields and their values to represent objects. Hashes allow you to store a decent amount of fields, up to 232 – 1 field-value pairs(more than 4 billion), while taking up very little space. You should use Redis Hashes whenever possible, as you can use a small Redis instance to store millions of objects. You can use basic hash command operations, such as get, set, exists, in addition to many advanced operations.

Redis Hashes Use Cases

  1. User Profiles: Many web applications use Redis Hashes for their user profiles, as they can use a single hash for all the user fields, such as name, surname, email, password, etc.
  2. User Posts: Social platforms like Instagram leverage Redis Hashes to map all the archived user photos or posts back to a single user. The hashing mechanism allows them to look up and return values very quickly, fit the data in memory, and leverage data persistence in the event one of their servers dies.
  3. Storing Multi-Tenant Metrics: Multi-tenant applications can leverage Redis hashes to record and store their product and sales metrics in a way that guarantees solid separation between each tenant, as hashes can be encoded efficiently in a very small memory space.

Who uses Redis?

Redis has found a huge market share across the travel and hospitality, community forums, social media, SaaS, and ecommerce industries to name just a few. Some of the leading companies who use Redis include Pinterest, Uber, Slack, Airbnb, Twitter, and Stack Overflow. Here are some stats on Redis popularity today:

  • 4,107 companies reported using Redis on StackShare
  • 8,759 developers stated using Redis on StackShare
  • 38,094 GitHub users have starred Redis
  • #8 ranked database on DB-Engines with a score of 144.08

from High Scalability

Stuff The Internet Says On Scalability For August 23rd, 2019

Stuff The Internet Says On Scalability For August 23rd, 2019

Wake up! It’s HighScalability time:

Absurd no more. This Far Side cartoon is now reality.

Do you like this sort of Stuff? I’d love your support on Patreon. I wrote Explain the Cloud Like I’m 10 for people who need to understand the cloud. And who doesn’t these days? On Amazon it has 54 mostly 5 star reviews (125 on Goodreads). They’ll learn a lot and likely add you to their will.

Number Stuff:

  • 7.11 trillion: calls to the DynamoDB API, peaking at 45.4 million requests per second, during 48 hours of Prime Day. Amazon Aurora also supports the network of Amazon fulfillment centers. On Prime Day, 1,900 database instances processed 148 billion transactions, stored 609 terabytes of data, and transferred 306 terabytes of data. The EBS team added an additional 63 petabytes of storage ahead of Prime Day; the resulting fleet handled 2.1 trillion requests per day and transferred 185 petabytes of data per day.
  • 768 million: US vacation days go wasted. Do something fun! Your creativity depends on it.
  • $1.2 billion: data labeling industry for AI by 2023, up from $500 million last year. 
  • 23: local Texas government agencies struck by ransomware. Attacks against businesses and governments are up by 365%.
  • 1.2 trillion: transistor deep learning processor. 56 times larger than the largest GPU today. 
  • 97%: of code in a modern web app comes from npm.
  • 70%: of all Java apps are bottlenecked on memory churn. 
  • 25 years: time it has taken ecommerce to reach 10% of retail sales in the U.S.
  • 8 million: universes simulated by two-thousand processors crunching data simultaneously over three weeks.
  • 5.3 million: stolen credit card numbers go on sale.
  • 50%: per year growth in bandwidth needs for hyperscalers and cloud builders.

Quotable Stuff:

  • Cardinal Richelieu: Give me six lines written by an honest man, and I will find something in it with which to hang him.
  • @bassamtabbara: Multicloud is knowing that I can change carriers without having to change phone numbers
  • Benjamin Woodruff: Instagram Server is entirely Python powered. Well, mostly. There’s also some Cython, and our dependencies include a fair amount of C++ code exposed to Python as C extensions. Our server app is a monolith, one big codebase of several million lines and a few thousand Django endpoints [1], all loaded up and served together.
  • Stephen Shankland: The fine-tuned AI chips – which have 6 billion transistors apiece – are “smart” enough to power Tesla’s full self-driving abilities in the future, according to the company. Their performance has improved by a factor of 21, compared to the earlier Nvidia chips. Ganesh Venkataramanan, one of the chip designers and a former AMD processor engineer, said that in order to meet “performance levels at the power constraints and the form factor constraints we had, we had to design something of our own.” The chips, optimized for self-driving cars, run at 2GHz and perform 36 trillion operations a second.
  • @trevorbrindlejs: Hiring is Broken: (research paper) Candidates are concerned about: – Relevance of questions – anxiety during interview – frustration/humiliation (affect) – lack of typical dev env (affordances) – the time required to practice – being disqualified on an unfair criteria
  • @jzawodn: True for me today: “Debugging is like being the detective in a crime movie where you are also the murderer.”
  • Dan Goodin: A rash of supply chain attacks hitting open source software over the past year shows few signs of abating, following the discovery this week of two separate backdoors slipped into a dozen libraries downloaded by hundreds of thousands of server administrators.
  • kruzes: Sorry, but at this point, I need to assume Javascript/Node culture is beyond saving. Just today I was trialing Firebase Cloud Functions. The hello world of it requires a package “firebase-functions”, by Google themselves. It has 73 dependencies, by 83 maintainers.There’s no being careful with that, I have to assume Google was careful on my behalf, right?
  • Bechtolsheim: People always talk about this as if there is some magic to it, but there really is just substitution from a cost/performance and technology standpoint from the previous generation. And the speed of adoption is largely driven by the relative price/performance. In the dark ages between 2000 and 2010, there was a 10 Gb/sec standard in 2000, but the equipment was so expensive that very few people could justify deploying it. It took almost ten years for the cost to come down before there was some adoption. In the cloud, this pace doesn’t work at all because they will never adopt a technology unless it is cheaper on Day One.
  • @copyconstruct: How to make FaaS a fully functional. We now have millions of cores and petabytes of RAM. We need a programming model that’ll allow us to unlock the full potential of the cloud and serverless.
  • Albert Kozłowski: However, after a couple of years and despite how much things have changed in terms of technology, I believe that code ownership and feature teams had the biggest impact on how software is developed within organizations that adopted microservices. In my opinion, having smaller teams with clear ownership brings a lot of joy to the day-to-day development work and gives developers the kind of freedom that sparks creativity.
  • @PDChina: For the first time in China, #AI assistive technology was used in a trial at Shanghai No 2 Intermediate People’s Court on Wed, the Legal Daily reported. When the judge, public prosecutor or defender asked the AI system, it displayed all related evidence on a courtroom screen.
  • dwl-sdca: The IT person doesn’t make the funding decision. A relative of mine works for a small county agency. The IT department wanted to buy two external drives to support staggered off-site backups. The total cost was less than US$1000. The request was refused. They countered with a request for one backup drive. That too was refused.
  • David Auerbach: Two consequences of this massive increase in data processing are a drive toward ubiquity of the models used, and an increasing human opacity to these models, whether or not such opacity is intended or inevitable. If our lives are going to be encoded (in the non-programming sense) by computers, computer science should assume reductionism, ubiquity, and opacity as intrinsic properties (and problems) of the models its methods generate.
  • Ed Sperling: Putting all of this in perspective, all of the major chipmakers are tackling similar problems in their target markets. They are improving performance per watt through a combination of general-purpose processors and custom accelerators, and in many cases they are making it possible to replace modules more easily and quickly from one market to the next, and as algorithms are updated. They also are improving throughput of data on-chip, off-chip to memory, and prioritizing the movement of different kinds of data.
  • Jason Torchinsky: These pumps are expensive pieces of equipment, costing many thousands of dollars each. This isn’t some toy; the Encore line of pumps are serious machines, designed for serious business. That’s why this is all so baffling: why is that image so, you know, shitty?
  • @benedictevans: China’s ability to fire the CEO of Cathay Pacific shows how irrelevant it is to ask who technically owns Huawei. What matters is whether the state has effective control, not legal control, and we know the answer to that for any company in or even near China.
  • Zak Jason: “I can’t tell you the number of times women have filled our questionnaires with no details except ‘We want an Instagram-perfect spot for brunch.”
  • Dr. Ian Cutress: One of the key critical future elements about this world of compute is moving data about. Moving data requires power, to the point where calling data from memory can consume more power than actually doing ‘compute’ work on it. This is why we have caches, but even these require extensive management built in to the CPU. For simple operations, like bit-shifts or AND operations, the goal is to move the ability to do that compute onto the main DRAM itself, so it doesn’t have to shuttle back and forth.  Citing performance examples, UPMEM has stated that they have seen speedups of 22x—25x on Genomics pattern matching, an 18x speed up in throughput for database index searching at 1/100th the latency, and an 14x TCO gain for index search applications.
  • More_front_IPC says: So it appears that with higher core clocks no longer being as readily available with process node shrinks as it was in the past there is now the impetus for AMD and Intel to have to go wider order superscalar and move more in the IBM Power 8/9/10 direction with some very wide order superscalar offerings in order to get the IPCs higher with with that clock speed increase low hanging fruit non longer there as an easy way to get more performance.
  • Lorin Hochstein: To get better at avoiding or mitigating future incidents, you need to understand the conditions that enabled past incidents to occur. Counterfactual reasoning is actively harmful for this, because it circumvents inquiry into those conditions. It replaces “what were the circumstances that led to person X taking action Y” with “person X should have done Z instead of Y”.
  • @GossiTheDog: I’m beginning to think that rather than dumping firmware for IoT ovens and toasters and hyping insignificant bugs on blogs, the security industry should be dumping and examining the firmware of security industry products – as this shit looks backdoored to hell.
  • Cade Metz: One day, who knows when, artificial intelligence could hollow out the job market. But for now, it is generating relatively low-paying jobs. The market for data labeling passed $500 million in 2018 and it will reach $1.2 billion by 2023, according to the research firm Cognilytica. This kind of work, the study showed, accounted for 80 percent of the time spent building A.I. technology.
  • TSMC: First, let’s discuss the elephant in the room. Some people believe that Moore’s Law is dead because they believe it is no longer possible to continue to shrink the transistor any further. Just to give you an idea of the scale of the modern transistor, the typical gate is about 20 nanometers long. A water molecule is only 2.75 Angstrom or 0.275 nanometer in diameter! You can now start counting the number of atoms in a transistor. At this scale, many factors limit the fabrication of the transistor. The primary challenge is the control of materials at the atomic level. How do you place individual atoms to create a transistor? How do you do this for billions of transistors found on a modern chip? How do you build these chips that have billions of transistors in a cost effective manner?
  • ChuckMcM: It wasn’t until it started obviously failing to be true, that semiconductor companies started arguing in favor of an interpretation they could meet, rather than admit that Moore’s law was dead, as pretty much any engineer actually building systems would tell you. Somewhere there must be a good play on the Monty Python Parrot sketch where Moore’s law stands in for the parrot and a semiconductor marketing manager stands in for the hapless pet shop owner. It is really hard to make smaller and smaller transistors. And the laws of physics interferes. Further its really hard to get the heat out of a chip when you boost the frequency. Dennard, others, have  characterized those limits more precisely and as we hit those limits, progress along that path slows to a crawl or stops. Amdahl pretty famously characterized the limits of parallelism, we are getting closer to that one too, even for things that are trivially parallelized like graphics or neural nets.
  • Charlie Demerjian: As you can see from the chart above a 32C Epyc, probably the 7452, beats the best of Intel’s Cascade line by a substantial margin. The Intel 8280L (Note that we will use the L/M variants from here on out because the crippled -nothing parts are not comparable to AMD’s Epyc line in our eyes) is a $17,906 part where the 7452 costs $3400. And has more features like 8-channel DDR at higher speeds, PCIe4, more than 2x the PCIe lanes, etc etc. On the down side the 7452 consumes 20W more to do so. Depending on your TCO calculations, usually $1-3/W/year, this could add almost $300 to AMDs tab reducing the price differential to a mere $14,506 per socket.
  • joezydeco: When working with credit cards and chip/PIN systems the entry of the PIN needs to be secure. This usually means the scanning lines from the keypad go directly to a security-hardened subprocessor inside the pump – the same one reading the PAN from the EMV chip or magstripe. Then the PIN/PAN block is encrypted and sent off to the application processor and/or bank to complete the transaction. If PIN entry was offloaded to the application processor, that processor would need to be audited to make sure of certain requirements (PIN isn’t sniffable, it isn’t held in RAM after deallocation, encryption isn’t breakable, etc).

Useful Stuff

  • Do you have psychic scars from then endless Extreme Programming, Agile, and Lean wars?  Would you like an approach to building and delivering software that isn’t just another version of identity politics? Then you’ll probably appreciate this interview with Ryan Singer, head of Product Strategy at Basecamp. Shaping, betting, and building. It all sounds so reasonable you could probably make some money from it on ASMR YouTube. And there’s a free book too! Just go to basecamp.com/shapeup. If you are looking for a methodology you’ve probably already done worse—many times.
  • Eric Brewer on Why Envoy and Istio are the Future of Networking. Most people when they think of a service they think about the API. But that’s only half of a service. Operations is the other half. When deploying a service you think about policies: DDoS, who can call it, quotas, authentication, security, etc. You’re not thinking about what about the service does, you’re not thinking about the API. This means you can decouple developers from operations by moving all the operational concerns to operations. Ideally the two—developers and operations—can coexist without much interaction. But this is not how it works historically. Historically developers encode in source code access control checks, quota checks, etc. The problem is this means you have to negotiate what goes inside every service which adds coordination overhead. So, put all those things in the service infrastructure. This is where Istio and Envoy come in. Istio implements service infrastructure and Envoy manages it. Operational checks are pushed into a proxy. Developers just write the meat of it. Decoupling lets both sides go faster. Also, Service Mesh Day Recap
  • Embedded interpreters have been causing problems for millions of years. The Obscure Virus Club. Retro viruses are explained as embedded enzyme interpreters that speak RNA and translate to RNA so the virus can insert its own genetic information into the cells it infects. Reverse transcriptase
  • AnandTech with wall to wall coverage of the Hot Chips 31 conference. I hope Dr. Ian Cutress gets a few days off. You might like: Tesla Solution for Full Self DrivingNVIDIA Releases GeForce 436.02 Driver: Integer Scaling Support for Turing, Freestyle Sharpening, & MoreDr. Lisa Su, CEO of AMD Live Blog.
  • Highlights from Git 2.23. IshKebab: I’m pretty blown away that they’re finally admitting that the current git CLI is an unintuitive mess.
  • Lesson Learned from Queries over 1.3 Trillion Rows of Data Within Milliseconds of Response Time at Zhihu.com
    • Zhihu is the Quora of China. We currently have 220 million registered users, and 30 million questions with more than 130 million answers. With approximately 100 billion rows of data accruing each month and growing, this number will reach 3 trillion in two years. 
    • TiDB, an open source MySQL-compatible NewSQL Hybrid Transactional/Analytical Processing (HTAP) database, empowered us to get real-time insights into our data.
    • TiDB’s key features: Horizontal scalability; MySQL-compatible syntax; Distributed transactions with strong consistency; Cloud-native architecture; Minimal extract, transform, load (ETL) with HTAP; Fault tolerance and recovery with Raft; Online schema changes
    • The top layer: stateless and scalable client APIs and proxies. These components are easy to scale out.
    • The middle layer: soft-state components, and layered Redis caches as the main part. When services break down, these components can self-recover services via restoring data saved in the TiDB cluster.
    • The bottom layer: the TiDB cluster stores all the stateful data. Its components are highly available, and if a node crashes, it can self-recover its service.
    • The 99th percentile response time was about 25 ms, and the 999th percentile response time was about 50 ms.
  • A whole bunch of Key Takeaway Points and Lessons Learned from QCon New York 2019.
  • Know thyself. Why our team cancelled our move to microservices: After a month of investigation and preparation, we cancelled the move, instead deciding to stick with our monolith. For us, microservices were not only going to not help us; they were going to hurt our development process…Once everything started getting hard, and the clear path forward started to get lost, we paused, and realized we didn’t know why we were doing any of this. We didn’t have a list of our pain points, and we had no clear understanding of how this would help solve any pain points we do have. Worse, microservices might be just about to create a whole set of new problems for us…After months of investigation and work, we abandoned the project and spent the remaining time performing some minor refactors to our “monolith”.
  • There are a lot of options. Next up? We need stateful solutions. Serverless on GCP: Firebase (serverless applications, BaaS), Cloud Functions (serverless functions, FaaS), App Engine (serverless platforms, PaaS), Cloud Run (serverless containers, CaaS), Kubernetes Engine, Compute Engine. 
  • You need to wait for the push of a button and then let an LED flash exactly five times? You need to control a battery-operated night light? A short survey of sub $0.10 microcontrollers. Amazingly there are quite a few. Also, Making A Three Cent Microcontroller Useful
  • Monolist: By abstracting away retry behavior, ensuring that jobs are idempotent, and making sure that we’re always getting closer to success, our end users are fully oblivious to the errors, and can focus on staying productive, writing code, and being the best they can be at their jobs.
  • We used AWS to create a global on-demand server infrastructure.  Spawning Game Servers on AWS. It goes pretty much as you might expect. They went with AWS Elastic Compute Cloud (ECS) using Fargate. They only pay for what they use and as game play is variable you don’t want to stand up a fleet of machines. The biggest issue seemed to be minimizing startup times given the Steam update takes about 40 seconds. They minimized the docker image and went with a regional architecture. 
  • Free Neo4j Data Science and Graph Algorithm courses.
  • There’s a DevOps report. The 2019 Accelerate State of DevOps: Elite performance, productivity, and scaling. The shocking conclusion is DevOps is the future. And if you’re elite DevOps your 24x times more likely to full exploit the cloud and low performers use more proprietary software than high and elite performers. So if you want to be leet you know what you need to do. 
  • Common Design Patterns in Distributed Architectures: Command and Query Responsibility Segregation (CQRS); Two-Phase Commit (2PC); Saga; Sidecar. Saga: Saga is an asynchronous design pattern that is meant to overcome the disadvantages of synchronous patterns, such as 2PC. This design pattern uses Event Bus to communicate with microservices. This bus is used to send and receive requests between services, with each participating service creating a local transaction and emitting an event. Other services listen for events, and the first request to intercept an event performs the required action. Sidecar enables applications to be decomposed into isolated components and includes the dependencies and packages that it requires.
  • Maybe we should just give up on the idea of reuse and make coding applications easier? It seems even the with the best intentions and highest skill levels reuse remains elusively situational. Building the New Uber Freight App as Lists of Modular, Reusable Components. Would anyone be surprised in a year to see another post explaining how the app grew more complex over time and the previous reusable component system to rule them all didn’t work as well as planned and needed to be reconceptualized?

Pub Stuff:

  • Anna: A KVS For Any Scale: In contrast, we explore how a system can be architected to scale across many orders of magnitude by design. We explore this challenge in the context of a new keyvalue store system called Anna: a partitioned, multi-mastered system that achieves high performance and elasticity via waitfree execution and coordination-free consistency. Our design rests on a simple architecture of coordination-free actors that perform state update via merge of lattice-based composite data structures. We demonstrate that a wide variety of consistency models can be elegantly implemented in this architecture with unprecedented consistency, smooth fine-grained elasticity, and performance that far exceeds the state of the art.
  • Hiring is Broken: What Do Developers Say About Technical Interviews?: Technical interviews -a problem-solving form of interview in which candidates write code- are commonplace in the software industry, and are used by several well-known companies including Facebook, Google, and Microsoft.
  • UNIVERSEMACHINE: The correlation between galaxy growth and dark matter halo assembly from z = 0−10: We present a method to flexibly and self-consistently determine individual galaxies’ star formation rates (SFRs) from their host haloes’ potential well depths, assembly histories, and redshifts. The public data release (DR1) includes the massively parallel (>105 cores) implementation (the UNIVERSEMACHINE), the newly compiled and remeasured observational data, derived galaxy formation constraints, and mock catalogues including lightcones.

from High Scalability

Stuff The Internet Says On Scalability For August 16th, 2019

Stuff The Internet Says On Scalability For August 16th, 2019

Wake up! It’s HighScalability time:

Do you like this sort of Stuff? I’d love your support on Patreon. I wrote Explain the Cloud Like I’m 10 for people who need to understand the cloud. And who doesn’t these days? On Amazon it has 53 mostly 5 star reviews (124 on Goodreads). They’ll learn a lot and likely add you to their will.

Number Stuff:

  • $1 million: Apple finally using their wealth to improve security through bigger bug bounties.
  • $4B: Alibaba cloud service yearly run rate, growth of 66%. Says they’ll overtake Amazon in 4 years. 
  • 200 billion: Pinterest pins pinned across more than 4 billion boards by 300 million users.
  • 21: technology startups took in mega-rounds of $100 million or more. 
  • 3%: of users pass their queries through resolvers that actively work to minimize the extent of leakage of superfluous information in DNS queries.
  • < 50%: Google searches result in a click. SEO dies under walled garden shade.
  • 4 million: DDoS attacks in the last 6 months, frequency grew by 39 percent in the first half of 2019. IoT devices are under attack within minutes. Rapid weaponization of vulnerable services continued. 
  • 200: distributed microservices in S3, up from 8 when it started 13 years ago.
  • 50%: cumulative improvement to Instagram.com’s feed page load time.
  • $318 million: Fortnite monthly revenue, likely had more than six consecutive months with at least one million concurrent active users.
  • $18,000: in fines because you just had to have the license plate NULL. 
  • $6.1 billion: Uber created Dutch weapon to avoid paying taxes.
  • 14.5%: drop in 1H19 global semiconductor sales.
  • 13%: fall in ad revenue for newspapers. 

Quotable Stuff:

  • Donald Hoffman: That is what evolution has done. It has endowed us with senses that hide the truth and display the simple icons we need to survive long enough to raise offspring. Space, as you perceive it when you look around, is just your desktop—a 3D desktop. Apples, snakes, and other physical objects are simply icons in your 3D desktop. These icons are useful, in part, because they hide the complex truth about objective reality.
  • rule11: First lesson of security: there is (almost) always a back door.
  • Paul Ormerod: A key discovery in the maths of how things spread across networks is that in any networked system, any shock, no matter how small, has the potential to create a cascade across the system as a whole. Watts coined the phrase “robust yet fragile” to describe this phenomenon. Most of the time, a network is robust when it is given a small shock. But a shock of the same size can, from time to time, percolate through the system. I collaborated with Colbaugh on this seeming paradox. We showed that it is in fact an inherent property of networked systems. Increasing the number of connections causes an improvement in the performance of the system, yet at the same time, it makes it more vulnerable to catastrophic failures on a system-wide scale.
  • @jeremiahg: InfoSec is ~$127B industry, yet there’s no price tags on any vendor website. For some reason it’s easier to find out what a private plane costs than a ‘next-gen’ security product. Oh yah, and let’s not forget the lack of warranties.
  • Hall’s Law:  the maximum complexity of artifacts that can be manufactured at scales limited only by resource availability doubles every 10 years. 
  • YouTube~ Our responsibility was never to the creators or to the users,” one former moderator told the Post. “It was to the advertisers.”
  • reaperducer: It’s for this reason that’s I’ve stopped embedding micro data in the HTML I write. Micro data only serves Google. Not my clients. Not my sites. Just Google. Every month or so I get an e-mail from a Google bot warning me that my site’s micro data is incomplete. Tough. If Google wants to use my content, then Google can pay me. If Google wants to go back to being a search engine instead of a content thief and aggregator, then I’m on board.
  • Maxime Puteaux: The small satellite launch market has grown to account for “69% of the satellites launched last year in number of satellites but only 4% of the total mass launched (i.e 372 tons). … The smallsat market experienced a 23% compound annual growth rate (CAGR) from 2009 to 2018” with even greater growth expected in the future, dominated by the launch needs of constellations.
  • @Electric_Genie: San Diego has a huge, machine-intelligence-powered smart streetlight network that monitors traffic to time traffic signals. Now, they’ve added ability to detect pedestrians and cyclists
  • Simon Wardley: How to create a map? Well, I start off with a systems diagram, I give it an anchor at the top. In this case, I put customer and then I describe position through a value chain. A customer wants online photo storage, which needs website, which needs platform, which needs computer, which needs power, and of course, the stuff at the bottom is less visible to the customer than the stuff at the top.
  • Charity Majors: When we blew up the monolith into many services, we lost the ability to step through our code with a debugger: it now hops the network.  Our tools are still coming to grips with this seismic shift.
  • Livia Gershon: According to McLaren, from 1884 to 1895, the Matrimonial Herald and Fashionable Marriage Gazette promised to provide “HIGH CLASS MATCHES” to U.K. men and women looking for wives and husbands. Prospective spouses could place ads in the paper or work directly with staff of the associated Word’s Great Marriage Association to privately make a connection.
  • @KarlBode: There is absolutely ZERO technical justification for bandwidth caps and overage fees on cable networks. Zero. It’s a glorified price hike on captive US customers who already pay more for bandwidth than most developed nations due to limited competition.
  • Fowler: That’s the other piece of app trackers, is that they do a whole bunch of bad things for our phone. Over the course of a week, I found 5,400 different trackers activated on my iPhone. Yours might be different. I may have more apps than you. But that’s still quite a lot. If you multiplied that out by an entire month, it would have taken up 1.5 gigabytes of data just going to trackers from my phone. To put that in some context, the basic data plan from AT&T is only 3 gigabyte
  • Kate Green: Starshot is straightforward, at least in theory. First, build an enormous array of moderately powerful lasers. Yoke them together—what’s called “phase lock”—to create a single beam with up to 100 gigawatts of power. Direct the beam onto highly reflective light sails attached to spacecraft weighing less than a gram and already in orbit. Turn the beam on for a few minutes, and the photon pressure blasts the spacecraft to relativistic speeds.
  • Markham Heid: Beeman says activities that are too demanding of our brain or attention — checking email, reading the news, watching TV, listening to podcasts, texting a friend, etc. — tend to stifle the kind of background thinking or mind-wandering that leads to creative inspiration. 
  • @ben11kehoe: Aurora never downsizes storage. Continue to pay at the highest roll you’ve ever made.
  • John Allspaw: Resilience is not preventative design, it is not fault-tolerance, it is not redundancy. If you want to say fault-tolerance, just say fault-tolerance. If you want to say redundancy, just say redundancy. You don’t have to say resilience just because, you can, and you absolutely are able to. I wish you wouldn’t, but you absolutely can, and that’ll be fine as well.
  • Matthew Ball: But, again, lucrative “free-to-play” games have been around for more than a decade. In fact, it turns out the most effective way to generate billions of dollars is to not require a player spend a single one (all of the aforementioned billion-dollar titles are also free-to-play). 
  • TrailofBits: Smart contract vulnerabilities are more like vulnerabilities in other systems than the literature would suggest. A large portion (about 78%) of the most important flaws (those with severe consequences that are also easy to exploit) could probably by detected using automated static or dynamic analysis tools.
  • @sfiscience: 1/2″Once you induce [auto safety] regulatory protection, there is a decline in the number of highway deaths. And then in 3-4 years, it goes right up to where it was before the safety regulation is imposed.”  2/2 There’s a kind of “risk homeostasis” with regulation: as people feel safer, they take more risks (eg, seatbelts led to faster driving and more pedestrian deaths). One exception:  @NASCAR deaths went UP with safety innovations. “People are not dumb, but they’re not rational-expectations-efficient either.”  
  • Michael F. Cohen: It may be hard to believe, but only a few years ago we debated when the first computer graphics would appear in a movie such that you could not tell if what you were looking at was real or CG. Of course, now this question seems silly, as almost everything we see in action movies is CG and you have no chance of knowing what is real or not.
  • Dropbox: Much like our data storage goals, the actual cost savings of switching to SMR (Shingled Magnetic Recording) have met our expectations. We’re able to store roughly 10 to 20 percent more data on an SMR drive than on a PMR drive of the same capacity at little to no cost difference. But we also found that moving to the high-capacity SMR drives we’re using now has resulted in more than a 20% percent savings overall compared to the last generation storage design.
  • Riot Games: The patch size was 68 MB for RADS and 83 MB for the new patcher. Despite the larger download size, the average player was able to update the game in less than 40 seconds, compared to over 8 minutes with the old patcher.
  • @grossdm: For a decade, VCs have been subsidizing the below-market provision of services to urban-dwellers: transport, food delivery, office space. Now the baton is being passed to public shareholders, who will likely have less patience. 20 years ago, public investors very quickly walked away from the below-market provision of e-commerce and delivery services  — i.e. Webvan. 
  • Julia Grace: Looking back, I should have done a lot more reorgs [at Slack] and I should’ve broken up a lot more parts of the organization so that they could have more specialization, but instead, it was working so we kept it all together.
  • Thomas Claburn: “No iCloud subscriber bargained for or agreed to have Apple turn his or her data – whether encrypted or not – to others for storage,” the complaint says. “…The subscribers bargained for, agreed, and paid to have Apple – an entity they trusted – store their data. Instead, without their knowledge or consent, these iCloud subscribers had their data turned over by Apple to third-parties for these third-parties to store the data in a manner completely unknown to the subscribers.”
  • @glitchx86: Some merit to TM: it solves the problem of the correctness of lock-based concurrent programs. TM hides all the complexity of verifying deadlock-free software .. and it isn’t an easy task 
  • @narayanarjun: We were experiencing 40ms latency spikes on queries at @MaterializeInc and @nikhilbenesch tracked it down to TCP NODELAY, and his PR just cracks me up. The canonical cite is a hacker news comment ((link: https://news.ycombinator.com/item?id=10608356) news.ycombinator.com/item?id=106083…) signed by John Nagle himself, and I can’t even.
  • Donald Hoffman: Perhaps the universe itself is a massive social network of conscious agents that experience, decide, and act. If so, consciousness does not arise from matter; this is a big claim that we will explore in detail. Instead, matter and spacetime arise from consciousness—as a perceptual interface.
  • MacCárthaigh: From the very beginning at AWS, we were building for internet scale. AWS came out of amazon.com and had to support amazon.com as an early customer, which is audacious and ambitious. They’re a pretty tough customer, as you can imagine, one of the busiest websites on Earth. At internet scale, it’s almost all uncoordinated. If you think about CDNs, they’re just distributed caches, and everything’s eventually consistent, and that’s handling the vast majority of things.
  • Jack Clark: Being able to measure all the ways in which AI systems fail is a superpower, because such measurements can highlight the ways existing systems break and point researchers towards problems that can be worked on.
  • Google: We investigated the remote attack surface of the iPhone, and reviewed SMS, MMS, VVM, Email and iMessage. Several tools which can be used to further test these attack surfaces were released. We reported a total of 10 vulnerabilities, all of which have since been fixed. The majority of vulnerabilities occurred in iMessage due to its broad and difficult to enumerate attack surface. Most of this attack surface is not part of normal use, and does not have any benefit to users. Visual Voicemail also had a large and unintuitive attack surface that likely led to a single serious vulnerability being reported in it.  Overall, the number and severity of the remote vulnerabilities we found was substantial. Reducing the remote attack surface of the iPhone would likely improve its security.
  • sleepydog: I work in GCP support. I think you would be surprised. Of course Linux is more common, but we still support a lot of customers who use Windows Server, SQL Server, and .NET for production.
  • Laurence Tratt: performance nondeterminism increasingly worries me, because even a cursory glance at computing history over the last 20 years suggests that both hardware (mostly for performance) and software (mostly for security) will gradually increase the levels of performance nondeterminism over time. In other words, using the minimum time of a benchmark is likely to become more inaccurate and misleading in the future…
  • Geoff Tate: A year ago, if you talked to 10 automotive customers, they all had the same plan. Everyone was going straight to fully autonomous, 7nm, and they needed boatloads of inference throughput. They wanted to license IP that they would integrate into a full ADAS chip they would design themselves. They didn’t want to buy chips. That story has backpedaled big time. Now they’re probably going to buy off-the-shelf silicon, stitch it together to do what they want, and they’re going to take baby steps rather than go to Level 5 right away.
  • Ann Steffora Mutschler: In discussions with one of the Tier 0.5 suppliers about whether sensor fusion is the way to go or if it makes better sense to do more of the computation at the sensor itself, one CTO remarked that certain types of sensor data are better handled centrally, while other types of sensor data are better handled at the edge of the car, namely the sensor, Fritz said.
  • Dai Zovi: A software engineering team would write security features, then actively go to the security team to talk about it and for advice. We want to develop generative cultures, where risk is shared. It’s everyone’s concern. If you build security responsibility into every team, you can scale much more powerfully than if security is only the security staff’s responsibility.
  • Nitasha Tiku: But that didn’t mean things would go back to normal at Google. Over the past three years, the structures that once allowed executives and internal activists to hash out tensions had badly eroded. In their place was a new machinery that the company’s activists on the left had built up, one that skillfully leveraged media attention and drew on traditional organizing tactics. Dissent was no longer a family affair. And on the right, meanwhile, the pipeline of leaks running through Google’s walls was still going as strong as ever.
  • Graham Allan: There’s another bottleneck that SoC designers are starting to struggle with, and it’s not just about bandwidth. It’s bandwidth per millimeter of die etch. So if you have a bandwidth budget that you need for your SoC, a very easy exercise is to look at all the major technologies you can find. If you have HBM2E, you can get on the order of 60+ gigabytes per second per millimeter of die edge. You can only get about a sixth of that for GDDR6. And I can only get about a tenth of that with LPDDR5.
  • Brian Bailey: If the industry is willing to give von Neumann the boot, it should perhaps go the whole way and stop considering memory to be something shared between instructions and data and start thinking about it as an accelerator. Viewed that way, it no longer has to be compared against logic or memory, but should be judged on its own merits. If it accelerates the task and uses less power, then it is a purely economic decision if the area used is worth it, which is the same as every other accelerator.
  • Barbara Tversky: This brings us to our First Law of Cognition: There are no benefits without costs. Searching through many possibilities to find the best can be time consuming and exhausting. Typically, we simply don’t have enough time or energy to search and consider all the possibilities. The evidence on action is sufficient to declare the Second Law of Cognition: Action molds perception. There are those who go farther and declare that perception is for action. Yes, perception serves action, but perception serves so much more. 
  • Jez Humble: testing is for known knowns, monitoring is for known unknowns, observability is for unknown unknowns
  • @briankrebs: Being in infosec for so long takes its toll. I’ve come to the conclusion that if you give a data point to a company, they will eventually sell it, leak it, lose it or get hacked and relieved of it. There really don’t seem to be any exceptions, and it gets depressing
  • Brendon Foye: The hyperscale giant today released a new co-branding guide (pdf), instructing partners in the AWS Partner Network (APN) how to position their marketing material when going to market with AWS. Among the guidelines, AWS said it won’t approve the use of terms like “multi-cloud,” “cross cloud,” “any cloud,” “every cloud,” “or any other language that implies designing or supporting more than one cloud provider.” The hyperscale giant today released a new co-branding guide (pdf), instructing partners in the AWS Partner Network (APN) how to position their marketing material when going to market with AWS. Among the guidelines, AWS said it won’t approve the use of terms like “multi-cloud,” “cross cloud,” “any cloud,” “every cloud,” “or any other language that implies designing or supporting more than one cloud provider.
  • Newley Purnell: Startup Engineer.ai says it uses artificial-intelligence technology to largely automate the development of mobile apps, but several current and former employees say the company exaggerates its AI capabilities to attract customers and investors.
  • George Dyson: If you look at the most interesting computation being done on the Internet, most of it now is analog computing, analog in the sense of computing with continuous functions rather than discrete strings of code. The meaning is not in the sequence of bits; the meaning is just relative. Von Neumann very clearly said that relative frequency was how the brain does its computing. It’s pulse frequency coded, not digitally coded. There is no digital code.
  • Brendon Dixon: Because they’ve chosen to not deeply learn their deep learning systems—continuing to believe in the “magic”—the limitations of the systems elude them. Failures “are seen as merely the result of too little training data rather than existential limitations of their correlative approach” (Leetaru). This widespread lack of understanding leads to misuse and abuse of what can be, in the right venue, a useful technology.
  • Ewan Valentine: I could be completely wrong on this, but over the years, I’ve found that OO is great for mapping concepts, domain models together, and holding state. Therefor I tend to use classes to give a name to a concept and map data to it. For example, entities, repositories, and services, things which deal with data and state, I tend to create classes for. Whereas deliveries and use cases, I tend to treat functionally. The way this ends up looking, I have functions, which have instances of classes, injected through a higher-order function. The functional code then interacts with the various objects and classes passes into it, in a functional manor. I may fetch a list of items from a repository class, map through them, filter them, and pass the results into another class which will store them somewhere, or put them in a bucket.
  • Timothy Morgan: But what we do know is that the [Cray] machine will weigh in at around 30 megawatts of power consumption, which means it will have more than 10X the sustained performance of the current Sierra system on DOE applications and around 4X the performance per watt. This is a lot better energy efficiency than many might have been expecting – a few years back there was talk of exascale systems requiring as much as 80 megawatts of juice, which would have been very rough to pay for at a $1 per kilowatt per year. With those power consumption numbers, it would have cost $500 million to build El Capitan but it would have cost around $400 million to power it for five years; at 30 megawatts, you are in the range of $150 million, which is a hell of a lot more feasible even if it is an absolutely huge electric bill by any measure.
  • Timothy Prickett Morgan: All of us armchair architecture quarterbacks have been thinking the CPU of the future looks like a GPU card, with some sort of high bandwidth memory that’s really close. 
  • Garrett Heinlen (Netflix): I believe GraphQL also goes a step further beyond REST and it helps an entire organization of teams communicate in a much more efficient way. It really does change the paradigm of how we build systems and interact with other teams, and that’s where the power truly lies. Instead of the back end dictating, “Here are the APIs you receive and here’s the shape in the format you’re going to get,” they express what’s possible to access. The clients have all the power between pulling in the data just what they need. The schema is the API contract between all teams and it’s a living evolving source of truth for your organization. Gone are the days of people throwing code over the wall thing like, “Good luck, it’s done.” Instead, GraphQL promotes more of a uniform working experience amongst front end and back end, and I would go further to say even product and designer could have been involved in this process as well to understand the business domain that you’re all working within.

Useful Stuff:

  • Fun thread. @jessfraz: Tell me about the weirdest bug you had that caused a datacenter outage, can be anywhere in the stack including human error. @dormando: one day all the sun servers fired temp alarms and shut off. thought AC had died or there was a fire. Turns out cleaners had wedged the DC door open, causing a rapid humidity shift, tricking the sensors. @ewindisch: connection pool leak in a distributed message queue I wrote caused the cascade failure of a datacenter’s network switches. This brought offlin a large independent cloud provider around 2013. @davidbrunelle: Unexpected network latency caused TCP sockets to stay open indefinitely on a fleet of servers running an application. This eventually led to PAT exhaustion causing around ~50% of outbound calls from the datacenter to fail causing a DC-wide brownout.
  • What happens when you go from LAMP to serverless: case study of externals.io. 90% of the requests are below 100ms. $17.37/month. Generally low effort migration.
  • By continuously monitoring increases in spend, we end up building scalable, secure and resilient Lambda based solutions while maintaining maximum cost-effectiveness. How We Reduced Lambda Functions Costs by Thousands of Dollars: In the last 7 months, we started using Lambda based functions heavily in production. It allowed us to scale quickly and brought agility to our development activities…We were serving +80M Lambda invocations per day across multiple AWS regions with an unpleasant surprise in the form of a significant bill…once we start running heavy workloads in production, the cost become significant and we spent thousands of dollars daily…to reduce AWS Lambda costs, we monitored Lambda memory usage and execution time based on logs stored in CloudWatch…we created dynamic visualizations on Grafana based on metrics available in the timeseries database and we were able to monitor in near real-time Lambda runtime usage…we gain insights into the right sizing of each Lambda function deployed in our AWS account and we avoided excessive over-allocation of memory. Hence, significantly reduced the Lambda’s cost…To gather more insights and uncover hidden costs, we had to identify the most expensive functions. Thats where Lambda Tags comes into the play. We leveraged those metadata to breakdown the cost per Stack…By reducing the invocation frequency (control concurrency with SQS), we reduced the cost up to 99%…we’re evaluating alternative services like Spot Instances & Batch Jobs to run heavy non-critical workloads considering the hidden costs of Serverless…we were using SNS and we had issues with handling errors and Lambda timeout, so we changed our architecture to use instead SQS and we configured a dead letter queue to reduce the number of times the same message can be handled by the Lambda function (avoir recursion). Hence, reducing the number of invocations.
  • Six Shades of Coupling: Content Coupling, Common Coupling, External Coupling, Control Coupling, Stamp Coupling and Data Coupling. 
  • When does redundancy actually help availability?: The complexity added by introducing redundancy mustn’t cost more availability than it adds. The system must be able to run in degraded mode. The system must reliably detect which of the redundant components are healthy and which are unhealthy. The system must be able to return to fully redundant mode.
  • AI Algorithms Need FDA-Style Drug Trials. The problem with this idea is molecules do not change whereas software continuously changes and learning software by definition changes reactively. No static proces like a one and done drug trial will yield meaningful results. We need a different approach that considers the unique nature software plays in systems. Certainly vendors can’t be trusted. Any AI will tell you that. Perhaps create a set of test courses that platforms can be continuously tested and fuzzed against?
  • AWS Lambda is not ready to replace convenctional EC2Why we didn’t brew our Chai on AWS Lambda: Chai Point, India’s largest organized Chai retailer, with over 150+ stores and over 1000+ boxC(IoT Enabled Chai and Coffee vending machines) are designed for corporate which serves approximately 250k cups of chai per day from all the channels…Most of the Chai Point’s stores and boxC machines typically run between 7 AM to 9 PM…[Lambda cold start is] one of the most critical and deciding factors for us to move back the Shark infrastructure to EC2…AWS Lambda has a limit of 50 MB as the maximum deployment package…it takes a delay of 1–2 minutes for logs to appear in the CloudWatch which makes it difficult for immediate debugging in a test environment…when it comes to deploying it in enterprise solutions where there are inter-services dependencies I think there is still time especially for languages like Java. 
  • Facebook Performance @Scale 2019 recap videos are now available. 
  • Sharing is caring until it becomes overbearing. Dropbox no longer shares code between platforms. Their policy now is to use the native language on each platform. It is simply easier and quicker to write code twice. And you don’t have to train people on using a custom stack. The tools are native. So when people move on you have not lost critical expertise. The one codebase to rule them all dream dies hard. No doubt it will be back in short order, filtered through some other promising stack.
  • Everyone these days wants your functions. Oracle Functions Now Generally Available. It’s built on the Apache 2.0 licensed Fn Project. Didn’t see much in the way of reviews or on costs.
  • On LeanXcale database. Interview with Patrick Valduriez and Ricardo Jimenez-Peris: There is a class of new NewSQL databases in the market, called Hybrid Transaction and Analytics Processing (HTAP). NewSQL is a recent class of DBMS that seeks to combine the scalability of NoSQL systems with the strong consistency and usability of RDBMSs. LeanXcale’s architecture is based on three layers that scale out independently, 1) KiVi, the storage layer that is a relational key-value data store, 2) the distributed transactional manager that provides ultra-scalable transactions, and 3) the distributed query engine that enables to scale out both OLTP and OLAP workloads. he storage layer, it is a proprietary relational key-value data store, called KiVi, which we have developed. Unlike traditional key-value data stores, KiVi is not schemaless, but relational. Thus, KiVi tables have a relational schema, but can also have a part that is schemaless. The relational part enabled us to enrich KiVi with predicate filtering, aggregation, grouping, and sorting. As a result, we can push down all algebraic operators below a join to KiVi and execute them in parallel, thus saving the movement of a very large fraction of rows between the storage layer and they query engine layer.
  • Apollo Day New York City 2019 Recap
    • During his keynote, DeBergalis announced one of Apollo’s most anticipated innovations, Federation, which utilizes the idea of a new layer in the data stack to directly meet developers’ needs for a more scalable, reliable, and structured solution to a centralized data graph.
    • Federation paired with existing features of Apollo’s platform like schema change validation listing creates a flow where teams can independently push updates to product microservices. This triggers re-computation of the whole graph, which is validated and then pushed into the gateway. Once completed, all applications contain changes in the part of the graph that is available to them. These events happen independently, so there is a way to operate, which allows each team to be responsible solely for its piece.
    • Another key concept that DeBergalis detailed was the idea that a “three-legged” stack is emerging in front-end development. The “legs” of this new “stool” that form the basis of this stack are React, Apollo, and Typescript. React provides developers with a system for managing user components, Apollo provides developers a system for managing data, and Typescript provides a foundation underneath that provides static typing end-to-end through the stack.
  • Lesson: sticker shock—in Google Cloud everything costs more you think it will, but it’s still worth it. Etsy’s Big Data Cloud Migration. Etsy generates a terabyte of data a day, they run hundreds of Hadoop workflows and thousands of jobs daily. Started out on prem. They migrated to the cloud over a year and half ago, driven by needing both the machine and people resources required to keep up with machine leaning and data processing tasks. Moving into the cloud decoupled systems so groups can operate independently. With their on prem system they didn’t worry about optimization, but on the cloud you must because the cloud will do whatever you tell it do—at a price. In the cloud there’s a business case for making things more efficient. They rearchitected as they moved over. Managed services were a huge win. As they grew bigger they simply didn’t have the resources and the expertise to run all the needed infrastructure. That’s now Google’s job. This allowed having more generalized teams. It would be impossible for their team of 4 to manage all the things they use in GCP. Specialization is not required run things. If you need it you just turn it on. That includes services like BigTable, k8s, Cloud Pub/Sub, Cloud Dataflow, and AI. It allows Etsy to punch above their weight class. They have a high level of support, with Google employees  embedded on their team. Etsy didn’t lift and shift,they remade the platform as they moved over. If they had to do it over again they might have tried for a middle road, changing things before the migration.
  • Facebook Systems @Scale 2019 recap videos are now available.
  • The human skills we need an an unpredictable world. Efficiency and robustness trade off against each other. The more efficient something is the less slack there is to handle to the unexpected. When you target efficiency you may be making yourself more vulnerable to shocks.
  • The lesson is, you can’t wait around for Netflix or anyone else to promote your show. It’s up to you to create the buzz. How a Norwegian Viking Comedy Producer Hacked Netflix’s Algorithm: The key to landing on Netflix’s radar, he knew, would be to hack its recommendation engine: get enough people interested in the show early…Three weeks before launch, he set up a campaign on Facebook, paying for targeted posts and Facebook promotions. The posts were fairly simple — most included one of six short (20- to 25-second) clips of the show and a link, either to the show’s webpage or to media coverage. They used so-called A/B testing — showing two versions of a campaign to different audiences and selecting the most successful — to fine-tune. The U.S. campaign didn’t cost much — $18,500, which Tangen and his production partners put up themselves — and it was extremely precise. In just 28 days, the Norsemen campaign reached 5.5 million Facebook users, generating 2 million video views and some 6,000 followers for the show. Netflix noticed. “Three weeks after we launched, Netflix called me: ‘You need to come to L.A., your show is exploding,'” Tangen recalls. Tangen invested a further $15,000 to promote the show on Facebook worldwide, using what he had learned during the initial U.S. campaign.
  • How did NASA Steer the Saturn V? Triply redundant in logic. Doubly redundant in memory.  Two compared to make sure getting the same answer. If the same numbers aren’t returned then a subroutine is called to determine at this point in the flight which number makes the most sense. During all Saturn flights had less than 10 miscompares. More component means less reliability. Never had catastrophic failure. Biggest problem is vibration.
  • Interesting idea, instead of interviews use how well a candidate performs on training software to determine how well a candidate knows a set of skills. The Cloudcast – Cloud Computing. Role of generalists is gone. Pick a problem people are struggling with and become an expert at solving that problem and market yourself as person who has the skill of solving the problem. 
  • The end state for any application is to write its own scheduler.  Making Instagram.com faster: Part 1. Use preload tags to start fetching resources as soon as possible. You can even preload GraphQL requests to get a head start on those long queries. Preloads have a higher network priority. Preload tag for all script resources and to place them in the order that they would be needed. Load in new batches before the user hits the end of their current feed. A prioritized task abstraction that handles queueing of asynchronous work (in this case, a prefetch for the next batch of feed posts).  If the user scrolls close enough to the end of the current feed, we increase the priority of this prefetch task to ‘high’ by cancelling the pending idle callback and thus firing off the prefetch immediately. Once the JSON data for the next batch of posts arrives, we queue a sequential background prefetch of all the images in that preloaded batch of posts. We prefetch these sequentially in the order the posts are displayed in the feed rather than in parallel, so that we can prioritize the download and display of images in posts closest to the user’s viewport. Also Preemption in Nomad — a greedy algorithm that scales
  • Native lazy loading has arrived! Adding the loading attribute to the images decreased the load time on a fast network connection by ~50% — it went from ~1 second to < 0.5 seconds, as well as saving up to 40 requests to the server 🎊. All of those performance enhancements just from adding one attribute to a bunch of images!
  • Maybe it should just be simpler to create APIs?  Simple Two-way Messaging using the Amazon SQS Temporary Queue Client. Seems a lot of people use queues for front-end back-end communication because it’s simpler to setup and easier to secure than createing an HTTP endpoint. So AWS came up with a virtual queue that let’s you multiplex many virtual queues over a physical queue. No extra cost. It’s all done on the client. A clever tag based heartbeat mechanism is used to garbage collect queues.
  • Monolith to Microservices to Serverless — Our journey: A large part of our technology stack at that time comprised of a Spring based application and a MySQL database running on VMs in a data centre…The application was working for our thousands of customers, day in, day out, with little to no downtime. But it couldn’t be denied that new features were becoming difficult to build and the underlying infrastructure was beginning to struggle to scale as we continued to grow as a business…We needed a drastic rethink of our infrastructure and that came in the shape of Docker containers and Kubernetes…We took a long hard look at our codebase and with the ‘independent loosely coupled services’ mantra at the forefront of our minds we were quickly able to break off large parts of the monolith into smaller much more manageable services. New functionality was designed and built in the same way and we were quickly up to a 2 node K8s cluster with over 35 running pods….Fast forward to Today and we have now been using AWS for well over 2 years, we have migrated the core parts of our reporting suite into the cloud and where appropriate all new functionality is built using serverless AWS services. Our ‘serverless first’ ethos allows us to build highly performant and highly scaleable systems that are quick to provision and easy to manage. 
  • This is Crypto 101. Security Now 727 BlackHat & DefCon. Steve Gibson details how electronic hotel locks can protect themselves against replay attacks: All that’s needed to prevent this is for the door, when challenged to unlock, to provide a nonce for the phone to sign and return. The door contains a software ratchet. This is a counter which feeds a secretly-keyed AES symmetric cipher. Each door lock is configured with its own secret key which is never exposed. The AES cipher which encrypts a counter, produces a public elliptic key which is used to verify signatures. So the door lock first checks the key that it is currently valid for and has been using. If that fails, it checks ahead to the next public key to see whether that one can verify the returned signature. If not, it ignores the request. But if the next key does successfully verify the request’s signature it makes the next key permanent, ratcheting forward and forgetting the previous no-longer-valid key. This means that the door locks do not need to communicate with the hotel. Each door lock is able to operate autonomously with its own secret key which determines the sequence of its public keys. The hotel system knows each room’s secret key so it’s able to issue the proper private signing key to each guest for the proper room. If that system is designed correctly, no one with a copy of the Mobile Key software, and the ability to eavesdrop on the conversation, is able to gain any advantage from doing so.
  • Trip report: Summer ISO C++ standards meeting (Cologne). Reddit trip report. C++20 is now feature complete. Added: modules, coroutines, concepts including in the standard library via ranges, <=> spaceship including in the standard library, broad use of normal C++ for direct compile-time programming, ranges, calendars and time zones, text formatting, span, and lots more. Contracts were moved to C++21. 
  • Ingesting data at “Bharat” Scale: Initially, we considered Redis for our failover store, but with serving an average ingestion rate of 250K events per second, we would end up needing large Redis clusters just to support minutes worth of panic of our message bus. Finally, we decided to use a failover log producer that writes logs locally to disk. This periodically rotates & uploads to S3…We’ve seen outages, where our origin crashes & as it tries to recover, it is inundated with client retries & pending requests in the surge queue. That’s a recipe for cascading failure…We want to continue to serve the requests we can sustain, for anything over that, sorry, no entry. So we added a rate-limit to each of our API servers. We arrived at this configuration after a series of simulations & load-tests, to truly understand at what RPS our boxes will not sustain the load. We use nginx to control the number of requests per second using a leaky bucket algorithm. The target tracking scaling trigger is 3/4th of the rate-limit, to allow for the room to scale; but there are still occasions where large surges are too quick for target-tracking scaling to react.

Soft Stuff:

  • jedisct1/libsodium: Sodium is a new, easy-to-use software library for encryption, decryption, signatures, password hashing and more. It is a portable, cross-compilable, installable, packageable fork of NaCl, with a compatible API, and an extended API to improve usability even further. Its goal is to provide all of the core operations needed to build higher-level cryptographic tools.
  • amejiarosario/dsa.js-data-structures-algorithms-javascript: In this repository, you can find the implementation of algorithms and data structures in JavaScript. This material can be used as a reference manual for developers, or you can refresh specific topics before an interview. Also, you can find ideas to solve problems more efficiently.
  • linkedin/brooklin: Brooklin is a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale. Designed for multitenancy, Brooklin can simultaneously power hundreds of data pipelines across different systems and can easily be extended to support new sources and destinations.
  • gojekfarm/hospital: Hospital is an autonomous healing system for any System. Any failure or faults occurred in the system will be resolved automatically according to given run-book by the Hospital without manual intervention.
  • BlazingDB/pyBlazing: BlazingSQL is a GPU accelerated SQL engine built on top of the RAPIDS ecosystem. RAPIDS is based on the Apache Arrow columnar memory format, and cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
  • serverless/components: Forget infrastructure — Serverless Components enables you to deploy entire serverless use-cases, like a blog, a user registration system, a payment system or an entire application — without managing complex cloud infrastructure configurations.

Pub Stuff:

  • Zooming in on Wide-area Latencies to a Global Cloud Provider: The network communications between the cloud and the client have become the weak link for global cloud services that aim to provide low latency services to their clients. In this paper, we first characterize WAN latency from the viewpoint of a large cloud provider Azure, whose network edges serve hundreds of billions of TCP connections a day across hundreds of locations worldwide. 
  • What is Applied Category Theory? Two themes that appear over and over (and over and over and over) in applied category theory are functorial semantics and compositionality. 
  • ML can never be fair. On Fairness and Calibration: In this paper, we investigate the tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates. We show that calibration is compatible only with a single error constraint (i.e. equal false-negatives rates across groups), and show that any algorithm that satisfies this relaxation is no better than randomizing a percentage of predictions for an existing classifier. These unsettling findings, which extend and generalize existing results, are empirically confirmed on several datasets.

from High Scalability

Stuff The Internet Says On Scalability For August 2nd, 2019

Stuff The Internet Says On Scalability For August 2nd, 2019

Wake up! It’s HighScalability time—once again:

Do you like this sort of Stuff? I’d greatly appreciate your support on Patreon. I wrote Explain the Cloud Like I’m 10 for people who need to understand the cloud. And who doesn’t these days? On Amazon it has 52 mostly 5 star reviews (121 on Goodreads). They’ll learn a lot and hold you in even greater awe.

Number Stuff:

  • $9.6B: games investment in last 18 months, equal to the previous five years combined.
  • $3 million: won by a teenager in the Fortnite World Cup.  
  • 100,000: issues in Facebook’s codebase fixed from bugs found by static analysis. 
  • 106 million: Capital 1 IDs stolen by a former Amazon employee. (complaint)
  • 2 billion: IoT devices at risk because of 11 VXWorks zero day vulnerabilities.
  • 2.1 billion: parking spots in the US, taking 30% of city real estate, totaling 34 billion square meters, the size of West Virginia, valued at 60 trillion dollars.
  • 2.1 billion: people use Facebook, Instagram, WhatsApp, or Messenger every day on average. 
  • 100: words per minute from Facebook’s machine-learning algorithms capable of turning brain activity into speech. 
  • 51%: Facebook and Google’s ownership of the global digital ad market space on the internet.
  • 56.9%: Raleigh, NC was the top U.S. city for tech job growth.
  • 20-30: daily CPAN (Perl) uploads. 700-800 for Python.
  • 476 miles: LoRaWAN (Low Power, Wide Area (LPWA)) distance world record broken using 25mW transmission power.
  • 74%: Skyscanner savings using spot instances and containers on the Kubernetes cluster.
  • 49%: say convenience is more important than price when selecting a provider.
  • 30%: Airbnb app users prefer a non-default font size.
  • 150,000: number of databases migrated to AWS using the AWS Database Migration Service.
  • 1 billion: Google photos users, @MikeElgan: same size as Instagram but far larger than Twitter, Snapchat or Pinterest
  • 300M: Pinterest monthly active users with evenue of $261 million, up 64% year-over-year, on losses of $26 million for the second-quarter of 2019.
  • 7%: of all dating app messages were rated as false.
  • $100 million: Goldman Sachs spend to improve stock trades from hundreds of milliseconds down to 100 microseconds while handling more simultaneous trades. The article mentions using microservices and event sourcing, but it’s not clear how that’s related.

Quotable Stuff:

  • Josh Frydenberg, Australian Treasurer: Make no mistake, these companies are among the most powerful and valuable in the world. They need to be held to account and their activities need to be more transparent.
  • Neil Gershenfeld: Fabrication merges with communication and computation. Most fundamentally, it leads to things like morphogenesis and self-reproducing an assembler. Most practically, it leads to almost anybody can make almost anything, which is one of the most disruptive things I know happening right now. Think about this range I talked about as for computing the thousand, million, billion, trillion now happening for the physical world, it’s all here today but coming out on many different link scales.
  • Alan Kay: Marvin and Seymour could see that most interesting systems were crossconnected in ways that allowed parts to be interdependent on each other—not hierarchical—and that the parts of the systems needed to be processes rather than just “things”
  • Lawrence Abrams: Now that ransomware developers know that they can earn monstrous payouts from local cities and insurance policies, we see a new government agency, school district, or large company getting hit with a ransomware attack every day.
  • @tmclaughbos: A lot of serverless adoption will fail because organizations will push developers to assume more responsibility down the stack instead of forcing them to move up the stack closer to the business.
  • Lightstep: Google Cloud Functions’ reusable connection insertion makes the requests more than 4 times faster [than S3] both in region and cross region.
  • HENRY A. KISSING, ERERIC SCHMIDT, DANIEL HUTTENLOCHER: The evolution of the arms-control regime taught us that grand strategy requires an understanding of the capabilities and military deployments of potential adversaries. But if more and more intelligence becomes opaque, how will policy makers understand the views and abilities of their adversaries and perhaps even allies? Will many different internets emerge or, in the end, only one? What will be the implications for cooperation? For confrontation? As AI becomes ubiquitous, new concepts for its security need to emerge. The three of us differ in the extent to which we are optimists about AI. But we agree that it is changing human knowledge, perception, and reality—and, in so doing, changing the course of human history. We seek to understand it and its consequences, and encourage others across disciplines to do the same.
  • minesafetydisclosures: Visa’s business is all about scale. That’s because the company’s fixed costs are high, but the cost of processing a transaction is essentially zero. Said more simply, it takes a big upfront investment in computers, servers, personnel, marketing, and legal fees to run Visa. But those costs don’t increase as volume increases; i.e., they’re “fixed”. So as Visa processes more transactions through their network, profit swells. As a result, the company’s operating margin has increased from 40% to 65%. And the total expense per transaction has dropped from a dime to a nickel; of which only half of a penny goes to the processing cost. Both trends are likely to continue.
  • noobiemcfoob: Summarizing my views: MQTT seems as opaque as WebSockets without the benefits of being built on a very common protocol (HTTP) and being used in industries beyond just IoT. The main benefits proponents of MQTT argue for (low bandwidth, small libraries) don’t seem particularly true in comparison to HTTP and WebSockets.
  • erincandescent: It is still my opinion that RISC-V could be much better designed; though I will also say that if I was building a 32 or 64-bit CPU today I’d likely implement the architecture to benefit from the existing tooling.
  • Director Jon Favreau~  the plan was to create a virtual Serengeti in the Unity game engine, then apply live action filmmaking techniques to create the film — the “Lion King” team described this as a “virtual production process.”
  • Alex Heath: In confidential research Mr. Cunningham prepared for Facebook CEO Mark Zuckerberg, parts of which were obtained by The Information, he warned that if enough users started posting on Instagram or WhatsApp instead of Facebook, the blue app could enter a self-sustaining decline in usage that would be difficult to undo. Although such “tipping points” are difficult to predict, he wrote, they should be Facebook’s biggest concern. 
  • jitbit: Well, to be embarrassingly honest… We suck at pricing. We were offering “unlimited” plans to everyone until recently. And the “impressive names” like you mention, well, they mostly pay us around $250 a month – which used to be our “Enterprise” pricing plan with unlimited everything (users, storage, agents etc.) So I guess the real answer is – we suck at positioning and we suck at marketing. As the result – profits were REALLY low (Lesson learned – don’t compete on pricing). P.S. Couple of years ago I met Thomas from “FE International” at some conference, really experienced guy, who told me “dude, this is crazy, dump the unlimited plan like right now” so we did. So I guess technically we can afford a PaaS now…
  • 1e-9: The markets are kind of like a massive, distributed, realtime, ensemble, recursive predictor that performs much better than any one of its individual component algorithms could. The reason why shaving a few milliseconds (or even microseconds) can be beneficial is because the price discovery feedback loops get faster, which allows the system to determine a giant pricing vector that is more self-consistent, stable, and beneficial to the economy. It’s similar to how increasing the sample rate of a feedback control system improves performance and stability. Providers of such benefits to the markets get rewarded through profit.
  • @QuinnyPig: There’s something else afoot too. I fix cloud bills. If I offer $10k to save $100k people sign off. If I offer $10 million to save $100 million people laugh me out of the room. Large numbers are psychologically scary.
  • mrjn:  Is it worth paying $20K for any DB or DB support? If it would save you 1/10th of an engineer per year, it becomes immediately worth. That means, can you avoid 5 weeks of one SWE by using a DB designed to better suit your dataset? If the answer is yes (and most cases it is), then absolutely that price is worth. See my blog post about how much money it must be costing big companies building their graph layers. Second part is, is Dgraph worth paying for compared to Neo or others? Note that the price is for our enterprise features and support. Not for using the DB itself. Many companies run a 6-node or a 12-node distributed/replicated Dgraph cluster and we only learn that much later when they’re close to pushing it into production and need support. They don’t need to pay for it, the distributed/replicated/transactional architecture of Dgraph is all open source. How much would it cost if one were to run a distributed/replicated setup of another graph DB? Is it even possible, can it execute and perform well? And, when you add support to that, what’s the cost?
  • @codemouse: It’s halfway to 2020. At this point, if any of your strategy is continued investment into your data centers you’re doing it wrong. Yes migration may take years, but you’re not going to be doing #cloud or #ops better than @awscloud
  • hermitdev: Not Citibank, but previously worked for a financial firm that sold a copy of it’s back office fund administration stack. Large, on site deployment. It would take a month or two to make a simple DNS change so they could locate the services running on their internal network. The client was a US depository trust with trillions on deposit. No, I wont name any names. But getting our software installed and deployed was as much fun as extracting a tooth with a dull wood chisel and a mallet.
  • Insikt Group: Approximately 50% of all activity concerning ransomware on underground forums are either requests for any generic ransomware or sales posts for generic ransomware from lower-level vendors. We believe this reflects a growing number of low-level actors developing and sharing generic ransomware on underground forums.
  • Facebook: For classes of bugs intended for all or a wide variety of engineers on a given platform, we have gravitated toward a “diff time” deployment, where analyzers participate as bots in code review, making automatic comments when an engineer submits a code modification. Later, we recount a striking situation where the diff time deployment saw a 70% fix rate, where a more traditional “offline” or “batch” deployment (where bug lists are presented to engineers, outside their workflow) saw a 0% fix rate.
  • Andy Rachleff: Venture capitalists know that the thing that causes their companies to go out of business is lack of a market, not poor execution. So it’s a fool’s errand to back a company that proposes to do a ride-hailing service or renting a room or something as crazy as that. Again–how would you know if it’s going to work? So the venture industry outsourced that market risk to the angel community. The angel community thinks they won it away from the venture community, but nothing could be further from the truth, because it’s a sucker bet. It’s a horrible risk/reward. The venture capitalists said, “Okay, let the angels invest at a $5 million valuation and take all of that market risk. We’ll invest at a $50 million valuation. We have to pay up if it works.” Now they hope the company will be worth $5 billion to make the same return as they would have in the old model. Interestingly, there now are as many companies worth $5 billion today as there were companies worth $500 million 20 years ago, which is why the returns of the premier venture capital firms have stayed the same or even gone up.
  • imagetic: I dealt with a lot of high traffic live streaming video on Facebook for several years. We saw interaction rates decline almost 20x in a 3 year period but views kept increasing. Things just didn’t add up when the dust settled and we’d look at the stats. It wouldn’t be the least bit surprised if every stat FB has fed me was blown extremely out of proportion.
  • prism1234: If you are designing a small embedded system, and not a high performance general computing device, then you already know what operations your software will need and can pick what extensions your core will have. So not including a multiply by default doesn’t matter in this case, and may be preferred if your use case doesn’t involve a multiply. That’s a large use case for risc-v, as this is where the cost of an arm license actually becomes an issue. They don’t need to compete with a cell phone or laptop level cpu to still be a good choice for lots of devices.
  • oppositelock: You don’t have time to implement everything yourself, so you delegate. Some people now have credentials to the production systems, and to ease their own debugging, or deployment, spin up little helper bastion instances, so they don’t have to use 2FA each time to use SSH or don’t have to deal with limited-time SSH cert authorities, or whatever. They roll out your fairly secure design, and forget about the little bastion they’ve left hanging around, open to 0.0.0.0 with the default SSH private key every dev checks into git. So, any former employee can get into the bastion.
  • Lyft: Our tech stack comprises Apache Hive, Presto, an internal machine learning (ML) platform, Airflow, and third-party APIs.
  • Casey Rosenthal: It turns out that redundancy is often orthogonal to robustness, and in many cases it is absolutely a contributing factor to catastrophic failure. The problem is, you can’t really tell which of those it is until after an incident definitively proves it’s the latter.
  • Colm MacCárthaigh: There are two complementary tools in the chest that we all have these days, that really help combat Open Loops. The first is Chaos Engineering. If you actually deliberately go break things a lot, that tends to find a lot of Open Loops and make it obvious that they have to be fixed.
  • @eeyitemi: I’m gonna constantly remind myself of this everyday. “You can outsource the work, but you can’t outsource the risk.” @Viss 2019
  • Ben Grossman~ this could lead to a situation where filmmaking is less about traditional “filmmaking or storytelling,” and more about “world-building”: “You create a world where characters have personalities and they have motivations to do different things and then essentially, you can throw them all out there like a simulation and then you can put real people in there and see what happens.”
  • cheeze: I’m a professional dev and we own a decent amount of perl. That codebase is by far the most difficult to work in out of anything we own. New hires have trouble with it (nobody learns perl these days). Lots of it is next to unreadable.
  • Annie Lowrey: All that capital from institutional investors, sovereign wealth funds, and the like has enabled start-ups to remain private for far longer than they previously did, raising bigger and bigger rounds. (Hence the rise of the “unicorn,” a term coined by the investor Aileen Lee to describe start-ups worth more than $1 billion, of which there are now 376.) Such financial resources “never existed at scale before” in Silicon Valley, says Steve Blank, a founder and investor. “Investors said this: ‘If we could pull back our start-ups from the public market and let them appreciate longer privately, we, the investors, could take that appreciation rather than give it to the public market.’ That’s it.”
  • alexis_fr: I wonder if the human life calculation worked well this time. As far as I see, Boeing lost more than the sum of the human lives; they also lost reputation for everything new they’ve designed in the last 7 years being corrupted, and they also engulfed the reputation of FAA with them, whose agents would fit the definition of “corrupted” by any people’s definition (I know, they are not, they just used agents of Boeing to inspect Boeing because they were understaffed), and the FAA showed the last step of failure by not admitting that the plane had to be stopped until a few days after the European agencies. In other words, even in financial terms, it cost more than damages. It may have cost the entire company. They “DeHavailland”’ed their company. Ever heard of DeHavailland? No? That’s probably to do with their 4 successive deintegrating planes that “CEOs have complete trust in.” It just died, as a name. The risk is high.
  • Neil Gershenfeld: computer science was one of the worst things ever to happen to computers or science, why I believe that, and what that leads me to. I believe that because it’s fundamentally unphysical. It’s based on maintaining a fiction that digital isn’t physical and happens in a disconnected virtual world.
  • @benedictevans: Netflix and Sky both realised that a new technology meant you could pay vastly more for content than anyone expected, and take it to market in a new way. The new tech (satellite, broadband) is a crowbar for breaking into TV. But the questions that matter are all TV questions
  • @iamdevloper: Therapist: And what do we do when we feel like this? Me: buy a domain name for the side project idea we’ve had for 15 seconds. Therapist: No
  • @dvassallo: Step 1: Forget that all these things exist: Microservices, Lambda, API Gateway, Containers, Kubernetes, Docker. Anything whose main value proposition is about “ability to scale” will likely trade off your “ability to be agile & survive”. That’s rarely a good trade off. 4/25 Start with a t3.nano EC2 instance, and do all your testing & staging on it. It only costs $3.80/mo. Then before you launch, use something bigger for prod, maybe an m5.large (2 vCPU & 8 GB mem). It’s $70/mo and can easily serve 1 million page views per day.
  • PeteSearch: I believe we have an opportunity right now to engineer-in privacy at a hardware level, and set the technical expectation that we should design our systems to be resistant to abuse from the very start. The nice thing about bundling the audio and image sensors together with the machine learning logic into a single component is that we have the ability to constrain the interface. If we truly do just have a single pin output that indicates if a person is present, and there’s no other way (like Bluetooth or WiFi) to smuggle information out, then we should be able to offer strong promises that it’s just not possible to leak pictures. The same for speech interfaces, if it’s only able to recognize certain commands then we should be able to guarantee that it can’t be used to record conversations.
  • Murat: As I have mentioned in the previous blog post, MAD questions, Cosmos DB has operationalized a fault-masking streamlined version of replication via nested replica-sets deployed in fan-out topology. Rather than doing offline updates from a log, Cosmos DB updates database at the replicas online, in place, to provide strong consistent and bounded-staleness consistency reads among other read levels. On the other hand, Cosmos DB also maintains a change log by way of a witness replica, which serves several useful purposes, including fault-tolerance, remote storage, and snapshots for analytic workload.
  • grauenwolf: That’s where I get so frustrated. Far too often I hear “premature optimization” as a justification for inefficient code when doing it the right way would actually require the same or less effort and be more readable.
  • Murat: Leader – I tell you Paxos joke, if you accept me as leader. Quorum – Ok comrade. Leader – Here is joke! (*Transmits joke*) Quorum – Oookay… Leader – (*Laughs* hahaha). Now you laugh!! Quorum – Hahaha, hahaha.
  • Manmax75: The amount of stories I’ve heard from SysAdmins who jokingly try to access a former employers network with their old credentials only to be shocked they still have admin access is a scary and boggling thought.
  • @dougtoppin: Fargate brings significant opportunity for cost savings and to get the maximum benefit the minimal possible number of tasks must be running to handle your capacity needs. This means quickly detecting request traffic, responding just as quickly and then scaling back down.
  • @evolvable: At a startup bank we got management pushback when revealing we planned to start testing in production – concerns around regulation and employees accessing prod. We changed the name to “Production Verification”. The discussion changed to why we hadn’t been doing it until now. 
  • @QuinnyPig: I’m saying it a bit louder every time: @awscloud’s data transfer pricing is predatory garbage. I have made hundreds of thousands of consulting dollars straightening these messes out. It’s unconscionable. I don’t want to have to do this for a living. To be very clear, it’s not that the data transfer pricing is too expensive, it’s that it’s freaking inscrutable to understand. If I can cut someone’s bill significantly with a trivial routing change, that’s not the customer’s fault.
  • @PPathole: Alternative Big O notations: O(1) = O(yeah) O(log n) = O(nice) O(nlogn) = O(k-ish) O(n) = O(ok) O(n²) = O(my) O(2ⁿ) = O(no) O(n^n) = O(f*ck) O(n!) = O(mg!)
  • Brewster Kahle: There’s only a few hackers I’ve known like Richard Stallman, he’d write flawless code at typing speed. He worked himself to the bone trying to keep up with really smart former colleagues who had been poached from MIT. Carpal tunnel, sleeping under the desk, really trying hard for a few years and it was killing him. So he basically says I give up, we’re going to lose the Lisp machine. It was going into this company that was flying high, it was going to own the world, and he said it was going to die, and with it the Lisp machine. He said all that work is going to be lost, we need a way to deal with the violence of forking. And he came up with the GNU public license. The GPL is a really elegant hack in the classic sense of a hack. His idea of the GPL was to allow people to use code but to let people put it back into things. Share and share alike.

Useful Stuff:

  • It’s probably not a good idea to start a Facebook poll on the advisability of your pending nuptials a day before the wedding. But it is very funny and disturbingly plausible. Made Public. Another funny/sad one is using a ML bot to “deal with” phone scams. The sad part will be when both sides are just AIs trying to socially engineer each other and half the world’s resources become dedicated to yet another form of digital masturbation. Perhaps we should just stop the MADness?
  • Urgent/1111 Zero Day Vulnerabilities Impacting VxWorks, the Most Widely Used Real-Time Operating System (RTOS). I read this with special interest because I’ve used VxWorks on several projects. Not once do I ever remember anyone saying “I wonder if the TCP/IP stack has security vulnerabilities?” We looked at licensing costs, board support packages, device driver support, tool chain support, ISR service latencies, priority inversion handling, task switch determinacy, etc. Why did we never think of these kind of potential vulnerabilities? One reason is social proof. Surely all these other companies use VxWorks, it must be good, right? Another reason is VxWorks is often used within a secure perimeter. None of the network interfaces are supposed to be exposed to the internet, so remote code execution is not part of your threat model. But in reality you have no idea if a customer will expose a device to the internet. And you have no idea if later product enhancements will place the device on the internet. Since it seems all network devices expand until they become a router, this seems a likely path to Armageddon. At that point nobody is going to requalify their entire toolchain. That just wouldn’t be done in practice. VxWorks is dangerous because everything is compiled into a single image the boots and runs, much like a unikernel. At least when I used it that was the case. VxWorks is basically just a library you link into your application that provides OS functionality. Your write the boot code, device drivers, and other code to make your application work. So if there’s a remote code execution bug it has access to everything. And a lot of these images are built into ROM, so they aren’t upgradeable. And if even if the images are upgradeable in EEPROM or flash, how many people will actually do that? Unless you pay a lot of money you do not get the source to VxWorks. You just get libraries and header files. So you have no idea what’s going on in the network stack. I’m surprised VxWorks never tested their stack against a fuzzing kind of attack. That’s a great way to find bugs in protocols. Though nobody can define simplicity, many of the bugs were in the handling of the little used TCP Urgent Pointer feature. Anyone surprised that code around this is broke? Who uses it? It shouldn’t be in the stack at all. Simple to say, harder to do.
  • JuliaCon 2019 videos are now available. You might like Keynote: Professor Steven G. Johnson and The Unreasonable Effectiveness of Multiple Dispatch
  • CERN is Migrating to open-source technologies. Microsoft wants too much for their licenses so CERN is giving MS the finger.
  • Memory and Compute with Onur Mutlu:
    • The main problem is that DRAM latency is hardly improving at all. From 1999 to 2017, DRAM capacity has increased by 128x, bandwidth by 20x, but latency only by 1.3x! This means that more and more effort has to be spent tolerating memory latency.  But what could be done to actually improve memory latency?
    • You could “easily” get a 30% latency improvement by having DRAM chips provide a bit more precise information to the memory controller about actual latencies and current temperatures.
    • Another concept to truly break the memory barrier is to move the compute to the memory. Basically, why not put the compute operations in memory?  One way is to use something like High-Bandwidth Memory (HBM) and shorten the distance to memory by stacking logic and memory.
    • Another rather cool (but also somewhat limited) approach is to actually use the DRAM cells themselves as a compute engine. It turns out that you can do copy, clear, and even logic ops on memory rows by using the existing way that DRAMs are built and adding a tiny amount of extra logic.
  • Want to make something in hardware? Like Pebble, Dropcam, or Ring. Who you gonna call? Dragon Innovation. Listen how on the AMP Hour podcast episode #451 – An Interview with Scott Miller
    • Typical customers build between 5k and 1 million units, but will talk with you at 100 units. Customers usually start small. They’ve built a big toolbox for IoT, so they don’t need to create the wheel every time, they have designs for sensing, processing, electronics on the edge, radios, and all the different security layers. They can deploy quickly with little customizations.
    • Dragon is moving into doing the design, manufacturing, packaging, issue all POs, and installation support. They call this Product as a Service (PaaS)—full end-to-end provider. Say you have a sensor to determine when avocados are ripe you would pay per sensor per month, or maybe per avocado, instead of a one time sale. Seeing more non-traditional getting into the IoT space, with different revenue models, Dragon has an opportunity to innovate on their business model. 
    • Consumer is dying and industrial is growing. A trend they are seeing in the US is a constriction of business to consumer startups in the hardware space, but an an expansion of industrial IoT. There have been a bunch of high profile bankruptcies in the consumer space (Anki, Jibo).
    • Europe is growing. Overall huge growth in industrial startups across Europe. Huge number of capable factories in the EU. They get feet on the ground to find and qualify factories. They have over 2000 factories in their database. 75% in China, increasingly more in the EU and the US. 
    • Factories are going global. Seeing a lot of companies driven out of China by the 25% tariffs, moving into Asian pacific countries like Taiwan, Singapore, Vietnam, Indonesia, Malaysia. Coming up quickly, but not up to China’s level yet. Dragon will include RFQs on a global basis, including factories from the US, China, EU, Indonesia, Vietnam, to see what the landed cost is as a function of geography. 
    • Factories are different in different countries. In China factories are vertically integrated. Mold making, injection molding, final assembly and test and packaging, all under one roof. Which is very convenient. In the US and Europe factories are more horizontal. It takes a lot more effort to put together your supply chain.  As an example of the degree they were vertically integrated this factory in China would make their own paint and cardboard. 
    • Automation is huge in China. Chinese labor rates are on average 5 to 6 dollars an hour, depending on region, factory, training. Focus is on automation. One factory they worked with had 100,000 workers now they have 30,000 because of automation.
    • Automation is different in China. Automation in China is bottom’s up. They’ll build a simple robot that attaches to a soldering iron and will solder the leads. In the US is top down. Build a huge full functioning worker that can do anything instead of a task specific robot. China is really good at building stuff so they build task specific robots to make their processes more efficient. Since products are always changing this allows them to stay nimble. 
    • Also Strange PartsDesign for Manufacturing Course, How I Made My Own iPhone – in China.
  • BigQuery best practices: Controlling costs: Query only the columns that you need; Don’t run queries to explore or preview table data; Before running queries, preview them to estimate costs; Using the query validator; Use the maximum bytes billed setting to limit query costs; Do not use a LIMIT clause as a method of cost control; Create a dashboard to view your billing data so you can make adjustments to your BigQuery usage. Also consider streaming your audit logs to BigQuery so you can analyze usage patterns; Partition your tables by date; f possible, materialize your query results in stages;  If you are writing large query results to a destination table, use the default table expiration time to remove the data when it’s no longer needed; Use streaming inserts only if your data must be immediately available.
  • Boeing has changed a lot over the years. Once upon a time I worked on a project with Boeing and the people were excellent. This is something I heard: “The changes can be attributed to the influence of the McDonnel family who maintain extremely high influence through their stock shares resulting from the merger. It has been gradually getting better recently but still a problem for those inside who understand the real potential impact.”
  • Maybe we are all just random matrices? What Is Universality? It turns out there are deep patterns in complex correlated systems that lie somewhere between randomness and order. They arise from components that interact and repel one another. Do such patterns exist in software systems? Also, Bubble Experiment Finds Universal Laws
  • PID Loops and the Art of Keeping Systems Stable
    • I see a lot of places where control theory is directly applicable but rarely applied. Auto-scaling and placement are really obvious examples, we’re going to walk through some, but another is fairness algorithms. A really common fairness algorithm is how TCP achieves fairness. You’ve got all these network users and you want to give them all a fair slice. Turns out that a PID loop it’s what’s happening. In system stability, how do we absorb errors, recover from those errors? 
    • Something we do in CloudFront is we run a control system. We’re constantly measuring the utilization of each site and depending on that utilization, we figure out what’s our error, how far are we from optimized? We change the mass or radius of effect of each site, so that at our really busy time of day, really close to peak, it’s servicing everybody in that city, everybody directly around it drawing those in, but that at our quieter time of day can extend a little further and go out. It’s a big system of dynamic springs all interconnected, all with PID loops. It’s amazing how optimal a system like that can be, and how applying a system like that has increased our effectiveness as a CDN provider. 
    • A surprising number of control systems are just like this, they’re just Open Loops. I can’t count the number of customers I’ve gone through control systems with and they told me, “We have this system that pushes out some states, some configuration and sometimes it doesn’t do it.” I find that scary, because what it’s saying is nothing’s actually monitoring the system. Nothing’s really checking that everything is as it should be. My classic favorite example of this as an Open Loop process, is certificate rotation. I happened to work on TLS a lot, it’s something I spent a lot of my time on. Not a week goes by without some major website having a certificate outage.
    • We have two observability systems at AWS, CloudWatch, and X-Ray. One of the things I didn’t appreciate until I joined AWS – I was a bit going on like Charlie and the chocolate factory, and seeing the insides. I expected to see all sorts of cool algorithms and all sorts of fancy techniques and things that I just never imagined. It was a little bit of that, there was some of that once I got inside working, but mostly what I found was really mundane, people were just doing a lot of things at scale that I didn’t realize. One of those things was just the sheer volume of monitoring. The number of metrics we keep on, every single host, every single system, I still find staggering.
    • Exponential Back-off is a really strong example. Exponential Back-off is basically an integral, an error happens and we retry, a second later if that fails, then we wait. Rate limiters are like derivatives, they’re just rate estimators and what’s going on and deciding what’s to let in and what to let out. We’ve built both of these into the AWS SDKs. We’ve got other back pressure strategies too, we’ve got systems where servers can tell clients, “Back off, please, I’m a little busy right now,” all those things working together. If I look at system design and it doesn’t have any of this, if it doesn’t have exponential back-off, if it doesn’t have rate-limiters in some place, if it’s not able to fight some power-law that I think might arise due to errors propagating, that tells me I need to be a bit more worried and start digging deeper.
    • I like to watch out for edge triggering in systems, it tends to be an anti-pattern. One reason is because edge triggering seems to imply a modal behavior. You cross the line, you kick into a new mode, that mode is probably rarely tested and it’s now being kicked into at a time of high stress, that’s really dangerous. Your system has to be idempotent, if you’re going to build an idempotent system, you might as well make a level-triggered system in the first place, because generally, the only benefit of building an edge-triggered system is it doesn’t have to be idempotent.
    • There is definitely tension between stability and optimality, and in general, the more finely-tuned you want to make a system to achieve absolute optimality, the more risk you are of being able to drive it into an unstable state. There are people who do entire PIDs on nothing else then finding that balance for one system. Oil refineries are a good example, where the oil industry will pay people a lot of money just to optimize that, even very slightly. Computer Science, in my opinion, and distributed systems, are nowhere near that level of advanced control theory practice yet. We have a long way to go. We’re still down at the baby steps of, “We’ll at least measure it.”
  • Re:Inforce 2019 videos are now available.
  • Top Seven Myths of Robust Systems: The number one myth we hear out in the field is that if a system is unreliable, we can fix that with redundancy; rather than trying to simplify or remove complexity, learn to live with it. Ride complexity like a wave. Navigate the complexity; The adaptive capacity to improvise well in the face of a potential system failure comes from frequent exposure to risk; Both sides — the procedure-makers and the procedure-not-followers — have the best of intentions, and yet neither is likely to believe that about the other; Unfortunately it turns out catastrophic failures in particular tend to be a unique confluence of contributing factors and circumstances, so protecting yourself from prior outages, while it shouldn’t hurt, also doesn’t help very much; Best practices aren’t really a knowable thing; Don’t blame individuals. That’s the easy way out, but it doesn’t fix the system. Change the system instead. 
  • They grow up so slow. What’s new in JavaScript: Google I/O 2019 Summary
  • From a rough calculation we saw about 40% decrease in the amount of CPU resources used. Overall, we saw latency stabilize for both avg and max p99. Max p99 latency also decreased a bit. Safely Rewriting Mixpanel’s Highest Throughput Service in Golang. Mixpanel moved from Python to Go for their data collection API. They has already migrated the Python API to use the Google Load Balancer to route messages to kubernetes pod on Google Cloud where an Envoy container load-balanced between eight Python API containers. The Python API containers then submitted the data to Google Pubsub queue via a pubsub sidecar container that had a kestrel interface. To enable testing against live traffic, we created a dedicated setup. The setup was a separate kubernetes pod running in the same namespace and cluster as the API deployments. The pod ran an open source API correctness tool, Diffy, along with copies of the old and new API services. Diffy is a service that accepts HTTP requests, and forwards them to two copies of an existing HTTP service and one copy of a candidate HTTP service. One huge improvement is we only need to run a single API container per pod. 
  • Satisfactory: Network Optimizations: It would be a big gain to stop replicating the inventory when it’s not viewed, which is essentially what we did, but the method of doing so was a bit complicated and required a lot of rework…Doing this also helps to reduce CPU time, as an inventory is a big state to compare, and look for changes in. If we can reduce that to a maximum of 4x the number of players it is a huge gain, compared to the hundreds, if not thousands, that would otherwise be present in a big base…There is, of course, a trade-off. As I mentioned there is a chance the inventory is not there when you first open to view it, as it has yet to arrive over the network…In this case the old system actually varied in size but landed around 48 bytes per delta, compared to the new system of just 3 bytes…On top of this, we also reduced how often a conveyor tries to send an update to just 3 times a second compared to the previous of over 20…the accuracy of item placements on the conveyors took a small hit, but we have added complicated systems in order to compensate for that…we’ve noticed that the biggest issue for running smooth multiplayer in large factories is not the network traffic anymore, it’s rather the general performance of the PC acting as a server.
  • MariaDB vs MySQL Differences: MariaDB is fully GPL licensed while MySQL takes a dual-license approach. Each handle thread pools in a different way. MariaDB supports a lot of different storage engines. In many scenarios, MariaDB offers improved performance.
  • Our pySpark pipeline churns through tens of billions of rows on a daily basis. Calculating 30 billion speed estimates a week with Apache Spark: Probes generated from the traces are matched against the entire world’s road network. At the end of the matching process we are able to assign each trace an average speed, a 5 minute time bucket and a road segment. Matches on the same road that fall within the same 5 minute time bucket are aggregated to create a speed histogram. Finally, we estimate a speed for each aggregated histogram which represents our prediction of what a driver will experience on a road at a given time of the week…On a weekly basis, we match on average 2.2 billion traces to 2.3 billion roads to produce 5.4 billion matches. From the matches, we build 51 billion speed histograms to finally produce 30 billion speed estimates…The first thing we spent time on was designing the pipeline and schemas of all the different datasets it would produce. In our pipeline, each pySpark application produces a dataset persisted in a hive table readily available for a downstream application to use…Instead of having one pySpark application execute all the steps (map matching, aggregation, speed estimation, etc.) we isolated each step to its own application…We favored normalizing our tables as much as possible and getting to the final traffic profiles dataset through relationships between relevant tables…Partitioning makes querying part of the data faster and easier. We partition all the resulting datasets by both a temporal and spatial dimension. 
  • Do not read this unless you can become comfortable with the feeling that everything you’ve done in your life is trivial and vainglorious. Morphogenesis for the Design of Design
    • One of my students built and runs all the computers Facebook runs on, one of my students used to run all the computers Twitter runs on—this is because I taught them to not believe in computer science. In other words, their job is to take billions of dollars, hundreds of megawatts, and tons of mass, and make information while also not believing that the digital is abstracted from the physical. Some of the other things that have come out from this lineage were the first quantum computations, or microfluidic computing, or part of creating some of the first minimal cells.
    • The Turing machine was never meant to be an architecture. In fact, I’d argue it has a very fundamental mistake, which is that the head is distinct from the tape. And the notion that the head is distinct from the tape—meaning, persistence of tape is different from interaction—has persisted. The computer in front of Rod Brooks here is spending about half of its work just shuttling from the tape to the head and back again.
    • There’s a whole parallel history of computing, from Maxwell to Boltzmann to Szilard to Landauer to Bennett, where you represent computation with physical resources. You don’t pretend digital is separate from physical. Computation has physical resources. It has all sorts of opportunities, and getting that wrong leads to a number of false dichotomies that I want to talk through now. One false dichotomy is that in computer science you’re taught many different models of computation and adherence, and there’s a whole taxonomy of them. In physics there’s only one model of computation: A patch of space occupies space, it takes time to transit, it stores state, and states interact—that’s what the universe does. Anything other than that model of computation is physics and you need epicycles to maintain the fiction, and in many ways that fiction is now breaking.
    • We did a study for DARPA of what would happen if you rewrote from scratch a computer software and hardware so that you represented space and time physically.
    • One of the places that I’ve been involved in pushing that is in exascale high-performance computing architecture, really just a fundamental do-over to make software look like hardware and not to be in an abstracted world.
    • Digital isn’t ones and zeroes. One of the hearts of what Shannon did is threshold theorems. A threshold theorem says I can talk to you as a wave form or as a symbol. If I talk to you as a symbol, if the noise is above a threshold, you’re guaranteed to decode it wrong; if the noise is below a threshold, for a linear increase in the physical resources representing the symbol there’s an exponential reduction in the fidelity to decode it. That exponential scaling means unreliable devices can operate reliably. The real meaning of digital is that scaling property. But the scaling property isn’t one and zero; it’s the states in the system. 
    • if you mix chemicals and make a chemical reaction, a yield of a part per 100 is good. When the ribosome—the molecular assembler that makes your proteins—elongates, it makes an error of one in 104. When DNA replicates, it adds one extra error-correction step, and that makes an error in 10-8, and that’s exactly the scaling of threshold theorem. The exponential complexity that makes you possible is by error detection and correction in your construction. It’s everything Shannon and von Neumann taught us about codes and reconstruction, but it’s now doing it in physical systems.
    • One of the projects I’m working on in my lab that I’m most excited about is making an assembler that can assemble assemblers from the parts that it’s assembling—a self-reproducing machine. What it’s based on is us. 
    • If you look at scaling coding construction by assembly, ribosomes are slow—they run at one hertz, one amino acid a second—but a cell can have a million, and you can have a trillion cells. As you were sitting here listening, you’re placing 1018 parts a second, and it’s because you can ring up this capacity of assembling assemblers. The heart of the project is the exponential scaling of self-reproducing assemblers.
    • As we work on the self-reproducing assembler, and writing software that looks like hardware that respects geometry, they meet in morphogenesis. This is the thing I’m most excited about right now: the design of design. Your genome doesn’t store anywhere that you have five fingers. It stores a developmental program, and when you run it, you get five fingers. It’s one of the oldest parts of the genome. Hox genes are an example. It’s essentially the only part of the genome where the spatial order matters. It gets read off as a program, and the program never represents the physical thing it’s constructing. The morphogenes are a program that specifies morphogens that do things like climb gradients and symmetry break; it never represents the thing it’s constructing, but the morphogens then following the morphogenes give rise to you.
    • What’s going on in morphogenesis, in part, is compression. A billion bases can specify a trillion cells, but the more interesting thing that’s going on is almost anything you perturb in the genome is either inconsequential or fatal. The morphogenes are a curated search space where rearranging them is interesting—you go from gills to wings to flippers. The heart of success in machine learning, however you represent it, is function representation. The real progress in machine learning is learning representation. 
    • We’re at an interesting point now where it makes as much sense to take seriously that scaling as it did to take Moore’s law scaling in 1965 when he made his first graph. We started doing these FAB labs just as outreach for NSF, and then they went viral, and they let ordinary people go from consumers to producers. It’s leading to very fundamental things about what is work, what is money, what is an economy, what is consumption.
    • Looking at exactly this question of how a code and a gene give rise to form. Turing and von Neumann both completely understood that the interesting place in computation is how computation becomes physical, how it becomes embodied and how you represent it. That’s where they both ended their life. That’s neglected in the canon of computing.
    • If I’m doing morphogenesis with a self-reproducing system, I don’t want to then just paste in some lines of code. The computation is part of the construction of the object. I need to represent the computation in the construction, so it forces you to be able to overlay geometry with construction.
    • Why align computer science and physical science? There are at least five reasons for me. Only lightly is it philosophical. It’s the cracks in the matrix. The matrix is cracking. 1) The fact that whoever has their laptop open is spending about half of its resources shuttling information from memory transistors to processor transistors even though the memory transistors have the same computational power as the processor transistors is a bad legacy of the EDVAC. It’s a bit annoying for the computer, but when you get to things like an exascale supercomputer, it breaks. You just can’t maintain the fiction as you push the scaling. The resource in very largescale computing is maintaining the fiction so the programmers can pretend it’s not true is getting just so painful you need to redo it. In fact, if you look down in the trenches, things like emerging ways to do very largescale GPU program are beginning to inch in that direction. So, it’s breaking in performance.
    •  What’s interesting is a lot of the things that are hard—for example, in parallelization and synchronization—come for free. By representing time and space explicitly, you don’t need to do the annoying things like thread synchronization and all the stuff that goes into parallel programming.
    • Communication degraded with distance. Along came Shannon. We now have the Internet. Computation degraded with time. The last great analog computer work was Vannevar Bush’s differential analyzer. One of the students working on it was Shannon. He was so annoyed that he invented our modern digital notions in his Master’s thesis to get over the experience of working on the differential analyzer.
    • When you merge communication with computation with fabrication, it’s not there’s a duopoly of communication and computation and then over here is manufacturing; they all belong together. The heart of how we work is this trinity of communication plus computation and fabrication, and for me the real point is merging them.
    • I almost took over running research at Intel. It ended up being a bad idea on both sides, but when I was talking to them about it, I was warned off. It was like the godfather: “You can do that other stuff, but don’t you dare mess with the mainline architecture.” We weren’t allowed to even think about that. In defense of them, it’s billions and billions of dollars investment. It was a good multi-decade reign. They just weren’t able to do it. 
    • Again, the embodiment of everything we’re talking about, for me, is the morphogenes—the way evolution searches for design by coding for construction. And they’re the oldest part of the genome. They were invented a very long time ago and nobody has messed with them since.
    • Get over digital and physical are separate; they can be united. Get over analog as separate from digital; there’s a really profound place in between. We’re at the beginning of fifty years of Moore’s law but for the physical world. We didn’t talk much about it, but it has the biggest impact of anything I know if anybody can make anything.

Soft Stuff:

  • paypal/hera (article): Hera multiplexes connections for MySQL and Oracle databases. It supports sharding the databases for horizontal scaling. It is a data access gateway that PayPal uses to scale database access for hundreds of billions of SQL queries per day. Additionally, HERA improves database availability through sophisticated protection mechanisms and provides application resiliency through transparent traffic failover. HERA is now available outside of PayPal as an Apache 2-licensed project.
  • zerotier/lf: a fully decentralized fully replicated key/value store. LF is built on a directed acyclic graph (DAG) data model that makes synchronization easy and allows many different security and conflict resolution strategies to be used. One way to think of LF’s DAG is as a gigantic conflict-free replicated data type (CRDT). Proof of work is used to rate limit writes to the shared data store on public networks and as one thing that can be taken into consideration for conflict resolution. 
  • pahud/fargate-fast-autoscaling: This reference architecture demonstrates how to build AWS Fargate workload that can detect the spiky traffic in less than 10 seconds followed by an immediate horizontal autoscaling.
  • ailidani/paxi: Paxi is the framework that implements WPaxos and other Paxos protocol variants. Paxi provides most of the elements that any Paxos implementation or replication protocol needs, including network communication, state machine of a key-value store, client API and multiple types of quorum systems.

Pub Stuff:

from High Scalability

Stuff The Internet Says On Scalability For July 26th, 2019

Stuff The Internet Says On Scalability For July 26th, 2019

Wake up! It’s HighScalability time—once again:


 The Apollo 11 guidance computer repeatedly crashed on descent. On earth computer scientists had just 13 hours to debug the problem. They did. It was CPU overload because of a wrong setting. Some things never change! 

Do you like this sort of Stuff? I’d greatly appreciate your support on Patreon. I wrote Explain the Cloud Like I’m 10 for people who need to understand the cloud. And who doesn’t these days? On Amazon it has 52 mostly 5 star reviews (120 on Goodreads). They’ll learn a lot and hold you in even greater awe.

Number Stuff:

  • $11 million: Google fine for discriminating against not young people. 
  • 55,000+: human-labeled 3D annotated frames, a drivable surface map, and an underlying HD spatial semantic map in Lyft’s Level 5 Dataset.
  • 645 million: LinkedIn members with 4.5 trillion daily messages pumping through Kafka.
  • 49%: drop in Facebook’s net income. A fine result.
  • 50 ms: repeatedly randomizing elements of the code that attackers need access to in order to compromise the hardware. 
  • 7.5 terabytes: hacked Russian data.
  • 5%: increase in Tinder shares after bypassing Google’s 30% app store tax.
  • $21.7 billion: Apple’s profit from other people’s apps.
  • 5 billion: records in 102 datasets in the UCR STAR spatio-temporal index.
  • 200x: speediest quantum operation yet. 
  • 45%: US Fortune 500 companies founded immigrants and their kids.
  • 70%: hard drive failures caused by media damage, including full head crashes. 
  • 12: lectures on everything Buckminster Fuller knew.
  • 600,000: satellite images taken in a single day used to create a picture of Earth
  • 149: hours between Airbus reboots needed to mask software problems. 

Quotable Stuff:

  • @mdowd: Now that Alan Turing is on the 50 pound note, they should rename ATMs to “Turing Machines”
  • @hacks4pancakes: Another real estate person: “I tried using a password manager, but then my credit card was stolen and my annual payment for it failed – and they cut off access to all my passwords during a meeting.”
  • @juliaferraioli: “Our programs were more complex because Go was so simple” — @_rsc on reshaping #Golang at #gophercon
  • Jason Shen: A YC founder once said to me that he found little correlation between the success of a YC company and how hard their founders worked. That is to say, among a group of smart, ambitious entrepreneurs who were all already working pretty hard, the factors that made the biggest difference were things like timing, strategy, and relationships. Which is why Reddit cofounder-turned-venture capitalist Alexis Ohanian now warns against the “utter bullshit” of this so-called hustle porn mentality.
  • Dale Markowitz: A successful developer today needs to be as good at writing code as she is at extinguishing that code when it explodes.
  • @allspaw: I’m with @snowded on this. Taleb’s creation of ‘antifragile’ is what the Resilience Engineering community has referred to as resilience all along.
  • Timothy Lee: Heath also said that the interviewer assumed that the word “byte” meant eight bits. In his view, this also revealed age bias. Modern computer systems use 8-bit bytes, but older computer systems could have byte sizes ranging from six to 40 bits.
  • General Patton: If everybody is thinking alike, somebody isn’t thinking
  • panpanna: Architecturally, microkernels and unikernels are direct opposites. Unikernels strive to minimize communication complexity (and size and footprint but that is not relevant to this discussion) by putting everything in the same address space. This gives them many advantages among which performance is often mentioned but ease of development is IMHO equally important. However, the two are not mutually exclusive. Unikernels often run on top of microkernels or hypervisors.
  • Wayne Ma: The team ultimately put a stop to most [Apple] device leaks—and discovered some audacious attempts, such as some factory workers who tried to build a tunnel to transport components to the outside without security spotting them.
  • Dr. Steven Gundry: I think, uploaded most of our information processing to our bacterial cloud that lives in us, on us, around us, because they have far more genes than we do. They reproduce virtually instantaneously, and so they can do fantastic information processing. Many of us think that perhaps lifeforms on Earth, particularly animal lifeforms, exist as a home for bacteria to prosper on Earth. 
  • @UdiDahan: I’ve been gradually coming around to the belief that any “good” code base lives long enough for the environment around it to change in such away that its architecture is no longer suitable, making it then “bad”. This would probably be as equally true of FP as of OOP.
  • @DmitryOpines: No one “runs” a crypto firm Holger, we are merely the mortal agents through whose minor works the dream of disaggregated ledger currency manifests on this most unworthy of Prime Material Planes.
  • Philip Ball: One of the most remarkable ideas in this theoretical framework is that the definite properties of objects that we associate with classical physics — position and speed, say — are selected from a menu of quantum possibilities in a process loosely analogous to natural selection in evolution: The properties that survive are in some sense the “fittest.” As in natural selection, the survivors are those that make the most copies of themselves. This means that many independent observers can make measurements of a quantum system and agree on the outcome — a hallmark of classical behavior.
  • @mikko: Rarely is anyone thanked for the work they did to prevent the disaster that didn’t happen.
  • David Rosenthal: Back in 1992 Robert Putnam et al published Making democracy work: civic traditions in modern Italy, contrasting the social structures of Northern and Southern Italy. For historical reasons, the North has a high-trust structure whereas the South has a low-trust structure. The low-trust environment in the South had led to the rise of the Mafia and persistent poor economic performance. Subsequent effects include the rise of Silvio Berlusconi. Now, in The Internet Has Made Dupes-And Cynics-Of Us All, Zynep Tufecki applies the same analysis to the Web
  • Diego Basch: So here is an obvious corollary. Like I just mentioned, if you have an idea for an interesting gadget you will move to a place like Shenzhen or Hong Kong or Taipei. You will build a prototype, prove your concept, work with a manufacturer to iterate your design until it’s mature enough to mass-produce. Either you will bootstrap the business or you will partner with someone local to fund it, because VCs won’t give you the time of the day. Now, let’s say hardware is not your cup of tea and you want to build services. Why be in Silicon Valley at all?
  • Buckminster Fuller~ You derive data by segregating; You derive principles by integrating; Without data, you cannot calculate; Without calculations, you cannot generalize; Without generalizations, you cannot design; Without designs, you cannot discover; Without discoveries, you cannot derive new data…Segregation and integration are not opposed: they are complementary and interdependent. Striving to be a specialist OR a generalist is counterproductive; the aim is to be COMPREHENSIVE!
  • @jessfraz: I see a lot of debates about “open core”. To me the premise behind it is “we will open this part of our software but you gotta take care of supporting it yourself.” Then they charge for support. Except the problem was some other people *cough* clouds *cough* beat them to it.
  • @greglinden: Tech companies consistently get this wrong, thinking this is a simple black-and-white ML classification problem, spam or not spam, false or not false. Disinformation exploits that by being just ambiguous enough to not get classified as false. It’s harder than that.
  • Brent Ozar: The ultimate, #1, primary, existential, responsibility of a DBA – for which all other responsibilities pale in comparison – is to implement database backup and restore processing adequate to support the business’s acceptable level of data loss.
  • Alex Hern: A dataset with 15 demographic attributes, for instance, “would render 99.98% of people in Massachusetts unique”. And for smaller populations, it gets easier: if town-level location data is included, for instance, “it would not take much to reidentify people living in Harwich Port, Massachusetts, a city of fewer than 2,000 inhabitants”.
  • Memory Guy: Our forecasts find that 3D XPoint Memory’s sub-DRAM prices will drive that technology’s revenues to over $16 billion by 2029, while stand-alone MRAM and STT-RAM revenues will approach $4 billion — over one hundred seventy times MRAM’s 2018 revenues.  Meanwhile, ReRAM and MRAM will compete to replace the bulk of embedded NOR and SRAM in SoCs, to drive even greater revenue growth. This transition will boost capital spending, increasing the spend for MRAM alone by thirty times to $854 million in 2029.
  • @unclebobmartin: John told me he considered FP a failure because, to paraphrase him, FP made it simple to do hard things but almost impossible to do simple things.
  • Dr. Neil J. Gunther: All performance is nonlinear.
  • @mathiasverraes: I wish we’d stop debating OOP vs FP, and started debating individual paradigms. Immutability, encapsulation, global state, single assignment, actor model, pure functions, IO in the type system, inheritance, composition… all of these are perfectly possible in either OOP or FP.
  • Troy Hunt: “1- All those servers were compromised. They were either running standalone VPSs or cpanel installations. 2- Most of them were running WordPress or Drupal (I think only 2 were not running any of the two). 3- They all had a malicious cron.php running”
  • Gartner: AWS makes frequent proclamations about the number of price reductions it has made. Customers interpret these proclamations as being applicable to the company’s services broadly, but this is not the case. For instance, the default and most frequently provisioned storage for AWS’s compute service has not experienced a price reduction since 2014, despite falling prices in the market for the raw components.
  • mcguire: Speaking as someone who has done a fair number of rewrites as well as watching rewrites fail, conventional wisdom is somewhat wrong. 1. Do a rewrite. Don’t try to add features, just replace the existing functionality. Avoid a moving target. 2. Rewrite the same project. Don’t redesign the database schema at the same time you are rewriting. Try to keep the friction down to a manageable level. 3. Incremental rewrites are best. Pick part of the project, rewrite and release that, then get feedback while you work on rewriting the next chunk.
  • Atlassian: Isolating context/state management association to a single point is very helpful. This was reinforced at Re:Invent 2018 where a remarkably high amount of sessions had a slide of “then we have service X which manages tenant → state routing and mapping”.
  • taxicabjesus: I have a ~77 year old friend who was recently telling me about going to Buckminster Fuller’s lectures at his engineering university, circa 1968. He quoted Mr. Fuller as saying something like, “entropy takes things apart, life puts them back together.”
  • Daniel Abadi: PA/EC systems sound good in theory, but are not particularly useful in practice. Our one example of a real PA/EC system — Hazelcast — has spent the past 1.5 years introducing features that are designed for alternative PACELC configurations — specifically PC/EC and PA/EL configurations. PC/EC and PA/EL configurations are a more natural cognitive fit for an application developer. Either the developer can be certain that the underlying system guarantees consistency in all cases (the PC/EC configuration) in which case the application code can be significantly simplified, or the system makes no guarantees about consistency at all (the PA/EL configuration) but promises high availability and low latency. CRDTs and globally unique IDs can still provide limited correctness guarantees despite the lack of consistency guarantees in PA/EL configurations.
  • Simone de Beauvoir: Then why “swindled”? When one has an existentialist view of the world, like mine, the paradox of human life is precisely that one tries to be and, in the long run, merely exists. It’s because of this discrepancy that when you’ve laid your stake on being—and, in a way you always do when you make plans, even if you actually know that you can’t succeed in being—when you turn around and look back on your life, you see that you’ve simply existed. In other words, life isn’t behind you like a solid thing, like the life of a god (as it is conceived, that is, as something impossible). Your life is simply a human life.
  • SkyPuncher: I think Netflix is the perfect example of where being data driven completely fails. If you listen to podcasts with important Netflix people everything you hear is about how they experiment and use data to decide what to do. Every decision is based on some data point.  At the end of the day, they just continue to add features that create short term payoffs and long term failures. Pennywise and pound foolish.
  • Frank Wilczek: I don’t think a singularity is imminent, although there has been quite a bit of talk about it. I don’t think the prospect of artificial intelligence outstripping human intelligence is imminent because the engineering substrate just isn’t there, and I don’t see the immediate prospects of getting there. I haven’t said much about quantum computing, other people will, but if you’re waiting for quantum computing to create a singularity, you’re misguided. That crossover, fortunately, will take decades, if not centuries.

Useful Stuff:

  • What an incredible series. Apollo 11 13 minutes to the MoonEp.05 The fourth astronaut tells the story of how the Apollo computer system was made. Apollo guidance computer weighed 30 kilos, was as big as a couple of shoe boxes, built by a team at MIT, was the worlds first digital, portable, general purpose computer. It was the first software system were people’s lives depended on it. It was the first fly-by-wire system. Contract 1, the first contract of the Apollo program, was for the navigation computer. The MIT group used inertial navigation, first pioneered in Polaris missies. The idea is if you know where you start, direction, and acceleration, then you always know where you are and where you are going. Until this time flight craft were moved by manually pushing levers and flipping switches. Apollo couldn’t affort the weight of these pulley based systems. They chose, for the first time ever (1964), to make a computer to control the flight of the space craft. They had to figure out everything. How would the computer communicate to all the different subsystems? Command a valve to open? Turn on an engine? Turn off an engine? Redirect an engine? Apollo is the moment when people stopped bragging about how big their computer was and started bragging about how small they were. Digital computers were the size of buildings at the time. At the time nobody trusted computers because they would only work a few hours or days at a time. They needed a computer to work for a couple weeks. They risked everything on a brand new technology called Integrated Circuits that were only in the labs. They made the very first computer with ICs. But they got permission to do it. A huge risk betting everything, but there was no alternative. There was no other way to build a computer with enough compute power. Use of ICs to build a digital computers is one of the lasting legacies of Apollo. Apollo bought 60% of the total chip output at the time, a huge boost to a fledgling computer industry. But the hardware needed software. Software was not even in the original 10 page contract. In 1967 they were afraid they wouldn’t meet the end of the decade deadline because software is so complicated to build. And nothing has changed since. Margaret Hamilton joined the project in 1964. There were no rules for software at the time. There was no field of software development. You could get hired just for knowing a computer language. So again, not much has changed. Nobody knew what software was. You couldn’t describe what you did to family. Very unregimented, very free environment. Don Eyles wrote the landing software on the AGC (Apollo Guidance Computer). The AGC was one square foot in size, weighed 70 pounds, 55 watts, 76kb of memory in the form of 16 bit words, only 4k was RAM, the rest was hard wired memory. Once written a program was printed out on paper and then converted to punch cards, by people key punch operators, that could be read directly into main frame computers, which translated them onto the AGC. Over 100 people worked on it at the end. All the cards had to be integrated together and submitted in one big run that executed overnight. Then the simulation would be run the next day to see if the code was OK. This was your debug cycle. The key punch operators had to go around at night and beat up on the prima donna programmers, who always wanted more time or do something over, to submit their jobs. Again, not much has changed. The key punch operators would go back to the programmers when they notice syntax errors. If the code wasn’t right the program wouldn’t go. It used core rope memory. Software was woven into the cores. If a wire went through one of the donut shaped cores of magnetic material that was a 0, if it went around a core that represented a 1. Software was hardware that was literally sewn into the structure of the computer, manually, by textile workers, by hand. Rope memory was proven tech at the time. It was bullet proof. There was no equivalent bullet proof approach to software, which is why Hamilton invented software engineering. There were no tools for finding errors at the time. They tried to find a way to build software so errors would not happen. Wrong time, wrong priority, wrong data, interface errors were the big source of errors. Nobody knew how to control a system with software. They came up with a verb and noun system that looked like a big calculator with buttons. The buttons had to be big and clear so they could be punched with gloves and seen through a visor. Verb, what do you want to do? Noun, what do you want to do it to? It used a simple little key board. There were three digital read outs, no text, it was all just numbers, three sets of numbers. To start initiating lunar landing program you would press noun, 63, enter. To start the program in 15 seconds you enter verb, 15, enter. A clock would start counting down. At zero it would start program 63 which initiated a large breaking burn to slow you down so you start dropping down to the surface of the moon. The astronauts didn’t fly, they controlled programs. They ended 200 meters from where they intended to land. Flying manually would have taken a lot more fuel. The computer was always on and in operation. It was a balance of control, a partnership. The intention at first was to create a fully automated system with two buttons: go to the moon; go home. They ended up with 500 buttons. Again, things don’t change.
  • Why Some Platforms Thrive and Others Don’t: Some digital networks are fragmented into local clusters of users. In Uber’s network, riders and drivers interact with network members outside their home cities only occasionally. But other digital networks are global; on Airbnb, visitors regularly connect with hosts around the world. Platforms on global networks are much less vulnerable to challenges, because it’s difficult for new rivals to enter a market on a global scale…As for Didi and Uber, our analysis doesn’t hold out much hope. Their networks consist of many highly local clusters. They both face rampant multi-homing, which may worsen as more rivals enter the markets. Network-bridging opportunities—their best hope—so far have had only limited success. They’ve been able to establish bridges just with other highly competitive businesses, like food delivery and snack vending. (In 2018 Uber struck a deal to place Cargo’s snack vending machines in its vehicles, for instance.) And the inevitable rise of self-driving taxis will probably make it challenging for Didi and Uber to sustain their market capitalization. Network properties are trumping platform scale.
  • James Hamilton: Where Aurora took a different approach from that of common commercial and open source database management systems is in implementing log-only storage. Looking at contemporary database transaction systems, just about every system only does synchronous writes with an active transaction waiting when committing log records. The new or updated database pages might not be written for tens of seconds or even minutes after the transaction has committed. This has the wonderful characteristic that the only writes that block a transaction are sequential rather than random. This is generally a useful characteristic and is particularly important when logging to spinning media but it also supports an important optimization when operating under high load. If the log is completing an I/O while a new transaction is being committed, then the commit is deferred until the previous log I/O has completed and the next log I/O might carry out tens of completed transactions that had been waiting during the previous I/O. The busier the log gets, the more transactions that get committed in a single write. When the system is lightly loaded each log I/O commits a single transaction as quickly as possible. When the system is under heavy load, each commit takes out tens of transaction changes at a slight delay but at much higher I/O efficiency. Aurora takes a bit more radical approach where it simply only writes log records out and never writes out data pages synchronously or otherwise. Even more interesting, the log is remote and stored with 6-way redundancy using a 4/6 write quorum and a 3/6 read quorum. Further improving the durability of the transaction log, the log writes are done across 3 different Availability Zones (each are different data centers).  In this approach Aurora can continue to read without problem if an entire data center goes down and, at the same time, another storage server fails. 
  • Videos from DSConf 2019 are now available
  • Given Microsoft’s purchase of LinkedIn three years ago, that LinkedIn is moving cloud to Azure should not be a big surprise. Have to wonder if there will be an Azure tax? Moving over from your own datacenters will certainly chew up a lot of cycles that could have went in to product.
  • Best name ever: A grimoire of functions
  • ML helping programmers is becoming a thing. A GPT-2 model trained on ~2 million files from GitHubAutocompletion with deep learningTabNine is an autocompleter that helps you write code faster. We’re adding a deep learning model which significantly improves suggestion quality. 
  • It’s hard to get a grasp on how EventBridge will change architectures. This article on using it as a new webhook is at least concrete. Amazon EventBridge: The biggest thing since AWS Lambda itself. Though with webhooks I just enter a url in a field on a form and I start receiving events. This works for PayPal, Slack, chatbots, etc. What’s the EventBridge equivalent? The whole how to hook things up isn’t clear at all. Also, Why Amazon EventBridge will change the way you build serverless applications
  • Tired of pumping all your data into a lake? Mesh it. The eternal cycle of centralizing, distributing, and then centralizing continues. How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh: “In order to decentralize the monolithic data platform, we need to reverse how we think about data, it’s locality and ownership. Instead of flowing the data from domains into a centrally owned data lake or platform, domains need to host and serve their domain datasets in an easily consumable way.”  There’s also an interview at Straining Your Data Lake Through A Data Mesh – Episode 90
  • The problem of false knowledge. Exponential Wisdom Episode 74: The Future of Construction. In this podcast there’s a segment that extols the wonders of visiting the Sistine Chapel through VR instead of visiting in-person. Is anyone worried about the problem of false knowledge? If I show you a picture of a chocolate bar do you know what chocolate tastes like? Constricting all experience through only our visual senses is a form of false knowledge. The Sistine Chapel evoked in me a visceral feeling of awe tinged with sadness. Would I feel that through VR? I don’t think so. I walked the streets. Tasted the food. Met the people. Saw the city. All experiences that can’t be shoved through our eyes.
  • Good to see Steve Balmer hasn’t changed. Players players players. Developers developers developers.
  • Never thought of this downside of open source before. SECURITY NOW 724 HIDE YOUR RDP NOW!. Kazakhstan is telling citizens to install a root cert into their browser so they can perform man- in-the-middle attacks. An interesting question is how browser makers should respond. More interesting is what if Kazakhstan responds by making their own browser based on open source, compromising it, and requring its use? Black Mirror should get on this. Software around us appears real, but has actually been replaced by pod-progs. Also, Open Source Could Be a Casualty of the Trade War
  • Darkweb Vendors and the Basic Opsec Mistakes They Keep Making. Don’t use email addresses that link to other accounts. Don’t use the same IDs across accounts. Don’t ship from the same area. Don’t do stuff yourself so you can be photographed. Don’t model your product using your own hands. Don’t cause anyone to die. Don’t sell your accounts to others. Don’t believe someone when they offer to launder your money. 
  • Though it’s still Electron. When a rewrite isn’t: rebuilding Slack on the desktop: The first order of business was to create the modern codebase: All UI components had to be built with React; All data access had to assume a lazily loaded and incomplete data model; All code had to be “multi-workspace aware”. The key to our approach ended up being Redux. The key to its success is the incremental release strategy that we adopted early on in the project: as code was modernized and features were rebuilt, we released them to our customers.
  • Re-Architecting the Video Gatekeeper: We [Netflix] decided to employ a total high-density near cache (i.e., Hollow) to eliminate our I/O bottlenecks. For each of our upstream systems, we would create a Hollow dataset which encompasses all of the data necessary for Gatekeeper to perform its evaluation. Each upstream system would now be responsible for keeping its cache updated. With this model, liveness evaluation is conceptually separated from the data retrieval from upstream systems. Instead of reacting to events, Gatekeeper would continuously process liveness for all assets in all videos across all countries in a repeating cycle. The cycle iterates over every video available at Netflix, calculating liveness details for each of them. At the end of each cycle, it produces a complete output (also a Hollow dataset) representing the liveness status details of all videos in all countries.
  • Should you hire someone who has already done the job you need to do? Not necessarily. Business Lessons from How Marvel Makes Movies: Marvel does something that is very counterintuitive. Instead of hiring people that are going to be really good at directing blockbusters, they look for people that have done a really good job with medium-sized budgets, but developing very strong storylines and characters. So, generally speaking, what they do is they looked to other genres like Shakespeare or horror. You can have spy films, comedy films, buddy cop films and what they do is they say, if I brought this director into the Marvel universe, what could they do with our characters? How could they shake up our stories and kind of reinvigorate them and provide new energy and new life?
  • What is a senior engineer? A historian. EliRivers: I work on some software of which the oldest parts of the source code date back to about 2009. Over the years, some very smart (some of them dangerously smart and woefully inexperienced, and clearly – not at all their fault – not properly mentored or supervised) people worked on it and left. What we have now is frequently a mystery. Simple changes are difficult, difficult changes verge on the impossible. Every new feature requires reverse-engineering of the existing code. Sometimes literally 95% of the time is spent reverse-engineering the existing code (no exaggeration – we measured it); changes can take literally 20 times as long as they should while we work out what the existing code does (and also, often not quite the same, what it’s meant to do, which is sometimes simply impossible to ever know). Pieces are gradually being documented as we work out what they do, but layers of cruft from years gone by from people with deadlines to meet and no chance of understanding the existing code sit like landmines and sometimes like unbreakable bonds that can never be undone. In our estimates, every time we have to rely on existing functionality that should be rock solid reliable and completely understood yet that we have not yet had to fully reverse-engineer, we mark it “high risk, add a month”. The time I found that someone had rewritten several pieces of the Qt libraries (without documenting what, or why) was devastating; it took away one of the cornerstones I’d been relying on, the one marked “at least I know I can trust the Qt libraries”. It doesn’t matter how smart we are, how skilled a coder we are, how genius our algorithms are; if we write something that can’t be understood by the next person to read it, and isn’t properly documented somewhere in some way that our fifth replacement can find easily five years later – if we write something of which even the purpose, let alone the implementation, will take someone weeks to reverse engineer – we’re writing legacy code on day one and, while we may be skilled programmers, we’re truly Godawful software engineers.
  • You always learn something new when you listen to Martin Thompson. Protocols and Sympathy With Martin Thompson. He goes into the many implications of the Universal Scalability Law which covers what can be split up and shared whlle considering coherence costs, which is the time it takes parties working together to reach agreement. The mathematics for systems and the mathematics for people are all very similar because it’s just a system. Doubling the size of system doesn’t mean doubling the amount of work done. You have to ask if the workload is decomposable. The workload needs to decompose and be done in parallel, but not concurrently. Parallelism is doing multiple things at the same time. Concurrency is dealing with multiple things at the same time. Concurrency requires coordination. Adding slack to a system reduces response time because it reduces utilization. If we constantly break teams up and reform them we end up spending more time on achieving coherence. If your team has become more efficient and reaches agreement faster than you can do more things at the same time with less overhead. You get more throughput by maximizing parallelism and minimizing coherency. Slow down and think more. Also, Understanding the LMAX Disruptor
  • Excellent explanation. Distributed Locks are Dead; Long Live Distributed Locks! and Testing the CP Subsystem with Jepsen
  • Atlassian on Our not-so-magic journey scaling low latency, multi-region services on AWS. Do you have something like this: “a context service which needed to be called multiple times per user request, with incredibly low latency, and be globally distributed. Essentially, it would need to service tens of thousands of requests per second and be highly resilient.” They were stuck with a previous sharding solution so couldn’t make a complete break as they moved to AWS. The first cut was CQRS with DynamoDB, which worked well until higher load hits and DynamoDB had latency problems. They used SNS to invalidate node level caches. They replaced ELBs with ALBs which increased reliability but the p99 latency went from 10ms to 20ms. They went with Caffeine instead of Guava for their cache. They added a sidecar as a local proxy for a service.  A sidecar is essentially just another containerised application that is run alongside the main application on the EC2 node. The benefit of using sidecars (as opposed to libraries) is that it’s technology agnostic. Latencies fell drastically. 
  • Nike on Moving Faster With AWS by Creating an Event Stream Database: we turned to the Kinesis Data Firehose service…a service called Athena that gives us the ability to perform SQL queries over partitioned data…how does our solution compare to more traditional architectures using RDS or Dynamo? Being able to ingest data and scale automatically via Firehose means our team doesn’t need to write or maintain pre-scaling code…Data storage costs on S3($0.023 per GB-month) are lower when compared to DynamoDB($0.25 per GB-month) and Aurora($0.10 per GB-month)…In a sample test, Athena delivered 5 million records in seconds, which we found difficult to achieve with DynamoDB…One limitation is that Firehose batches out data in windows of either data size or a time limit. This introduces a delay between when the data is ingested to when the data is discoverable by Athena…Queries to Athena are charged by the amount of data scanned, and if we scan the entire event stream frequently, we could rack up serious costs in our AWS bill.
  • It’s not easy to create a broadcast feed. Here’s how Hoststar did it. Building Pubsub for 50M concurrent socket connections. They went through a lot of different options. They ended up using EMQX, client side load balancing, and multiple clusters with bridges connecting them and a reverse bridge. Each subscriber could support 250k clients. With 200 subscribe nodes, the system can support 50M connections and more. Also, Ingesting data at “Bharat” Scale
  • Making Containers More Isolated: An Overview of Sandboxed Container Technologies: We have looked at several solutions that tackle the current container technology’s weak isolation issue. IBM Nabla is a unikernel-based solution that packages applications into a specialized VM. Google gVisor is a merge of a specialized hypervisor and guest OS kernel that provides a secure interface between the applications and their host. Amazon Firecracker is a specialized hypervisor that provisions each guest OS a minimal set of hardware and kernel resources. OpenStack Kata is a highly optimized VM with built-in container engine that can run on hypervisors. It is difficult to say which one works best as they all have different pros and cons. 

Soft Stuff

  • Nodes: In Nodes you write programs by connecting “blocks” of code. Each node – as we refer to them – is a self contained piece of functionality like loading a file, rendering a 3D geometry or tracking the position of the mouse. The source code can be as big or as tiny as you like. We’ve seen some of ours ranging from 5 lines of code to the thousands. Conceptual/functional separation is usually more important.
  • Picat: Picat is a simple, and yet powerful, logic-based multi-paradigm programming language aimed for general-purpose applications. Picat is a rule-based language, in which predicates, functions, and actors are defined with pattern-matching rules. Picat incorporates many declarative language features for better productivity of software development, including explicit non-determinism, explicit unification, functions, list comprehensions, constraints, and tabling. Picat also provides imperative language constructs, such as assignments and loops, for programming everyday things. 
  • When the aliens find the dead husk of our civilization the irony is what will remain of history are clay cuneiform tablets. Something comforting knowing what’s oldest will last longest. Cracking Ancient Codes: Cuneiform Writing.
  • donnaware/AGCFPGA Based Apollo Guidance Computer. 

Pub Stuff

  • Unikernels: The Next Stage of Linux’s Dominance (overview): In this paper, we posit that an upstreamable unikernel target is achievable from the Linux kernel, and, through an early Linux unikernel prototype, demonstrate that some simple changes can bring dramatic performance advantages. rwmj: The entire point of this paper is not to start over from scratch, but to reuse existing software (Linux and memcached in this case), and fiddle with the linker command line and a little bit of glue to link them into a single binary. If you want to start over from scratch using a safe language then see MirageOS.
  • Linux System Programming. rofo1: The book is solid. I mentally place it up there with “Advanced programming in the UNIX Environment” by Richard Stevens. 
  • Checking-in on network functions:  we need beter approaches to VERIFY and INTERACT with network functions and packet processing program properties. here, we provide a HYBRID-APPROACH and implementation for GRADUALLY checking and validating the arbitrary logic and side effects by COMBINING design by contract, static assertions and type-checking, and code generation via macros all without PENALIZING programmers at development time
  • Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches: In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. 
  • DistME: A Fast and Elastic Distributed Matrix Computation Engine using GPUs: We implement a fast and elastic matrix computation engine called DistME by integrating CuboidMM with GPU acceleration on top of Apache Spark. Through extensive experiments, we have demonstrated that CuboidMM and DistME significantly outperform the state-of-the-art methods and systems, respectively, in terms of both performance and data size.
  • PARTISAN: Scaling the Distributed Actor Runtime (github, video, twitter): We present the design of an alternative runtime system for improved scalability and reduced latency in actor applications called PARTISAN. PARTISAN provides higher scalability by allowing the application developer to specify the network overlay used at runtime without changing application semantics, thereby specializing the network communication patterns to the application. PARTISAN reduces message latency through a combination of three predominately automatic optimizations: parallelism, named channels, and affinitized scheduling. We implement a prototype of PARTISAN in Erlang and demonstrate that PARTISAN achieves up to an order of magnitude increase in the number of nodes the system can scale to through runtime overlay selection, up to a 38.07x increase in throughput, and up to a 13.5x reduction in latency over Distributed Erlang.
  • BPF Performance Tools (book): This is the official site for the book BPF Performance Tools: Linux System and Application Observability, published by Addison Wesley (2019). This book can help you get the most out of your systems and applications, helping you improve performance, reduce costs, and solve software issues. Here I’ll describe the book, link to related content, and list errata.

from High Scalability

Sponsored Post: Educative, PA File Sight, Etleap, PerfOps, InMemory.Net, Triplebyte, Stream, Scalyr

Sponsored Post: Educative, PA File Sight, Etleap, PerfOps, InMemory.Net, Triplebyte, Stream, Scalyr

Who’s Hiring? 

  • Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here.
  • Need excellent people? Advertise your job here! 

Cool Products and Services

  • Grokking the System Design Interview is a popular course on Educative.io (taken by 20,000+ people) that’s widely considered the best System Design interview resource on the Internet. It goes deep into real-world examples, offering detailed explanations and useful pointers on how to improve your approach. There’s also a no questions asked 30-day return policy. Try a free preview today.
  • PA File Sight – Actively protect servers from ransomware, audit file access to see who is deleting files, reading files or moving files, and detect file copy activity from the server. Historical audit reports and real-time alerts are built-in. Try the 30-day free trial!
  • For heads of IT/Engineering responsible for building an analytics infrastructure, Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike older enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own. Read stories from customers like Okta and PagerDuty, or try Etleap yourself.
  • PerfOps is a data platform that digests real-time performance data for CDN and DNS providers as measured by real users worldwide. Leverage this data across your monitoring efforts and integrate with PerfOps’ other tools such as Alerts, Health Monitors and FlexBalancer – a smart approach to load balancing. FlexBalancer makes it easy to manage traffic between multiple CDN providers, API’s, Databases or any custom endpoint helping you achieve better performance, ensure the availability of services and reduce vendor costs. Creating an account is Free and provides access to the full PerfOps platform.
  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net
  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorialStream is free up to 3 million feed updates so it’s easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we’d like to ad a few zeros to that number. Check out the job opening on AngelList.
  • Scalyr is a lightning-fast log management and operational data platform.  It’s a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services — all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.
  • Advertise your product or service here!

Fun and Informative Events

  • Advertise your event here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


PA File Sight monitors file access on a server in real-time.

It can track who is accessing what, and with that information can help detect file copying, detect (and stop) ransomware attacks in real-time, and record the file activity for auditing purposes. The collected audit records include user account, target file, the user’s IP address and more. This solution does NOT require Windows Native Auditing, which means there is no performance impact on the server. Join thousands of other satisfied customers by trying PA File Sight for yourself. No sign up is needed for the 30-day fully functional trial.


Make Your Job Search O(1) — not O(n)

Triplebyte is unique because they’re a team of engineers running their own centralized technical assessment. Companies like Apple, Dropbox, Mixpanel, and Instacart now let Triplebyte-recommended engineers skip their own screening steps.

We found that High Scalability readers are about 80% more likely to be in the top bracket of engineering skill.

Take Triplebyte’s multiple-choice quiz (system design and coding questions) to see if they can help you scale your career faster.


The Solution to Your Operational Diagnostics Woes

Scalyr gives you instant visibility of your production systems, helping you turn chaotic logs and system metrics into actionable data at interactive speeds. Don’t be limited by the slow and narrow capabilities of traditional log monitoring tools. View and analyze all your logs and system metrics from multiple sources in one place. Get enterprise-grade functionality with sane pricing and insane performance. Learn more today


If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

from High Scalability

Sponsored Post: PA File Sight, Etleap, PerfOps, InMemory.Net, Triplebyte, Stream, Scalyr

Sponsored Post: PA File Sight, Etleap, PerfOps, InMemory.Net, Triplebyte, Stream, Scalyr

Who’s Hiring? 

  • Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here.
  • Need excellent people? Advertise your job here! 

Cool Products and Services

  • PA File Sight – Actively protect servers from ransomware, audit file access to see who is deleting files, reading files or moving files, and detect file copy activity from the server. Historical audit reports and real-time alerts are built-in. Try the 30-day free trial!
  • For heads of IT/Engineering responsible for building an analytics infrastructure, Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike older enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own. Read stories from customers like Okta and PagerDuty, or try Etleap yourself.
  • PerfOps is a data platform that digests real-time performance data for CDN and DNS providers as measured by real users worldwide. Leverage this data across your monitoring efforts and integrate with PerfOps’ other tools such as Alerts, Health Monitors and FlexBalancer – a smart approach to load balancing. FlexBalancer makes it easy to manage traffic between multiple CDN providers, API’s, Databases or any custom endpoint helping you achieve better performance, ensure the availability of services and reduce vendor costs. Creating an account is Free and provides access to the full PerfOps platform.
  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net
  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorialStream is free up to 3 million feed updates so it’s easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we’d like to ad a few zeros to that number. Check out the job opening on AngelList.
  • Scalyr is a lightning-fast log management and operational data platform.  It’s a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services — all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.
  • Advertise your product or service here!

Fun and Informative Events

  • Advertise your event here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


PA File Sight monitors file access on a server in real-time.

It can track who is accessing what, and with that information can help detect file copying, detect (and stop) ransomware attacks in real-time, and record the file activity for auditing purposes. The collected audit records include user account, target file, the user’s IP address and more. This solution does NOT require Windows Native Auditing, which means there is no performance impact on the server. Join thousands of other satisfied customers by trying PA File Sight for yourself. No sign up is needed for the 30-day fully functional trial.


Make Your Job Search O(1) — not O(n)

Triplebyte is unique because they’re a team of engineers running their own centralized technical assessment. Companies like Apple, Dropbox, Mixpanel, and Instacart now let Triplebyte-recommended engineers skip their own screening steps.

We found that High Scalability readers are about 80% more likely to be in the top bracket of engineering skill.

Take Triplebyte’s multiple-choice quiz (system design and coding questions) to see if they can help you scale your career faster.


The Solution to Your Operational Diagnostics Woes

Scalyr gives you instant visibility of your production systems, helping you turn chaotic logs and system metrics into actionable data at interactive speeds. Don’t be limited by the slow and narrow capabilities of traditional log monitoring tools. View and analyze all your logs and system metrics from multiple sources in one place. Get enterprise-grade functionality with sane pricing and insane performance. Learn more today


If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

from High Scalability

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

Ready to transition from a commercial database to open source, and want to know which databases are most popular in 2019? Wondering whether an on-premise vs. public cloud vs. hybrid cloud infrastructure is best for your database strategy? Or, considering adding a new database to your application and want to see which combinations are most popular? We found all the answers you need at the Percona Live event last month, and broke down the insights into the following free trends reports:

2019 Top Databases Used

So, which databases are most popular in 2019? We broke down the data by open source databases vs. commercial databases:

Open Source Databases

Open source databases are free community databases with the source code available to the general public to use, and may be modified or used in their original design. Popular examples of open source databases include MySQL, PostgreSQL and MongoDB.

Commercial Databases

Commercial databases are developed and maintained by a commercial business that are available for use through a licensing subscription fee, and may not be modified. Popular examples of commercial databases include Oracle, SQL Server, and DB2.

Top Open Source Databases

MySQL remains on top as the #1 free and open source database, representing over 30% of open source database use. This comes as no surprise, as MySQL has held this position consistently for many years according to DB-Engines.

2019 Most Popular Open Source Databases Used Report Pie Chart - ScaleGrid

PostgreSQL came in 2nd place with 13.4% representation from open source database users, closely followed by MongoDB at 12.2% in 3rd place. This again could be expected based on the DB-Engines Trend Popularity Ranking, but we saw MongoDB in 2nd place at 24.6% just three months ago in our 2019 Database Trends – SQL vs. NoSQL, Top Databases, Single vs. Multiple Database Use report.

While over 50% of open source database use is represented by the top 3, we also saw a good representation for #4 Redis, #5 MariaDB, #6 Elasticsearch, #7 Cassandra, and #8 SQLite. The last 2% of databases represented include ClickhouseGaleraMemcached, and Hbase.

Top Commercial Databases

In this next graph, we’re looking at a unique report which represents both polyglot persistence and migration trends: top commercial databases used with open source databases.

We’ve been seeing a growing trend of leveraging multiple database types to meet your application needs, and wanted to compare how organizations are using both commercial and open source databases within a single application. This report also represents the commercial database users who are also in the process of migrating to an open source database. For example, PostgreSQL, the fastest growing database by popularity for 2 years in a row, has 11.5% of its user base represented by organizations currently in the process of migrating to PostgreSQL.

So, now that we’ve explained what this report represents, let’s take a look at the top commercial databases used with open source.

2019 Most Popular Commercial Databases Used with Open Source Report Pie Chart - ScaleGrid

Oracle, the #1 database in the world, holds true representing over 2/3rds of commercial and open source database combinations. What is shocking in this report is the large gap between Oracle and 2nd place Microsoft SQL Server, as it maintains a much smaller gap according to DB-Engines. IBM Db2 came in 3rd place representing 11.1% of commercial database use combined with open source.

Cloud Infrastructure Breakdown by Database

Now, let’s take a look at the cloud infrastructure setup breakdown by database management systems.

Public Cloud vs. On-Premise vs. Hybrid Cloud

We asked our open source database users how they’re hosting their database deployments to identify the current trends between on-premise vs. public cloud vs. hybrid cloud deployments.

A surprising 49.5% of open source database deployments are run on-premise, coming in at #1. While we anticipated this result, we were surprised at the percentage on-premise. In our recent 2019 PostgreSQL Trends Report, on-premise private cloud deployments represented 59.6%, over 10% higher than this report.

Public cloud came in 2nd place with 36.7% of open source database deployments, consistent with the 34.8% of deployments from the PostgreSQL report. Hybrid cloud, however, grew significantly from this report with 13.8% representation from open source databases vs. 5.6% of PostgreSQL deployments.
2019 Open Source Databases Report: Public Cloud vs Private Cloud vs On-Premise Pie Chart - ScaleGrid

So, which cloud infrastructure is right for you? Here’s a quick intro to public cloud vs. on-premise vs. hybrid cloud: 

Public Cloud

Public cloud is a cloud computing model where IT services are delivered across the internet. Typically purchased through a subscription usage model, public cloud is very easy to setup with no large upfront investment requirements, and can be quickly scaled as your application needs change.

On-Premise

On-premise, or private cloud deployments, are cloud solutions dedicated to a single organization run in its own datacenter (or with a third-party vendor off-site). There are many more opportunities to customize your infrastructure with an on-premise setup, but requires a significant upfront investment in hardware and software computing resources, as well as on-going maintenance responsibilities. These deployment types are best suited for organizations with advanced security needs, regulated industries, or large organizations.

Hybrid Cloud

A hybrid cloud is a mixture of both public cloud and private cloud solutions, integrated into a single infrastructure environment. This allows organizations to share resources between public and private clouds to improve their efficiency, security, and performance. These are best suited for deployments that require the advanced security of an on-premise infrastructure, as well as the flexibility of the public cloud.

Now, let’s take a look at which cloud infrastructures are most popular by each open source database type.

Open Source Database Deployments: On-Premise

In this graph, as well as the public cloud and hybrid cloud graphs below, we break down each individual open source database by the percentage of deployments that leverage this type of cloud infrastructure.

So, which open source databases are most frequently deployed on-premise? PostgreSQL came in 1st place with 55.8% of deployments on-premise, closely followed by MongoDB at 52.2%, Cassandra at 51.9%, and MySQL at 50% on-premise.
2019 Percent of Open Source Databases Using an On-Premise Infrastructure Report - ScaleGrid

The open source databases that reported less than half of deployments on-premise include MariaDB at 47.2%, SQLite at 43.8%, and Redis at 42.9%. The database that is least often deployed on-premise is Elasticsearch at only 34.5%.

Open Source Database Deployments: Public Cloud

Now, let’s look at the breakdown of open source databases in the public cloud.

SQLite is the most frequently deployed open source database in a public cloud infrastructure at 43.8% of their deployments, closely followed by Redis at 42.9%. MariaDB public cloud deployments came in at 38.9%, then 36.7% for MySQL, and 34.5% for Elasticsearch.

2019 Percent of Open Source Databases Using a Public Cloud Infrastructure Report - ScaleGrid

Three databases came in with less than 1/3rd of their deployments in the public cloud, including MongoDB at 30.4%, PostgreSQL at 27.9%, and Cassandra with the fewest public cloud deployments at only 25.9%.

Open Source Database Deployments: Hybrid Cloud

Now that we know how the open source databases break down between on-premise vs. public cloud, let’s take a look at the deployments leveraging both computing environments.

The #1 open source database to leverage hybrid clouds is Elasticsearch which is came in at 31%. The closest following database for hybrid cloud is Cassandra at just 22.2%.

2019 Percent of Open Source Databases Using a Hybrid Cloud Infrastructure Report - ScaleGrid

MongoDB was in 3rd for percentage of deployments in a hybrid cloud at 17.4%, then PostgreSQL at 16.3%, Redis at 14.3%, MariaDB at 13.9%, MySQL at 13.3%, and lastly SQLite at only 12.5% of deployments in a hybrid cloud.

Open Source Database Deployments: Multi Cloud

On average, 20% of public cloud and hybrid cloud deployments are leveraging a multi-cloud strategy. Multi-cloud is the use of two or more cloud computing services. We also took a look at the number of clouds used, and found that some deployments leverage up to 5 different cloud providers within a single organization:

Average Number of Clouds Used for Open Source Database Multi-Cloud Deployments - ScaleGrid Report

Most Popular Cloud Providers for Open Source Database Hosting

In our last analysis under the Cloud Infrastructure breakdown, we analyze which cloud providers are most popular for open source database hosting:
2019 Most Popular Cloud Providers for Open Source Database Hosting Pie Chart - ScaleGrid

AWS is the #1 cloud provider for open source database hosting, representing 56.9% of all cloud deployments from this survey. Google Cloud Platform (GCP) came in 2nd at 26.2% with a surprising lead over Azure at 10.8%. Rackspace then followed in 4th representing 3.1% of deployments, and DigitalOcean and Softlayer followed last representing the remaining 3% of open source deployments in the cloud.

Polyglot Persistence Trends

Polyglot persistence is the concept of using different databases to handle different needs using each for what it is best at to achieve an end goal within a single software application. This is a great solution to ensure your application is handling your data correctly, vs. trying to satisfy all of your requirements with a single database type. An obvious example would be SQL which is good at handling structured data vs. NoSQL which is best used for unstructured data.

Let’s take a look at a couple polyglot persistence analyses:

Average Number of Database Types Used

On average, we found that companies leverage 3.1 database types for their applications within a single organization. Just over 1/4 of organizations leverage a single database type, with some reporting up to 9 different database types used:

Average Number of Database Types Used in an Organization - ScaleGrid Report

Average Number of Database Types Used by Infrastructure

So, how does this number break down across infrastructure types? We found that hybrid cloud deployments are most likely to leverage multiple database types, and average 4.33 database types at a time.

On-premise deployments typically leverage 3.26 different database types, and public cloud came in lowest at 3.05 database types leverage on average within their organization.

Average Number of Database Used On-Premise vs Public Cloud vs Hybrid Cloud - ScaleGrid Report

Databases Types Most Commonly Used Together

Let’s now take a closer look at the database types most commonly leveraged together within a single application.

In the chart below, the databases in the left column represent the sample size for that database type, and the databases listed on top are represent the percentage combined with that database type. The blue highlighted cells represent 100% of deployment combinations, while yellow represents 0% of combinations.

So, as we can see below in our database combinations heatmap, MySQL is our most frequently combined database with other database types. But, while other database types are frequently leveraged in conjunction with MySQL, that doesn’t mean that MySQL deployments are always leveraging another database type. This can be seen in the first row for MySQL, as these are lighter blue to yellow compared to the first column of MySQL which is shows a much higher color match to the blue representing 100% combinations.

The cells highlighted with a black border represent the deployments leveraging only that one database type, where again MySQL takes #1 at 23% of their deployments using MySQL alone.

Percent of Database Deployments Used With Another Database Type - ScaleGrid ReportWe can also see a similar trend with Db2, where the bottom row for Db2 shows that it is highly leveraged with MySQL, PostgreSQL, Cassandra, Oracle, and SQL Server, but a very low percentage of other database deployments also leverage Db2, outside of SQL Server which also uses DB2 in 50% of those deployments.

SQL vs. NoSQL Open Source Database Popularity

Last but not least, we compare SQL vs. NoSQL for our open source database report. SQL represents over 3/5 of the open source database use at 60.6%, compare to NoSQL at 39.4%.

SQL vs NoSQL Open Source Database Popularity - ScaleGrid Report

We hope these database trends were insightful and sparked some new ideas or validated your current database strategy! Tell us what you think below in the comments, and let us know if there’s a specific analysis you’d like to see in our next database trends report! Check out our other reports for more insight on what’s trending in the database space:

from High Scalability

Sponsored Post: Etleap, PerfOps, InMemory.Net, Triplebyte, Stream, Scalyr

Sponsored Post: Etleap, PerfOps, InMemory.Net, Triplebyte, Stream, Scalyr

Who’s Hiring? 

  • Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here.
  • Need excellent people? Advertise your job here! 

Fun and Informative Events

  • Advertise your event here!

Cool Products and Services

  • For heads of IT/Engineering responsible for building an analytics infrastructure, Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike older enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own. Read stories from customers like Okta and PagerDuty, or try Etleap yourself.
  • PerfOps is a data platform that digests real-time performance data for CDN and DNS providers as measured by real users worldwide. Leverage this data across your monitoring efforts and integrate with PerfOps’ other tools such as Alerts, Health Monitors and FlexBalancer – a smart approach to load balancing. FlexBalancer makes it easy to manage traffic between multiple CDN providers, API’s, Databases or any custom endpoint helping you achieve better performance, ensure the availability of services and reduce vendor costs. Creating an account is Free and provides access to the full PerfOps platform.
  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net
  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorialStream is free up to 3 million feed updates so it’s easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we’d like to ad a few zeros to that number. Check out the job opening on AngelList.
  • Scalyr is a lightning-fast log management and operational data platform.  It’s a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services — all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.
  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


Make Your Job Search O(1) — not O(n)

Triplebyte is unique because they’re a team of engineers running their own centralized technical assessment. Companies like Apple, Dropbox, Mixpanel, and Instacart now let Triplebyte-recommended engineers skip their own screening steps.

We found that High Scalability readers are about 80% more likely to be in the top bracket of engineering skill.

Take Triplebyte’s multiple-choice quiz (system design and coding questions) to see if they can help you scale your career faster.


The Solution to Your Operational Diagnostics Woes

Scalyr gives you instant visibility of your production systems, helping you turn chaotic logs and system metrics into actionable data at interactive speeds. Don’t be limited by the slow and narrow capabilities of traditional log monitoring tools. View and analyze all your logs and system metrics from multiple sources in one place. Get enterprise-grade functionality with sane pricing and insane performance. Learn more today


If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

from High Scalability