San Francisco, CA
San Francisco, CA
Basho Technologies, along with our sponsors, proudly presented RICON 2012, a two day conference dedicated to Riak, developers, and the future of distributed systems in production. This page is dedicated to post-conference consumption. Here you will find slide decks, resources, photos, and much more.
All of the videos from RICON2012 can also be found in this album on Vimeo.
(In no particular order.)
Client-server programming is a discipline as old as computer networks and well-known. Just connect socket to the server and send some bytes back and forth, right?
Au contraire! Building reliable, robust client libraries and applications is actually quite difficult, and exposes a lot of classic distributed and concurrent programming problems. From understanding and manipulating the TCP/IP network stack, to multiplexing connections across worker threads, to handling partial failures, to juggling protocols and encodings, there are many different angles one must cover.
In this talk, we'll discuss how Basho has addressed these problems and others in our client libraries and server-side interfaces for Riak, and how being a good client means being a participant in the distributed system, rather than just a spectator.
Sean Cribbs is a Software Engineer at Basho Technologies, where he works on Riak, the fault-tolerant, highly-scalable distributed database. Prior to Basho, Sean was a freelance developer and consultant who also managed the development of the open-source Radiant web publishing system. He briefly studied Music Theory at the graduate level after receiving degrees in Computer Science and Music from University of Tulsa. He can often be found speaking about Riak at conferences and other events, and enjoys playing the piano in his free time.
Allowing users to run arbitrary and complex searches against your data is a feature required by most consumer facing applications. For example, the ability to get ranked results based on free text search and subsequently drill down on that data based on secondary attributes is at the heart of any good online retail shop. Not only must your application support complex queries such as "doggy treats in a 2 mile radius, broken down by popularity" but it must also return in hundreds of milliseconds or less to keep users happy. This is what systems like Solr are built for. But what happens when the index is too big to fit on a single node? What happens when replication is needed for availability? How do you give correct answers when the index is partitioned across several nodes? These are the problems of distributed search. These are some of the problems Yokozuna solves for you without making you think about it.
In this talk Ryan will explain what search is, why it matters, what problems distributed search brings to the table, and how Yokozuna solves them. Yokozuna provides distributed and available search while appearing to be a single-node Solr instance. This is very powerful for developers and ops professionals.
Ryan is a Sr. Software Engineer at Basho Technologies, creators of the Riak database. He is best known for his Riak Core tutorials on his try-try-try blog. For the last year Ryan has developed an interest in information retrieval; specifically how it can be applied to systems like Riak. He is the creator and lead engineer on Yokozuna, a project with the goal of uniting the strengths of Riak and Solr. When not writing code you'll probably find Ryan riding his Suzuki GSX-R through back-roads of rural Maryland.
This talk will cover work to build out an internal cloud offering using Riak and PostgreSQL as a data layer, architectural decisions made to achieve high availability, and lessons learned along the way.
Shawn Gravelle is an IT Architect at State Farm Insurance and has spent 16 years working on a variety of IT solutions covering application development, system design, data design, infrastructure virtualization, and enterprise architecture. Shawn has spent the last 5 years focused on architecting geographically distributed data solutions and working through the implications for consistency and availability.
Sam Townsend is the team lead of the group that’s deploying Riak at State Farm. His 27 year IT career began by supporting Apple II’s in a college lab, so he’s old enough to remember when ‘Big Data’ meant punching a notch to use both sides of a floppy disk. In his 12 years at State Farm, Sam has worked on distributed systems performance and capacity management, and infrastructure system design. The last few years have been focused mainly on unstructured data management, which made him a perfect fit for leading the Riak effort.
ZooKeeper is everywhere these days. It's a core component of the Hadoop ecosystem. It provides the glue that enables high availability for systems like Redis and Solr. Your favorite startup probably uses it internally. But as every good skeptic knows, just because something is popular doesn't mean you should use it. In this talk I will go over the core uses of ZooKeeper in the wild and why it is suited to these use cases. I will also talk about systems that don't use ZooKeeper and why that can be the right decision. Finally I will discuss the common challenges of running ZooKeeper as a service and things to look out for when architecting a deployment.
Turner Broadcasting hosts several large sites that need to serve "data" to millions of clients over HTTP. A couple of years ago, we started building a generic service to solve this and to retire several legacy systems. We will discuss the general architecture, the growing pains, and why we decided to use Riak. We will also share some implementation details and the use of the service for a few large internet events.
Brian Akins is Senior Principal Architect at Turner Broadcasting where he is focused primarily on scaling web applications and the organizations that build and support them. He is an old school C guy who has fell in love with Lua. He lives with his wife and four children in the suburbs of Atlanta.
Network partitions are real, but their practical consequences on complex applications are poorly understood. I want to talk about some of the neat ways I've found to lose important data, the challenge of building systems which are reliable under partitions, and what it means for you, an application developer.
Kyle Kingsbury is the author of Riemann, a monitoring system built on event stream processing. He's confused about distributed consensus.
LevelDB is a flexible key-value store written by Google and open sourced in August 2011. LevelDB provides an ordered mapping of binary keys to binary values. Various companies and individuals utilize LevelDB on cell phones and servers alike. The problem, however, is it does not run optimally on either as shipped.
This presentation outlines the basic internal mechanisms of LevelDB and then proceeds to discuss the tuning opportunities in the source code for each mechanism. This talk will draw heavily from our experiences optimizing LevelDB for use in Riak, which is handy for running sufficiently large clusters.
Matthew is a high tech migrant worker. Currently a Software Engineer at Basho Technologies working on the C/C++ aspects of Riak's storage and vm layers. Prior to Basho, Matthew has been a contributing developer at Intuit, Akamai, Nuview, SmarterTravel Media, and for miscellaneous contracts. His delivered projects range from 4 bit micro controller toys, ROM based Quicken, high volume / user specific content delivery, and distributed retail inventory planning/control. Weekends find him either with his family or out participating in marathons or Half-Iron triathlons.
This talk will discuss how Gilt has grown its technology organization to optimize for engineer autonomy and happiness and how that optimization has affected its software. Conway's Law states that an organization that designs systems will inevitably produce systems that are copies of the communication structures of the organization. This talk will work its way between both the (gnarly) technical details of Gilt's application architecture (something we internally call "LOSA") and the Gilt Tech organization structure. I'll discuss the technical challenges we came up against, and how these often pointed out areas of contention in the organization. I'll discuss quorums, failover, and latency in the context of building a distributed, decentralized, peer-to-peer technical organization.
Mark is a lead software engineer at Gilt, working on the web experience. He is the author of a handful of lesser-known Rubygems and the organizer of the Play Framework NYC meetup. He lives in NYC
It's one thing to have a lot of data, and another to make it useful. This talk explores the interplay between infrastructure, algorithms, and data necessary to design robust systems that produce useful and measurable insights for realtime data products. We'll walk through several examples and discuss the design metaphors that bitly uses to rapidly develop these kinds of systems.
Hilary is the Chief Scientist at bitly, the URL- shortening and bookmarking service, where she makes beautiful things with data. She is a former computer science professor with a background is in machine learning and data mining. As native New Yorker, Hilary was appointed to Mayor Bloomberg’s Technology and Innovation Advisory Council. She also co-founded HackNY, created dataists, and is a member of NYCResistor.
Alex is co-founder & CTO at Breather, an early-stage startup based in New York City. In his free time, Alex organizes the annual Emerging Languages event, which will be held for the fourth time in September 2013 as part of the Strange Loop conference in St Louis, MO. An enthusiastic observer of programming language research and development, Alex co-authored "Programming Scala" for O'Reilly and has spoken at conferences worldwide on using cutting-edge languages for real world work. Alex was previously CTO of online banking startup Simple, and before that Platform Lead and an infrastructure engineer at Twitter.
Distributed systems are ubiquitous, but distributed programs remain stubbornly hard to write. While many distributed algorithms can be concisely described, implementing them requires large amounts of code-- often, the essence of the algorithm is obscured by low-level concerns like exception handling, task scheduling, and message serialization. This results in programs that are hard to write and even harder to maintain. Can we do better?
Bloom is a new programming language we've developed at UC Berkeley that takes two important steps towards improving distributed systems development. First, Bloom programs are designed to be declarative and concise, aided by a new philosophy for reasoning about state and time. Second, Bloom can analyze distributed programs for their consistency requirements and either certify that eventual consistency is sufficient, or identify program locations where stronger consistency guarantees are needed. In this talk, I'll introduce the language, and also suggest how lessons from Bloom can be adopted in other distributed programming stacks.
Neil Conway is a fifth-year PhD Candidate at UC Berkeley. Neil's research focuses on how high-level, declarative languages can be used to write distributed systems. Before graduate school, Neil was a major contributor to PostgreSQL and an early employee at Truviso, where he helped design and implement Truviso's stream processing engine.
Managing a business critical Riak instance in an enterprise environment takes careful planning, coordination, and the willingness to accept that no matter how much you plan, Murphy's law will always win. At CIM we've been running Riak in production for nearly 3 years, and over those years we've seen our fair share of failures, both expected and unexpected. From disk melt downs to solar flares we've managed to recover and maintain 100% uptime with no customer impact. I'll talk about some of these failures, how we dealt with them, and how we managed to keep our clients completely unaware.
Michajlo Matijkiw is a Sr Software Engineer at Comcast Interactive Media where he focuses on building the types of tools and infrastructure that make developers' lives easier. Prior to that, he was an undergraduate student at University of Pennsylvania where he split his time between soldering irons and studying task based parallelism. In his spare time he enjoys cooking, good beer, and climbing.
Talk details coming soon...
Rich Hickey, the author of Clojure and designer of Datomic, is a software developer with over 20 years of experience in various domains. Rich has worked on scheduling systems, broadcast automation, audio analysis and fingerprinting, database design, yield management, exit poll systems, and machine listening, in a variety of languages.
When OmniTI first set out to build a next generation monitoring system, we turned to one of our most trusted tools for data management; Postgres. While this worked well for developing the initial Open Source application, as we continued to grow the Circonus public monitoring service, we eventually ran into scaling issues. This talk will cover some of the changes we made to make the original Postgres system work better, talk about some of the other systems we evaluated, and discuss the eventual solution to our problem; building our own time series database. Of course, that's only half the story. We'll also go into how we swapped out these backend data storage pieces in our production environment, all the while capturing and reporting on millions of metrics, without downtime or customer interruption.
Theo Schlossnagle is a Founder and Principal at OmniTI where he designs and implements scalable solutions for highly trafficked sites and other clients in need of sound, scalable architectural engineering. He is the architect of the highly scalable Momentum mail transport agent, principal architect of Fontdeck, which delivers professional typefaces optimized for the web, Project Lead and Architect for OmniOS, an Illumos based operating system distribution, and Founder and Principal Architect of Circonus, a cloud platform designed for monitoring and marrying systems and business analytics. He authored Scalable Internet Architectures (Sams) and is a veteran speaker in the open source conference circuit. A member of the Apache Software Foundation and IEEE, and senior member of the ACM, he serves on the editoral board of ACM’s Queue Magazine.
Working on database backed, internet based systems for over a decade, Robert Treat is co-author of the book Beginning PHP and PostgreSQL 8, maintains the phpPgAdmin software package, and has been recognized as a major contributor to the PostgreSQL project for his work over the years. An international speaker on databases, open source, and managing web operations at scale, he spends his days as COO of OmniTI, a consultancy focused on building and managing large scale web infrastructure.
The trends of technology are rocking the storage industry. Fundamental changes in basic technology, combined with massive scale, new paradigms, and fundamental economics leads to predictions of a new storage programming paradigm. The growth of low cost/GB disk is continuing with technologies such as Shingled Magnetic Recording. Flash and RAM are continuing to scale with roadmaps, some argue, down to atom scale. These technologies do not come without a cost. It is time to reevaluate the interface that we use to all kinds of storage, RAM, Flash and Disk. The discussion starts with the unique economics of storage (as compared to processing and networking), discusses technology changes, posits a set of open questions and ends with predictions of fundamental shifts across the entire storage hierarchy.
James Hughes is a Principal Technologist at Seagate Technology. Formerly with Huawei, and Sun Microsystems where he was a Sun Fellow, VP and the Solaris Chief Technologist. James is a recognized expert in the area of storage, networking, and information security. Before Sun, James worked at StorageTek, Network Systems, and Control Data Corp. He has over 40 years experience in OS, storage, networking, information security, and cryptography and is the holder of 30 patents with many more pending.
For most applications, application-level caching (like with memcached) is absolutely critical for performance. Unfortunately, these caching systems are so simple they leave applications with the burden of maintaining the cache. Developers must write code to invalidate, handle cache misses, and perform updates.
Pequod is a key/value cache we're developing at MIT and Harvard that automatically updates the cache to keep data fresh. Pequod exploits a common pattern in these computations: different kinds of cached data are often related to each other by transformations equivalent to simple joins, filters, and aggregations. Pequod allows applications to pre-declare these transformations with a new abstraction, the cache join. Pequod then automatically applies the transformations and tracks relationships to materialize data and keep the cache up to date, and in many cases improves performance by reducing client/cacheserver communication. Sound like a database? We use abstractions from databases like joins and materialized views, while still maintaining the performance of an in-memory key/value cache.
In this talk, I'll describe the challenges caching solves, the problems that still exist, and how tools like Pequod can make the space better.
Neha is a fifth year PhD student in PDOS, the Parallel and Distributed Operating Systems group at MIT, advised by Robert Morris. Here she has worked on W5, BFlow, a privacy-preserving browser system, WARP, and Dixie. Neha's research interests are in protecting user data and scalable storage systems for web applications.
Neha has worked for Google as a Software Engineer on Native Client, Blobstore, a system for efficiently storing and serving terabytes of large binary objects, and Froogle.
Riak CS has come a long way since it was first released in 2012, and then open sourced in March 2013. We'll take a look at some of the features and improvements in the recently released Riak CS 1.3.0, and planned for the future, like better integration with CloudStack and OpenStack. Next, we'll go over some of the Riak CS guts that deployers should understand in order to successfully deploy, monitor and scale Riak CS.
Reid Draper is a Software Engineer at Basho, where he primarily works on Riak CS. He also enjoys programming in Haskell and Clojure, learning constantly, and brewing coffee.
Riak Enterprise has undergone an overhaul since it's 1.2 days, mostly around Mult-DataCenter replication. We'll talk about the "Brave New World" of replication in depth, how it manages concurrent TCP/IP connections, Realtime Sync, and the technology preview of Active Anti-Entropy Fullsync. Finally, we'll peek over the horizon at new features such as chaining of Realtime sync messages across multiple clusters.
Chris has 25 years in the high technology industry as a software developer, CTO, designer, and startup co-founder. He discovered Erlang indirectly through development of telecommunications test equipment at Tektronix, which launched a new passion in functional programming. During the Dot Com days, he co-founded a startup using OCaml as the core language for graph theoretic analysis of web sites. A fear of compilers led to an intense study and eventual job as the lead on a Java-to-native assembly language compiler for a Massively Parallel Processor Array at Ambric, also written in OCaml. Thinking about how to scale software concurrently lead right back to Erlang and Basho, where he works on the Enterprise project team. Chris develops iPhone and Android applications as a hobby. He and his son, Geordie, co-designed a concurrent programming language called 'G' which compiles to C, and a LEGO-sized underwater ROV - both targeted for Arduino. When not programming, he enjoys rocket stoves, cob structures, remodeling, Minecraft, and Kendo.
As our computational infrastructure races gracefully forward into increasingly parallel multi-core and blade-based systems, our ability to easily produce software that can successfully exploit such systems continues to stumble. For years, we've fantasized about the world in which we'd write simple, sequential programs, add magic sauce, and suddenly have scalable, parallel executions. We're not there. We're not even close. I'll present trajectory-based execution, a radical, potentially crazy, approach for achieving automatic scalability. To date, we've achieved surprisingly good speedup in limited domains, but the potential is tantalizingly enormous.
Margo I. Seltzer is a Herchel Smith Professor of Computer Science in the Harvard School of Engineering and Applied Sciences. Her research interests include provenance, file systems, databases, transaction processing systems, and applying technology to problems in healthcare. She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer was a founder and CTO of Sleepycat Software, the makers of Berkeley DB, and is now an Architect at Oracle Corporation. She is currently the President of the USENIX Association and a member of the Computing Research Association's Computing Community Consortium. She is a Sloan Foundation Fellow in Computer Science, an ACM Fellow, a Bunting Fellow, and was the recipient of the 1996 Radcliffe Junior Faculty Fellowship. She is recognized as an outstanding teacher and mentor, having received the Phi Beta Kappa teaching award in 1996, the Abrahmson Teaching Award in 1999, and the Capers and Marion McDonald Award for Excellence in Mentoring and Advising in 2010.
Dr. Seltzer received an A.B. degree in Applied Mathematics from Harvard/Radcliffe College in 1983 and a Ph. D. in Computer Science from the University of California, Berkeley, in 1992.
In this talk Sathish will discuss the size, complexity and use cases surrounding weather data services and analytics, which will entail an overview of the architecture of such systems and the role of Riak in these patterns.
Sathish is a senior technology executive with strong entrepreneurial drive and enjoy linking technology capabilities with business needs. Hands on experience on complex technology transformation initiatives and leading large and highly capable global teams. In-depth knowledge in the state of the art technologies and its application in multiple industry settings.
Sunny Gleason is founder and Distributed Systems Engineer at SunnyCloud, a company that provides high-performance application development, hosting and high-availability distributed storage solutions using Amazon Web Services, Rackspace, and Google Compute Cloud. Previously, he worked on cloud platform system design and implementation at Amazon.com and Ning.