The Journey to D-I-Y
Chris Hunter, Senior Software Architect/Manager gives us an introduction to URBN brands, a brief history of e-commerce growth at URBN, some of the technical and organizational challenges he’s run into and the future opportunities he sees for web development within the company.
Key Takeaways
- Urban Outfitters went online in 2000 with an e-commerce experience.
- URBN brands have brought all web development in house.
- One of the major things that drove the re-platform was the user experience that the brands were looking for exceeded the capabilities of the existing platform.
Video Transcript
Chris: All right. So who am I? I've been in Urban for three years, currently managing 39 UI engineers. This is all of our consumer facing applications, as well as all of our retail applications, web and mobile, iOS and Android. I still maintain about a 50% allocation when possible. Right now it's not very possible given the current state of our re-platform, but I still work on quite a few proof-of-concepts and pair programming, code review. I've 12 years professional software experience, and I'm a Drexel graduate. I actually also have a degree in management and economics, so I have kind of peppered in some of those topics into this talk as well.
So I wanted to tell a story about my experience at Urban over the last three years and the transition that we've made. And it's sort of a circuitous path, if you will, of how we got there, so I needed to put this preface in here so that you didn't be like, "What is he talking about?" So essentially, I'm going to give you an introduction to Urban, URBN, because when I say Urban, I actually mean URBN brands, which is not just Urban Outfitters, it's a whole family of brands. And I'm going to give you a history of some of the e-commerce growth at Urban, some of the technical challenges that we've run into, some of the organizational challenges that we've run into, and how we solve some of these things, as well as some of the future opportunities that we hope to get to.
So URBN Inc. is the parent company of the whole portfolio of brands. So this is part of a larger diversification strategy. So when you think about Urban Outfitters, it's a younger brand, it's 18 to 28, it's men and women. You got Free People, it's targeted towards a 28 to 35 year old, maybe a little younger, maybe a little older.
Anthropologie is 35+, BHLDN Weddings is sort of a transition brand between Free People and Anthropologie, Terrain is more of our, let's call it baby boomer, kids out of the house, gardening, home stuff. And we've recently acquired Vetri Family restaurants, so that's a totally different ball game for us, but it also plays into sort of a mixed strategy with some of our large format stores. So it's not always completely obvious but we...if you look at Vetri, it actually covers a wide demographic as well, so you've got kids and pizzas and that kind of stuff all the way up to fine dining.
So, I'm going to read our mission statement, because I think it'll help me tell the story a little bit. So, "lifestyle merchandising is our business and our passion. The goal of our brands is to build a strong emotional bond with the customer. To do this, we must build lifestyle environments that appeal emotionally, and offer fashion correct products on a timely basis. Our customers are the reason and inspiration for everything we do." So I hope that didn't bore you to death.
But when I look at this, it doesn't say anything about building service oriented architecture or native mobile applications or isomorphic JavaScript applications or any of this kind of stuff, but what this...certain parts of this really resonate with the way I think about my job. We use technology to augment this mission, we use technology to build a strong emotional bond with the customer through digital experiences, build lifestyle environments that appeal emotionally, and offer fashion correct products on a timely basis using our very strong supply chain technologies. And the user experience groups and the product ownership groups in our companies really drive at the customers are the reason and inspiration. So the mission statement is very, very important to us.
So I also wanted to just show that we've been at this a while. So Anthropologie put a catalog online in 1998. We were doing ecommerce pretty soon after that. Urban Outfitters, I love this, go in the wayback machine, went online in 2000 with an e-commerce experience. This is actually from 2005, I'm sure some of you have built table based layouts, and makes me really appreciate CSS3 right here, but Free People went online in 2004, so we've been doing this a little while, this is not a new thing for us.
I can't tell you what the Y-axis is here, because some of this is not public information, but it's a lot more. First five years, we saw a healthy growth curve here, second five years that growth curve, a lot steeper. More is better, but it just shows that this is more of a thing. And there's a lot of reasons for this. There's people using multiple channels to convert, so it looks like e-commerce is high, but they're actually using our retail stores to browse and touch and try on clothes and then they go home and think about whether they actually want that thing. They'll go online, they'll buy it. So it's a little bit, you kind of gotta take some of it with a grain of salt, but at the other hand, it's really important, people are using this thing.
So we had this whole panel up here earlier talking about vendor-driven software versus DIY and all these other topics. I wanted to talk a couple of slides about consumer off-the-shelf software versus DIY software. When you are a brand like ours, where we're a retail merchandising company, it's actually not a great idea to jump right in and build an e-commerce backend with your first go. So, we made a rational decision, we went out, we got expert expertise from lots of vendors, and we actually went through many iterations, which I'll go into.
So we really abided by the core competency principle here, so, we are merchandisers, we're retailers, we're going to get help, we're going to go online with people's help. It's lower risk, it's built to serve many different businesses, so like our initial feature set may be X, two years from now X plus five more features, maybe they offer those things, we just turn them on and it works really well. This is important, because a lot of businesses operate in a traditional waterfall mentality and it's also very rational, it's like, "Hey, tell me how much this is going to cost and when it's going to be done." Well, if you're doing agile and you're building software, I can't necessarily tell you that with a very good certainty, actually.
So building custom software from the ground up with a company that doesn't support it is actually bad. But it also brings with it potential downsides. So you can have vendor lock in issues, you can have an inflexible architecture that doesn't allow you to build features that you want to build, and it can have difficult integrations in other systems that you've bought.
In a DIY system, it's sort of like the inverse of this. It gives you that competitive advantage, so once you are out there and this thing is becoming more of a differentiator for you, it's a good idea to potentially invest. You own your architecture, you can get at any piece of code you need to or add to the system any way you need to, it supports an agile environment, so you might want engineers that want to work in this capacity, so doing it yourself is great for that, but it does require a lot or organizational support. And when I say organizational support, I don't mean like, "Hey, they're telling you how to do it," they're saying like, "Here's funding, and we trust that you know what you're doing, and you're going to make this happen for us."
It can be very high risk, especially early on, when you don't have the culture you need to actually achieve it throughout the entire stack. So you might do really well in one area if your system, and you might not do so well in another, and that can...Chris brought up a good point, if you get halfway into a project and you blow your date, they might just fire everybody and that's not good, so there's a large personnel investment there. So speaking of competitive advantage, this guy knew a thing or two about that, and probably knew a thing or two about the prisoner's dilemma, but I found this quote in an interview that he did, and it really just jumped off the page for me.
Basically Steve's talking about this MCI Friends and Family plan, this is from '94 by the way, so it's kind of old, and he's talking about how AT&T didn't respond to it for about 18 months. But they didn't, because they couldn't build a custom app within their billing system. And if you think about it, MCI's thing is, and AT&T thing, is copper wiring and phone service, and there's some software back there making that happen, but accounting software probably wasn't the thing that they really thought would allow them to innovate. And for us, as we get more and more into selling online, it's important for us to realize how that thing becomes more important to us, and that we need to be able to change that thing if we want to be able to build out our business in the future.
So I'm going to just go through our architecture here really quickly. This is a history lesson of how it started. It's like we have a very simple architecture. And I apologize, I wasn't sure what the audience would be here, whether it will be a lot of technical people or not, so I tried to do some of both. So we've got this Apache frontend, and a very simple ColdFusion backend totally written out by another vendor, it's not our software.
We start ratcheting up more business, we had a couple of peaks that didn't go so well with that system, so we decide to go out, maybe we did RFP, I'm not sure. Urban tends to, they run flat, so we don't necessarily have a lot of bureaucracy, which is good, so we may have just hired somebody.
So we came in with a Java based system, it scales better, we added a new order management system, still have our retail ERP, probably through Akamai on there around that time, I think that's around the time that we found that caching is useful. And then we're hiring more, and this is like I'd say about 2010, I got a little bit of history lesson from my CIO and several other people about this, but around 2010, we started hiring more and more people in our IT group to continue to build out the capabilities on this Java infrastructure, and we...that's where you see that growth curve really shooting up, and we're adding more features in there.
We put some services in, Spring MVC there, we're dropping that in our JVM, we've got a card check-out UI, which is a SPA architecture for a client side application. We've got this ESB in there, converting a bunch of order management services into JSON for us. And it's looking like something, but it's a lot of like pieced together things that different vendors have given us that can help us solve some problems. And it's working, we're making a lot of money on this thing.
But as we bring more and more customers, we're just throwing hardware at this thing, and there's certain architectural principles in this system that just do not scale very well. Was somebody asking a question? Oh, sorry. So one of the major reasons that we had to get into our re-platform was around this concept of our e-commerce backend being very stateful, and it was how we managed sessions at the time as well. So we would throw hardware at the problem, but what would happen is we couldn't allow that session to live on all the app servers, just by nature of the way this package is written, so we would end up with situations during peak where we might have a bunch of really active customers on a couple of the app servers.
So these ones in red would be full of users trying to buy things and hitting the site a lot, and then you'd have these other two where they came in from maybe a promotion that we ran, and they were like, "You know what, I can't deal with this right now, I gotta go." So they abandon their session, and those machines just sit there idle. And that was a pretty big problem for us, actually. So we would run into these scalability issues, we'd have all this...I'm not going to get into all the reasons why this happened, but essentially we would end up with a situation where we had to restart app servers, which would cause people to lose sessions, which would cause conversion to go down, and we dealt with a lot of this stuff, but it was herculean effort by a number of people to deal with these situations during peak volume.
So one of the major things that drove this re-platform as well was the user experience that the brands were looking for just exceeded the requests, it just exceeded the capabilities of this platform. So we're just trying to, whatever we could, glue new services and new capabilities into the backend, so that we could achieve these frontend experiences. And it was also very monolithic, so anytime we made a change on one area over here to support some sort of marketing thing, we might actually break our check-out, because we didn't realize that those two things were conflicting, or we didn't have tests around that. There was some tech debt in some areas, because we did have a number of different contractors working in there, and they were under a lot of pressure, under marketing pressures. There were duplicate efforts on the UI.
So one thing I didn't really go into yet is that the brands actually owned the frontend experience, so we actually had this build process where each brand had an in-house development team to help them with their marketing, and we would build the artifact for each brand and deploy it. And they would build a feature pretty much three times, like for a PDP or a category page feature or something, and a lot of that stuff under the hood did not need to be built three three times. And then deployments just become difficult. So your feature development slows down, you can't get features to your customer, it's a problem.
Around this time, we get into needing to deliver mobile applications, because all of our customers are going online, all of our research is showing that this channel is converting really well, and it's becoming more important. So this is actually one of the first times where the IT group was able to deliver UI experiences for the brands. So typically the IT group would do all the backend, and the brand teams would do the frontend. IT was given the ball to actually build native mobile applications here, and that's where I joined the team, and actually we have a couple members here in the audience as well.
But what it really did was it exposed a lack of a cohesive service oriented architecture vision for our platform. So we were kind of gluing together all these different services, they had different authentication schemes, they had different authorization schemes. We were doing all kinds of orchestration in these clients that you could never do in JavaScript, but we were getting it done in iOS, because C is powerful. But we started to build a culture around that team where we wouldn't ship a feature unless it had tests, and we wouldn't...we were doing agile. We were starting to groom our sprints properly, and plan our sprints properly.
And at first, to step back a little bit, Urban is really anti-PMO, so everybody's...there's a lot of doers, everyone is doing things, there's not a lot of process around things. So when we did this, it was kind of controversial, because we were like, "Hey, we need this process to do it," and everyone's ears pique up and they're like, "I don't know. I don't know if this is a good idea." So we had to show that it was going to work. I remember having to fight for an $800 Mac Mini to be like, "Hey, I need to do my builds on every commit to dev and GitHub," and my CIO, he's like, "Okay, let's see how this goes."
But the brands started loving it, because they were getting through TestFlight, they would get a build every time that we accepted a pull request into dev, and it really showed that a little bit of process actually gives them a way to show like, "Hey, you know what? I asked for that, you delivered it, it's not actually what I want. What I actually want is this." And we would adjust it and it was like, "Oh, great." And that really...the tooling improvements really resonated with the other teams that were working on the backend. So they were like, "Okay, hey, we should do some of this stuff too."
So, speaking of mobile, I'm sure Jeff would probably want to do that one again, but basically, Jeff Bezos got something totally right, and this is one of the reasons why Amazon has been so successful. In 2002, he actually put out this memo, it was actually leaked by a Google engineer who had been at Amazon at the time, Steve Yegge, actually put out this post a little while ago. But essentially, Jeff Bezos said in 2002, "All teams will henceforth expose their data and functionality through service interfaces," I'm going to paraphrase here, "There will be no other form of interprocess communication allowed, the only communication allowed is via service interface calls over the network. All service interfaces, without exception, must be designed from the ground up to be externalizable, and anyone who does not do that will be fired."
So for us, when we were going into re-platform, we were like, everything needs to be service oriented, because we want to be able to offer these backend capabilities to all of our different brands. We want to be able to roll out a new feature for one brand, and have it immediately be available for the other brand if they want it. So that's kind of how we got into this re-platform initiative. Like I said, it kind of started with the mobile apps, and it got the wheels turning, and we really realized that this was something we had to do.
So in 2014, we put together some goals with the executives. So this did not necessarily come from the bottom up, a lot of the executives were like, "Hey, we want to be able to grow three times the current traffic, and support a lot of mobile initiatives. We want engage the customer in a rich, consistent way. We want to be able to go across channels. We want to do," this omnichannel thing was out there, and "that's become more important for us." Performance is a feature, actually. People show that you can do A/B tests all day long, but if you just deliver the thing super fast, that's going to convert. And we want everyone to consume APIs, and we want to be able to do it fast, so we want to be able to deliver features must faster in the future.
So we put some principles on the table, we worked with...we hired an architect that came out of eBay, and I was working with some people, and we really brought in a lot of really smart engineers around this time, we did some good hiring. But we already had some momentum in place at that point. So I don't know if we would have been able to assemble this team to do this if we tried to do it in 2010. I don't think Urban was really ready at that time to try to do this.
So we wanted to go multitenant, which means that we wanted to be able to have...if we create a service, we should be able to have all brands hit the same cluster if that's what we want to do, based on the URL paths. We wanted to embrace a microservice architecture, which I don't know if anybody here is familiar with microservices, but that allows you to basically refactor your system in pieces, and break it up into pieces, and it gives you a lot of flexibility in how you implement a certain feature.
So you could implement something in one language, and then later on, be like, "Hey, you know what, I want to write this in Go," and just replace that one area of your system without having to replace your entire system. We wanted to go to a stateless backend, which solved the issue that I showed earlier. We wanted OAuth2, which gave us a single view of a customer, or any, actually any user, really. We wanted consistency across not only the customers, but also employees. So just look at like a session is a session, and you get certain capabilities, based on whether you're an employee or whether you're a customer, or both.
We wanted to be able to go to more than two data centers. We've been working out of these two U.S. data centers, we wanted to be able to move into cloud-based infrastructure, put data centers in Europe and Asia, where we're expanding in those areas. There are certain architectural principles and engineering principles that we're trying to abide by, so unit test coverage is really important to this team at this point. So 95% unit test covering, in fact we have more than that on almost all of our services, and we wanted to move to Docker.
So this is a super high level, I've ripped out lots of pieces of this, but it gives you a sense of what we ended up with, which is instead of having this massive Java war that we just deploy, we've broken our system up into a lot more pieces. So all the UIs come through the front door, we've got our platform proxy, which is dynamically updated through our DevOps flows, we've got microservices, so each one of those pieces below the proxy is its own application, and those applications are communicated between each other, so if the check-out service needs some catalog data, it actually calls the catalog service.
And we also at this time, we transitioned largely away from Java into Python, and Python has just been a boon for us. We've been able to just deliver software so much faster in that environment, and it's really fast. We actually use an async approach using the Tornado framework, so we get some of the benefits that you get with Node, but it's actually very fast, and it works well. And we've moved more and more of the interactions with the e-commerce backend, because we actually use the same e-commerce backend, but it's actually off the critical path. So anything that the customer is doing a lot, we moved into more of a document store or into some sort of cache, and then queue things into the backend. And freepeople.com is on this right now, so if you get the app, or you go to freepeople.com, you can experience the difference on performance there.
So this got us into, getting into some of the management speak here, Conway's law is a thing. And it's necessarily a bad thing. Basically the way it goes is, any organization that designs a system will inevitably produce a design whose structure is a copy of the organization's communication structure. So if you just look at this Manu Cornet thing here, it's pretty funny. Google, everybody controls everybody, but the thing about Google is they have a great single sign on.
Amazon, Jeff is at the top, and Facebook, it's just a very flat structure, it's well known, but it's also a graph back-end so it kind of looks like a graph. I love the Oracle one on the bottom there with the legal team on the left. But no offense to lawyers, I'm just saying, it's funny.
So this happened to us. When we started on the re-platform, the brands were still running frontend development, so Free People COO, and they have a development manager, they have a UI lead, we've got our CIO of Urban, which is, that's the shared service, we've got our director of ecommerce, and this is more of a platform thing. And we ended up with an app that kind of does this, where we have a browse application, and we have a check-out application, and nginx is there kind of rolling it all up for us. Not that it's a bad architecture, and it works perfectly fine, it's just like the communication structure really did play out this way, and I thought it was interesting.
So this is what our legacy org looked like. We've actually recently restructured. And you got Free People, Urban, and Anthro, those are our three biggest brands, and internally, they all have product ownership, user experience, they've got BAs, PMs, devs, QA, and then you've got IT back there, which is just handling the backend, and the integrations, and the DevOps, and making sure that the deployments go well. And this is actually funny, a lot of people were talking about this bimodal IT concept lately, and that's what we did, and it got us to a certain point, but it does create a lot of duplication in this type of model. So what we wanted to do was move to where we would build shared components in the backend and, well, I'll move to it.
So this is where we are now. So we've restructured over the last six months, and essentially, the brands bring stuff in from their product ownership and user experience groups, and they bring it into a solution delivery organization, where we have BAs and PMs that help them get tickets written, get documentation written up-front, collect all the business requirements, and then work with the architects to get technical details into a pre-grooming session, and then we move things into more of a formalized grooming session with our engineers, where we'll actually put detail on tickets, and actually get things built. And then they roll through the UI services, QA and DevOps team, and there's different pieces of the feature that needs to get built.
So if you look, this is what we're building now, and it looks a lot like that same structure, actually. So we've got this white light...so some of the goals are the...let me start with the bullet points here, I'm sorry. So we wanted to get to where we had more like 70% shared code under the hood, so this is actually a very common manufacturing model with cars actually, like drive trains, brakes, all kinds of things, you're not going to invent that for each car, you're going to reuse a lot of that stuff. We also wanted 95% unit test coverage, we wanted integration test automation, wanted feature toggles by brand, so if we have say, a superset of 25, 30 features, a brand could take whatever features they want, and leave whatever features they don't want outside of the build.
And then we wanted to be able to override code, and basically customize things wherever necessary. So I don't ever want to have to actually say to somebody, "I literally can't do that." But I want to be able to say like, "We probably shouldn't do that, because there's not enough business value to actually do that. But, we could. And we could test it. Maybe it's worthwhile." So our feature lifecycle looks kind of like this.
Request comes in, we think about things, so a brand might bring something in that's more marketing focused, like, "I want this feature based on this campaign I want to run." But really on the engineering side we're thinking like, "Oh, well that's some social media integration, and we're actually going to call it, instead of like the FP Me thing, we're going to call it like Engage service or something."
And then we can think about things more cross brand, and say like, "Hey, maybe UO wants to use Engage for something," and build it once, and then reuse it. Then we build like a white label implementation of that thing as a feature, a standalone feature, as part of the SDK, and then that will...the frontend team will then continue to customize the...going back to this, the customizations live in those top tiers, so most of the white label is actually right in the SDK at the bottom, and then the white label experience is basically just a representation of the SDK, with no overrides. And then we deliver it. So our goal is to be able to be delivering literally like every day in the future. So if we can do multiple deploys a week, that would be a win.
So some of the cultural things that we've really worked on is things like our agile practice. We didn't start with this agile practice. Some teams practice agile with a capital A, some teams practice it with a lowercase a still, but we practice it. Another thing is, we have technical managers. If you're managing engineers, you have to have been an engineer.
We don't put like a PM and then be like, "Hey, now you're managing this services team." Not that they necessarily couldn't do it, it's just that we want to make sure that they're getting strong leadership from the people they report to and that they have a strong sense of respect for each other, and there's a lot of reasons for that.
We have self organizing sprint teams, so basically teams are free to use the tooling that they want to use, and they can use the pieces of Agile that makes sense for their application. And like I said, with the microservices, it's great, because now, a sprint team can go work on one piece of that thing, and work it end to end and actually get it delivered, without having to worry about all the rest of the pieces of that system. Branding, obviously feature development I already went over, and we try to pick up generalists, we try to develop multiple skills in an engineer, but there's also a need for some specialization as well, so I want to make sure I call that out. And this is really, oh...
Male 1: I have a question. Where's the design [inaudible 00:28:33]
Chris: Yeah, so...I'll go back. So design is in the brands. So it's an area where they work really closely with the UX team. So the UX team and the product owners will come up with their wire frames, and then the UX team will work with their creative leads and their creative director within Urban Outfitters and within Anthropologie. One of the things that we're working on is some standardization on deliverables, so obviously we want them to be as creative as they want to be and deliver whatever they want to deliver in terms of look and feel, but when we get red lines from them, I want them to be the same across brands. We don't have that yet, but yeah...the assets come from the brands.
Male 2: [inaudible 00:29:29]
Chris: No. I don't think that's true, because I've got two guys right here that work in it all day long. So it may come off that way with the way that I organized these slides, and I didn't mean for it to come off that way. So they do, they're our customers, they come in with the features they want, how they want them to look. There is some amount of negotiation there, like if a feature can't be achieved within a given time frame, we have to compromise, and that does happen. But in terms of look and feel, they pretty much have carte blanche in terms of how it should look. Does that make sense?
Male 2: [inaudible 00:30:16]
Chris: I could probably put James Mackenzie up here or one of these other creative directors and he would have a different set of slides, and I would be like one little thing down at the bottom, so it's...maybe it's a little biased in this presentation. So some of the things that we're doing, we're trying to expand into more international markets, there's a lot of challenges there, so we want our application to be built around international issues, localization, multi language, multi country, multi currency, convenience currencies, shipping restrictions, flat rate shipping restrictions, like all these things. The complexity around this explodes. You start talking about things like in China, they've got, what's the name of that processor? Yeah, Alipay. Look it up, it's wild.
So omni-channel, which I'd mentioned earlier, so we're trying to get it into more cross channel personalization, things like buy online, pick up in store, same day delivery, in-store mobile experiences, ship from store, I don't know, there's all kinds of things. Once you have this service platform in place, it becomes so much easier to add capabilities. I don't even know where we're going with this thing, there's so many things we could do at this point, because we're known for our retail experience, so that's one of the advantages of our brand, is that we have so many really awesome experiences, and the customer's already really enjoying that. How can we augment that with this platform? I think that's it. Questions?