September 2018 - Managed IT Services | IT Support | SF Bay Area & Los Angeles

How to properly load balance your backup infrastructure

Veeam Backup & Replication is known for ease of installation and a moderate learning curve. It is something that we take as a great achievement, but as we see in our support practice, it can sometimes lead to a “deploy and forget” approach, without fine-tuning the software or learning the nuances of its work. In our previous blog posts, we examined tape configuration considerations and some common misconfigurations. This time, the blog post is aimed at giving the reader some insight on a Veeam Backup & Replication infrastructure, how data flows between the components, and most importantly, how to properly load balance backup components so that the system can work stably and efficiently.

Overview of a Veeam Backup & Replication infrastructure

Veeam Backup & Replication is a modular system. This means that Veeam as a backup solution consists of a number of components, each with a specific function. Examples of such components are the Veeam server itself (as the management component), proxy, repository, WAN accelerator and others. Of course, several components can be installed on a single server (provided that it has sufficient resources) and many customers opt for all-in-one installations. However, distributing components can give several benefits:

For customers with branch offices, it is possible to localize the majority of backup traffic by deploying components locally.
It allows to scale out easily. If your backup window increases, you can deploy an additional proxy. If you need to expand your backup repository, you can switch to scale-out backup repository and add new extents as needed.
You can achieve a High Availability for some of the components. For example, if you have multiple proxies and one goes offline, the backups will still be created.

Such system can only work efficiently if everything is balanced. An unbalanced backup infrastructure can slow down due to unexpected bottlenecks or even cause backup failures because of overloaded components.

Let’s review how data flows in a Veeam infrastructure during a backup (we’re using a vSphere environment in this example):

All data in Veeam Backup & Replication flows between source and target transport agents. Let’s take a backup job as an example: a source agent is running on a backup proxy and its job is to read the data from a datastore, apply compression and source-side deduplication and send it over to a target agent. The target agent is running directly on a Windows/Linux repository or a gateway if a CIFS share is used. Its job is to apply a target-side deduplication and save the data in a backup file (.VKB, .VIB etc).

That means there are always two components involved, even if they are essentially on the same server and both must be taken into account when planning the resources.

Tasks balancing between proxy and repository

To start, we must examine the notion of a “task.” In Veeam Backup & Replication, a task is equal to a VM disk transfer. So, if you have a job with 5 VMs and each has 2 virtual disks, there is a total of 10 tasks to process. Veeam Backup & Replication is able to process multiple tasks in parallel, but the number is still limited.

If you go to the proxy properties, on the first step you can configure the maximum concurrent tasks this proxy can process in parallel:

For normal backup operations, a task on the repository side also means one virtual disk transfer.

On the repository side, you can find a very similar setting:

For normal backup operations, a task on the repository side also means one virtual disk transfer.

This brings us to our first important point: it is crucial to keep the resources and number of tasks in balance between proxy and repository. Suppose you have 3 proxies set to 4 tasks each (that means that on the source side, 12 virtual disks can be processed in parallel), but the repository is set to 4 tasks only (that is the default setting). That means that only 4 tasks will be processed, leaving idle resources.

The meaning of a task on a repository is different when it comes to synthetic operations (like creating synthetic full). Recall that synthetic operations do not use proxies and happen locally on a Windows/Linux repository or between a gateway and a CIFS share. In this case for normal backup chains, a task is a backup job (so 4 tasks mean that 4 jobs will be able to generate synthetic full in parallel), while for per-VM backup chains, a task is still a VM (so 4 tasks mean that repo can generate 4 separate VBKs for 4 VMs in parallel). Depending on the setup, the same number of tasks can create a very different load on a repository! Be sure to analyze your setup (the backup job mode, the job scheduling, the per-VM option) and plan resources accordingly.

Note that, unlike for a proxy, you can disable the limit for number of parallel tasks for a repository. In this case, the repository will accept all incoming data flows from proxies. This might seem convenient at first, but we highly discourage from disabling this limitation, as it may lead to overload and even job failures. Consider this scenario: a job has many VMs with a total of 100 virtual disks to process and the repository uses the per-VM option. The proxies can process 10 disks in parallel and the repository is set to the unlimited number of tasks. During an incremental backup, the load on the repository will be naturally limited by proxies, so the system will be in balance. However, then a synthetic full starts. Synthetic full does not use proxies and all operations happen solely on the repository. Since the number of tasks is not limited, the repository will try to process all 100 tasks in parallel! This will require immense resources from the repository hardware and will likely cause an overload.

Considerations when using CIFS share

If you are using a Windows or Linux repository, the target agent will start directly on the server. When using a CIFS share as a repository, the target agent starts on a special component called a “gateway,” that will receive the incoming traffic from the source agent and send the data blocks to the CIFS share. The gateway must be placed as close to the system sharing the folder over SMB as possible, especially in scenarios with a WAN connection. You should not create topologies with a proxy/gateway on one site and CIFS share on another site “in the cloud” — you will likely encounter periodic network failures.

The same load balancing considerations described previously apply to gateways as well. However, the gateway setup requires an additional attention because there are 2 options available — set the gateway explicitly or use an automatic selection mechanism:

Any Windows “managed server” can become a gateway for a CIFS share. Depending on the situation, both options can come handy. Let’s review them.

You can set the gateway explicitly. This option can simplify the resource management — there can be no surprises as to where the target agent will start. It is recommended to use this option if an access to the share is restricted to specific servers or in case of distributed environments — you don’t want your target agent to start far away from the server hosting the share!

Things become more interesting if you choose Automatic selection. If you are using several proxies, automatic selection gives ability to use more than one gateway and distribute the load. Automatic does not mean random though and there are indeed strict rules involved.

The target agent starts on the proxy that is doing the backup. In case of normal backup chains, if there are several jobs running in parallel and each is processed by its own proxy, then multiple target agents can start as well. However, within a single job, even if the VMs in the job are processed by several proxies, the target agent will start only on one proxy, the first to start processing. For per-VM backup chains, a separate target agent starts for each VM, so you can get the load distribution even within a single job.

Synthetic operations do not use proxies, so the selection mechanism is different: the target agent starts on the mount server associated with the repository (with an ability to fail over to Veeam server if the mount server in unavailable). This means that the load of synthetic operations will not be distributed across multiple servers. As mentioned above, we discourage from setting the number of tasks to unlimited — that can cause a huge load spike on the mount/Veeam server during synthetic operations.

Additional notes

Scale-out backup repository. SOBR is essentially a collection of usual repositories (called extents). You cannot point a backup job to a specific extent, only to SOBR, however extents retain some of settings, including the load control. So what was discussed about standalone repositories, pertains to SOBR extents as well. SOBR with per-VM option (enabled by default), the “Performance” placement policy and backup chains spread out across extents will be able to optimize the resource usage.

Backup copy. Instead of a proxy, source agents will start on the source repository. All considerations described above apply to source repositories as well (although in case of Backup Copy Job, synthetic operations on a source repository are logically not possible). Note that if the source repository is a CIFS share, the source agents will start on the mount server (with a failover to Veeam server).

Deduplication appliances. For DataDomain, StoreOnce (and possibly other appliances in the future) with Veeam integration enabled, the same considerations apply as for CIFS share repositories. For a StoreOnce repository with source-side deduplication (Low Bandwidth mode) the requirement to place gateway as close to the repository as possible does not apply — for example, a gateway on one site can be configured to send data to a StoreOnce appliance on another site over WAN.

Proxy affinity. A feature added in 9.5, proxy affinity creates a “priority list” of proxies that should be preferred when a certain repository is used.

If a proxy from the list is not available, a job will use any other available proxy. However, if the proxy is available, but does not have free task slots, the job will be paused waiting for free slots. Even though the proxy affinity is a very useful feature for distributed environments, it should be used with care, especially because it is very easy to set and forget about this option. Veeam Support encountered cases about “hanging” jobs which came down to the affinity setting that was enabled and forgotten about. More details on proxy affinity.

Conclusion

Whether you are setting up your backup infrastructure from scratch or have been using Veeam Backup & Replication for a long time, we encourage you to review your setup with the information from this blog post in mind. You might be able to optimize the use of resources or mitigate some pending risks!

This article was provided by our service partner veeam.com

Unsecure RDP Connections are a Widespread Security Failure

While ransomware, last year’s dominant threat, has taken a backseat to cryptomining attacks in 2018, it has by no means disappeared. Instead, ransomware has become a more targeted business model for cybercriminals, with unsecured remote desktop protocol (RDP) connections becoming the favorite port of entry for ransomware campaigns.

RDP connections first gained popularity as attack vectors back in 2016, and early success has translated into further adoption by cybercriminals. The SamSam ransomware group has made millions of dollars by exploiting the RDP attack vector, earning the group headlines when they shut down government sectors of Atlanta and Colorado, along with the medical testing giant LabCorp this year.

Think of unsecure RDP like the thermal exhaust port on the Death Star—an unfortunate security gap that can quickly lead to catastrophe if properly exploited. Organizations are inadequately setting up remote desktop solutions, leaving their environment wide open for criminals to penetrate with brute force tools. Cybercriminals can easily find and target these organizations by scanning for open RPD connections using engines like Shodan. Even lesser-skilled criminals can simply buy RDP access to already-hacked machines on the dark web.

Once a criminal has desktop access to a corporate computer or server, it’s essentially game over from a security standpoint. An attacker with access can then easily disable endpoint protection or leverage exploits to verify their malicious payloads will execute. There are a variety of payload options available to the criminal for extracting profit from the victim as well.

Common RDP-enabled threats

Ransomware is the most obvious choice, since it’s business model is proven and allows the perpetrator to “case the joint” by browsing all data on system or shared drives to determine how valuable it is and, by extension, how large of a ransom can be requested.

Cryptominers are another payload option, emerging more recently, criminals use via the RDP attack vector. When criminals breach a system, they can see all hardware installed and, if substantial CPU and GPU hardware are available, they can use it mine cryptocurrencies such as Monero on the hardware. This often leads to instant profitability that doesn’t require any payment action from the victim, and can therefore go by undetected indefinitely.

Solving the RDP Problem

The underlying problem that opens up RDP to exploitation is poor education. If more IT professionals were aware of this attack vector (and the severity of damage it could lead to), the proper precautions could be followed to secure the gap. Beyond the tips mentioned in my tweet above, one of the best solutions we recommend is simply restricting RDP to a whitelisted IP range.

However, the reality is that too many IT departments are leaving default ports open, maintaining lax password policies, or not training their employees on how to avoid phishing attacks that could compromise their system’s credentials. Security awareness education should be paramount as employees are often the weakest link, but can also be a powerful defense in preventing your organization from compromise.

This article was provided by our service partner : webroot.com

Get your data ready for vSphere 5.5 End of Support

There have been lots of articles and walkthroughs on how to make that upgrade work for you, and how to get to a supported level of vSphere. This VMware article is very thorough walking through each step of the process.

But we wanted to touch on making sure your data is protected prior, during and after the upgrade events.

If we look at the best practice upgrade path for vSphere, we’ll see how we make sure we’re protected at each step along the way:

Upgrade Path

The first thing that needs to be considered is what path you’ll be taking to get away from the end of general support of vSphere 5.5. You have two options:

vSphere 6.5 which is now going to be supported till November 2021 (so another 5 years’ time)
vSphere 6.7 which is the latest released version from VMware.

Another consideration to make here is support for surrounding and ecosystem partners, including Veeam. Today, Veeam fully supports vSphere 6.5 and 6.7, however, vSphere 6.5 U2 is NOT officially supported with Veeam Backup & Replication Update 3a due to the vSphere API regression.

The issue is isolated to over-provisioned environments with heavily loaded hosts (so more or less individual cases).

It’s also worth noting that there is no direct upgrade path from 5.5 to 6.7. If you’re currently running vSphere 5.5, you must first upgrade to either vSphere 6.0 or vSphere 6.5 before upgrading to vSphere 6.7.

Management – VMware Virtual Center

The first step of the vSphere upgrade path after you’ve decided and found the appropriate version, is to make sure you have a backup of your vCenter server. The vSphere 5.5 virtual center could be a Windows machine or it could be using the VCSA.

Both variants can be protected with Veeam, however, the VCSA runs on a Postgres-embedded database. Be sure to take an image-level backup with Veeam and then there is a database backup option within the appliance. Details of the second step can be found in this knowledge base article.

If you’re an existing Veeam customer, you’ll already be protecting the virtual center as part of one of your existing backup jobs.

You must also enable VMware tools quiescence to create transactionally-consistent backups and replicas for VMs that do not support Microsoft VSS (for example, Linux VMs). In this case, Veeam Backup & Replication will use the VMware Tools to freeze the file system and application data on the VM before backup or replication. VMware Tools quiescence is enabled at the job level for all VMs added to the job. By default, this option is disabled.

You must also ensure Application-Aware Image Processing (AAIP) is either disabled or excluded for the VCSA VM.

Virtual Machine Workloads

If you are already a Veeam customer, then you’ll already have your backup jobs created and working with success before the upgrade process begins. However, as part of the upgrade process, you’ll want to make sure that all backup job processes that initiate through the virtual center are paused during the upgrade process.

If the upgrade path consists of new hardware but with no vMotion licensing, then the following section will help.

Quick Migration

Veeam Quick Migration enables you to promptly migrate one or more VMs between ESXi hosts and datastores. Quick Migration allows for the migration of VMs in any state with minimum disruption.

More information on Quick Migration can be found in our user guide.

During the upgrade process

As already mentioned in the virtual machine workloads section, it is recommended to stop all vCenter-based actions prior to update. This includes Veeam, but also any other application or service that communicates with your vCenter environment. It is also worth noting that whilst the vCenter is unavailable, vSphere Distributed Resource Scheduler (DRS) and vSphere HA will not work.

Veeam vSphere Web Client

If you’re moving to vSphere 6.7 and you have the Veeam vSphere Web Client installed as a vSphere plug-in, you’ll need to install the new vSphere Veeam web client plug-in from a post-upgraded Veeam Enterprise Manager.

More detail can be found in Anthony Spiteri’s blog post on new HTML5 plug-in functionality.

You’ll also need to ensure that any VMware-based products or other integrated products vCenter supports are the latest versions as you upgrade to a newer version of vSphere.

Final Considerations

From a Veeam Availability perspective, the above steps are the areas that we can help and make sure that you are constantly protected against failure during the process. Each environment is going to be different and other considerations will need to be made.

Another useful link that should be used as part of your planning: Update sequence for vSphere 5.5 and its compatible VMware products (2057795)

One last thing is a shout out to one of my colleagues who has done an in-depth look at the vSphere upgrade process.

This article was provided by our service partner : Veeam.com

An MSP Guide to Happy Customers

Shawn Lazarus brings a fresh take to marketing strategy with his engineering background and global perspective. He currently oversees the product marketing, social media, digital marketing, and brand management for OnPage which helps an MSP server their clients better.

An MSP’s ability to do effective work depends on their technical expertise. However, their ability to ensure customer satisfaction is what really grows their business. This is because high levels of customer satisfaction retain current clients and win over future ones. Given this reality, an MSP needs to be strategic about how they approach their work as every interaction is an opportunity to improve customer satisfaction.

With a few customer service guidelines, you can ensure you’re not only impressing potential clients, but also keeping existing ones happy. Not sure where to start? The tips below will help you improve communications with your clients, as well as establish business practices that reinforce the importance of customer relationships.

Develop a Strong Onboarding Policy

The first way to establish a strong foundation for customer satisfaction is to develop a strong onboarding policy. Onboarding defines the process of how new customers get integrated into your company’s workflow. The onboarding process should include straight-forward tasks such as cataloging a customer’s infrastructure, migrating the customer to a standard set of technology offerings, refreshing old hardware, and rolling in standard MSP tool sets.

However, the most important aspect of onboarding is documentation. Document the key processes for any new client you onboard so that when their technology is not functioning properly, your engineers can come in and properly assess and diagnose the problem. This documentation includes important items such as site details, site plans, and credentials for logging in to important software.

These expectations should be articulated during customer training so that the customer is apprised of how to contact your company and what to expect when they call during business hours or after hours. After their onboarding is complete, customers should know the answer to basic service questions, such as how long they should expect to wait until someone returns their call.

Set Client Expectations with SLAs

After onboarding a new client, a good MSP will provide them with a plan that offers a clearly defined time period in which issues will be fixed. This plan will also promise the client updates on all key stages of the incident resolution process. This is the essence of the MSP’s service level agreement (SLA). A strong SLA will help ensure you’re managing expectations, communicating effectively, and executing properly.

Manage Expectations: Focus on setting expectations around the time it will take you to call customers back, how long it will take until a tech is on site to fix an issue, and what a typical maintenance schedule will look like. Map out the customer journey and describe what the customer can expect from your business from beginning to end.
Effectively Communicate: There is often a lot of stumbling with communication. For effective client communications, develop a plan that helps you articulate the big picture. The plan should explain what happens when an issue arises for the customer and how you will respond. If there is the need for disaster recovery, how will you handle the process to get them back online?
Execute Properly: Effective execution is about getting the project done in a timely manner. When you’re unable to effectively execute, it’s likely because you lack either the time or personnel to deliver on what you’ve promised. You can accomplish this with two essential elements: collaboration and coordination. A business management software like ConnectWise Manage® enables you to easily centralize documentation, assign action items, track progress toward due dates, and report down to individual tasks seamlessly—ensuring no late or unfinished projects. In addition, employing an Incident Management tool like OnPage allows you to effectively execute alerts that come in from ConnectWise Manage in a timely manner. MSPs can equip even the smallest of staffs with an Incident Management Platform that allows the team to work after hours in shifts and handle alerts that are deftly sent to their smartphones, so they can attend to the alerts anytime, anywhere.

Leverage Automation

How much time is your staff spending on repetitive (yet necessary) tasks? How much more time would they have during the day if some of these tasks were automated? When repetitive yet necessary tasks are performed automatically with the right tools, the work is completed quickly and with a lower risk of human error. This also enables your staff to apply their advanced knowledge and expertise to more critical projects.

Some repeatable tasks you can automate include prioritizing tickets, routine maintenance, software upgrades, disk updates, patching, or end-virus remediation. By having tools automate these processes, your staff can spend more time on projects, planning, or providing proactive service to your customers—keeping them happy.

To take full advantage of the benefits of business automation, you’ll need tools such as a remote monitoring and management system, a ticketing system, a customer relationship management system (CRM), anti-virus solution, a remote access tool, malware detection solution, a critical alerting tool, and PBX systems—depending on your business’s specific needs.

While keeping customers happy sounds like it should be straightforward, anyone who has worked as an MSP knows that customer satisfaction is the secret sauce that separates one MSP from another. Simple things like clear communications with your clients and introducing a few new tools to your tool kit can make a huge difference in how your clients see you. By setting clear expectations and freeing up your staff from manual, repeatable tasks, you can develop strong relationships with new customers and keep your existing customers happy.

This article was provided by our service partner : Connectwise