Cloud Computing - 'Every silver lining has a dark cloud' What they don't tell you that most CTO's already know.

 

Bad Cloud Computing

After rain comes the sun, so be carefull which cloud you choose!
Vapourware warning

Data center and 'Cloudy' facts.

  • Is 'cloud computing' via a data center ECO-friendly? NO.
  • Is 'cloud computing' via a data center reliable? NO. [unless you have local live replication or backup] (Yahoo outage 2012)
  • Are (hosted) 'private clouds' secure NO, or maybe as secure as your vendor tries to convince you of. This still needs to be tested by a lawsuit.
  • Can 'cloud computing and services via a third party data center save costs? YES. If used correctly it can, but watch 'bursting and bandwidth'.

'Hey you! Get off of my cloud'

Choosing the right vendor and the right cloud is important. And here my experience with everyday Internet takes a nasty turn ;)

Data centers are a relatively new term, sure...Google used them in the 1870's right! Problem is, any corporation with billions to spare and even impoverished governments can take control and wreak havoc in the atmosphere where clouds are created, managed and run. I do little to no good to those cloud opperators that know their business and do their business well, but prefer to examine the darker side here. So be aware of clouds, rain, thunder, hail, tornadoes (and other real ECO concerns) and choose your cloud wisely.

What is the problem?

Imagine a 'corporation' offers ++++$ to have access to a data center (cloud) and then does....

  • USE(less) Unidentifiable Scraper Entities. Contantly Probing, scraping or gathering data with no identification or obnoxious and stupid names and without reference to a company name or web page. And if a reference is available, it leads to a commercial site claiming they know exactly how you can beat your competition online, but at a price.
  • Break fundamental Internet rules (ignore robots.txt instructions for website they crawl).
  • Randomly or selectively use all your available IP addresses faster than the speed of light (correction - electrons).
  • Do this at a rate that increases the load on the Internet located target services.
  • Undertake this in a way that has maximum profit for them with no return to the source they got (stole) the information from in the first place. In real life (yes this does exist) it's called a parasite.

Bad cloudy misty business...Right here right now, right under your nose.

Currently three distinct 'dark cloud' signatures register very clearly on daily Internet traffic. But, Hey, I'm seeing silver linings and that's my goal ;)

  • amazonaws Data Center: The birthplace of all weird, odd, strange, bad, conniving (also a few apparently innocent) automated outbound undertakings on Internet. I have yet to see a true 'freedom fighter' here at all!. They're all in it for the (their fortune) money, and totally at everyone else's online expense! No payback is evident in any way. Occationally a lost soul, like 'Pinterest' can found tucked inbetween some nasties, so don't just dump the complete IP range.
  • Chinese. Okay, you guessed it...Duh! But do take into account the cultural differences (please). It's actually amazing the vast IP resources they throw into this all and for What? I do not know, while they consume bandwidth at an alarming rate. PS: They should never say the world has run out of IP numbers! Just ask the corporations and governments to stop being greedy and try to poke and probe at everything all the time from every possible (IP) angle, JUST ASK! ;).
  • Reluctance. These are (well known) known entities that are quitely (and sparingly) venturing into cloud and data centers and utilizing their resources in a cunning way to find out what what they want to know. They pretend to be 'real users', using current browser header identification (and even pretend to be Google and other well known search engine bots sometimes). Some are getting really smart and even 'pseudo' mimic real user behaviour, which until now was one of the simplest ways to catch then..unnatural behaviour.
  • Alertness. Monitor external data gatherers carefully, you will find that even the 'trusted' search engines cannot keep their prying eyes out of browser use behaviour. This sees a few (including Google) tracking and checking what 'a' user got in reply to an internet 'HACK' access attempt. For this automated system it's equivalent to a real hack access attempt. Just goes to show...who's watching who? Would love to see Google's data on this topic..Open data guru's, this is a tip, they know a lot more than what they let on ;)

Choose your cloud provider well if you're and outbound service

If not, you may land up having vast resources at your disposal with nasty neighbours that have a bad reputation and many will avoid or block your access

You can see the low to no value automation against our websites we have detected and prohibited here. The cloud based data center ranges will become very apparent. In certain cases you may say it is counteractive to SEO and other Internet marketing techniques. We have found the contrary to be the case. The data consumption has decreased by more than 45%, bandwidth availability and site speed has increased, so have the visitors, and those sites with 'adds' have not noticed any impact in revenue at all.

Internet is transforming from a static content provision environment into an interactive real-time communication medium. Bringing with it amazing possibilities (big open data ;) and also abuse. A word to the wise...'Fisrt fix your mess before you create new methods and services for the greater good.'

Solutions for black cloud data centers

Enough ranting, lets fix this stuff as much as we can. Since 2006 I have seen mainly USA based 'bad swarm activity' from "Amazon AWS" over the years, true to Moore's Law, this problem has become epidemic of nature and also global in nature. Hundreds to thousands of IP addresses spread accross multiple IP ranges are used by one and the same unsavoury agents for unclear to outright malicious purposes.

If you take the trouble to trace the IPs used you'll be surprised that they are served by very few masters. For this exercise we will target Amazon AWS servers as these are clearly IMO the cestpool of Internet and have been for many years.

Rules of the game

  • IP addresses coming from data centers should not pretend to be 'human', and should identify themselves and honour the Internet code of conduct.

  • We will use a reactive approach (fix it after the damage is [partially[ done). We must have a positive hit on any and each IP address before it is terminated.

  • You could, but we don't, block the complete registered block. We will stick to an IP to IP registration as they hit port 80 and are verified.

  • Use selective unknown IP checking using 'Pattern Processing' to highlight dubious activity and escalate it for verification using this solution. At present it is beyond the scope here and we assume everyone is subject to verification.

  • This solution assumes you how which datacenter(s) you want to block and have knowledge of there Internet facing infrastucture registration(s). These are available from all International NIC organizations online.

The same principles we used for 'User Agent Spider Wasp' will apply here again. A cached xml comparitor list to check against.
The idea is to block all IP addresses that have source instances like:

ec2-50-18-24-18.us-west-1.compute.amazonaws.com,
ec2-174-129-32-219.compute-1.amazonaws.com,
ec2-50-17-154-105.compute-1.amazonaws.com,
ec2-107-20-176-132.compute-1.amazonaws.com,
ec2-184-73-230-215.compute-1.amazonaws.com,
ec2-75-101-246-218.compute-1.amazonaws.com,
ec2-176-34-203-24.eu-west-1.compute.amazonaws.com,
....and loads more!!!

Starting to see the problem: They are all over the place and apparently buying up (and infesting) all available and abandoned IP ranges on a global level.

Using RDNS (reverse domain name services) to get the host name from an IP address

In order to find out who an IP address belongs to, we use rdns, to see how it works, type an IP address in the box below and I find the host for you.

Enter an Ipv4 IP address in the format xxx.xxx.xxx.xxx

This is the process we will use to detect IP addresses that belong to same (owner) data center.

Constructing the Bad Cloud Blocker

Creating an automated hosts detector to filter for data center hosts connecting to your services in three easy steps

1. Adapable XML format host names file

Place this xml file in the root of your website.

<?xml version="1.0" encoding="utf-8" ?>

<hosts>

  <host>.amazonaws.com</host>

</hosts>

2. Global.asax inbound IP extraction

Place this method in the 'Application_BeginRequest' routine in Global.asax

        Dim MyHostIP As String = HttpContext.Current.Request.UserHostAddress

        If Not IsNothing(MyHostIP) Then

            If ValidationService.IsBadHost(MyHostIP) Then

                Response.Buffer = False

                Response.Clear()

                Response.StatusCode = 403 '// Forbidden (change to 400 - Bad Request, 401 - Unauthorized or 404 - Not Found, not that these bots bother to react to any of these.

                Response.Flush()

                Response.End()

                Exit Sub

            End If

        End If

 

Operations

  • Place the host name TLDs in the xml file. One TLD per entry.
  • Adapt Application_BeginRequest to get the ip number of the incoming requests.
  • Pass the IP numbers to the IsBadHost routine to test it against the incoming IP.
  • true = the IP is a member of a confirmed host entry and action should be undertaken.
  • false = the IP is not a member of the host entry list and can pass through.

This is not the ultimate data center blocking method, but will potentially stop all designated data centers from accessing your information, irrespective of which IP address they choose to use.

This is free source code and free rdns service provided as-is without any warranty or guarantee whatsoever. Do not automate against this rdns service.

3. IP to Host Name routine

Place these methods and class in the class 'ValidationService'

Public Shared Function IsBadHost(Ip As String) As Boolean

        Dim OK As Boolean = False

        Try

            Dim Host As String

            Dim Hosts As String()

            Dim ObjectCache As New MyCache

            If Not ObjectCache.MyCacheContains("hosts") Then

                '// Check if BotList in Cache, else get it and cache it.

                Hosts = GetHostsStringArray(AppDomain.CurrentDomain.BaseDirectory + "BadHostList.xml")

                Dim lstFiles As List(Of [String]) = New List(Of String)()

                lstFiles.Add(AppDomain.CurrentDomain.BaseDirectory + "BadHostList.xml")

                ObjectCache.addToMyCache("hosts", Hosts, MyCachePriority.[Default], lstFiles)

            Else

                '// Got in cahce

                Hosts = DirectCast(ObjectCache.GetMyCachedItem("hosts"), String())

            End If

            Dim MyHostName As String = System.Net.Dns.GetHostEntry(Ip).HostName.ToString

            For Each Host In Hosts

                If MyHostName.Contains(Host) Then

                    Return True

                End If

            Next

        Catch ex As Exception

            Errorhandler.ErrorHandler(0, ex.ToString)

        End Try

        Return OK

    End Function

 

    Public Shared Function GetHostsStringArray(path As String) As String()

        Dim bots As String() = {}

        Try

            Dim doc = XDocument.Load(path)

            ' Select all bot entries

            Dim MyServices = From service In doc.Descendants("host") Select service.Value

            bots = MyServices.ToArray()

        Catch ex As Exception

            Errorhandler.ErrorHandler(0, ex.ToString)

        End Try

        Return bots

    End Function