what did you learn today?

Status
You're currently viewing only waubers's posts. Click here to go back to viewing the entire thread.
Not open for further replies.

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
I learned that of my 16 production servers, only 4 have service contracts, and of the remaining 12, only 4 of those were produced after todays date 2004 (and thus eligible).

Somehow, we purchased a big-ass VM host last summer without any kind of service contract. An easily $10k (to replace) server running almost half of our production VMs has no service contract, and the guy who held this job before me doesn't have it configured for backups.

Thank god it's at least configured in a RAID10.
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
I learned why it's always a good idea to wait as long as possible to pay your bills.

The UPS that I blew up last night isn't even paid for yet, and there's no way in hell we'll be writing that check until the unit is fixed. I also found out that we hadn't bought a service contract for it yet either, so that'll probably be changing pretty quickly.

An UPS shouldn't have a problem with having power restored to it, right? I'm not just hallucinating or something.
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
Hmm, vendor wants to bill us for the fault on the UPS. While I admit, it's likely that I did something wrong during the bypass procedure, they failed to provide me with any procedures or documentation outside a beat up copy of the manual, which didn't cover the external wrap-around bypass that caused the fault to begin with.

I think I'll say we cover parts and they eat labor and call it a day, because I really feel like (and yeah, this makes me look stupid) this wouldn't have happened with some actual training or a written procedure.

Sound reasonable?
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
Originally posted by llib:

Not much tech info in your posts, but it sounds like you left the UPS in inverter mode (or "on line", whatever...) when you enabled the external wrap-around. Big no-no. UPS always goes to internal bypass before enabling external wrap.

That's the thing, I did have it in internal bypass, via the manual switch on the UPS, which is why I was more than a little surprised when it happened. That's what was confusing about the damn thing, I it has an on-off key, a bypass panel button and a manual bypass switch.

Whatever, we'll see what shakes out. The unit wasn't made by the same company as the external bypass, but it was sold and installed by the same vendor.

Hmm, I should probably tell my boss what happened too.
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
Originally posted by llib:
Hmmm... Two possibilities come to mind. First is that your bypass source isn't from the same source as your main input source. Our bypass source was from another transformer/switchboard, which caused us no end of misery. Maybe there's a phase difference between sources. Second possibility is a swapped phase or phase rotation error between the UPS input and the external wrap source. Wiring should be color coded: phase A is black or brown (depending on 208v or 480v), B is red or orange, C is blue or yellow.

If the vendor installed the UPS and the bypass, he should man-up and be on the hook for some of your misery... -- :) --

And yes, you should tell your boss (who is thinking his IT hardware is protected)...


Well, there was a phase rotation problem that the installer fixed. I'm thinking I probably did fuck up the order, but I'm almost 99.99% certain I had the unit in bypass when it happened.

The good news is this UPS was installed in our new facility, which we don't move into till Friday. There was zero load on the UPS when this happened, and all the breakers in the DC were off, so there isn't anything un-protected, because there's nothing there to protect yet.

I CC'd the sales guy for our account when I sent my reply to the repair tech on Friday. Since we've yet to even pay for the unit and they're we're still deciding on a service contract, my hope is that the sales guy tells the repair tech (who, honestly, was a prick in his email) to STFU and just waives it or offers to compromise on it. Yes, I probably caused this to happen, and I'm an idiot for not thinking clearer about what I was doing, but I didn't have proceedure to follow, which I've always gotten from this vendor in the past (previous company bought two 150kva Mitsu's from these guys) and I didn't even sign off on delivery, I just was emailed a test result for when they turned up the unit. While I'm to blame on the damage, they've got to answer for not following their normal delivery process and providing the normal delivery documentation (that they always have in the past).
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
Well, they've replaced two fuses, a control panel and a component that converts current off the DC bus to AC and still nothing. Now they're thinking the UPS might be toast. The good news is that they don't think a botched bypass execution would cause this problem, especially with no load present, so it looks like the UPS might have been bad or had hidden issues with this thing (it was a refurb, water damage to the control board).

Management is up to speed and I got told "don't sweat it, this is why we have insurance" by our COO. One good thing is that the work done by the vendor was not really authorized. I asked Friday evening for an estimate as to what the costs would be, but they had already started replacing parts and had two techs on site and have racked up like $2k in labor, and ultimately come up with nothing. So, lots to talk about tomorrow.
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
Well, looks like my botched bypass is going to cost us $7k to make go away.

Good news though, our insurance will probably cover it, so really it might only cost us $3k ($1k deductible, and $2k extra due to the increased cost of having to buy a new unit vs. refurb).

The vendor came back with a very good offer to get things right, admitted some fault in that they didn't provide proper training and turn-up documentation and that they overstepped their bounds by performing work before it was authorized.

They offered to give us a credit for about 40% of the value of the blown unit and sell us a new one for cost + expenses ($10k rather than $12k). Plus they're giving us 120v loaner UPS to cover us until we get the new one installed.

I think my job is safe and no one here is busting my balls too hard about this, so I'm grateful for that. Now I'm just hyper-focused on our move on Friday.
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
I learned that sometimes the Dell tech doesn't know what he's talking about, and that there's no way to increase the size of a RAID 10 virtual disk in on a Dell Perc6i.

I also learned what a huge fucking pain in the ass it is to upgrade the raid firmware and BIOS when you're running ESXi.

Seriously, it's a fucking nightmare.
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
"Training" budget of $5k, can go toward a variety of things. Planning to burn $1000 on Cisco certs right now, but the other $4k is good for courses or conference events.

So, what conferences should I, as an IT generalist, responsible for everything from a MS Exchange environment, AD, Cisco-based unmanaged WAN, VDI/VMWare/SAN environment look for?

Cisco Live and VMWorld seem obvious, but I'm curious what others are out there.
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
Apparently you can't rely on the UPS power of your colo provider.

They tell me a breaker failed between our PDUs and their UPS units. Thus, at 3:23am, all three racks of our gear went black.

Finally got everything up and running about an hour ago. My CIO is livid, and I can't say I'm much better. Between their inability to provide a working SIP trunk, their inability to deliver 50mbps (as our contract states) internet access, and the constant BGP issues on their MPLS, I'm pretty sure we'll be colo shopping starting late this year.

I just don't much feel like relocating 3 full racks of gear, in WI in early spring.
 

waubers

Ars Tribunus Angusticlavius
7,691
Subscriptor++
I learned that Windstream is border-line incompetent.

They turned off my datacenter access (we pay for 3 racks of colo with them) without telling me. Made my scheduled downtime/maint. on Friday a whole lot of fun (was upgrading our Cisco UCS to 192GB RAM for 3 blades and upgrading firmware to 2.03).

Heads are going to roll on this one.
 
Status
You're currently viewing only waubers's posts. Click here to go back to viewing the entire thread.
Not open for further replies.