Docstoc

320

Document Sample
320 Powered By Docstoc
					                              Transcript for 2008 VEHU 320


VistA Imaging: Who's Minding the Storage


Bonnie: Welcome everybody, I welcome the LiveMeeting attendees as well. My name
is Bonnie Matlack, and our storage class is 320, Who's Minding the Storage? I'm from
the Kansas City VA Medical Center and just for a little background information I really
had a totally clinical background, so between Joan and I, the other presenter, she's very
technical, I'm the clinical. We've been using VistA Imaging since 1999 at Kansas City.
We're an integrated database in VISN 15. We've completely used only VistA Imaging
and VistARad, so we don't have a PACS at our site. Joan however, her site brings in the
PACS atmosphere as well.


Joan, do you want to introduce yourself?


Joan: I'm Joan Weil, and I'm the VistA Imaging coordinator. As Bonnie said, we are
using a commercial PACS at our system. I've been with the IT department since 1992
supporting laboratory and CPRS and radiology and VistA Imaging when we started that
in 2000, but then at least for the last few years I was able to give up some of the others
and now I just support radiology and VistA Imaging.


Bonnie: We also have at our table up here, and we need to give special thanks to a few
people who really put in a lot of time and effort in this presentation. Richard Price, who's
the software developer for the background processor, Lucille Barrios, Mr. Dennis
Follensbee who's here, and Ron Paulin.


This is going to be a course, it's jam packed. The background processor, verifier, it's got
a lot of subject matter to it and we're going to try and do it justice. We're going to cover
background processor functionality and we're going to refer it with the acronym of BP
quite often. The current version everyone should be using is Patch 81. The exciting
thing about this presentation is it's going to give you a little small window into the new
version of software that we hope to be released this year, and that's going to be referred to
as Patch 39. So you're going to see a lot of new functionality in that new patch in this
                               Transcript for 2008 VEHU 320


presentation, so this will be your first window. We're also going to tell you a little bit
about the verifier and tell you the benefits of running the verifier, what you do about
failed images, application activity log files, the purge function, backups, we'll talk a little
bit about jukeboxes, the archive appliance which is a new storage device that's approved,
and then last but not least, and very not least, contingency planning.


Okay, the background queue processor, when you launch the software here is the basic
initial window that comes up. The background processor should be something that is
running continuously, however if you have network outages that however will not enable
the background processor to function. So that would be an instance when it wouldn't run.


The queue processor, the main functionality of it is to distribute newly acquired images
that have been received from modalities, retrieve archived images from your jukebox or
whatever your long-term storage is. It also triggers the purge function, and in Patch 39,
and notice the bold font and the Patch 39, the P39 in parentheses, you'll see that
throughout the presentation and that will signify new items, new functions, that come
with Patch 39. Patch 39's going to let you control the verifier as well within the
background processor, so that's new.


You're going to manage disk space with the background processor as well, so it writes
and distributes images to your RAID appropriately. We're going to talk first about the
queues for the background processor, and this list is the list of queues in the order of
priority for the background processor. So the (JBTOHD) jukebox to hard disk, PREFET,
abstract, the import queue, jukebox, delete, GCC, and new to Patch 39 is the EVAL.


Jukebox to hard drive, you guys probably are all real familiar with that. That populates
the primary RAID with any of the images that have been queued up or requested from
your long-term storage or your secondary storage device. Jukebox to hard drive queues
are initiated by VistARad, the clinical display, DICOM export, so if you use DICOM
export to send to a CD burner, those when you ask for an old study from the jukebox,




                                                                                              2
                              Transcript for 2008 VEHU 320


those are a jukebox to hard drive queue. And the abstract queue process can also make a
jukebox to hard disk.


PREFET, PREFET is lower in priority than the jukebox to hard disk, and it's requested by
the clinical display workstation. This is different than the VistARad PREFET. VistARad
PREFET actually initiates a jukebox to hard disk, the clinical display uses PREFET, and
actually they have to have a security key set up for that. It's a mag PREFET key. If you
don't have a holder of that key you can't do this with from your clinical display, and what
it does it allows the clinicians at the clinical display, if they've got the PREFET turned
on, it will load up comparisons, old images in the background for them, and they don't
have to wait for the queue each one individually.


The abstract queue creates the abstract derivative thumbnail file and this is - requests are
generated from the import queue and from clinical capture if the workstation has been
configured to do so.


The import queue takes the external applications and primarily we see it used with means
test going to the HEC go through the import queue.


And then jukebox queue, which is copies images that have been acquired from your
modalities or say your consents, iMedConsents. That copies it on to the jukebox, and
that's the jukebox queue. The currently released software does not requeue any failed
jukebox queues and that is going to be resolved in Patch 39. It tries three times to copy to
the jukebox, once that happens there is no longer a log of those jukebox writes, so if your
jukebox is down you'd want to have that queue turned off or unchecked, and we'll talk
about that further in the background. But it will try three times and then it no longer tries,
and there's no trace of that again. So that's another reason why you'd want to run your
verifier again over an IEN range that would include those images that could possibly
have not been written to the jukebox.




                                                                                               3
                               Transcript for 2008 VEHU 320


The delete queue. The delete queue you probably all are very familiar with, and the
delete queue is initiated by whoever in your hospital or your medical center has the key,
the security key of mag delete. That will let you delete individual images within the
clinical display. If you need the ability to delete large image sets like CT or MRI, you'd
want to the mag system key. So those are two keys you need to be able to delete and you
delete from the clinical display workstation. And one of the misnomers is a lot of people
think that when an image is deleted, it's deleted for good. And that isn't the case. The
image file is removed from the 2005 file to the 2005.1 file and it still resides. It's deleted
from the RAID but it still resides on the jukebox. It is still there.


The GCC queue is a generic carbon copy. This is the means test to the HEC, and Patch
39 is going to also include a patient photo ID, and that's going to be done in Indian
Health.


EVAL queue, the EVAL queue is not really a queue in the background processor. The
EVAL queue is really going to be a method for you to delete evaluation queues that have
been triggered by a routing processor. If you have a routing processor set up to evaluate
images to be routed you will be able to delete any old or outstanding evaluation queues
through the background processor in Patch 39. Currently you can't do that any other way
except go back to the routing gateway and delete through the DICOM menus.


Joan: Hi, I'm Joan Weil. I'll be discussing the background processor functionality and
show you the new enhancements coming with Patch 39. Besides handling the queues, the
background processor also provides a method for managing our background processor
workstations, the purge parameters, the network locations, and the site parameters. We
can also manage the queues, and many of these will have a new look with Patch 39.
Some new features coming with 39 are the auto-verifier and the mail message
management.


The version of background processor we currently use as Bonnie said is Patch 81. And
on this you can see that to go to the workstation manager you first stop the queue



                                                                                             4
                              Transcript for 2008 VEHU 320


processor, or you can run a separate instance or iteration of the background processor,
and you select edit and then the background processor workstation management and then
background processor or BP workstation queues from the BP as you see. And then you
select the background processor workstation that you want to work with and when you
have the one selected it will have little check marks by the different queues that are
running on it. And if you try to check a box and it doesn't allow you to, that means that
that particular queue might be currently assigned to one of the other processors, or one of
the other workstations. So to see all of them you could go one by one through them and
as you click on each one then the queues assigned would display.


With Patch 39, shows the workstations and the tasks in a tree structure, and this version
introduces the exclusive assignment of auto-purge and auto-verify to a designated
background processor workstation. The tasks can be dragged and dropped to a
workstation or to unassigned tasks. If you right-click on the workstation where you'll be
running the purge or the verify, then you can set the location of where you want the log
files to be written if you don't want them to go to the default location for log files. So
you now have control if you want to move your log files into a certain network location.
The tasks auto-purge and auto-verify must be assigned to a workstation that has an active
queue since the background processor must be running on that workstation in order to
find a job to start.


Okay, purge parameters. Be sure when the package is first released it sends out values
for the different purge criteria and it comes out with some values like this, 365 days
which is only one year. These default values are probably too short for your individual
hospitals, you most probably would like to keep your images on RAID for five years let's
say. So make sure that you fine-tune these values to the number of years and days that
you want to retain when a purge is run. And even though it's not shown here, you should
check the auto-purge box to allow the purge to run automatically when your RAID is
becoming full.




                                                                                             5
                              Transcript for 2008 VEHU 320


Now with Patch 39 this is the new look and on the left side is, this is the purge settings,
that same screen you saw is purge, and on the right hand side is a new section for verifier
because that is a new function to be able to set auto verify. But on the left here we'll be
going further in-depth about the auto-purge and purging in general. One other thing to
note is that on this screen the percent server reserve was in two locations on our screens
and it's been removed from this one and it now exists just on the site parameter screen.


This is our current network location manager screen and the network location manager
option is used to configure the RAID shares, the jukebox shares, the muse and the GCC
among others. And to get to it you would stop the background processing queue
processing and select network location manager and then it will display the different
network locations and you can select one of them. Once you select one then a window
opens up that gives you a graphical display of the amount of disk space used and other
parameters, and we're going to call this the properties window because in the new Patch
39 you will get to the properties window a different way. But this same window will
open.


This is a totally different look for Patch 39 and still allows you a way to create and
configure the different storage devices. The RAID shares show in a tree structure and
one of the new features of Patch 39 is something called RAID groups, and the RAID
groups are given names, right here they're called in this slide RG-SLC1 and SLC2, and
then the different RAID shares are listed under the RAID groups. And we'll go more in
detail about RAID groups a little bit later on. In order to get to the properties you right-
click on one of the RAID shares and the properties window shows up and you'll get to
that same screen. Now when Patch 39 first comes out, all of your RAID shares will be
grouped under one RAID group, so it will act just the same as it's acting right now.


Now as you can see we have, here's showing a method of moving, you can drag and drop
any RAID shares that are not assigned to a RAID group, they can be dragged over to a
RAID group. Also they can be dragged while you're working on deciding what groups




                                                                                               6
                                 Transcript for 2008 VEHU 320


they belong to they can also be dragged and dropped within or amongst the different
RAID groups.


Network location manager also is used for other types of network locations and since
they have different properties for the different locations then there are different columns.
You can see there are different tabs along the top and when you click on a tab you will
get their display, and most of them display something that looks like an Excel
spreadsheet, and here we're showing the different views for the jukebox, the router, the
EKG, and URL tabs.


The queue manager provides a method of purging and re-queuing the failed queues. It
also gives us a way to purge the active queues and a method of resetting the active queue
partition. So for example you'll want to retry the items that are in the failed jukebox
queue as these are clinical images that should be stored. But the JB to HD queues or
entries that have failed can normally be purged since they may have been requested quite
a long time ago and the doctor's no longer looking for them. And they can always be
requested again in the future.


To work with a queue you would stop your queue processing and start with the Edit and
then go down to queue manager, and then by queue status. And when you do that you
can pick the queue type that you want to manage or look at.


Then this window displays and you select the queue status, these are like failure statuses,
and you can select one or you can select the word "all" and then click the right arrow box
so it goes over into the right hand side and then click OK.


And at that point the detail shows up for the errors and the failed items can be
individually selected or deselected using your standard Windows keyboard operations,
clicking shift clicking or control clicking, and then the desired operation can be invoked
by the buttons at the top where you have save, purge, and retry.




                                                                                               7
                               Transcript for 2008 VEHU 320


Bonnie: One of the things that I found, that window right there I've learned to use quite a
bit actually. I'm not a big FileMan user and there's a way if you get invalid pointer
operations on your background processor typically for a long time I always went into
FileMan, found the IEN, and purged them that way. And you can really go into the GUI
and your queue manager and purge them through the GUI as opposed to always doing it
in FileMan.


Joan: Thank you. This screen you can go to by going to the edit and then queue
manager, and then you select purge failed queue by type. And the count of items that
have failed are displayed for each of the queue types. Generally you'll see zeroes in most
of these, but in our example here we have quite a few. This screen can be used two
different ways, the original purpose is to have an easy way to purge all of the failed
entries for one queue and take care not to purge the jukebox or the GCC queues and some
others without looking at them, looking at the detail of them first. Normally they should
be retried. But the second use of this window, which I find very helpful, is to actually
come here first and see which queue types actually have a number next to them, and that
way it's easier when you go back to the one I previously showed you, you'll know which
queue types to select so that you can work and retry or process the different queues. So
what Dennis is saying is if you find a very large number next to it, then you don't want to
go the other way, like the JB to HD here has 1783 failed items in it, and if you were to go
back the other way and go step by step through the windows and when you go to display
it's going to try to gather up all of those to put in that little window, and that could take
quite a while. When you already know that for the JBHDs you just want to purge them,
you can come here and just click that X and say OK and they're gone.


Okay, in our current version we have all those different windows, Patch 39 we've got one
window. In this one it's displayed in another tree structure and the queue types and each
one has a failed and an active folder under it, giving you a count, so you can look at a
quick glance and see if you have some failed items. Also in the active queues you'll see
the number that are waiting to be processed, just as if you are looking at the background
processor queue processing window. Then if you were to click on the little plus next to



                                                                                                8
                              Transcript for 2008 VEHU 320


one of them, if you click on a plus like in this case we clicked on a plus next to the
jukebox active, it opens up another window with all of the detail, and you can see all of
the images waiting to be processed. Now if you're in the detail view and you right-click
on an item, you have a context sensitive selection here and because we're in the active
queue our choices are it says purge queue, we're talking about the individual item
selected, you can select one or more, or you can set the queue partition. If this were a
failed queue, if this were looking at the failed items then you would be able to do a retry
or a purge. If you do it back up here at the folder level for the actives you get the choice
of purge the queue, which I would recommend against doing for the actives unless you
have a real reason to do that, and under the faileds of course you can do a requeue.


Ok, our imaging site parameters, this is the current look, Patch 81. In VistA it's called
file 2006.1 is the imaging site parameters file. This screen allows you to set and change
the site parameters without going into VistA, and the content on this screen will be
changing a bit with Patch 39. You're probably mostly familiar with this screen.


On the Patch 39 version a few fields have been moved off to another screen. The mail
group functionality has been expanded, it has its own tab. Error messaging interval is
defined with the new mail messages tab. The service account section, it now includes
VistA access and verify codes, which currently you manage by doing it on the gateways.
You also can get to it right here, the VistA access and verify. And this is for your default
user. The reason it's here now in addition to managing it on the gateways is because
some of the new functionality with Patch 39 is it's going to do an auto relogin for the
queue processor, the verify, and the purge functions. So if they lose connectivity to the
broker for some reason background processor is going to try to restart them. So that's a
big help for us because a lot of times those stop and you don't know that they stopped.
The Windows username is the same one, only it is now called Windows username and
Windows password, in the file it's called the net username. And the PACS interface
grouping here is now called DICOM interface field.




                                                                                               9
                               Transcript for 2008 VEHU 320


Mail messages and mail message management. This is a new feature with Patch 39. The
VistA Imaging coordinator can control the frequency and the recipient list of the
automated messages by message subject for each VistA implementation. You can have
different users receiving various messages, like from the critical error messages to maybe
a standard monthly report. The Patch 39 will roll out a list of the message names and you
can add individual VistA users, mail groups, and even all users holding a particular key
on VistA you can add to that message. And here it is showing like a message name and
the different subscribers so to speak to that message. And these can be dragged and
dropped from this side over underneath the message name. And if you right-click on the
message name then that's when you can set the frequency that you'd like the message to
be generated, like in the case of an error message if the error condition continues to exist
for a long period of time you don't want the user to be inundated with 100 messages, right
now it's hard coded to be six hours. Every six hours you would get a message. But you
can change that interval, you can do it by the message name.


And if you don't like going into VistA to create your mail groups, you can create your
mail groups here. This is another tab, and in the mail groups you can, here's different
mail groups, they all start with MAG so you're not going to be creating all your mail
groups for VistA, but the MAG mail groups you can create and on the right are the users
from VistA that can be dragged over if you want to add individual users to the mail
group. You also can drag other mail groups into the mail group, and if you want to add
outside address, I can add some TP type address, your address for a text pager or your
cell phone to get a text message. Then you right-click on the mail group and a box will
pop up and allow you to add that other e-mail address.


Well in Patch 39 as I said if your background processor stops due to a network or broker
failure, it's going to try to relogin, just like the gateway processes do. But if the error
persists it could remain down for a long time. So how many of you have wanted a way to
be notified that your background processor had stopped? A lot of you, right? I would
think all of us because it's always annoying to find out it's been down for a long time and
you had no idea. Well Patch 39 is going to include the background processor monitor



                                                                                              10
                               Transcript for 2008 VEHU 320


and this utility is going to be a FileMan option and it will be scheduled using task
manager in VistA. It's going to monitor the processing activities of the background
processor workstations and their assigned queues. If time has gone by, which is set by
you, the amount of time span you want to look, and if time has elapsed that has no
processing activity but there was a queue to process, then a Mailman message is going to
be generated to the appropriate mail group, the mail group that you set up in the previous
screen. This task should be scheduled to run every 10 to 15 minutes, you can schedule it
more often if you'd like, less often, but it is a really good thing to have, it will let you
know, send off a message.


And we recommend for this mail message adding your pager or text-receiving cell phone
so that you can be notified if there has been a stop in the processing so that you can
restart it. And in this example down on the pager the message is going to read something
like this, it's going to say BP workstation, and tell you which workstation it is, in this case
JB1, has failed to process the jukebox queue, or whatever queue it is it will tell you the
name, for 10 minutes, and of course that's going to be according to your parameter for the
amount of time. This BP workstation was supporting the VI implementation serving,
enter your name site here, VAMC. Now for Atlanta we're only going to see it say
Atlanta, but for other sites that are multisite it will tell you which of your sites is having
this problem so you know which background processor to look at.


Bonnie: Now we're going to talk a little bit about the verifier. The verifier probably
should be, if it's not already, one of your best friends. It's an activity that everyone needs
to get in the habit of running. Is everyone running their verifier on a routine basis? No?
Why? Is everyone else then? We'll leave Boston out of the picture. I hope that you are.
The verifier is really the best method you're going to have to check your image integrity,
make corrections to your pointer files, find out if something hasn't been written to the
jukebox, any of your failures. You're going to be able to catch them quickest if you've
been running the verifier. If something hasn't been copied you can at least in most cases
if you do it on a regular basis go back to your modality and have the images re-sent from




                                                                                               11
                               Transcript for 2008 VEHU 320


your modality. So it really is the best way to check the image integrity on a regular basis.
It's really something that needs to be done.


In Patch 39 I'm really excited about the enhancements of the verifier. One of the biggest,
greatest things is that you're going to be able to schedule a verifier. Now with the current
version you manually go in and start the verifier when it's convenient to you. You can
now in the new version go ahead and say I want it to start Saturday night at 10 p.m. and
run during the off hours. You're going to be able to schedule it, and we haven't had that
control but that is going to be a big, big enhancement. You're going to be able to control
the size of your log files depending on how you use your log files or what kind of
application you use to copy them into. You'll be able to control the size of your log files.
No longer will you have to worry about those empty log files. If you've run the verifier
on a frequent basis there's a lot of times there's a log file entry and you go to the task of
opening it up, copying it into an Excel spreadsheet, and there was nothing there. So no
longer will there be empty log files. And then just like with the background processor, if
you've got the scheduler set up, and so the verifier runs at 10 p.m. and there's been a
disruption in the network and so it stopped, it's going to try to auto re-logon for those
occasions as well. So if you scheduled it, it's supposed to be running, the network's gone
down, it will try and log back on for you.


The verifier really is a way for you to perform a number of maintenance operations on
your system and your image integrity. You routinely would want to run the verifier every
one to two weeks. I know from ITC I learned there's people that do it every day. So
depending on your site and the size of your database, it is something you'll have to make
that determination, but it's also not a very complicated process, so to get in the habit of
doing it frequently there's nothing wrong with that at all. It's going to verify all the image
entries in the image file, the 2005 file. Like I mentioned before, if there's something
missing in the 2005 file after you run the verifier, you can go back to the modality and
have it re-sent. You're going to want to periodically run it several times a year to include
all of your IENs and depending on your database that could be a big period, or a lot of
IENs. But it is something you want to do over the entire range of IENs because as things



                                                                                                12
                                 Transcript for 2008 VEHU 320


are pulled back over from the jukebox the pointer files could be updated and you'd want
the verifier to go over those.


If you run FUT 49 and you are migrating files back to your RAID, you're going to want
to run your verifier over those so that the pointers are corrected and make sure that they're
all set fine. And if you've had any network share or jukebox outages you'll want to run
the verifier over IEN range during that period of time because just like I told you before,
a jukebox queue, if it fails three times it's not going to requeued. If you run the verifier
back over that IEN range you have a chance of that queue restarting.


Basic verifier setup happens right on the logon screen. It's in the very center of the
window. You can enter in a range of IENs or the entire, all of your IENs. You would
just put a start and stop value, which is the IEN number itself, and you can run the IEN
range either forward or backward. The direction, it does not matter, so you could start it
with a low number and run to your current number, or your current number back to the
last time you ran the verifier. Either direction it will work. One of the other little options
is to check the image text. That does not have to be done every time you run the verifier,
but it does need to be done a couple of times a year. If you check that little radio button
there, that's going to add time to your verifier run because it goes and checks data in the
text file and compares it to your image file. So it's going to add some time, but it's
probably something that does need to be done a couple of times a year. It doesn't have to
be done every time. Another thing about the verifier, currently it doesn't have a stop
button like the background processor does, and sometimes that's kind of a complicated
event and if you needed to stop the verifier because there was an image file you'd like to
run the verifier on but you've already got a job running, Joan has experimented here and
much to many people's surprise you can go in to your stop value and change that value to
where it's at currently and it will stop. And it will stop more appropriately than just
killing the function. That is a text file that you can go in and change that number, it will
hit that number and it will stop.




                                                                                               13
                              Transcript for 2008 VEHU 320


Once you go ahead and start the verifier you'll get this shares offline screen. These
should be shares that are offline and that you're aware of and are appropriate. If you had
a share that was offline and you didn't expect it to be offline, you wouldn't want to run
your verifier until you found the cause because it will not run over that share that's offline
but it really is supposed to be, and you could have some pointers cleared. So you're
going to want to make sure that those are offline are shares you expect to have offline for
the correct reason.


This is the basic verifier processor screen once you launch it and the box to the right is
your shares, all your cache that you have on the RAID. The lower panel is the activity
panel. If you're running over a range of IENs that really doesn't have anything that needs
to be verified and it's good to go, you won't see any of this activity. When it hits a file
that has a process that needs to be corrected or changed, or it's noted as being inaccurate,
then you will see. So there's times when you won't see activities in this window, and that
would be normal if you see your progress numbers changing. The scan controls, there's
your progress numbers so you know what number of IEN that it has covered. It also
gives you a summary, so that will tell you the total number, when you started it, how
many errors it's found so far, so it's a good little summary. And then it also identifies the
jukebox shares that it is running over and checking.


When you do run the verifier there are things that need to happen as far as maintenance
procedures to check your files to find out, and the number one thing is to check the
NoArchive log. NoArchive log means that there is no image file that was written over to
your jukebox. And some of them have been marked as deleted, you'll look in your
NoArchive and if you see nothing in the 2005.1 file that means it's not been deleted. If
you see an entry next to an IEN that's in the 2000.1 file those are entries that have been
deleted.


Also at the end of the verifier run you'll get a summary, and the summary will show you
changes to patient ID. Now the verifier does not make those changes, but the verifier will




                                                                                              14
                               Transcript for 2008 VEHU 320


identify any of those patient IDs that need to have corrections made to them. That's in
that image integrity check that you get at the end of the run.


Log file searches. After all of your verifier runs, and as well as the background processor
itself, you can go in and check the log files. There's a file created for everyone of those
entries errors, so you can see what IENs are included. You'd want to do that if you find
that there's a particular entry that's been missing, hasn't been copied to the jukebox, if you
have chronic queue failures you can start monitoring your log files and compile them so
that maybe you can track down a problem you might be having. Network errors also you
might be able to track issues with the network that way too by having a log file search.


What you want to do with your log files, you can check your backup tapes for any
missing files that you find through the verifier you can go back to your backup tapes and
try to find them and recover them that way. You can also take your log files and compile
them into a single spreadsheet and then you can correlate it with other things that you've
been seeing. And there's a log file for your background processor and you can go
through and see all the activities of your background processor that happened, a log file
for your verifier too as well. There's a lot of information in a lot of different log files.


You can go right into Explorer and search. If you're looking for a particular IEN or a
particular error, an invalid pointer, you want to go ahead and search all of the log files for
a particular IEN. You can enter it in with specific information and just search all of the
log files and track down your error or your IEN that maybe hasn't been copied that way.


When you open a log file you also have the option to open it up in an Excel spreadsheet, I
don't know if any of you have been doing that but you can do it that way as well. You
don't have to open it up in the HTML, you can go ahead and open it up into an Excel
spreadsheet, which makes it easier to manage a lot of data.


So there's an example of a log file that's been opened up in the Excel.




                                                                                               15
                              Transcript for 2008 VEHU 320


You can browse and search and you can send things out in mail on the HTMLs and the
Excel as well. You can append and you can use the external data function and import
them into the Excel spreadsheet so that you've compiled a whole month's worth or a
whole six month's worth into one spreadsheet.


And this is just another way of showing you how to import embedded into the Excel
spreadsheet.


Joan: Okay, okay, everyone here, you now represent our total RAID in RAID shares.
This side of the room sit down. This is RAID that's already been used, here's what we
have left. This is our available RAID. The back half of the room sit down. Now we've
got a quarter of the RAID still available to write to. It's about 25 percent. Let's see,
everyone except for the front table sit down. So now we've got about two people, there
were approximately 30 people in the room. Let's say our percent server reserve is 2
percent and they represent maybe a little more than 2 percent and you have to say okay,
out of all of this RAID is that enough RAID to hold all of our upcoming images for the
next few weeks if some catastrophe were to happen? Thanks, you can sit down. Just sort
of a graphical view of thinking about what your percentage is, we're talking about percent
server reserve. And this is the amount of space that in a percentage number that should
be kept free for storing your new files and you want to have a significant amount of
primary storage in the event that your jukebox share fails and you can't move anything
over there or certain RAID shares fail. When the package is exported sometimes it might
say 2 percent, that might be enough for some sites depending on the size of your total
RAID. Eight percent might be better. You need to be prepared in case of some type of
catastrophic event, or even just even a minor problem where you're going to need to have
some new RAID to write to.


Once the percent server reserve watermark has been breached the auto-write location no
longer functions. Right now your auto-write goes through all of your RAID shares one
by one on a certain period of time, which is about 20 minutes. After writing for 20
minutes in one RAID share it looks to see which RAID has the most space and switches



                                                                                           16
                               Transcript for 2008 VEHU 320


over to that one and starts writing to it. If it breaches the percent server reserve it turns
off the auto-write and then it only writes to a single RAID share, and at the same time it
sends out mail message to the groups and alert people that there's some intervention
necessary, you'll need to run a purge to free up some space. By shutting off the auto-
write it gives you space to be able to continue to receive new, well they'll all be going to
the one when it's turned off, but it will allow you space for running the purge and also
when it does fill up you will definitely receive notice because images will no longer be
able to be stored until you manually move that to another write location, but it will
require intervention. It's kind of a wake up and pay attention type of feature.


When you should operate the purge. Well a site should have several weeks of free space
available at any given time. It's recommended that VistA Imaging, the primary storage
free space be maintained between 15 and 25 percent of your total primary disk space.
The exact number depends on the capacity of your storage relative to the rate of your
image acquisition. On the other hand, a site might want to keep five years of clinical
images and several more years of the abstracts online to reduce movement of images
back and forth from the jukebox. And achieving this balance might require some
monitoring of your storage capacity and fine-tune these parameters.


Generally the purge should be started when the primary storage reaches 92 percent
capacity. The result of the purge would be 15 to 25 percent disk space free depending on
your site's dynamics. If a site rarely wants to run a purge then it could purge until it has a
large amount of space. But if a site wants to run a purge on a weekly basis then they
could purge to just free up three weeks of storage space. VistA Imaging storage
maintains a large number of big files then your keep days for those files could be reduced
to 30 or 45 percent in order to maintain your optimal disk utilization. It's advised that
your percent server reserve value be set to 8 percent so that if the cache reserves have
reached that threshold there's enough response time to run the purge and keep your
background processor active. So of course when that location fills, no files are going to
be written until some manual action takes place.




                                                                                                17
                                Transcript for 2008 VEHU 320


When should you not run the purge? It's imperative not to run the purge function when
there's any connectivity impairment between the background processor and the jukebox
share. The purge when it's running will first check that a file exists on the jukebox before
it deletes it. If it doesn't find it there it's going to queue it and not purge it. Thus if it
actually is there but there's a network problem or the jukebox is down, then many entries
are going to be added into that jukebox queue unnecessarily and you're still not going to
have anything purged.


For auto-purge, this setting should be turned on to allow both the scheduled purge as well
as the auto-purge to operate. If it's enabled, then a purge is going to be launched when
your percent server reserve value has been breached. And a site can schedule this purge
on the pages we showed you to be launched on a regular basis and you can give it a time
and frequency, and that screen has changed a little bit to give you a few more options,
and we'll show you that again in a moment.


Well, RAID groups. This is something new with Patch 39, and this is hopefully going to
help a lot of us with backing up a large amount of RAID. It does it by allowing us to
break up the RAID into small chunks called RAID groups. It's important to do a full
backup every month, or at least once a quarter, but with the large size of the RAID it's
almost impossible for some of us, especially if we have very slow tape drives. So by
signing only one or two image shares to the RAID group, then this can be backed up in
less time. Only the active RAID group is the one that gets written to and the incremental
will run faster because it's running only on the few RAID shares that are in that RAID
group. There can be as few as one image per RAID group or as many as all the image
shares in that RAID group. But it's recommended to have at least two RAID shares in the
group. And that's because if one of the image shares goes down, then you know that
during that period of time that that RAID group was active then probably at least half the
images because of the load balancing, probably half the images are in the other image
share for that same RAID group. And in this picture we're showing these are the
different RAID groups and RAID groups progressing, in this case three RAID groups,
and it progresses from group A to B to C and then back to A again.



                                                                                                 18
                               Transcript for 2008 VEHU 320



One RAID group is going to be designated as the current RAID group and all the images
are going to be written to it over a period of time. The internal shares within the RAID
group will be written in the same manner as your current configuration. The share with
the most free space will be designated as the current write share. The check for the free
space will be done at designated intervals and then the writing of the images to the shares
– the share within the group that has the most space will be moved. So within the one
group it will still go between all of the image shares just as it's doing right now, only
there will be fewer of them to go around to. And the actions that are generated or
required when the high watermarks are reached are still going to apply at a user-
designated interval or when the used space of the shares in a RAID group exceed that
high watermark then the software is going to change the current write group to the next
RAID group in sequence, and this is going to allow you as a coordinator to coordinate
your backup strategy so you can run a full backup let's say on the weekend for the RAID
group that you finished using, the one that you're not writing to right now.


Okay, here's a little graphical picture of the RAID groups, and I'm going to read this first
and then try to explain it to you. This is Ron Paulin’s idea of the way this is going to
work. When the percent server reserve within the RAID group that you're currently
writing to multiplied by a purge factor, which I'm going to explain, has been breached
then an auto-purge is triggered for the next RAID group, and the purge factor is a number
that can be adjusted by the site to offset the length of time that a purge is going to take.
And the default for the purge factor is 2. So let's say that our current RAID group is C
and RAID group A over here is nearly full. Now if your percent server reserve is 8 and
your purge factor is 2 then when RAID Group C has only 16 percent of free space, that's
the 8 times 2, then an auto-purge is going to be started on RAID Group A, the next one.
Now you're still writing to RAID Group C, but you've given yourself some time because
of that purge factor to start running your purge on the next RAID group to free up space
so that you will be able to write to it next. It needs to complete, the purge needs to
complete before C actually does reach its high watermark, that percent server reserve of 8
percent. So you've given yourself another 8 percent of space to give yourself time for



                                                                                               19
                              Transcript for 2008 VEHU 320


this purge to finish, and then this Group A will become the next RAID group or the
current write group. So for scheduling, I mean this can work both ways, it can work as
an auto type purge where it's kind of a failover, or you can schedule it, schedule the
RAID group to advance from C to A on a Friday night and then schedule your full
backups to run Saturday on the RAID Group C, and if needed you can schedule the purge
to run after the full backup is complete if you want to be ahead of time and run your
purge then. And sites that only have one tape drive should run that full backup on the
weekends so that the incrementals aren't impacted during the week. But if you have more
than one tape drive you could schedule the full backups during the week.


Here's the screen again where you do your purge settings. I'm going to discuss, there's a
couple of new fields right in here I'm going to talk about on the next slide, and right here
is where the purge factor is set on auto-purge. Also in the past when you were doing
auto-purge or a scheduled purge it would be based on when the previous one ran, and the
last purge date, and then there was a purge frequency in days would be added to that and
that would tell you when your next purge was going to run. But now it's more like an
information field, or will be that your last purge date will show and you can schedule
your purge to be on a particular date without having to fudge the date that the last one
ran. The last time run really will be the last time run and you can say when you would
like the next one to start and what time.


So with Patch 39 the auto-purge, when you have more than one RAID group will actually
apply to the next RAID group. The express purge is something new, and that is
something to help the purge run a little faster. If your express purge is enabled with a
little checkmark, the purge is going to advance to the next RAID share when 10,000
image entries have been evaluated as a purge candidate and no files have been purged.
So when you think about purging, the first ones it runs into are probably candidates and
after a while it's going to run into a whole lot of images that are more recent, and those
are not purge candidates. So after it's gone by 10,000 of them it says okay, let's go to the
next image share, this particular image share is probably done. And that's called express
purge. You don't have to have it turned on if you wanted to go through the entire one



                                                                                             20
                              Transcript for 2008 VEHU 320


thoroughly, but if you turn it on it will run quicker. Then the purge factor I talked about.
Purge by creation date, last date modified, and last access date. Now currently with our
current patch all we have is purging by the last access date, but we're going to have two
other options and it's by creation date or date modified. The creation date, actually it's a
little backwards from what it normally would seem. You might want to actually purge by
the date that the file was really created. If you've done a type of a restore where you've
moved your files back over into RAID, then the date gets changed and what gets changed
is the creation date because this is Windows and Windows has decided to change the
creation date to the date they got moved back over. But that's not the real date that the
images were captured from the patient. That one you'll find under the modified date.
That's Windows. So because of that we've got both options of how you'd like to purge.
If you really want to purge by the date that the images were first gathered, then you can
select purge by date modified.


Now I'm going to try to quickly go through the backup types. These are the backups that
we're running, tapes and the platters. And I'll go through each of these kind of quickly.


These are also other than backing up image files we also need to backup these other types
of files. We've got the DICOM gateway dictionary files, your logs from the queue
processor and the verify and the purge, we've got our backup exec catalog files, we've got
platter reports and logs, and compaction logs, and all of these should be backed up at
various times.


The full backup as we said should be run monthly or quarterly, and depending on what
type of a tape system you have will probably dictate how often you run the full. Also
when you start working with RAID groups you might be able to do it more often. This
shows in order to backup two terabytes the different amounts of time with SDLT taking
about 13 days and seven tapes, and LTO4 only 1 1/2 days and two tapes. And if you still
have the old DLT, this isn't on the slide but I kind of calculated it very approximately and
I figured it's going to take over 21 days and over 31 tapes.




                                                                                             21
                              Transcript for 2008 VEHU 320


Bonnie: That's really the primary reason the RAID group, the thinking about RAID
groups is this problem of backups.


Joan: Right, but even with RAID groups, since we're only talking two terabytes, we're
not talking the full RAID, we're talking two terabytes. To take 31 tapes and 21 days, if
you can only put in 10 tapes at a time that's going to require user intervention to switch
out those tapes, put in the new tapes, so it's not going to be done in the 21 days because
it's going to require another day or so for somebody to get in and pull out the tapes and
put in the new ones. So it could take a month.


Bonnie: If you could possibly upgrade your system you really should to the LTOs. It
was taking me to backup 13 terabytes it was taking me over two weeks with the old
system. I went to the LTO4s, and it was taking me 20 tapes, I went to the LTO4s and it
takes three days and about three tapes. It made a big difference.


Joan: Yes, very important.


Your incrementals are done on a daily basis and backup all the image shares that have
changed since the last full or the last incremental that ran, and it also should include the
other extra files that I mentioned that have changed. This is on a daily basis. Your
dictionary files, the BP log files, the platter reports, compaction logs, and the backup exec
catalog files.


Your system backup runs once a month and you should backup all of those extra files, the
dictionaries, the logs, the platters, compaction logs, and backup exec catalog files. You
do not need to be doing full backups of your C or D drives, of your gateways and that
type of thing. Those can be restored through rebuilding your system or something, but
you don't need to be backing up all those drives. This is what really needs to be backed
up on a monthly basis. And also, I mean we're backing it up daily, but we're also backing
it up monthly because that way it's much easier to go back and get your changes to these
files. As long as you have them monthly, at least for dictionary files they don't change



                                                                                             22
                              Transcript for 2008 VEHU 320


that often, and the log files it's just very good to have them monthly and go back to the
last monthly tape and get them.


Platter copies. Is everybody doing platter copies? Anybody not doing platter copies?
Very good. Because the documentation that first came out a few years ago said it was
optional, but since then I guess you've heard on all of our calls that you must be doing
platter copies, and this is for all the optical media in the jukebox. It's done on a
continuous basis, as your platters fill then they should be copied, and of course the copy
should be stored off site. If you need help or instructions for it, it's found in Appendix C
in your VistA Imaging System Installation Guide.


Compactions are needed when you have your jukeboxes filling up platter-wise, and if you
have 5 and/or 9 gig platters still in it then they can be compacted into a larger 30 gig
drive and free up a lot of shelf space so that you can be able to put more platters in your
jukebox. This can free up quite a lot of space. But if you're planning to move to an
archive appliance anytime soon and you're not going to need that shelf space, then don't
do your compactions, just leave them as they are because once you get the archive
appliance then they're going to have to be moved over there and you're just wasting
resources to have to move from the 5s to the 30s and then you're going to be moving to
60s.


Offline platters. When your jukebox is actually physically full and space is needed and
you can't do compactions anymore and you have to take some of those platters out, then
you need to go and set the image as being offline in the jukebox. And the instructions for
this can be found in Chapter 9, called Jukebox Archive section of the Imaging System
Technical Manual. And this utility is going to mark those images as being archived and
the verifier is going to skip these while processing.


Here's something we did in Atlanta after we heard about New Orleans' problems, and that
is when all the platters – I mean this is the RAID – when all of the disks were popped out
of the RAID and brought off site if they weren't labeled there was no way to know how to



                                                                                            23
                               Transcript for 2008 VEHU 320


reassemble them to turn them back into the RAID. And I'd heard this tip from some
other people who've talked to some other people, and I'm passing it onto all of you, is to
label with a pen, whatever you have, on the little label portion of the disks in your RAID
and mark them. Here we did it with what's called the port target in LUN, and if you don't
know that, just number them and keep it written somewhere how you've numbered them.
If it's 1, 2, 3, 4, 5 at least if you know how you numbered them and if somebody had to
pop them out quickly, you could tell them how to put them back in some alternate RAID
storage and they could put the platters back in the right order and you'd have your RAID.
So just whatever it is, be consistent and keep a record of how you've labeled them. And
another tip is on a daily basis you should be monitoring your RAID for error conditions,
and if you have an HSG80 like we do you can keep a terminal window open all the time
on both IMM1 and IMM2, which shows you the top and bottom of your HSG80, and
when you keep these open then you can see if there's an error condition you can check it
every day and you can see if there was an error, the errors might only occur every couple
hours. You're not going to see those if you only just open your terminal window and then
close it and go away. You won't see if an error occurred somewhere in the meantime.
And if you see an error, that gives you time, because it's RAID you have time to actually
act on it and save the jukebox before you get two errors.


And this is just a picture of some storage boxes we have, but to remind you that you need
to store your platter copies off site, as well as your tapes. And if you've done
compactions then keep the smaller platters off-site as well because they are your backups
to the larger platters.


Bonnie: You guys on data overload now? Actually the background processor is
something that maybe next year it would be really good with the new patch, you guys
might want to suggest it be a hands-on. There's enough changes in Patch 39, and if it's
released that it be a hands-on class. It's very interesting and it's a lot of changes, and I
think everyone would benefit, so you might want to think about that for next year. Next
topic is going to be our jukebox copy. Currently there are two vendors approved for
VistA Imaging long-term storage, and that would be DISC and Plasmon. Probably a lot



                                                                                               24
                               Transcript for 2008 VEHU 320


of you still have the DISC systems. Does anyone have a Plasmon? And how many of
you have the archive appliance with it? Okay, you're test sites with me. So you're all
very familiar with the DISC software, it's gone through many versions of ownership
being Legato and DEX and EMC, and we've struggled with that.


And here's a picture of the Plasmon G638. It operates the same environment as the DISC
jukebox. It can operate in a clustered environment or on a single jukebox server. Same
difference and then the same software.


The 638, it does give you the larger capacity media over the DISC, so there is that
advantage. It has a lot of similarities in that it has dual picker arms, it has a bar code
reader, all of the media used in the Plasmon device have a bar code and you don't label
the media, so that's a difference there. There's media slots so you can manually insert the
media into the front of it and you can have it ejected the same. Very similar as with the
DISC. With the Plasmon 638 if you were to add an A12 RAID from Plasmon, that's what
makes it an archive appliance. So the archive appliance is the 638 with a RAID added to
it.


And that ends up being just another computer that sits on top of the Plasmon device. So
it's a NAS server, and the best part about it is it's actually a network location. It is not a
functionality of another server, it is a network location in and of itself. It also has its own
software and it's a GUI interface software, so I've found it to be much easier to check the
storage, to check the media, and to manipulate systems then with the Legato software and
the DEX. And of course the UDO media, they project its life expectancy, of course I'm
not sure anyone really knows that at this point, but of course it's got a bigger, longer life
expectancy than the current media. It's scalable. In VISN 15 for example we have
Plasmon 638s at other sites, at Columbia and Leavenworth, Wichita. They had installed
an archive appliance with a 638 in my site. The other sites are migrating their images off
of their Plasmon, they're going to ship the Plasmon hardware then to Kansas City and
we're going to daisy chain off of my device. So we're going to use the same archive
appliance but daisy chain the hardware, so it is expandable in that way.



                                                                                             25
                               Transcript for 2008 VEHU 320



This is just an example of the GUI interface with the archive appliance. When I log in I
can log in from my desktop or anywhere because it's a network location I can go ahead
and have access to the system and see how it's operating. So in VISN 15 I have the
central storage for all of the sites. If I'm gone and my backup can't get to it, one of my
other sites can log into the network location and check the status of the archive appliance.
They just put in the IP address or the network name and they can get access to it and
check it out. So this is the basic, the first login screen and it gives you a system status of
the archive appliance. It can easily tells me the last time it performed any of the
functions, tells me the environment, the temperature of the device, tells me if all my
drives are online, or if they're not. If at any time I wanted to check any of those out I just
take my mouse and click on it and it will tell me what drive is not operating, what piece
of media maybe has been failed. It's very easy to check, and it's a very different type of
software than what we were used to using.


To go a little further, just like with the background processor I have the capability of
sending out e-mail notifications with this system. I have myself and HP and Plasmon,
they all receive e-mail messages and so there's a lot of times Plasmon will get the e-mail
message and be calling me to check on what's going on. So it's good that other people get
involved. Like I also have a backup at another site get my messages and they'd know to
login to the network location and check it out. You can determine what level of
messaging goes to each one of those people as well.


In VISN 15 we're able to segment storage volumes for each one of the sites, so this is just
an example of the volumes that we've established. It doesn't randomly write all files to
any random piece of media, there is media designated for each site within the archive
appliance in the Plasmon, so Columbia has media that only has Columbia files written to
it, Kansas City has media that only Kansas City files are written to. And I can hover my
mouse over any of the storage volumes and check on the size, what kind of free space I
have available, and that's all I have to do.




                                                                                             26
                               Transcript for 2008 VEHU 320


Again here I'm checking specifically the Kansas City volume. It tells me how much free
space, how much used space, and then I can go one further. In the configuration of the
archive appliance you can automatically configure it to create a secondary copy, so I
dreaded the fact that I was going to be the central storage for VISN 15 because I just
thought oh no, all I'm going to do is manage the jukebox and move media. With this
device we've set it up so it creates a secondary copy for me and so I really handle media
less than ever with this device. I can go in then and check each volume and see how
many pieces of media of the secondary storage I need to take offline, so it tells me how
many I'm going to need to take out, that Columbia's got three secondary copies ready to
go offline. We're leaving all the primaries in, but we take the secondaries out and then I
ship them off site.


This is just, when I'm at my desktop I can go into Explorer and through Explorer put the
network location in and get to actually the files, the actual files that have been copied
onto the media and check them out. I can also make changes to the Social Security
number and those kind of things in the text files from here, right from my desktop
without having to go to the other server, I can do that anywhere. And I can also go and
see the last time something was written to a volume at any of the locations, which is
another just kind of a handy thing to be able to do.


Currently the sites that have the archive appliance installed are VISN 15, VISN 1, Little
Rock, Tampa, Durham, Cleveland I guess is planned for the future as well as Orlando.
So I hope you guys were on the list that have it. Indian Health is also planning to install
the archive appliance and those are the locations for it, Fort Defiance, Nashville,
Cherokee and Phoenix that they plan on installing for their systems as well. So it's
getting out there. I found it to be – I love it, so if you guys get the chance to look at it and
investigate, see if it meets your needs, you should do so.


Last but not least we wanted to just spend a few minutes just to talk about contingency
plans, and contingency plans we can't tell you what it needs to be specifically but it is
something the sites need to take responsibility for and have one. You have to be ready



                                                                                             27
                               Transcript for 2008 VEHU 320


for any kind of disaster. I think the past has really told us that's the case, whether it be
something like a natural disaster, Katrina, or a tornado, in the Midwest we battle
tornadoes constantly, it could happen to us at any time. Nashville was a simple water
pipe break. So any kind of disaster could happen to anyone at any time. You need to
think about what are you doing with your backup tapes? Are you doing backups and are
you keeping them where? Are they somewhere safe? Are they somewhere that's dry,
that they're not going to be affected by whatever the disaster may be? Are you doing
your secondary jukebox copies, where are they being stored? If you had a disaster what
would your plan be to restore your image files, to get your system back online? You
need to just think what would we be able to do, do we know we're going to pick up the
phone, call HP, they'll be able to get us a cabinet sent to X location and start rebuilding
the system? A plan is better than no plan, so a lot of the things you do make sure you
have them written down in a location that others would have access to so they know what
you're doing. If a disaster happened when you were gone on vacation would someone
know where your tapes are going? Would someone know what they need to do? So
we're just suggesting that you have a plan. It's very important.


Okay, that's basically those things I talked about.




                                                                                               28

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:2
posted:9/20/2011
language:English
pages:28