Captioning webinarpart1
Document Sample


1 00:00:06,900 --> 00:00:11,666 >> Soji: Hi. Welcome to Captioning in the University Environment Seminar. First, I would like to 2 00:00:11,666 --> 00:00:14,632 introduce our Accessing Higher Ground conference 3 00:00:14,633 --> 00:00:19,599 >> Soji: coordinator. Mr. Howard Kramer for University of Colorado at Boulder. He will be our 4 00:00:19,600 --> 00:00:25,933 moderator of today's webinar. Howard, will you press on the become presenter button please and start our presentation? 5 00:00:25,933 --> 00:00:29,199 >> Soji: Thank you very much. 6 00:00:29,200 --> 00:00:37,133 >> HOWARD KRAMER: Thank you, Soji. And welcome to everyone who is connected in. My name is Howard Kramer. 7 00:00:37,133 --> 00:00:43,399 >> HOWARD KRAMER: Again I am at the University of Colorado and this is our -- welcome to webinar "Captioning in the 8 00:00:43,400 --> 00:00:52,566 University Environment." This is the first webinar we have conducted. So we are excited about it and also a little nervous. So we will see how it 9 00:00:52,566 --> 00:00:59,999 goes. Some of you may be familiar with the university and with me from the Accessing Higher Ground Conference. 10 00:01:00,000 --> 00:01:07,800 And that occurs every November. And we hope this to be the first in a series of teleconferences or 11 00:01:07,800 --> 00:01:14,766 >> HOWARD KRAMER: webinars that we put on in between conferences. So, you know, when we put this on this seems like 12 00:01:14,766 --> 00:01:17,266 it must be a very hot topic right now. 13 00:01:17,266 --> 00:01:24,099 >> HOWARD KRAMER: I was hoping for ten participants I thought would be good. We have gotten 35 different institutions 14 00:01:24,100 --> 00:01:30,466 or at least 35 different sites are logged in to this seminar today. 15 00:01:30,466 --> 00:01:37,432 >> HOWARD KRAMER: So my goal today is actually to talk as little as possible and just provide you with the information 16 00:01:37,433 --> 00:01:41,566 you need and to let the presenters share their expertise with you. 17 00:01:41,566 --> 00:01:51,466 >> HOWARD KRAMER: I just wanted to go over a few logistics and a few housekeeping issues before we go on. The first is 18 00:01:51,466 --> 00:02:07,066 the -- just some basic help if problems occur. And let me bring up my next power point. So if you have any problem with the V-cube 19 00:02:07,100 --> 00:02:14,833 connection which is the -- which is the service provider providing the webinar interface, 20 00:02:14,833 --> 00:02:30,299 you can call their number that's up on the screen, 310-329-5959, extension 8008. If you have captioning issues you can use chat to indicate 21 00:02:30,300 --> 00:02:41,133 this. You can -- or you can e-mail Jill Perry, my colleague, at jill.perry@colorado.edu. And that email is up on the screen. If you've got questions 22 00:02:41,133 --> 00:02:45,366 you can put them in to the chat or you can e-mail 23 00:02:45,366 --> 00:02:56,999 me those questions. Obviously, the chat questions will probably be seen by the presenter immediately. A little note that the chat is not 24 00:02:57,000 --> 00:03:06,066 public. So only the presenters see what's typed in. And that's just to allow some privacy 25 00:03:06,066 --> 00:03:14,066 between questions and the presenters. There is also -- I wanted to direct people to the registrant resource page. This powerpoint and a lot of other 26 00:03:14,066 --> 00:03:16,166 resources are located on that page. 27 00:03:16,166 --> 00:03:27,232 And I e-mailed this URL last night and I just e-mailed it again to all the people who are listed as the contacts for this conference. So you can 28 00:03:27,233 --> 00:03:39,299 access that page and access these numbers and e-mails after the presentation starts. 29 00:03:39,333 --> 00:03:51,599 So I wanted to go over the agenda for today. As I mentioned in my e-mail last night it has changed slightly. Just a minor adjustment. So starting at 30 00:03:51,600 --> 00:03:57,966 10:30 John Foliot from Stanford University will do -- will talk about what the 31 00:03:57,966 --> 00:04:05,799 projects they are working on at Stanford .. basically talking about their model for producing a sort of a turnkey environment for captioned 32 00:04:05,833 --> 00:04:16,566 media. That will be followed by 15 minutes -- let me backtrack -- John's talk will be about an hour and will be followed by 15 minutes of question 33 00:04:16,566 --> 00:04:23,999 and answer. We'll then we will take a break at 11:45 for 20 minutes and then resume at 12:05. 34 00:04:24,000 --> 00:04:34,533 At that point Dean Brusnighan will do a short update of what's been happening at Purdue University regarding captioning. Hopefully all 35 00:04:34,533 --> 00:04:40,499 of you have seen the prerecorded content. You'll get the most out of his talk obviously if you've seen that. 36 00:04:40,500 --> 00:04:50,533 You can watch it after this webinar. It will be up and posted along with everything that goes on today will be posted and available 37 00:04:50,533 --> 00:04:55,033 for at least a year for people who want to look at it afterwards. 38 00:04:55,033 --> 00:05:03,033 At 12:35, this is the new element of the agenda, I had actually been talking to Angella Anderson at the University of Illinois 39 00:05:03,033 --> 00:05:12,766 about our conference in November and about what she was planning on presenting and realized it fit really well in to today's topic. So I asked her 40 00:05:12,766 --> 00:05:21,732 to just spend 15 minutes talking about what's going on at the University of Illinois. We will use the last 30 minutes to talk to -- to provide a 41 00:05:21,733 --> 00:05:31,866 it will be an opportunity to ask questions to John, Dean or Angella. And we will just leave five minutes at the end for concluding remarks. 42 00:05:31,866 --> 00:05:52,066 And hopefully we will be able to fit everything in. Okay. I am going to just go back once. So for those of you who are using captioning I just 43 00:05:52,066 --> 00:05:55,332 wanted to give a tip on how to set that up on the 44 00:05:55,333 --> 00:06:08,133 screen. I found the best way is to since the captioning is being provided in a separate URL, so a separate browser window so to speak, I 45 00:06:08,133 --> 00:06:17,333 found the best way to set this up is to put it -- minimize it as indicated up on the powerpoint I have on the screen and putting it to the right or left 46 00:06:17,333 --> 00:06:22,233 of the V-cube presentation. If you try to put it on the bottom it kind of 47 00:06:22,233 --> 00:06:33,433 squeezes the V-cube interface too much and it is hard to read the powerpoints. If you -- again if you have any problems with any of this you can enter 48 00:06:33,433 --> 00:06:43,466 - type something in to chat or you can send me an e-mail. And again and if you are having specific captioning problems you can also send an e- mail 49 00:06:43,500 --> 00:06:59,933 to Jill Perry. So I think at -- I will make one other suggestion and that's I think you will find that the volumes vary. We did sound tests yesterday and 50 00:06:59,933 --> 00:07:06,633 everything seems to be working very well. But I did notice that volumes varied greatly. Some of our speakers are soft spoken and some are 51 00:07:06,633 --> 00:07:17,166 louder. So be prepared to adjust your volumes especially if you are playing these through speakers. And with that I would like to hand over 52 00:07:17,166 --> 00:07:25,832 the session to our first speaker today, John Foliot from Stanford University. John. 53 00:07:27,833 --> 00:07:33,199 >> JOHN FOLIOT: Well, thank you, Howard. And good morning, everyone. Yeah, I am John Foliot. I work at the 54 00:07:33,200 --> 00:07:40,300 Stanford University. I run the Stanford online accessibility program and the presentation I am going to do 55 00:07:40,300 --> 00:07:46,566 >> JOHN FOLIOT: today is looking at the work flow model for captioning that I worked on with my friend and 56 00:07:46,566 --> 00:07:59,532 associate Sean Keegan who works out of our office of accessible education. So whoops, sorry. So this presentation is also available on our 57 00:07:59,533 --> 00:08:08,833 >> JOHN FOLIOT: website at Stanford. It is captioning.stanford.edu/ presentations. It is on line now. 58 00:08:08,833 --> 00:08:14,399 >> JOHN FOLIOT: Most of it. With the blue/black background.There is a couple of slides that I have added to the 59 00:08:14,400 --> 00:08:20,733 presentation for today that are not online but you can also contact me directly if you need those slides as well. 60 00:08:20,733 --> 00:08:33,833 So, about three years ago we started to look at why captioning wasn't happening on campus. What the barriers for getting captioning produced 61 00:08:33,833 --> 00:08:38,466 were and I talked to a number of web masters 62 00:08:38,466 --> 00:08:48,799 and content producers on campus and we started asking why aren't you getting this done and very quickly sort of came to a number of sort of 63 00:08:48,800 --> 00:08:51,366 preconceptions and assumptions on their behalf. 64 00:08:51,366 --> 00:09:00,266 One of the things that I heard time and time again was that producing captions was very costly to produce. That .. there was an expense that was 65 00:09:00,266 --> 00:09:12,766 found to be onerous and that people just found it really expensive to have this done. Many people also told me it was really geeky. It required a fairly 66 00:09:12,766 --> 00:09:24,299 high level of technical expertise and knowledge to create captioning. That the tools were not very intuitive, they were hard to work with. And that 67 00:09:24,300 --> 00:09:32,066 there was a fairly steep learning curve to create captioned media. It was also thought that there were perceived -- that it took a lot of time to 68 00:09:32,066 --> 00:09:43,066 produce captioned videos. And for a lot of the people that were intrusted with getting content up on the web quickly, they just, you know, one of 69 00:09:43,066 --> 00:09:52,799 the things I heard time and again is they didn't have the time to sit down there and manually caption a video and a lot of people and for those 70 00:09:52,800 --> 00:10:02,166 of you that work in the accessibility space this is not an uncommon refrain. A lot of people didn't really see the value of providing captioning 71 00:10:02,166 --> 00:10:12,132 outside of accommodating deaf and hard-of-hearing users. That it was strictly an accessibility thing. And while I don't have any deaf students in 72 00:10:12,133 --> 00:10:20,466 my classes is the thing we always hear. So those were some of the basic responses that I got when I started talking to people on campus. 73 00:10:20,466 --> 00:10:30,599 So after thinking about it and sort of analyzing some of these responses Sean and I decided that what we really needed to do was to develop a 74 00:10:30,600 --> 00:10:40,733 workflow solution that would basically level out these bumps in the road and make creating accessible new media something that was 75 00:10:40,733 --> 00:10:49,733 relatively easy to do and didn't require a lot of technical skill as well as we realized that we had to do some work towards promoting the benefits 76 00:10:49,733 --> 00:10:57,866 of captioning and to explain why it was useful for users rather than -- above and beyond deaf and hard-of-hearing users. 77 00:10:57,866 --> 00:11:12,466 So for the first part what we did is we looked at a system that would be as turnkey or as flow through as possible. So the basic premise is that the 78 00:11:12,466 --> 00:11:22,932 content author uploads the video here. Soji, I don't know how to change the color of the pen but they upload the video or audio file in to our system and 79 00:11:22,933 --> 00:11:38,166 the first thing that the system does is it converts the video in to a series of media files whether it is an FLV or a MP4 using HT-64 encoding, MP3 for 80 00:11:38,166 --> 00:11:50,366 an audio file and WebM. But one of the things that it does is it also takes the MP3 and it creates an audio file only even though it is a video. To get 81 00:11:50,366 --> 00:12:01,266 captioning all we really need is the audio and the system outsources the audio to a transcription vender who actually does the conversion of the 82 00:12:01,266 --> 00:12:12,199 speech-to-text. Once that text or once the audio is converted in to text it is fed back in to the system where the system creates a series of 83 00:12:12,200 --> 00:12:23,633 caption files and again we are creating a number of different file formats, DFXP or TTML with WC3 standard. We are also creating a SRT which is 84 00:12:23,633 --> 00:12:35,033 kind of a loose coda or encoding standard, timestamp standard that really -- it is not a standard per se but it is a specification that is 85 00:12:35,033 --> 00:12:45,799 widely used and supported by a number of third party media players. We're also creating a binary file format called SEC or Senturis closed caption.. 86 00:12:45,800 --> 00:12:54,366 This is very similar to the line 21 caption files that are produced for television. And it is very common in the broadcast media field. 87 00:12:54,366 --> 00:13:03,232 We discovered early on and less so now because there has been some new tools that arrived on the market but when we started on this program this 88 00:13:03,233 --> 00:13:04,499 was also a file format that 89 00:13:04,500 --> 00:13:13,833 was required to provide closed captions to the MP4 files that would be used on your "i" Devices, the iPods, the iPhone, 90 00:13:13,833 --> 00:13:17,533 et cetera, and of course you also get back the text. 91 00:13:17,533 --> 00:13:31,966 So recently we also added a couple of different things to the system. So the ability to bypass the video. So one of the things that we discovered is 92 00:13:32,033 --> 00:13:44,133 that a lot of people were shooting in high definition video and our system is not set up to actually support or create high definition video for 93 00:13:44,133 --> 00:13:52,933 web delivery. So we created a system where you didn't actually get a video file converted by the system but rather you could just upload the video 94 00:13:52,933 --> 00:14:02,599 or we would do an audio conversion. So again remember we were creating media files. The system will ingest a number of different video files 95 00:14:02,600 --> 00:14:11,566 in to the system and what we do is we output some sort of web ready videos in these different formats but we couldn't support a high def just 96 00:14:11,566 --> 00:14:23,299 because our system was unable to do that code conversion. So we created a bypass for that. We also started providing support for the letter box or 97 00:14:23,300 --> 00:14:26,166 the 169 video aspect ratio. 98 00:14:26,166 --> 00:14:35,532 Originally started off with the 43 which was sort of the common video, sort of 800 x 600 kind of frame. But again more people are going 99 00:14:35,533 --> 00:14:43,299 for the larger sort of letter box format, what you see very often on your iPhones and other devices like that. 100 00:14:43,300 --> 00:14:51,866 We also -- originally we were only outputting at one size of video. So we added a couple of 101 00:14:51,866 --> 00:14:56,999 different video resolution sizes. Sort of acknowledging the fact that websites and web pages 102 00:14:57,000 --> 00:15:04,600 >> JOHN FOLIOT: are increasingly having more real estate just because end users desktop screens are a lot 103 00:15:04,600 --> 00:15:14,700 larger and we started providing export option for WebM which is the VP8 Codec. A little bit more about that later in the presentation. 104 00:15:14,700 --> 00:15:28,033 But this is the latest Codec. It is an open source or patent free encoding Codec. It is very directly linked to sort of HTML5 efforts with video. And 105 00:15:28,033 --> 00:15:37,266 while HTML5 itself does not yet support native closed captions we were getting requests from content producers on campus that 106 00:15:37,266 --> 00:15:46,366 they wanted to start working with WebM. There are a number of sort of JavaScript solutions that can be used using WebM and caption files. So 107 00:15:46,366 --> 00:15:55,732 we are now producing content like that. So what I would like to do is originally I was hoping to actually do a live demonstration but 108 00:15:55,733 --> 00:16:04,266 unfortunately we can't really do that. So I have done a series of screen shots and I will walk you through the process of how 109 00:16:04,266 --> 00:16:17,532 content producers on campus can create captioned videos with a minimum of effort. So we have a website that users go to. And before 110 00:16:17,533 --> 00:16:23,399 anybody on campus can start to use the system, they have to first register with the system. 111 00:16:23,400 --> 00:16:35,866 So that we can track them and I mean as any user or system like this you need to have a user profile. One of the things that's really important in the user 112 00:16:35,866 --> 00:16:45,466 profile however is we are capturing billing information. And I will touch on that a little bit more but one of the goals that Sean and I had when we 113 00:16:45,466 --> 00:16:54,066 were setting up the system is we wanted this to be a pass through system. We didn't want to have to be actively involved in maintaining and 114 00:16:54,066 --> 00:17:03,632 managing this system. And to that end right now we probably spend no more than an hour or two a month doing maintenance on the system. So It is 115 00:17:03,633 --> 00:17:13,333 very much a pass through system. The captioning or the transcription costs are borne directly by the people that are using the system. 116 00:17:13,333 --> 00:17:22,099 The transcription companies that we are working with are invoicing our users directly. So we are completely out of sort of that billing and, you know, 117 00:17:22,100 --> 00:17:24,000 sort of financial management situation. 118 00:17:24,000 --> 00:17:34,500 So people need to register and Sean and I receive an e-mail when somebody is looking to register for the system that we manually approve. 119 00:17:34,500 --> 00:17:44,666 At this point in time we have limited the number of people that can use the system to people that are directly connected to the university with one or two 120 00:17:44,666 --> 00:17:54,399 very minor exceptions. We are not looking to become an outsourcing organization. We don't have the cycles or 121 00:17:54,400 --> 00:18:06,233 resources to support people off campus. The exception is my friend Victor Tsaran who runs the accessibility lab at Yahoo. He himself is blind and 122 00:18:06,233 --> 00:18:17,433 we have a really strong relationship with that usability lab over at Yahoo given that we are also located in the Bay area. So we open the system 123 00:18:17,433 --> 00:18:26,666 up to him. He provided some early feedback to us as well in terms of how the system works for nonsighted users. So he provided some sort of 124 00:18:26,666 --> 00:18:29,266 real world experience and feedback to us. 125 00:18:29,333 --> 00:18:37,766 So you either register. If you have forgotten your password, you put in the user, you click on here 126 00:18:37,766 --> 00:18:40,999 and you put in your e-mail address and you receive an e-mail that gives 127 00:18:41,000 --> 00:18:47,266 you a one time log-in so that you can log in and you can change your password, et cetera, et cetera. 128 00:18:47,266 --> 00:18:57,932 So once you have logged in to the system, you are presented with a -- the video upload page where you create a new project. So the first thing 129 00:18:57,933 --> 00:19:01,733 you would do is you select your media file. 130 00:19:01,733 --> 00:19:10,199 There is a video conversion profile that you select and I will be going through all of these in more detail later. 131 00:19:10,200 --> 00:19:20,266 The one thing that's critical is that you enter a title of the project and the reason for that is you need a title so you can find stuff in your library later on. 132 00:19:20,266 --> 00:19:31,366 You have the option of providing a brief description of your project so that, you know, when you go back later on you will be able to sort of 133 00:19:31,366 --> 00:19:39,199 remember what this video was about. One of the things that's critical to the system is the fact that we are 134 00:19:39,200 --> 00:19:50,600 allowing or we have set it up so that a number of third party transcription companies are actually providing the speech-to-text transcription. 135 00:19:50,600 --> 00:19:59,166 If anybody has played with captioned videos right now you will know that getting your speech converted to text is the most time consuming and 136 00:19:59,166 --> 00:20:02,632 difficult process, part of the entire process. 137 00:20:02,633 --> 00:20:11,899 And so we have contracted with a number of different transcription companies in the Bay area and we went to them -- actually they are not all in 138 00:20:11,933 --> 00:20:17,766 the Bay area. They are actually distributed throughout the country. They are all North American based 139 00:20:17,800 --> 00:20:28,566 transcription companies. There are some organizations that outsource their material to countries outside of North America. 140 00:20:28,566 --> 00:20:38,332 We were concerned about accuracy, given some of the technical terms that we were dealing and just kind of the level of content, sort of educational 141 00:20:38,333 --> 00:20:40,766 content that we were looking at. 142 00:20:40,766 --> 00:20:49,999 So that was kind of one of our prerogatives. You could use virtually any caption or transcription company out there. 143 00:20:50,000 --> 00:20:56,366 We went to these companies and we asked them to give us a couple of different price points based on turnaround time. 144 00:20:56,366 --> 00:21:06,532 So right now you can see I have two companies that are providing a 24- hour or same day turnaround time to convert the speech-to-text. 145 00:21:06,533 --> 00:21:17,533 I have a number of companies that are giving it to me in the 24 or rather 48 hour time frame, two business day time frame. We were very specific 146 00:21:17,533 --> 00:21:20,433 about defining it as business days. 147 00:21:20,433 --> 00:21:30,233 The last thing we wanted was somebody uploading a video at 5 o'clock on a Friday afternoon and expecting to have it Sunday, you 148 00:21:30,233 --> 00:21:32,933 know, in their inbox because frankly 149 00:21:32,933 --> 00:21:40,899 these companies, you know, they are commercial companies and they take weekends as well. So we specified business days, four business days 150 00:21:40,900 --> 00:21:47,600 and a week. So depending on the turnaround time or in the case of this organization, here Docsoft, the 151 00:21:47,600 --> 00:21:54,500 number of speakers that are included in the audio file, they have given us a couple of different price points. 152 00:21:54,500 --> 00:22:06,466 And so we are agnostic. We really don't care which company our users use. What I can tell you is that right now Cogi at $1.25 a minute for a 48 153 00:22:06,466 --> 00:22:17,499 hour turnaround time seems to be getting the lion's share of the work which tells us -- I am seeing some questions here. 154 00:22:17,500 --> 00:22:27,300 So somebody is asking does Stanford sell the system. We do not. Karen, I will pull up my e-mail address at the end and I can provide you with 155 00:22:27,300 --> 00:22:32,300 some contacts. I'll tell you how the system was put together a little later on. I can find you some contacts. 156 00:22:32,300 --> 00:22:49,100 Temple University, they are not getting any sound. I have to hand that off to the V-cube people to try and figure out. So as I said the $1.25 a minute 157 00:22:49,100 --> 00:22:51,966 turnaround time from Cogi seems to be 158 00:22:51,966 --> 00:23:01,232 the one that most people are using which again sort of confirms to me that people are very price conscious about this. So finally one of the things 159 00:23:01,233 --> 00:23:11,266 that we looked at when we were setting up the system was special terms, special vocabularies. So like any large institution you are going to have 160 00:23:11,266 --> 00:23:15,366 for example, buildings, or family names that you want to make sure 161 00:23:15,366 --> 00:23:27,366 that the spelling is correct. There are also legal terms, medical terms and specific engineering terms. And so when the user is uploading 162 00:23:27,400 --> 00:23:35,933 a video to the system, if they are aware of some special terms, so, for example, in the medical school there are a lot of medical terms that a lot of 163 00:23:35,933 --> 00:23:41,999 transcriptionists would likely not be familiar with. So one of the things that happens is you can add 164 00:23:42,000 --> 00:23:57,433 a special term that's associated to this video. And what happens is that those terms are passed along to the transcription company via an e-mail. 165 00:23:57,433 --> 00:24:07,599 They are also added to a flat text file. And so the transcription companies can log in, they can download that flat text file and essentially it 166 00:24:07,600 --> 00:24:16,933 becomes a custom dictionary that they can incorporate in to their transcription tools. So stepping through the process, 167 00:24:16,933 --> 00:24:25,033 selecting a media file. It is really simple. You click on the browse button and you find the video file format that you are looking for. 168 00:24:25,033 --> 00:24:38,133 Right now we support MOV, Windows media files, WMA and MP4. We also support audio only files of Wave and MP3. One of the problems that we 169 00:24:38,133 --> 00:24:47,099 have right now and it is a problem -- it is a limitation with the system and we are working with some engineers to try and overcome it, but the 170 00:24:47,100 --> 00:24:55,200 maximum file size that you can upload in to the system right now is a gigabyte. So again I mentioned earlier that we have some people who 171 00:24:55,200 --> 00:25:01,700 are shooting high def or high definition video and those files either due to 172 00:25:01,700 --> 00:25:13,466 screen size or just the quality often exceed one gigabyte. So another benefit of being able to upload the audio files I have two content creators 173 00:25:13,466 --> 00:25:18,766 on campus who are doing a fair amount of high definition video right now. So as part of their post 174 00:25:18,766 --> 00:25:30,999 production they are ripping a MP3 from their video and uploading the MP3 from their system. The next step is to choose your video conversion. 175 00:25:31,000 --> 00:25:43,600 And so we have a couple of different profiles. Each user can create their own profile. I call them full blown but what that basically means is that 176 00:25:43,666 --> 00:25:56,499 they are going to be converted to all of the different Codec files that we support. So FLV, MP4, and WebM. But if you don't need all of 177 00:25:56,500 --> 00:25:59,600 those, you can select or deselect the ones that you require. 178 00:25:59,600 --> 00:26:07,366 >> JOHN FOLIOT: So each end user can create as many sort of conversion pull files as they want. I mentioned as 179 00:26:07,366 --> 00:26:16,666 well that we support a number of different screen resolutions. So here is the 320 by 240 which is the standard 4-3 resolution. 180 00:26:16,666 --> 00:26:26,899 >> JOHN FOLIOT: I have 480 by 320 which is the 69, 640 by 360 or 640 by 480. We also -- you can also just choose 181 00:26:26,900 --> 00:26:31,900 to not have a video created and just get the timestamp file based on audio. 182 00:26:32,000 --> 00:26:40,700 >> JOHN FOLIOT: Again you enter a title and so for the sake of today's demonstration enter the title of 183 00:26:40,700 --> 00:26:48,633 demonstration. You can if you want provide that short text description and as most users 184 00:26:48,666 --> 00:26:58,132 >> JOHN FOLIOT: on campus I choose Cogi at $1.25 a minute here. And I don't have any specific vocabulary. So I 185 00:26:58,133 --> 00:27:06,133 didn't choose anything and I click on the submit button and off it goes. 186 00:27:06,133 --> 00:27:12,633 >> JOHN FOLIOT: One of the things that we noticed however was that some people didn't actually need 187 00:27:12,633 --> 00:27:20,099 to do the transcript. There was concerns that even at a $1.25 a minute it was very expensive. 188 00:27:20,100 --> 00:27:26,733 >> JOHN FOLIOT: And so we provided a means that if you actually have your own flat text transcript, you can actually 189 00:27:26,733 --> 00:27:29,499 insert the transcript here. 190 00:27:29,500 --> 00:27:39,266 >> JOHN FOLIOT: So if I go back a couple of slides, you will see right here it says I already have a transcript. And 191 00:27:39,266 --> 00:27:45,866 so if you click on that, then it changes, it removes the caption providers and it gives you 192 00:27:45,866 --> 00:27:54,232 >> JOHN FOLIOT: this upload dialogue where you can upload a flat text file. The file is quite simply a flat text file. It is 193 00:27:54,233 --> 00:28:03,199 saved in a .txt format. There is no formatting. As a matter of fact, we tell people to strip any formatting. 194 00:28:03,266 --> 00:28:12,899 >> JOHN FOLIOT: So don't give us a Word document or any kind of rtf. We just want a flat txt file. UTF coding is fine. 195 00:28:12,900 --> 00:28:21,833 And no line breaks are really necessary. Line breaks will be picked up or they will be automatically inserted by the system. 196 00:28:21,833 --> 00:28:29,633 >> JOHN FOLIOT: What we do tell people is do a carriage return at the end of a period. So if you've got your own 197 00:28:29,633 --> 00:28:40,133 transcript file you can insert it into the system. One of the advantages there is that right now the cost of this system, the only cost that's passed on to 198 00:28:40,133 --> 00:28:51,366 the end user is the actual outsourcing to the transcription company. I have one faculty member whose students are actually creating videos as part of sort of their class project or 199 00:28:51,366 --> 00:28:58,432 end of year project and the students themselves are providing the transcript. They do it themselves. 200 00:28:58,433 --> 00:29:06,599 And so they can insert that in to the system at which point the timestamping and the alignment and everything else is provided at no charge. 201 00:29:06,600 --> 00:29:14,966 >> JOHN FOLIOT: So we listened very carefully to this complaint about the cost of creating caption files. And so all 202 00:29:14,966 --> 00:29:20,899 of the conversion and the timestamping is mechanically done so it doesn't cost anything. 203 00:29:20,900 --> 00:29:28,666 >> JOHN FOLIOT: If I have a transcript file I can insert it there and finally I hit upload and the file is uploaded in to the 204 00:29:28,666 --> 00:29:32,666 system. You get a little dialogue box and off you go. 205 00:29:32,933 --> 00:29:38,933 >> JOHN FOLIOT: Depending on your bandwidth and the size of the file, you know, it takes a minute or three to actually 206 00:29:38,933 --> 00:29:46,933 upload the video in to the system and then you are brought to the next page in the workflow system. 207 00:29:46,933 --> 00:29:51,333 >> JOHN FOLIOT: So you will notice that each job is given a unique identifying number and this particular 208 00:29:51,333 --> 00:30:01,899 demonstration job is 591. That job identification number is also used to create essentially a work ticket number. 209 00:30:01,900 --> 00:30:08,700 >> JOHN FOLIOT: It is used in the invoicing process that's done by the transcription companies. And so it is the 210 00:30:08,700 --> 00:30:14,300 unique identifier of the project so that we can track it through the system if we have to. 211 00:30:14,300 --> 00:30:20,266 >> JOHN FOLIOT: You get a little bit of other sort of basic information, the title, the description, the date that 212 00:30:20,266 --> 00:30:31,399 it was submitted. One of the things that was really important to us was that we are not -- the system again is a pass through system. 213 00:30:31,400 --> 00:30:39,100 >> JOHN FOLIOT: We are not at this time in a situation or prepared to do video hosting. Not that it couldn't be worked 214 00:30:39,100 --> 00:30:46,233 out but that was not our goal when we set the system up. We have had some conversations with a couple 215 00:30:46,233 --> 00:30:56,799 of people on campus that are looking to have the sort of hosting situation. But we don't have it at this point. 216 00:30:56,800 --> 00:31:04,433 >> JOHN FOLIOT: The people at Sacramento asked a question. Notice the option of data mining only. And is this 217 00:31:04,433 --> 00:31:13,499 through Docsoft? The answer to that is yes. The system that we are using it relies on two servers. 218 00:31:13,500 --> 00:31:20,500 >> JOHN FOLIOT: One is the Docsoft appliance as well as a custom server that manages all of the additional business 219 00:31:20,500 --> 00:31:29,300 logic that I am showing you here. So the data mining we leave it there simply because it is available. 220 00:31:29,300 --> 00:31:40,433 If anybody has used the system you will know that the speech-to-text, sort of the mechanical or machine speech-to-text, the quality varies greatly. 221 00:31:40,433 --> 00:31:48,166 In our early testing we were only getting -- we were getting results as poor as 20 percent accuracy, 30 percent accuracy which was clearly 222 00:31:48,166 --> 00:31:58,599 not good enough. Somebody asked is this a homegrown system or a commercial system. This is a homegrown system. This is something 223 00:31:58,600 --> 00:32:07,466 that we put together on campus based on sort of some understandings and some custom development as well. 224 00:32:07,500 --> 00:32:14,933 >> JOHN FOLIOT: So getting back to the screen here, you will notice that we have an expiratory date. So right now 225 00:32:14,933 --> 00:32:21,833 when the videos are converted in to these different file formats. We will store them on the server for 30 days. 226 00:32:21,833 --> 00:32:29,766 And at the end of 30 days they are automatically removed from the system simply because we don't have storage space. 227 00:32:29,766 --> 00:32:37,566 >> JOHN FOLIOT: So you will notice that my files or the conversion status is that the conversion to these different file 228 00:32:37,566 --> 00:32:43,699 formats is in process as is the transcription status. 229 00:32:43,700 --> 00:32:55,000 And so what happens is that the transcription status will always be in process even though -- so the first step is that we need to do the conversion. 230 00:32:55,000 --> 00:33:02,733 >> JOHN FOLIOT: And we can't go any further in the process until we have at the very least the MP3. So the 231 00:33:02,733 --> 00:33:09,799 system will convert the file to an MP3 first as you can see it was done fairly quickly. 232 00:33:09,900 --> 00:33:16,466 >> JOHN FOLIOT: And so at that point the end user can download the MP3. But again if you remember on one of 233 00:33:16,466 --> 00:33:22,232 the earlier slides we have some internal logic in the system that sends an e-mail off 234 00:33:22,233 --> 00:33:31,433 to the transcription company that was selected. And it says that the MP3 file is ready to be picked up and converted in to text. 235 00:33:31,433 --> 00:33:40,966 So the transcription status will be in process until such time as the transcription company reloads the flat text file in to the system. 236 00:33:40,966 --> 00:33:47,966 >> JOHN FOLIOT: If you have your own text file it will start to happen once the conversion of these file formats is done. 237 00:33:47,966 --> 00:33:58,266 But if it is being outsourced and you have chosen 48 hours, then it will be sort of in process for 48 hours until the file comes back. 238 00:33:58,333 --> 00:34:05,799 >> JOHN FOLIOT: In this particular example I had a text file all ready. And so obviously it is ready to be downloaded. I 239 00:34:05,800 --> 00:34:13,100 mean it is kind of silly in this particular instance because I uploaded the text file and it is ready for download. 240 00:34:13,100 --> 00:34:21,633 >> JOHN FOLIOT: We didn't bother worrying about the skewed logic there but it has not yet been timestamped. And 241 00:34:21,633 --> 00:34:32,166 the timestamping again is dependent on the MP3 file because the Docsoft appliance would actually apply as the timestamping. 242 00:34:32,166 --> 00:34:38,066 >> JOHN FOLIOT: So a couple of minutes later and you can see the completion. It took me roughly 18 minutes to 243 00:34:38,066 --> 00:34:44,599 convert the video in to the different file formats. And so they are all available for download. 244 00:34:44,700 --> 00:34:52,233 >> JOHN FOLIOT: Once all of the video files have been converted to their sort of web ready format, the system sends 245 00:34:52,233 --> 00:34:59,633 an e-mail to the individual user informing them that the videos are, in fact, ready to be picked up. 246 00:34:59,866 --> 00:35:09,466 As I said at this point the most mechanical part of the system is users log in to their system, they go to the project page and they have to 247 00:35:09,466 --> 00:35:18,266 download the files on to their local drive so that they can be uploaded to their web server or put in their media asset management tool. 248 00:35:18,266 --> 00:35:24,032 >> JOHN FOLIOT: We are hoping in the future to be able to have rather than make it available for download that it 249 00:35:24,033 --> 00:35:33,166 could just be fired off to a storage facility, but there is a couple of issues about FTP file permissions and whatnot. So we need to work those out. 250 00:35:33,166 --> 00:35:40,466 >> JOHN FOLIOT: But at any rate the file, they receive an e-mail saying that the files are ready to download. 251 00:35:40,466 --> 00:35:49,232 >> JOHN FOLIOT: If they have chosen the transcription process that's going to take more than 24 hours, one of the 252 00:35:49,233 --> 00:35:54,099 things that we heard was that oftentimes they need to get the video up as soon as possible. 253 00:35:54,200 --> 00:36:03,200 And so I would rather not discourage people from putting up their videos. I don't want the captions to be the roadblock. 254 00:36:03,200 --> 00:36:09,266 >> JOHN FOLIOT: Obviously I would love to have, you know, the captions go live with the video right away but in 255 00:36:09,266 --> 00:36:13,266 some instances I am prepared to accept a little bit of water with the wine. 256 00:36:13,266 --> 00:36:21,399 >> JOHN FOLIOT: And if the captions catch up a day or two later then that's what it is. The e-mails are 257 00:36:21,433 --> 00:36:28,399 actually sent individually. So a second e-mail is also sent out when the caption files are ready to be downloaded as well. 258 00:36:28,400 --> 00:36:40,666 The other thing that happens is that once the process is done, the project file creates a little sort of preview window here. 259 00:36:40,666 --> 00:36:48,466 >> JOHN FOLIOT: Anybody that's using video probably recognizes the JW FLV player which is the player that 260 00:36:48,466 --> 00:36:58,866 we are using. And so it is embedded in the page there. So that you can review the file, you can see the caption associated with the video. 261 00:36:58,866 --> 00:37:10,866 One of the things that has been occurring occasionally is that once the timestamp files are created people notice that they are off a little bit or 262 00:37:10,866 --> 00:37:16,232 >> JOHN FOLIOT: in the transcription status if it was outsourced, there may be some spelling mistakes in there. 263 00:37:16,233 --> 00:37:23,633 If that's the case, they can take the flat text file, they can do minor edits. 264 00:37:23,633 --> 00:37:31,966 And you recall that in the system you can say I have a text transcript already. And so they can reupload it and it would be retimestamped. 265 00:37:31,966 --> 00:37:42,066 >> JOHN FOLIOT: So error correction like that. The dfxp file and the subrip file are also essentially flat text files. 266 00:37:42,100 --> 00:37:54,500 So if you need to do some minor timestamp tweaking you can open that up in the text file. You have the ability in process to do some fine- tuning. 267 00:37:54,500 --> 00:38:04,500 Finally at the bottom of each project page we provide a link to download the JW FLV player. We bought a license for it. 268 00:38:04,500 --> 00:38:12,766 The license is relatively cheap. I think Sean paid $50, $60 for the license. We did a little bit of additional branding to it. 269 00:38:12,766 --> 00:38:25,899 The most recent version of the JW player, the module for displaying captions in the player is a third party sort of stand-alone module. What 270 00:38:25,900 --> 00:38:34,166 we did is we actually took the module and we integrated it in to the player and then rebundled it. The player, of course, is done in Flash. 271 00:38:34,166 --> 00:38:40,232 >> JOHN FOLIOT: And so we open it up in to a Flash editor and made a minor tweak there so that the player that 272 00:38:40,233 --> 00:38:48,366 we offer for download has the module built in. We also provide some copy and paste code here. 273 00:38:48,366 --> 00:38:57,432 Both raw sort of HTML code .. we had a couple people using content management systems, blogging tools and whatnot. 274 00:38:57,433 --> 00:39:06,766 And so we have a base URL so you specify where the files are located and then we generate some copy and paste code there. 275 00:39:06,766 --> 00:39:14,666 So that people can add it to their either -- in to their Web page or in to their blog pages or what have you. 276 00:39:14,666 --> 00:39:19,632 >> JOHN FOLIOT: You notice perhaps along the top that there are a couple of additional buttons. So under the library 277 00:39:19,633 --> 00:39:27,033 button you have a media library and each of the projects has an entry that looks like this. 278 00:39:27,033 --> 00:39:36,866 Gives you a screen capture of the first frame, which we also make available as a download right here, the preview image. 279 00:39:36,866 --> 00:39:46,899 >> JOHN FOLIOT: So you have got a screen capture that's available. I'm sorry. Got to get back to the right slide. You 280 00:39:46,900 --> 00:39:54,133 can -- if you have a number of projects you can sort by submission date, title or duration. And you have some basic details. 281 00:39:54,133 --> 00:40:02,899 >> JOHN FOLIOT: Clicking on the details link takes you back to the, sorry, takes you back to the actual project page 282 00:40:02,900 --> 00:40:08,633 there. And so you have a number of projects depending on how you go. 283 00:40:08,633 --> 00:40:16,099 As I said each user has their own unique user account. And so the user account, user preferences look like that. 284 00:40:16,100 --> 00:40:24,933 >> JOHN FOLIOT: So if you need you can go in and change passwords. I mention that all throughout the 285 00:40:24,933 --> 00:40:33,399 process e-mails are sent to the end users. If you don't want to receive all of those e-mails, you can choose not to get those notifications. 286 00:40:33,466 --> 00:40:41,266 >> JOHN FOLIOT: We also have a couple of settings that can be modified. The accessibility settings ... when we 287 00:40:41,266 --> 00:40:45,832 set up the system and the system is now almost three years old. 288 00:40:45,833 --> 00:40:51,633 >> JOHN FOLIOT: We had one person complaining about sort of some of the JavaScript that was being used in the 289 00:40:51,666 --> 00:40:59,499 interface and they didn't want to have it. You can disable the scripting part of the system. 290 00:40:59,500 --> 00:41:08,700 You just get more page scrolling and it takes longer to go through the workflow. But it really doesn't have a lot of practical impact. 291 00:41:08,700 --> 00:41:18,366 But the other thing that I mention is that we have a conversion workflow profile. And so each user can create as many profiles as they want. 292 00:41:18,366 --> 00:41:25,999 You just give it a number. And so as I mentioned earlier I called it full blown but you can call it whatever you want. 293 00:41:26,000 --> 00:41:32,800 >> JOHN FOLIOT: You get to choose the final output size as well as the formats that are going to be converted in. 294 00:41:32,800 --> 00:41:40,166 >> JOHN FOLIOT: So, for example, if you know that you really only want an FLV file, then that's all you need to 295 00:41:40,166 --> 00:41:46,766 convert to. It does speed up the process. And it also saves us a little bit of CPU. 296 00:41:46,766 --> 00:41:53,932 One of the attendees asks who is paying for the captioning. The individual user pays for their own captioning. 297 00:41:53,933 --> 00:42:01,966 >> JOHN FOLIOT: So it would be the department or the faculty member is directly billed. And so as I 298 00:42:01,966 --> 00:42:07,432 mentioned we have .. when you set up the user profile we capture billing information 299 00:42:07,433 --> 00:42:15,966 >> JOHN FOLIOT: so that the transcript company sends the invoice directly to the end user. So this is the conversion 300 00:42:15,966 --> 00:42:24,266 workflow profile so that you can go in and create as many or as few as you want. 301 00:42:24,266 --> 00:42:35,499 We also have now .. because I am in administrator I am giving you a screen shot of the administrator panel. And so most users will not 302 00:42:35,500 --> 00:42:41,766 >> JOHN FOLIOT: But the system administrator, so Sean and I do. So there is a number of back end things that we 303 00:42:41,766 --> 00:42:47,899 can do to set up the system. So one of the first important things is managing the department. 304 00:42:47,900 --> 00:42:54,433 >> JOHN FOLIOT: So kind of related to that question of who pays for the captioning, the department does. So 305 00:42:54,433 --> 00:43:02,433 we have a listing of all the different, you know, here is one that Sean set up just as a test account, the Acme labs. 306 00:43:02,433 --> 00:43:07,133 >> JOHN FOLIOT: So we capture a contact name, a person that we are actually dealing with in the department and we 307 00:43:07,133 --> 00:43:14,166 have their e-mail address. And these three fields here are obligatory when creating a user profile. 308 00:43:14,166 --> 00:43:18,332 >> JOHN FOLIOT: We need an e-mail address and need a contact name and we need a department. We have a 309 00:43:18,366 --> 00:43:24,132 contact phone number so if we need to get in touch with the people we can. We have the mailing 310 00:43:24,133 --> 00:43:31,633 address so again when the transcript company creates the invoice they just send the invoice straight to the user. 311 00:43:31,633 --> 00:43:38,333 >> JOHN FOLIOT: So for Sean and I, again, it is as hands off as we can make it. You can in fact edit that information. 312 00:43:38,333 --> 00:43:46,899 And so .. you know .. I dont' have the full screen to show you. But moving along the mailing address, the e-mail or web address if we have, and then 313 00:43:46,900 --> 00:43:52,633 you can edit or delete or when you are editing you get this gray look. 314 00:43:52,633 --> 00:43:59,266 >> JOHN FOLIOT: So we can change the contact name, e-mail address, the mailing address, et cetera, et cetera. 315 00:43:59,266 --> 00:44:04,332 >> JOHN FOLIOT: We also have a similar system for the different transcript services. So right now I mention we are 316 00:44:04,333 --> 00:44:10,199 dealing with while the Stanford transcription is a test account, it doesn't really do anything. 317 00:44:10,200 --> 00:44:15,500 >> JOHN FOLIOT: But we are dealing with four transcript companies that have given us different price points. 318 00:44:15,533 --> 00:44:20,299 >> JOHN FOLIOT: We can add as few or as many as we want. So one of the things that I was concerned 319 00:44:20,300 --> 00:44:25,833 about moving forward is that I didn't want to put all of our eggs in one basket. 320 00:44:25,833 --> 00:44:34,466 So as the system continues to grow if we need to add other transcription companies or if other transcription companies come forward that can 321 00:44:34,466 --> 00:44:44,499 give us a better price we can add a new transcript company in to the service offering and so again our users have a choice. 322 00:44:44,500 --> 00:44:51,433 >> JOHN FOLIOT: So the system in many ways is almost like a brokerage or a user bazaar and we give 323 00:44:51,433 --> 00:44:56,666 users the options, but they actually engage directly with the transcript company. 324 00:44:56,666 --> 00:45:06,032 >> JOHN FOLIOT: Our system simply works as the brokerage and manages it all sort of behind the scenes. We can 325 00:45:06,033 --> 00:45:17,133 also manage our users directly. And so over here we can decide whether the users are billable or not. 326 00:45:17,133 --> 00:45:23,599 >> JOHN FOLIOT: So a couple of the systems we had a couple of people come to us that wanted to use the 327 00:45:23,600 --> 00:45:33,100 system. We are always going to have the transcript. So, for example, students and so we can set up the system that a student could log in. 328 00:45:33,100 --> 00:45:38,433 >> JOHN FOLIOT: They could upload a file but they do not have access to the transcript companies and that's 329 00:45:38,433 --> 00:45:47,466 kind of something that we added after the system was initially created because we had some demand for that. 330 00:45:47,466 --> 00:45:52,266 >> JOHN FOLIOT: We didn't want to be in a situation where we were trying to capture billing information from 331 00:45:52,266 --> 00:45:58,999 students because as you know students come and go and if they skipped on the bill we would be left holding the bag. 332 00:45:59,000 --> 00:46:04,600 >> JOHN FOLIOT: So we have created a system where if you have got a file you can still get the timestamping and 333 00:46:04,600 --> 00:46:15,200 the conversion done but you do not have access to the billable part of the system. As well for security reasons 334 00:46:15,200 --> 00:46:23,733 if you try and log on and you fail three times you are locked out. At which point we receive an e-mail saying that a particular user was locked out. 335 00:46:23,733 --> 00:46:28,366 >> JOHN FOLIOT: And we will contact them and ask if they were having problems, or was somebody 336 00:46:28,366 --> 00:46:35,432 trying to hack their account and so then we can reset it based on the answers to those questions. 337 00:46:35,433 --> 00:46:44,699 >> JOHN FOLIOT: We also as I said when somebody first signs up we receive an e-mail, you know, requesting to 338 00:46:44,700 --> 00:46:51,700 have an account with the system. And so for the time being Sean and I have set it up that it is a manual approval process. 339 00:46:51,700 --> 00:46:57,066 >> JOHN FOLIOT: We could automate that if we wanted to but again we wanted to make sure that the people that are 340 00:46:57,066 --> 00:47:03,299 using the system were, in fact, affiliated with the university. 341 00:47:03,366 --> 00:47:10,799 >> JOHN FOLIOT: As far as the pricing is concerned you select -- once you have created a transcript service you 342 00:47:10,800 --> 00:47:18,766 select the name of the transcript service and then you can add as many different price services as possible. 343 00:47:18,766 --> 00:47:24,232 >> JOHN FOLIOT: Here the price per minute and we -- excuse me, we create a description here and the 344 00:47:24,233 --> 00:47:31,799 description is actually what is read here. So we actually write out two business days project read on at the price point. 345 00:47:31,800 --> 00:47:41,633 That's what is displayed on the video upload page to the end user. So we couldn't find a more elegant way than that. So it is a little kludgy but at 346 00:47:41,633 --> 00:47:52,699 least it is accurate. So that's basically the system from end to end. As I said we provide a copy and paste of the 347 00:47:52,700 --> 00:48:03,900 HTML code. One of the things that we are looking at is integrating it with Drupal. On our campus Drupal has become a very 348 00:48:03,900 --> 00:48:11,300 popular content management system. My guess is within the next two to three years probably about 80 percent of the web content 349 00:48:11,300 --> 00:48:22,266 on Stanford's campus will be delivered via the Drupal system. So we are looking at having a tighter integration with the Drupal system. 350 00:48:22,266 --> 00:48:30,666 >> JOHN FOLIOT: The -- we also have tutorials on captioning Youtube videos as well as support for iOS. 351 00:48:30,666 --> 00:48:37,666 Somebody asked a question "Is the audio mine option free of cost to the users?" 352 00:48:37,666 --> 00:48:46,499 >> JOHN FOLIOT: It is, it is free of cost. The problem is that the accuracy .. it's just .. I mean if you have a really 353 00:48:46,500 --> 00:48:57,333 sort of good speaker with no accent, you know somebody from the Midwest and that has been well mic'd and you have really good audio quality 354 00:48:57,366 --> 00:49:00,932 then the accuracy is not bad. 355 00:49:00,933 --> 00:49:07,033 >> JOHN FOLIOT: By our experience, however, having that kind audio quality going in is something of a pipe 356 00:49:07,033 --> 00:49:15,533 dream more often than not. Its you know, a boom mic at the back of the room or you've got 357 00:49:15,533 --> 00:49:23,666 >> JOHN FOLIOT: faculty members or presenters with thick and heavy accents. At that point the quality of the 358 00:49:23,666 --> 00:49:30,732 speech-to-text goes way down and it really is a question of the law of diminishing returns. 359 00:49:30,733 --> 00:49:35,166 >> JOHN FOLIOT: When you get back 30 percent accuracy you might as well get back 0 percent accuracy 360 00:49:35,166 --> 00:49:40,832 because it is going to take you so long to try and clean that up that it's really not worthwhile. 361 00:49:40,833 --> 00:49:47,799 >> JOHN FOLIOT: However if people want to do that and then use that as a basis to clean it up and provide a text 362 00:49:47,800 --> 00:49:54,333 transcript, and then insert that transcript in to the system, they are free to do so and there is no cost. 363 00:49:54,333 --> 00:49:59,399 >> JOHN FOLIOT: We will still do the Codec conversion and create the sort of web ready video file. We will still 364 00:49:59,400 --> 00:50:06,900 generate the timestamp files. You get the copy and paste code. So all of the other pieces of creating sort of a captioned video for web delivery 365 00:50:06,900 --> 00:50:14,700 will still be delivered by the system. So the -- another question 366 00:50:14,700 --> 00:50:21,800 is when a vender is chosen who pays for it. The professor, the department, disability services, et cetera, the client. 367 00:50:21,800 --> 00:50:29,633 So -- when -- I mean I hate to use business terms on campus sometimes because it bristles some people's necks. 368 00:50:29,633 --> 00:50:39,566 But this really is a business relationship that we've set up .. between somone who's creating video content and the transcription company. 369 00:50:39,600 --> 00:50:45,200 >> JOHN FOLIOT: And so all we do is we create a business relationship for them. If it is disability services that 370 00:50:45,233 --> 00:50:52,566 wants to set up an account, then they can do so. And any videos that they set send in to the system gets billed back to them. 371 00:50:52,566 --> 00:50:58,499 >> JOHN FOLIOT: So, of course, Sean who works in Stanford's office of accessible education which is our 372 00:50:58,500 --> 00:51:04,766 disability services department, when he uses the system, the bill goes to them. 373 00:51:04,766 --> 00:51:11,966 >> JOHN FOLIOT: But one of the things that we have done is that we have tried to push the costs back to the end user 374 00:51:11,966 --> 00:51:17,532 not in a negative way but by going around and explaining sort of the value of having captioned files. 375 00:51:17,533 --> 00:51:24,466 >> JOHN FOLIOT: And so one of the groups on campus that I am working with is it is called tech training and as 376 00:51:24,466 --> 00:51:29,332 part of education group that's located in IT services. 377 00:51:29,333 --> 00:51:36,433 >> JOHN FOLIOT: And every Friday afternoon on campus they have a 90-minute free kind of presentation where they 378 00:51:36,433 --> 00:51:42,066 bring in people to talk about various technical subjects. I have done presentations there. 379 00:51:42,133 --> 00:51:48,333 They have brought in third party venders from, you know, Adobe and Microsoft and they do these 90 -minute 380 00:51:48,400 --> 00:51:58,433 sessions on - you know - a myriad of different topics. They are videotaping all these sessions to create their own sort of video library and they 381 00:51:58,433 --> 00:52:06,266 are using the captioning system. And so they were set up as a client in the system and they are using 382 00:52:06,266 --> 00:52:17,566 Cogi right now. So when they upload the video, Cogi does the speech-to- text and fires back the flat text file and they send the invoice straight to 383 00:52:17,566 --> 00:52:26,899 the training group at IT services and they pay it directly. So, you know, it really is set up as a client process like that. 384 00:52:26,900 --> 00:52:31,466 >> JOHN FOLIOT: Somebody asked do we have a policy on campus requiring professors to caption videos. We do 385 00:52:31,466 --> 00:52:41,766 not have a policy like that at this time. And so that's the answer. I don't have a policy at this time. 386 00:52:41,766 --> 00:52:51,399 We are trying really hard to sort of sell the benefits of captions in that it makes the videos more searchable, there is a benefit to end users 387 00:52:51,400 --> 00:52:59,533 >> JOHN FOLIOT: and to students as well, especially students that are, you know, English is not their first language. 388 00:52:59,566 --> 00:53:09,299 But we don't -- I don't have a stick right now. So I am walking around with lots of carrots. One of the things that we did -- so I mentioned integration 389 00:53:09,300 --> 00:53:19,966 with Drupal. So we have created a custom input type for those that know Drupal using CCK .. the video can be hosted locally on the same server as 390 00:53:19,966 --> 00:53:30,432 the drupal instance or remotely and so.. one of the things that Stanford has on campus is .. .. there is a group called Stanford Video and they 391 00:53:30,433 --> 00:53:42,433 maintain a streaming video server as opposed to pseudo streaming and you can host up to a gigabyte of, no, excuse me, a 100 gigabytes of 392 00:53:42,433 --> 00:53:47,766 >> JOHN FOLIOT: So for smaller departments or faculty members they can set up an account at Stanford video and 393 00:53:47,766 --> 00:53:55,999 they can move their videos and host them there and they get true video streaming as opposed to pseudo streaming. 394 00:53:56,000 --> 00:54:03,700 >> JOHN FOLIOT: And so unfortunately I can't really show you some examples, but one of the other things that the 395 00:54:03,700 --> 00:54:10,866 caption files that are created can also be used in third party applications like Youtube. 396 00:54:10,866 --> 00:54:16,266 >> JOHN FOLIOT: So if anybody is using Youtube and I know a lot of faculty members and students use Youtube as 397 00:54:16,266 --> 00:54:31,499 kind of the final resting place for their videos, the caption files that are being created can also be uploaded to Youtube and Youtube will then use 398 00:54:31,500 --> 00:54:40,033 them in their player. And so one of the nice things about the Youtube system is that you can search for a particular line in the lecture and you can click 399 00:54:40,033 --> 00:54:43,599 on that line and it will take you to that place in the video. 400 00:54:43,600 --> 00:54:50,100 >> JOHN FOLIOT: So again, when we go around and talk to people and kind of explain the benefits of having a time 401 00:54:50,100 --> 00:54:56,166 synchronized caption file with the video we certainly point to that as being a useful thing. 402 00:54:56,166 --> 00:55:02,466 >> JOHN FOLIOT: So there is a couple of other questions here. One person says how many people are involved 403 00:55:02,466 --> 00:55:11,932 in the selling. I am not really sure what you mean by that question. The selling is done sort of automatically. 404 00:55:11,933 --> 00:55:19,799 We created the relationship with the transcription company by adding them in to the system as a vender. 405 00:55:19,800 --> 00:55:29,966 And then the sort of offer of work is the e-mail that gets sent to the transcription company when somebody uploads the video. So that's the offer 406 00:55:29,966 --> 00:55:39,166 of work. The e-mail that they receive has that job number that I indicated. Has a location where the MP3 file is and says that, you know, this 407 00:55:39,166 --> 00:55:47,399 particular department, this particular client and we give them all the billing information has this file that needs to be transcribed and here is 408 00:55:47,400 --> 00:55:51,733 the time frame and you agree to this type of price. Please do the job. 409 00:55:51,733 --> 00:55:59,233 >> JOHN FOLIOT: Wow there is a lot of questions sort of coming right now, so let me just finish through. I only have 410 00:55:59,233 --> 00:56:06,566 a couple more slides. And then we can go back and I will try and process some of these questions. We also have a Q and A session at the 411 00:56:06,566 --> 00:56:20,299 end of the larger seminar. So a couple of other systems here, so iTunes and iOS platform, the SCC file format is really tricky to work with 412 00:56:20,300 --> 00:56:31,566 but there is some third party tools now that allows for subtitling and we also use it for captioning. And there is a MacIntosh tool that is free to use 413 00:56:31,566 --> 00:56:41,666 that will do that. So we don't get too twitchy about whether it is a subtitle or a closed caption. I mean what's important is that the text is being provided 414 00:56:41,666 --> 00:56:48,166 >> JOHN FOLIOT: to those that can't hear. So, you know, these are some of the ways that we provide for the subtitling 415 00:56:48,166 --> 00:57:00,766 for Mac OS. So the tool is iSubtitle and here is the URL there. It kind of side steps having to deal with the SCC file format which required a lot of 416 00:57:00,766 --> 00:57:11,199 work earlier on. A couple of screen shots here. I am not going to spend a lot of time on that. Capturing for the Android platform it is still tricky, it 417 00:57:11,200 --> 00:57:23,200 is immature but it requires the SRT file format which is what we are generating right now. So it is doable but it is still kind of awkward. 418 00:57:23,200 --> 00:57:30,500 >> JOHN FOLIOT: And so there is some additional resources. Some URLs here. I know Howard will be 419 00:57:30,600 --> 00:57:34,966 providing all of this information at the end of the seminar. 420 00:57:34,966 --> 00:57:43,266 >> JOHN FOLIOT: So if you need to get any of these URLs you can. And with that there is my contact information 421 00:57:43,266 --> 00:57:52,066 there. And I am happy to answer questions if you want to send me an e- mail to follow up directly I can. 422 00:57:52,066 --> 00:58:08,299 I am looking at the time now. Howard, what is the process? Are we taking a break now or is it question and answer for the next 15 minutes? 423 00:58:08,300 --> 00:58:15,400 >> JOHN FOLIOT: Howard says we are going to do some question and answer for the next 15 minutes. So a number 424 00:58:15,400 --> 00:58:18,866 of people have typed some questions in to the chat section. 425 00:58:18,866 --> 00:58:27,666 I am not going to use names here. But somebody asked how can I create the HTML files without knowing where the video will be hosted. 426 00:58:27,666 --> 00:58:41,232 That's a good question. It really depends on how you are going to be using the video file. I am looking for the -- I have to go back some more. 427 00:58:41,266 --> 00:58:52,599 >> JOHN FOLIOT: Sorry, I am looking for the particular slide. So I can show you. There we go here. So what 428 00:58:52,600 --> 00:59:03,966 happens is that in this, this HTML code here the files that are created and we have a tutorial that's associated with this website as well. 429 00:59:03,966 --> 00:59:16,766 So the files that are created always have the same name. So it is always media.flv, media.webm, media.mp4 and then caption.srt, 430 00:59:16,766 --> 00:59:21,066 caption.xml, et cetera, et cetera. And so 431 00:59:21,066 --> 00:59:30,099 what we tell people to do is to download all those files and put them in a directory and name that directory or the file folder the name of the 432 00:59:30,100 --> 00:59:37,500 >> JOHN FOLIOT: project. So, for example, in this particular project I used the name of demonstration. So the tutorial 433 00:59:37,500 --> 00:59:44,800 says to download all those files and create a directory called demonstration and then FTP that on to the server. 434 00:59:44,800 --> 00:59:52,600 >> JOHN FOLIOT: And so this code here because the name of the file or the name of the project was called 435 00:59:52,600 --> 01:00:01,100 demonstration, there is a little bit of dynamic writing here that actually paths you to that particular directory. 436 01:00:01,100 --> 01:00:07,066 >> JOHN FOLIOT: So if you put the demonstration directory at the root of your web location, this copy and 437 01:00:07,066 --> 01:00:10,499 paste code just works because it has been figured out that way. 438 01:00:10,500 --> 01:00:18,233 >> JOHN FOLIOT: So we have a sort of FAQ file that's associated with the website that explains how to do that. In 439 01:00:18,233 --> 01:00:29,799 the case of a blogging platform, you have to upload your video and transcript files to the final location before you can use the copy and paste 440 01:00:29,800 --> 01:00:40,933 code. And you type in the location of where that stuff is. Click on the refresh button and it dynamically rewrites this code here so that it will 441 01:00:40,933 --> 01:00:52,199 point to the final location. So I don't know if that answers that particular question. If it is not clear, you can certainly send me an e- mail and I can 442 01:00:52,200 --> 01:01:04,233 follow up some more. Somebody from West Texas A&M ask what product is used to match the flat text file to the video. So again as I mention we 443 01:01:04,233 --> 01:01:08,199 have two servers that are working in tandem. 444 01:01:08,200 --> 01:01:20,600 One is the Docsoft appliance from Docsoft. And I believe it is docsoft.com and it is the tool that actually does the timestamping. 445 01:01:20,600 --> 01:01:28,233 >> JOHN FOLIOT: And so the -- yeah. That's the tool that does that. The second server is one that we had custom 446 01:01:28,233 --> 01:01:40,033 created. Docsoft did the work for us and it is the one that manages all the business logic. And it also does all of the Codec conversion. 447 01:01:40,033 --> 01:01:58,566 The next question is --. Somebody asked a question something about a -- explain the benefits to faculty, I am not really under -- 448 01:01:58,566 --> 01:02:06,366 >> JOHN FOLIOT: so in terms of explaining the benefits to faculty, yeah, so anybody that's involved with accessibility 449 01:02:06,366 --> 01:02:12,932 issues or disability resource issues, you know, yeah, I buy lots of coffee. 450 01:02:12,933 --> 01:02:20,233 >> JOHN FOLIOT: What can I tell you? A lot of it is kind of one-on-one. So evangelizing and meeting with people, 451 01:02:20,266 --> 01:02:29,332 we have a website that we work on. I also do presentations. I mention I have done tech briefings on captioning. 452 01:02:29,333 --> 01:02:34,399 >> JOHN FOLIOT: The other thing is that both Sean and I have a really good relationship with a lot of the 453 01:02:34,400 --> 01:02:42,633 technology people on campus, a lot of the web masters and so through them we will arrange meetings with various faculty members. Or we'll 454 01:02:42,633 --> 01:02:47,199 sit down, we have eight different sort of presentations that we do. 455 01:02:47,200 --> 01:02:52,800 >> JOHN FOLIOT: But we have some stock speaking points and some things that we show people. So I didn't 456 01:02:52,800 --> 01:03:00,766 mention but I showed you in Youtube that screen shot where you can search inside the video to that particular point. 457 01:03:00,766 --> 01:03:10,099 Another thing that Youtube has that's still experimental which is very cool is the automatic language translation. And so not only are we 458 01:03:10,100 --> 01:03:18,766 getting captions but you get subtitles. And so we do little demonstrations of that to sort of show how it benefits everyone. 459 01:03:18,766 --> 01:03:24,466 >> JOHN FOLIOT: We lean real hard in to the searchability of videos and the fact that caption files are being indexed 460 01:03:24,466 --> 01:03:31,866 by Youtube which is Google. There is an SCO benefits search engine of doing that. Optimization benefit of doing that. 461 01:03:31,866 --> 01:03:42,399 So, yeah, but a lot of it is, you know, based on working, personal relationships and just working with people to make them understand. 462 01:03:42,400 --> 01:03:54,266 >> JOHN FOLIOT: Somebody says are you going to speak to what this is written in? So we worked with Docsoft. 463 01:03:54,266 --> 01:03:58,532 We actually outsource the production of this entire system to them. 464 01:03:58,533 --> 01:04:03,133 >> JOHN FOLIOT: We bought the Docsoft appliance and then we had some custom work done. They are a 465 01:04:03,133 --> 01:04:15,033 Windows shop. The Docsoft shop appliance is running on ISS and all the other logic is written in asp or aspx I believe. It is all written to work on a 466 01:04:15,033 --> 01:04:24,033 Windows platform. I would love, I would love to find somebody who would be interested in taking some of the business logic, I mean the Docsoft 467 01:04:24,033 --> 01:04:31,866 appliance runs on IAS and so there is nothing we can do about it. But the second server that manages all the business logic and what not.. 468 01:04:31,866 --> 01:04:35,032 >> JOHN FOLIOT: Now that we have got this up and running I would love to find somebody that .. who would be 469 01:04:35,033 --> 01:04:43,833 interested in taking on the project and doing it in PHP. I thought I had a student in the computer sciences department who was 470 01:04:43,833 --> 01:04:49,966 interested in the project but as students are, he kind of moved on and I never heard anything more. 471 01:04:49,966 --> 01:04:55,399 >> JOHN FOLIOT: But I'll tell you honestly, that if I could find someone to do it in PHP I would open souce it , I would put 472 01:04:55,400 --> 01:05:01,800 it out as free. You still need the Docsoft appliance but all the other business logic I would give away simply because we need it, right? So I don't know 473 01:05:01,800 --> 01:05:09,266 if that answers that question. But there you go. 474 01:05:09,266 --> 01:05:16,199 Somebody asked the question is this work used for commercial videos that are shown in classrooms. 475 01:05:16,200 --> 01:05:25,500 >> JOHN FOLIOT: I am not aware of anybody doing that yet. There is no reason why you couldn't. My program is 476 01:05:25,500 --> 01:05:35,000 really focused on online accessibility. So, you know, web delivery. So it was really sort of created to address that specific need. 477 01:05:35,000 --> 01:05:44,600 >> JOHN FOLIOT: But I mean the system of producing timestamped files that can be then associated to a video if you 478 01:05:44,600 --> 01:05:50,133 had say an old movie that for whatever reason a faculty member was using a 479 01:05:50,133 --> 01:05:56,466 >> JOHN FOLIOT: film from the '30s or '40s say that didn't have closed captions, it would be a little bit of 480 01:05:56,466 --> 01:06:09,466 additional post production that disability services would have to do but in theory they could 481 01:06:09,466 --> 01:06:17,832 and then it would do the timestamping and create an SCC file that you can take into final cut pro or somthing and export that video and burn it as 482 01:06:17,833 --> 01:06:20,233 a DVD or whatnot. 483 01:06:20,233 --> 01:06:30,933 The capability of doing that exists but that’s not what the system was designed to do. I hope that answers that question. Somebody asked what 484 01:06:30,933 --> 01:06:41,166 pots of money do users tap in to to pay for the transcription costs typically. There you go. That’s the sticking point. And so it generally comes out 485 01:06:41,166 --> 01:06:54,999 of operational budgets, which is a very euphemism for where they can find it. That’s been our big stumbling block. When we launched 486 01:06:55,000 --> 01:07:05,033 the system two and a half, three years ago it was right around the time the local or even the national economy took that massive hit. So right at the 487 01:07:05,033 --> 01:07:15,333 time when all excised budgets dried up completely. So one of the things that we have also done is we have been talking to sort of 488 01:07:15,333 --> 01:07:23,699 administrative people in the various departments about, you know, when they are working on their annual budgets that they have to do every year. 489 01:07:23,700 --> 01:07:33,800 And at my university there are about 60 to 70 percent through that process now and usually starts after Christmas break and the real decision 490 01:07:33,800 --> 01:07:44,133 making happens in July or August. We talked to them about thinking about how much video they are going to be producing. And sort of putting that 491 01:07:44,133 --> 01:07:54,666 -- the cost of transcription in to their working budgets. A lot of the videos that we are seeing go through the system right now tend to be of the 492 01:07:54,666 --> 01:08:06,499 short form video file, format. So very much geared to the Youtube delivery. Videos are usually between three and five minutes. At $1.25 493 01:08:06,500 --> 01:08:21,266 a five minute video is going to cost us $7.50. You can’t buy lunch on campus for that. Right now it comes from wherever assets can be found. But 494 01:08:21,266 --> 01:08:30,699 that’s our big challenge right now is getting the administrative people and the budgeting people to work this in to their average or their basic 495 01:08:30,700 --> 01:08:44,533 yearly budgets. Question about copyright, what if the original videos are commercial and not personal. So again as I said the system was not 496 01:08:44,533 --> 01:08:57,633 really designed for, you know, creating closed captions for commercial product. It was created for videos that are being created on campus by 497 01:08:57,633 --> 01:09:09,299 faculty students and staff to be used via web delivery. The system can be used for commercial videos by the disability resource center. But, you 498 01:09:09,300 --> 01:09:21,133 know, they are governed by the same copyright rules that exist for, you know, using any other system. So, you know, this is just an alternative to 499 01:09:21,133 --> 01:09:31,466 some of the commercial systems that are out there. You know, if you are a disability resource center that has to do a large amount of 500 01:09:31,466 --> 01:09:42,799 commercial videos, I would say at that point this system probably is not the right tool for that particular job. This was very much sort of 501 01:09:42,800 --> 01:09:52,333 envisioned and designed for the low volume producers on campus. Those that are doing, you know, an hour or four a week as opposed to 8 and 502 01:09:52,333 --> 01:10:00,999 10 hours a day. The system could handle something like that but we were going after low hanging fruit and our observation was that a lot of 503 01:10:01,000 --> 01:10:12,466 departments were doing, you know, as I said one or two videos a week, you know, or, you know, a dozen or so a quarter and that was about it. 504 01:10:12,466 --> 01:10:22,399 So, you know, those were the ones that we were going after there. Angella mentions that she is going to be talking about some copyright issues 505 01:10:22,400 --> 01:10:32,600 later in the presentation. So I will defer to her for more information on that. Somebody asks are there transcription editors that stand for 506 01:10:32,600 --> 01:10:43,000 recommend CEUs for editing or creation of transcripts? I don’t really want to say one or the other. What I can tell you is that I am working with 507 01:10:43,000 --> 01:11:01,233 four right now. They are Video Jump, Cogi, Docsoft themselves and another group called Project readOn or Blue Rhino I believe. All four 508 01:11:01,233 --> 01:11:11,399 companies have given us good service and we have been very satisfied with them. The system is set up such that you can choose a transcription 509 01:11:11,400 --> 01:11:22,166 vender and is agnostic like that. The organizations we are dealing with right now two of them are located in the Bay area and all four are 510 01:11:22,166 --> 01:11:31,232 located in California. So there was sort of that geographic proximity that was of interest to us as well. Again this is all being handled on the web. 511 01:11:31,233 --> 01:11:40,766 So they could be located in Anchorage, Alaska, for all I care. But it was useful for us to have local venders simply because ... 512 01:11:40,766 --> 01:11:50,166 you know, we were spit balling this and putting it together from the get- go. And so it was easier to have somebody local that we could talk to on the 513 01:11:50,166 --> 01:11:59,832 phone. Somebody asked about moving it to Python or Jango. Rob, yes, you can e-mail separately on that. I would be more than happy 514 01:11:59,833 --> 01:12:04,733 I'm really agnostic. I don't care how it's done. What I would really like to do 515 01:12:04,733 --> 01:12:12,633 is see the system move in to or at least some of the business logic that we created move in to an open source delivery mechanism. Because I 516 01:12:12,633 --> 01:12:24,299 think, I think there would be value there and I am really big about sharing that. So .. yeah. Somebody posted the question have you tried 517 01:12:24,300 --> 01:12:34,300 automatic sync. Yes, as a matter I am very familiar with automatic sync. I am good friends, I know Kevin Erler very well and I talked to Kevin 518 01:12:34,300 --> 01:12:36,400 early on. 519 01:12:36,400 --> 01:12:47,300 And so going back to some of those commercial productions of, you know, movies that disability resource center might have to do, I would certainly 520 01:12:47,300 --> 01:12:56,833 say automatic sync would be a commercial service provider that would be able to facilitate that kind of captioning. 521 01:12:56,833 --> 01:13:08,466 Again our system was envisioned to be used for web delivery primarily. So I see it is roughly a quarter to the hour and I think we were supposed 522 01:13:08,466 --> 01:13:22,199 to be taking a brief break at this point in time. So unless there is no other questions I am going to hand this back to Soji or Howard, I am not sure 523 01:13:22,200 --> 01:13:31,600 who, but I am going to hand it back to the facilitator of the webinar and I will be around throughout the process and able to answer 524 01:13:31,600 --> 01:13:42,800 questions later on in the seminar. So thank you to everyone. Again my e-mail address, I will very quickly get the slide up here, is 525 01:13:42,800 --> 01:14:26,166 jfoliot@stanford.edu and more than happy to answer questions via e-mail as well. So don’t be shy and thanks once again. 526 01:14:26,166 --> 01:14:34,799 Soji: Thank you, John. And just want to make sure everybody -- I just want to tell you we will be taking a short break for a couple of minutes. So 527 01:14:34,800 --> 01:14:42,000 please stay online for the next presentation. Thank you. (Break) (We will be restarting the seminar in 20 minutes) (We will be 528 01:14:42,000 --> 01:14:53,833 restarting at 12:05 p.m. MST).
Get documents about "