UBICC(okadome) 80 by ubiccjournalpdf


									Real-Time Creation of Web Content about Physical Objects and Events using Sensor Network
Takeshi Okadome, Yasue Kishino, Takuya Maekawa, Koji Kamei, Yutaka Yanagisawa, and Yasushi Sakurai NTT Laboratories, Japan houmi@idea.brl.ntt.co.jp

ABSTRACT Many of the objects and events in the world around us have not been converted into verbal information such as words or other types of content. Project s-room aims to construct a system that enables us to extract object properties and status information and to create Web content about physical objects and events by utilizing sensor nodes attached to various objects. This paper introduces the concept, basic technologies, and applications developed by s-room research. Keywords: Content, Web, Sensor Network



In today's information society, a vast amount of information is available including video, the Web, and mobile content. However, many of the objects and events in the world around us have not been converted into verbal information such as words or other types of content. Now, a unique concept is being applied to objects and events that previously could not be presented as data capable of perusal by generating information on objects and events and, in addition, creating content through verbalization. This is the concept of the s-room project. This paper presents the s-room approach, its basic technologies, and applications that have been developed during sroom research. Most services provided by ubiquitous computing are context aware, for example they monitor environments to prevent crimes and disasters or they notify us of an item that we have left behind. These are characterized as direct services to the real-world. In contrast, s-room provides web content about physical objects and events or open APIs which everyone can access. This strategy may lead us to many applications that are quite different from that of context-aware services. 2 APPROACH

As the s-room, we constructed a system for comprehending events, object properties and status information by utilizing sensor nodes attached to various objects. The premise and fundamental technology of the s-room involves monitoring objects and events in sensor networked environments via sensor nodes to obtain an understanding of them. That is, (1) we observe physical phenomena that emerge when an event related to physical objects

occurs, (2) we convert the observed sensor data into words that denote the event, and (3) we create Web content about the event using the event occurrence time. Finding a way to convert data obtained via this fundamental technology efficiently and effectively into words and content in a format that people can easily understand and use is at the core of s-room research. Project s-room assumes a world in which physical objects have general-purpose sensor nodes equipped with a) sensors such as an accelerometer, a thermometer, a hygrometer, or an illuminometer, which are expected to become smaller and cheaper; b) a processor and memory for computation; c) a wireless communication module that transmits sensor data to data servers through a network. To accomplishing this goal, we must satisfy the following requirements: 1. We must specify the relationship between sensor data and linguistic symbols and translate sensor data into linguistic symbols using that relationship. This is a kind of a symbol grounding problem and its inverse, namely sensor fusion problem. 2. We must design and construct real-world knowledge as collective intelligence. 3. Using linguistic symbols translated from sensor data and real-world knowledge, we infer the properties of physical objects such as their type, role, or relationship with other objects. We also infer the status of physical objects at an arbitrary time. 4. We design APIs that deal with collected sensor data as Web content and construct a variety of applications. In addition to the above, we deal with signal processing including noise elimination and with

Ubiquitous Computing and Communication Journal

stream sensor data mining for the real-time verbalization of sensor data. We can record the times of sensor data production although they will have errors caused by such delays such as the input fetch time of a device driver in an operating system. We use verbalized sensor data together with their recorded times to create content about events in the real world. 3 BASIC TECHNOLOGIES

where g is the gravitational vector and λ (t) is the physical object position at time t. The sensor node that the s-room project developed is equipped with a triaxial accelerometer, a triaxial magnetic field sensor that enables us to determine direction, a thermometer, and an illuminometer. It also has an embedded CPU and a wireless communication module (Fig.1). We can directly detect 40 of the 160 event using the sensor node.

Project s-room has already developed knowledge representation and several basic technologies. They are introduced here. 1. Event representation for sensor data grounding, 2. Tag and Think, and 3. NeighborSense. Event representation for sensor data grounding Event representation [1] is a middle language between sensor reading values and NL–phrase descriptions. It consists of (1) event descriptors each of which has its own physical quantity expression that reflects an interpretation of an event and (2) composition rules. Using WordNet, we collect “observable” event concepts in the following way. First, we set twelve words as seeds. Nine of these words relate to “movements” such as “move,” “reach,” and “pass.” The other three words, “increase,” “decrease,” and “remain,” describe changes in physical scalar quantities such as temperature and light intensity. Starting from the seed words, we traverse synonym links in WordNet and choose “observable” synonyms. For each word, WordNet generally contains two or more sets of synonyms; synonyms in a set have identical meanings and those in different sets have different meanings. We assume that synonyms that have the same meaning denote an identical event concept. We tag the concept with connected words that make up a phrase that describes the meaning written in WordNet. Therefore, we deal with them as they denote the concept and label it with a phrase-like word. For example, the event concept “drop to a lower place” is represented by the event descriptor
drop-to-lower-place with which the words “drop,” “drop down,” and “sink” are associated.


Figure 1: A sensor node developed by the s-room project. The node is equipped with a triaxial accelerometer, a triaxial direction sensor, a themometer, and an illuminometer. It also has a CPU and a wireless communication module.

We obtained a set comprising a total of 185 event concepts and 348 words associated with the concepts. Furthermore, we assume that, to each observable event concept, it is possible to assign a mathematical expression including variables and physical constants that denote physical quantities such as position and temperature. For example, we assign

dλ (t ) d 2 λ (t ) ⋅g >0∧ = g(t 1 ≤ t ≤ t 2 ) dt dt 2
to the event descriptor drop-to-lower-place,

Tag and Think The system framework, Tag and Think, enables us to introduce ubiquitous environments easily and simply by attaching sensor nodes to physical objects without any information about the objects [2]. In the framework, we developed a method that uses realworld knowledge constructed from a person’s general knowledge to infer types of physical indoor objects and their states automatically from sensor data produced by the sensor nodes. The method infers the type and state changes of an object by detecting ‘the ways in which the object is used’ and by comparing them with the object models. We divide object types into three categories according to their characteristics. C1: Such object types as toothbrushes and shoes that are moved repetitively. C2: Such object types as doors and chairs whose characteristics are represented by a combination of characteristic sensor outputs. C3: Such object types as tables and rulers that have no characteristic motion or whose characteristic phenomena cannot be detected by generic sensors. We build the models of C2 objects based on a person’s basic knowledge and infer object types in C2 using that knowledge. In Tag and Think, after collecting sensor data from a general-purpose sensor node that has been attached to a physical object for a certain period of time, the method determines the presumed model of the object type from prepared models that matches the sensor data.


Ubiquitous Computing and Communication Journal

Figure 2: The finite state automaton for the class "door" that represents the states and state transitions of instances in the class.

method assigns primitive event groups to activity segments in the data and obtains a sequence of event groups. Fig. 3 shows an example of a sequence of primitive event groups assigned to activity segments. By comparing an event group sequence obtained from sensor data with those generated by object-type state transition diagrams, the method calculates the likelihood of the sequences and outputs an object type whose diagram has the largest likelihood as an inference result. It uses the Viterbi algorithm that enables us to find quickly the most likely state sequence from an observed sequence. The results of evaluation experiments show that the method correctly infers the types of about 90% of the objects and about 80% of the states for those objects whose types are correctly inferred. The method correctly infers such C2 object types as doors and chairs which have characteristic phenomena detected by generic sensors. It, however, fails to infer such C3 object types as a table which have no characteristic phenomena. NeighborSense A general sensor node is equipped with such sensors as an accelerometer or an illuminometer that can detect the motion of the object to which the general sensor is attached or the illumination around it. However, they cannot determine the relationship between the object and other objects. NeighborSense [3] is a system that permits us to detect the physical contact relationship between objects. An implementation of neighborSense that we have developed is a thin plate with an induction coil formed in concentric circles (Fig. 4). Two neighborSense plates detect whether or not they are in contact with by employing extremely short distance communication using electromagnetic induction. We have developed plane- and cube-type neighborSense prototypes (Fig. 5). 3.3

The prepared model for an object type is represented by a state transition diagram. One example is the diagram for the class “door” has ‘Opened’ and ‘Closed’ states where these states transit to each other. Each transition is associated with its own primitive event groups. The transition ‘Open’ in the diagram for the class “door,” for example, is associated with a group of three primitive events that can be detected by a sensor node attached to the door when the transition (‘Open’) occurs. Fig. 2 shows the state transition model for the “door” class. It also shows two sequences of primitive event groups that the model generates. On the other hand, by analyzing sensor data, the

Figure 3: An example of a sequence of event groups constructed from sensor data.

Ubiquitous Computing and Communication Journal

Figure 4: The cube-type neighborSense.

Figure 5: A schematic representation of neighborSense. The black box represents a wireless communication module. The event representation described in section 3.1 cannot represent events denoted by such verbs as “put” or “hide” because the events cannot be directly detected by such sensors as an accelerometer. NeighborSense enables us to deal with many of the events. For example, “put” is expressed as the state change from an off-state to an on-state.

APPLICATIONS Project s-room has developed several applications using the knowledge representation and the basic technologies described in section 3. This section describes three of them. They are 1. EventGo!: a real-world event search system, 2. Object-participation-type Weblog, and 3. EventCapture. Real-world event search system Assuming an environment in which a sensor network always collects data produced by sensors attached to physical objects, the event search system returns information about an event that matches an intuitive interpretation of a set of NL words in a query [4]. To demonstrate this system, we construct a sensor-networked office environment in which physical objects are equipped with sensor nodes. Using a simple Google-like interface, users input queries in a word set that may contain a preposition and/or an adverb such as “drop,” “what hide,” “book move:horizontally,” or “who drop vase 4.1


on:2006.12.31.” The system translates each query into a description by the event representation for sensor data grounding described in section 3.1. Then it searches for a sensor data segment that satisfies the description. The system shows the event occurrence time, place, or related objects as a reply to the query. In addition, it answers for example, by displaying a video image recorded with video cameras. The system can ‘exactly’ search 40 event concepts (event descriptors) that can be directly detected by the sensors on the sensor node. Incidentally, a total of 101 verbs such as “move” and “increase” are associated with the 40 event concepts. To expand the vocabulary that we can deal with, we focus on a word subcategorization approach that defines a lexical usage for a word that tells us how other words or phrases are related to it. We first collect verbs related to twenty four object classes in offices such as staplers, chairs, trash cans, and pens, from textual corpora (1.4 Gbytes of data from the New York Times). We then select verbs that are not associated with any event descriptors. For example, the verbs ``use'' and ``sit'' are extracted verbs related to ``chair.'' We collect a total of 1,732 extracted verbs by subcategorization. Then we try to connect each of the extracted verbs to one of the 101 verbs associated with event descriptors that the present system can directly detect without a location detector. That is, for each extracted verb related to an object class, we use WordNet to search for its highest-frequency synonym, which is one of the 101 associated verbs, and connect it to the synonym. For example, we connect the associated word “turn'' to the extracted verb “open'' related to “door.'' This method enables us to connect 743 verbs to their synonyms (associated verbs) among the 1,732 extracted verbs. Fig. 6 shows the event search processing flow. To evaluate the event search system, we conducted an experiment in a sensor networked office environment and assessed the precision and recall. The experiment consisted of five sessions. For each session, we prepared a list of twenty pairs consisting of a noun and a verb randomly selected from the nouns that denote the eight objects and the 844 associated and extracted verbs. In each session, we first initiated twenty events corresponding to the twenty noun-verb pairs. We then input the twenty noun-verb pairs as a query and searched for the events using the system. Table 1 shows the precision and recall for each session. b1 0.70 0.97 b2 0.75 0.96 b3 0.55 0.91 b4 0.75 0.97 b5 0.65 0.95

precision recall

Table 1: Precision and recall for each session.

Ubiquitous Computing and Communication Journal

The results of the experiments show that the average precision value is about 0.68 and the recall value about 0.95. The high recall scores indicate that the system can successfully find data segments that correspond to observable events. In contrast, the precision reflects the limitation of the resolving power of different word meanings introduced by the vocabulary expanding techniques. Search failures are classified into the following types: (1) the system fails to find data segments that match physical quantity expressions and (2) it does not have an associated or extracted verb for a query. Examples of the former type of failur are (a) searching for ``knock door'' because the vibration that occurred when knocking on the door did not result is a ``move'' that connects with ``knock'' and detecting an event corresponding to ``cup raise'' because the subject had input the query before the outer temperature of the cup had begun to increase. This is because the thermometer on the sensor node attached to the outside of the cup detects the outer temperature of the cup. Therefore, we must delay the sensor response by about 20 or 30 seconds after pouring hot water into the cup. The latter is exemplified by a case where a subject rolled up a book (paperback) and she submitted the query ``roll

book,'' but the system does not have the verb ``roll.'' 4.2 Object-participation-type weblog As an application of Tag and Think, we have developed an object-participation-type weblog, where indoor objects with sensor nodes post weblog entries and comments about what happened to them in a sensor networked environment [5]. We implemented a sensor network system in an office. To enable people to continue working, we included desks, chairs, and bookshelves and installed many kinds of objects such as PCs. We installed four video cameras on the ceiling to record the room. We attached sensor nodes to various items of furniture including doors, chairs, tables, cups, an alarm clock, books, locker doors, a drawer, and resealable pouches of tea leaves. The indoor objects are personified and post weblog entries and comments about what has happened to them. They post entries (1) periodically or (2) when events occur. We can classify these events into three types: (2-a) weblog posting by users, (2-b) weblog posting by other objects, and (2-c) the occurrence of specific events experienced by the objects themselves. We prepared a total of 34 kinds of services (postings). Some representative examples are:

Figure 6: The event search processing flow.

Ubiquitous Computing and Communication Journal

Door - If a door is shut loudly many times at midnight, the door issues cautions. (2-c)

Resealable pouch of tea leaves - When a user posts an entry about a lost pouch, the pouch infers its location and posts it. (2-b) Fig. 7 shows a weblog entry that denotes the dropping of a cup. It includes an animation GIF which was recorded when the cup was dropped. We detect the fact that the cup was dropped by finding an acceleration data segment where the values of its three axes are approximately zero. Fig. 7 also shows a weblog entry posted by a user that denotes a lost tea pouch and a comment posted by the pouch. The comment includes the estimated location of the pouch and an animation GIF that was recorded when the pouch was last moved before the entry was posted. We infer objects in lockers and drawers by using luminance co-occurrence. When a locker is opened, luminance around the objects in the locker increases. EventCapture EventCapture [6] is a real-time application that searches for, monitors, and visualizes events. It also has a simple event prediction function. By using neighborSense, eventCapture can detect such events as those represented by the verb “put,” which eventGo! is unable to detect accurately. It monitors events in real-time and displays 3D movies of these events. EventCapture enables us to predict future events probabilistically just after an event has occurred. This is accomplished by using saved sensor data. Fig. 8 shows a snapshot of a 3D movie produced by eventCapture. We will be able to construct a broadcast movie site, similar to YouTube, where physical objects equipped with sensor nodes post the movies of events related to the objects by using the 3D movie generation function of eventCapture. RELATED WORK Many systems and applications that provide context-aware services detect situations and contexts from sensor data and change their behavior based on a model of the environment constructed using sensor data. That is, they obtain information about the realworld and directly provide services in the real world (for example, [7],[8], [9]). 5 4.3

Figure 7: Object-participation-type weblog. - If the times when a door are used is different from usual, the door cautions the user. (2-c) Chair - When a user posts an entry about his/her tiredness, a chair posts the total amount of times during which the user sat on it as a comment to the entry. (2-a) - A chair posts the total amount of times during nervous shaking of the user's leg every week. (1) Cup - If a cup is dropped, the cup asks with concern whether or not it is broken (2-a) Locker and drawer - A locker and a drawer infer what objects are in them and post object lists every week. (1).

Ubiquitous Computing and Communication Journal

Figure 8: A snapshot of a 3D movie generated by eventCaputure. On the other hand, some recent studies have focused on user-understandable representations of information on the real-world. For example, [10] proposed a system that allows people to search for and locate physical objects as and when they need them. On the assumption that all physical objects can be tagged with small devices that possess certain limited processing and communication capabilities, the system provides location information in NL sentences that offer references to identifiable landmarks rather than precise coordinates such as “my notebook is on my desk'' or “the keys are on the dining room table.'' [11] focused on WiFi location detection technology that estimates a device’s location based on received WiFi beacon signals. It constructs a continuous location logging system, or a “selflogging” system, using this technology. The system records WiFi information every few minutes and later converts this information to location logs. [11] also described such applications as displaying a user’s moving trajectories on a map and automatically generating a skeleton for weblog sentences based on the self-logging system. CONCLUDING REMARKS This paper described the concept, basic technologies, and applications developed for project s-room. The aim of project s-room is to construct a system that enables us to extract object properties and 6 status information and to create web content about physical objects and events utilizing sensor nodes attached to various objects. Our own future research directions include extending and refining our basic technologies and developing new web application. We will also define open API specifications for developers as regards the use of collected sensor data. We are convinced that ubiquitous computing techniques should be used as pre-processors for creating web content about physical objects and events in the real world, which will change both our life-styles and our businesses. ACKNOWLEDGMENTS We would like to thank Dr. Naonori Ueda, the executive manager of the Innovative Communication Laboratory, NTT Communication Science Laboratories for encouraging our work. We are also grateful for valuable discussions with the members of the Ambient Semantics Research group at NTT Communication Science Laboratories and with the members of the s-room team. REFERENCES [1] Okadome, T. (2006). Event representation for sensor data grounding. International Journal of Computer Science and Network Security, 6, 10, 187193. [2] Maekawa, T., T. Hattori, Y. Yanagisawa, T.

Ubiquitous Computing and Communication Journal

Okadome (2006). A representation of objects for context awareness in ubiquitous environments. DBSJ Letters, 5, 2, 45-48. (In Japanese) [3] Yanagisawa, Y., T. Maekawa, T. Okadome (2007). Event detection based on relative positions between objects. Proceedings of IPSJ Symposium on Multimedia, Distributed, Cooperative and Mobile (DICOMO2007). (In Press) [4] Okadome, T., T. Hattori, K. Hiramatsu, and Y. Yanagisawa (2006). A real-world event search system in sensor network environments. Proceedings of the 7th International conference on Mobile Data Managament (MDM2006). [5] Maekawa, T., Y. Yanagisawa, and T. Okadome (2007). Towards environment generated media: object-participation-type weblog in home sensor network. Proceedings of the 16th International World Wide Web Conference (WWW2007). [6] Ansai, T., Y. Yanagisawa, T. Maekawa, T. Okadome (2007). Event visualization using small sensor nodes in real world. Proceedings of IPSJ Symposium on Multimedia, Distributed, Cooperative and Mobile (DICOMO2007). (In Press)
[7] Addlesee, M. R. Curven, S. Hodges, J. Newman, P. Stegglles, A. Ward, and A. Hopper (2001). Implementing a sentient computing system, IEEE Computer Magazine, 34, 8, 50-56. [8] Gellersen, H. W., A. Schmidt and M. Beigl (2002). Multi-sensor context-awareness in mobile devices and smart artefacts, Mobile Networks and Applications, 7, 5, 341-351. [9] Philipose, M., K. P. Fishkin, M. Perkowitz, D. J. Patterson, D. Fox, H. Kautz, and D. Hahnel (2004). Inferring activities from interactions with objects. Pervasive Computing, 3, 4, 50-57. [10] Yap, K-K., V. Srinivasan, and M. Motani (2005). Max: human-centric search of the physical world. Proceedings of the 3rd ACM Conference on Embedded Networked Sensor Systems (SenSys’05), 166-170. [11] Rekimoto, J. and T. Miyaki (2007). WHEN-becomesWHERE: continuous location archiving with WiFi selflogging. Proceedings of IPSJ Symposium on Interaction (Interaction2007), 223-230. (In Japanese)

Ubiquitous Computing and Communication Journal

To top