HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE
FROM DATA TO KNOWLEDGE
Data analysis of gene expression data
Jaakko Hollmén
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE FROM DATA TO KNOWLEDGE
Personnel
• Jaakko Hollmén, Heikki Mannila • Graduate students (3): Jouni Seppänen, Salla Ruosaari, Anne Patrikainen • Undergraduate students (2) : Mikko Katajamaa, Antti Rasinen
Jaakko Hollmén
2
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE FROM DATA TO KNOWLEDGE
Gene expression data
• State of protein production • Tissue to RNA to hybridized arrays • High-dimensional, noisy measurement data matrices • 500-10000 simultaneous measurements from an organism
Jaakko Hollmén
3
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE FROM DATA TO KNOWLEDGE
Research scope
• Goal: advances in data analysis, with a specific focus on analyzing gene expression data • High-dimensional, noisy measurement data matrices • Signal decomposition and projection methods (PCA, ICA, NMF, ...), MCMC, and pattern discovery methods
Jaakko Hollmén
4
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE FROM DATA TO KNOWLEDGE
Understanding measurements
Source levels Simulation model
Image analysis
Normalization
• Simulation model for gene expression data • To understand measurements and their analysis
Jaakko Hollmén
Data analysis
Verify results
5
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE FROM DATA TO KNOWLEDGE
Closer look at the real world
Jaakko Hollmén
6
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE FROM DATA TO KNOWLEDGE
Expression data as numbers
0.8214 0.4447 0.6154 0.7919 0.9218 0.7382 0.1763 0.4057 0.9355 0.9169 0.4103 0.8936 0.0579 0.3529 0.8132 0.0099 0.1389 0.2028 0.1987 0.6038 0.2722 0.1988 0.0153 0.7468 0.4451 0.9318 0.4660 0.4186 0.8462 0.5252 0.2026 0.6721 0.8381 0.0196 0.6813 0.3795 0.8318 0.5028 0.7095 0.4289 0.3046 0.1897 0.5298 0.6405 0.2091 0.3798 0.7833 0.6808 0.4611 0.5678 0.7942 0.0592 0.6029 0.0503 0.4154 0.3050 0.8744 0.0150 0.7680 0.9708 0.9901 0.7889 0.4387 0.4983 0.2140 0.6435 0.3200 0.9601 0.7266 0.4120 0.7446 0.2679 0.4399 0.9334 0.6833 0.2126 0.8392 0.6288 0.1338 0.2071 0.6072 0.6299 0.3705 0.5751 0.4586 0.8699 0.9342 0.2644 0.1603 0.8729 0.2379 0.6458 0.9669 0.6649 0.8704 0.0099 0.1370 0.8188 0.4302 0.8903 0.7349 0.6873 0.3461 0.1660 0.1556 0.1911 0.4225 0.8560 0.4902 0.8159 0.4608 0.4574 0.4507 0.4122 0.9016 0.0056 0.2974 0.0492 0.6932 0.6501 0.9830 0.5527 0.4001 0.1988 0.6252 0.7334 0.7505 0.7400 0.4319 0.6343 0.8030 0.0839 0.9455 0.9159 0.6020 0.2536 0.8735 0.5134 0.7327 0.4222 0.9614 0.0721 0.5534 0.2920 0.8580 0.3358 0.6802 0.0534 0.3567 0.4983 0.4344 0.5625 0.6166 0.1133 0.8983 0.7546 0.7911 0.8150 0.6700 0.2009 0.2731 0.6262 0.5369 0.0595 0.0890 0.2713 0.4091 0.4740 0.0147 0.6641 0.7241 0.2816 0.2618 0.7085 0.7839 0.9862 0.4733 0.9028 0.4511 0.8045 0.8289 0.1663 0.3939 0.5208 0.7181 0.5692 0.4608 0.4453 0.0877 0.4435 0.3663 0.3025 0.8518 0.7595 0.9498 0.5579 0.0142 0.5962 0.8162 0.9771 0.2219 0.7037 0.5221 0.9329 0.7134 0.2280 0.4496 0.1722 0.9688 0.3557 0.2440 0.8220 0.2632 0.7536 0.6596 0.2141 0.6021 0.6049 0.6595 0.1834 0.6365 0.1703 0.5396 0.6234 0.6859 0.6773 0.8768 0.0129 0.3104 0.7791 0.3073 0.9267 0.6787 0.0743 0.0707 0.0119 0.2272 0.5163 0.4582 0.7032 0.5825 0.5092 0.0743 0.1932 0.3796 0.2764 0.7709 0.3139 0.6382 0.9866 0.5029 0.9477 0.7258 0.3987 0.3584 0.2853 0.8686 0.6264 0.2412 0.9781 0.6405 0.2298 0.6813 0.6658 0.1347 0.0225 0.2622 0.1165 0.0693 0.8529 0.1803 0.0324 0.7339 0.5365 0.2760 0.3685 0.0129 0.8892 0.8660 0.2542 0.5695 0.1593 0.5944 0.3311 0.6586 0.8636 0.5676 0.9805 0.7918 0.1526 0.8330 0.1919 0.6390 0.6690 0.1302 0.2544 0.8030 0.6678 0.0136 0.5616 0.4546 0.9049 0.2822 0.0650 0.4766 0.9837 0.9223 0.5612 0.6523 0.7727 0.1062 0.0011 0.5418 0.0069 0.4513 0.1957 0.7871 0.6186 0.0155 0.8909 0.7617 0.9070 0.7586 0.3807 0.3311 0.5041 0.5646 0.7672 0.7799 0.4841 0.8022 0.4710 0.2028 0.5796 0.6665 0.6768 0.8995 0.6928 0.4397 0.7010 0.6097 0.2999 0.8560 0.1121 0.2916 0.0974 0.3974 0.3333 0.9442 0.8386 0.2584 0.0429 0.0059 0.5744 0.7439 0.8068 0.6376 0.2513 0.1443 0.6516 0.9461 0.8159 0.9302 0.3099 0.2688 0.5365 0.1633 0.2110 0.2168 0.6518 0.0528 0.2293 0.6674 0.3109 0.3066 0.7207 0.9544 0.1311 0.2233 0.3965 0.1351 0.2411 0.9275 0.3911 0.5113 0.0929 0.0217 0.1595 0.8445 0.8792 0.1870 0.9913 0.7120 0.8714 0.4796 0.4960 0.2875 0.0609 0.2625 0.1863 0.9171 0.1233 0.0134 0.3697 0.6986 0.8893 0.5938 0.1567 0.3167 0.2334 0.0084 0.3969 0.6499 0.0850 0.7688 0.9697 0.7148 0.7820 0.2376 0.1957 0.7430 0.6508 0.9398 0.8328 0.4700 0.6299 0.0582 0.5422 0.4557 0.8631 0.8552 0.4723 0.7869 0.6560 0.0000 0.1312 0.4949 0.0383 0.2274 0.3279 0.8995 0.3137 0.2517 0.4330 0.8424 0.1845 0.5082 0.4522 0.3256 0.3801 0.8865 0.7613 0.8838 0.4574 0.7992 0.1341 0.0653 0.3751 0.3735 0.4840 0.9695 0.3421 0.9636 0.1205 0.0483 0.3802 0.4128 0.4014 0.4210 0.3770 0.9073 0.6702 0.9618 0.1630 0.7486 0.3741 0.4542 0.0386 0.5624 0.3723 0.7928 0.7952 0.3829 0.2528 0.3429 0.9678 0.4798 0.3683 0.7646 0.3771 0.9003 0.1834 0.3683 0.9175 0.5159 0.0903 0.7353 0.0047 0.6031 0.9569 0.3974 0.7316 0.6846 0.9785 0.7067 0.1684 0.8137 0.4662 0.7223 0.9949 0.3625 0.7308 0.6497 0.6813 0.0076 0.6541 0.9452 0.6133 0.7829 0.0032 0.7970 0.6418 0.1785 0.5294 0.2187 0.5481 0.0582 0.5876 0.4161 0.1864 0.0639 0.0748 0.3100 0.9441 0.9807 0.5551 0.9885 0.6916 0.2417 0.8098 0.9345 0.1288 0.6868 0.2972 0.6472 0.4638 0.7333 0.6223 0.9898 0.1524 0.2033 0.8193 0.0584 0.5385 0.1902 0.5995 0.2923 0.0913 0.5068 0.8841 0.6156 0.0464 0.9519 0.1690 0.8267 0.6114 0.8473 0.1141 0.6492 0.1148 0.4734 0.6832 0.1333 0.4641 0.0713 0.5812 0.5660 0.2553 0.2385 0.0160 0.3847 0.7573 0.5752 0.4081 0.1957 0.5122 0.7133 0.8674 0.4974 0.0750 0.7666 0.0454 0.1651 0.7772 0.2083 0.2518 0.3965 0.4807 0.5093 0.6248 0.6255 0.9912 0.3592 0.2760 0.6781 0.5088 0.2769 0.5788 0.8228 0.9415 0.4443 0.4232 0.9962 0.6141 0.9441 0.9121 0.8150 0.6896 0.3087 0.5582 0.6368 0.7691 0.0540 0.1148 0.8460 0.1724 0.0370 0.3126 0.8173 0.2346 0.0264 0.3554 0.7439 0.2987 0.1812 0.4152 0.8673 0.6249 0.0552 0.4041 0.3020 0.1523 0.3092 0.0033 0.4374 0.6764 0.8229 0.7558 0.1626 0.5520 0.5251 0.9194 0.4419 0.0448 0.9646 0.0135 0.5520 0.9343 0.8986 0.6180 0.6999 0.9391 0.4521 0.1767 0.6168 0.5184 0.3645 0.7733 0.8283 0.3184 0.5960 0.7818
Jaakko Hollmén
7
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE FROM DATA TO KNOWLEDGE
Quality control at spot level
• Choose good quality spots for subsequent analysis • image analysis, detection and costsensitive classification
Jaakko Hollmén
8
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE FROM DATA TO KNOWLEDGE
Collaboration with biologists
• Department of Medical genetics, Lab. of Cytomolecular Genetics, U. of Helsinki • Institute of Occupational Health • Turku Centre for Biotechnology • Karolinska Institutet • Journal articles during 2002:
Wikman et al., Identification of differentially expressed genes in pulmonary adenocarcinoma by using a cDNA array. Oncogene 21(37), 2002, Nature Publishing Group Niini et al., Expression of myeloid-specific genes in childhood acute lymphoblastic leukemia – cDNA array study. Leukemia, 16(11), 2002, Nature Publishing Group Mannila et al., Long-range control of gene expression in yeast. Bioinformatics 18(3), 2002.
Jaakko Hollmén
9
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE FROM DATA TO KNOWLEDGE
Current topics and further work
• Correlation between gene expression and gene location in the genome • Combinations with sequence information • Time-series analysis, decompositions • Sparse decompositions of data matrices • MCMC techniques • Pattern discovery methods • Etc.
Jaakko Hollmén
10