帳號:guest(44.222.116.199)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目勘誤回報
作者:劉以勒
作者(英文):Yi-Le Liu
論文名稱:使用度量學習在基於三元組的對稱式雙向條件變分自動編碼器來訓練皮革辨識系統
論文名稱(英文):Leather Retrieval System with TSDP-CVAE : Triplet-based Symmetric Dual-Path Conditional Variational Autoencoder using Metric Learning
指導教授:江政欽
指導教授(英文):Cheng-Chin Chiang
口試委員:謝君偉
林信鋒
口試委員(英文):Jun-Wei Hsieh
Shin-Feng Lin
學位類別:碩士
校院名稱:國立東華大學
系所名稱:資訊工程學系
學號:610521211
出版年(民國):108
畢業學年度:108
語文別:中文
論文頁數:63
關鍵詞:條件式變分自編碼器三元損失皮革檢索圖片檢索邊緣檢測傳統紋理特徵度量學習聚類分析
關鍵詞(英文):Conditional Variational AutoEncoder (CVAE)Triplet LossLeather Texture RetrievalImage RetrievalEdge DetectionTraditional Texture FeaturesMetric LearningCluster Analysis
相關次數:
  • 推薦推薦:0
  • 點閱點閱:20
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:8
  • 收藏收藏:0
皮革紋路細膩多樣,用人眼辨識需要的人力與時間成本很大,因此以機器做自動皮紋比對檢索有其必要。皮革有兩種紋理,一為皮紋,是主要用來識別皮革的紋理,其紋理有深淺凹痕或細裂紋路。另一為色紋,是配合客戶需要做上色處理而產生不同顏色斑紋,不會有凹痕但對皮紋識別會造成干擾。本研究主要在於設計開發檢索皮紋深度學習技術,希望此技術能有效抵抗色紋干擾,並能達到新增皮革系列樣本時,不必費時重新訓練深度神經網路。
為抵抗色紋干擾,我們設計了一種皮紋與色紋混合合成方法來合成更多訓練樣本,以幫助深度神經網路有效學習抵抗色紋干擾。在神經網路模型上,我們提出一個改良型條件式變分自編碼器(Conditional Variational Autoencoder, CVAE)融入常用來識別紋理圖像的灰度共生矩阵(Gray-level Co-occurrence Matrix,GLCM)特徵並結合三元損失函數(Triplet Loss Function)來讓CVAE學習出一個嵌入特徵空間轉換,讓同系列皮革紋路在該嵌入空間中有理想的群聚效果,而讓不同系列皮革紋路在該嵌入空間中則會互相遠離,如此便可在後續辨識階段使用高斯混合模型(Gaussian Mixture Model,GMM)來檢索皮革。我們將此用來抽取嵌入特徵的神經網路稱為三元損失函數條件式變分自編碼器(Triplet Loss Conditional Variational Autoencoder, TCVAE)。
為了讓我們所提出的TCVAE可克服因為層數增加而造成學習不佳的問題,我們也在此神經網路的架構中加入了對稱式跳躍連結(Symmetric Skip Connections)以及密集式連結(Densely Connections),我們稱此架構的神經網路為三元損失對稱雙路條件式變分自編碼器(Triplet-based Symmetric Dual-Path CVAE,TSDP-CVAE)。此外我們更整合了改良版三元損失(Improved Triplet Loss)技術來讓神經網路更能學習提取各種皮紋的共同特徵。經過實驗測試,我們所提出的TSDP-CVAE在檢索皮革的應用上取得了相當理想的準確率。
Leather products have exquisite and diverse textures. Identifying these textures by human eyes requires high costs in both manpower and time. Therefore, an automatic way for retrieval leather textures by machines is desirable. A leather commonly has two types of textures. One is the leather texture, which is mainly used to identify the texture of the leather. Leather textures usually have deep dents or fine cracks. The other is the color spots, which appears after a customized dyeing process. Color spots have no dents but will cause interference to the leather texture identification. This research is mainly to design and develop the deep learning technique of leather texture retrieval. Our technique is expected to be able to effectively resist the interference of color spots and require no time-consuming retraining of the deep neural network when new leather textures are included.
To resist the interference of color spots, we design a synthesis method to synthesize more leather patterns with hybrid textures and color spots for training so that the deep neural networks can learn to better handle the noisy color spots. As to the neural network model, we propose a modified Conditional Variational Autoencoder (CVAE) to incorporate the Gray-level Co-occurrence Matrix (GLCM) features, which is commonly used to identify texture images. Combined with a Triplet Loss Function, the CVAE is able to learn an embedded feature space, in which the feature representations of intra-class leather samples are well clustered while those of inter-class leather samples are pulled away. This embedded feature space is very suitable for using the Gaussian Mixture Model (GMM) in our later retrieval of leather textures. The proposed network that extracts embedded features is called the Triplet Loss Conditional Variational Autoencoder (TCVAE).
To overcome the problem of poor learning due to the increased number of layers in autoencoders, we also introduced Symmetric Skip Connections and Densely Connections into our network architecture. We call this network the triplet loss-based Symmetric Dual-Path CVAE (TSDP-CVAE). Additionally, we also integrate the Improved Triplet Loss (ITL) in the network so that the common features of leather textures can be explored. The experimental results verify that TSDP-CVAE achieves a satisfactory accuracy in the leather texture retrieval applications.
致謝 I
摘要 III
Abstract V
目錄 VII
圖目錄 IX
表目錄 XI
第1章 緒論 1
1.1 研究背景及動機 1
1.2 研究目的與問題 2
1.2 遭遇之問題與解決方法 3
1.3 章節架構 8
第2章 文獻回顧 9
2.1 相關技術與背景 9
2.1.1 基於圖像的辨識研究現狀 9
2.1.2 GLCM 10
2.1.3 GMM 11
2.2 本系統之處理流程 12
第3章 影像採集及前處理 15
3.1 圖像採集方式 15
3.2 有效區域切割 15
3.3 小塊取樣 16
3.4 小塊挑選 17
3.5 前處理演算法 19
第4章 擴充樣本演算法 29
4.1 旋轉影像樣本擴增 29
4.2 皮革混合生成演算法 29
第5章 TSDP-CVAE類神經網路架構 33
5.1 三元損失函數(Triplet Loss Function) 33
5.2 整合三元損失函數與GMM的變分自編碼器 35
5.3 以Symmetric Learning改良自編碼器架構 39
5.4 以GLCM特徵融入TVAE學習 40
5.5 改良版三元損失(Improved Triplet Loss, ITL) 44
5.6 以DenseNet改良自編碼器架構 45
第6章 實驗資料與方法 49
6.1 實驗方法 49
6.2 實驗結果 50
6.2.1 前處理效果比較 50
6.2.2 GLCM整合GMM與AlexNet效果比較 52
6.2.3 TVAE、TCVAE與TSDP-CVAE效果比較 52
第7章 結論與未來展望 59
7.1 結論 59
7.2 未來工作 59
參考文獻 61
[1]R. M. Haralick, K. Shanmugan, and I. H. Dinstein, “Textural features for image classification,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, pp. 610–621, May 1973.
[2]Chris Stauffer W.E.L Grimson."Adaptive background mixture models for real-time tracking".The Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge,M A 02139
[3]Chopra, S., Hadsell, R., & LeCun, Y. (2005, June). Learning a similarity metric discriminatively, with application to face verification. In CVPR (pp. 539-546).
[4]Chen, W., Chen, X., Zhang, J., & Huang, K. (2017). Beyond triplet loss: a deep quadruplet network for person re-identification. In CVPR (pp. 403-412).
[5]Sohn, K., Lee, H., & Yan, X. (2015). Learning structured output representation using deep conditional generative models. In NIPS (pp. 3483-3491).
[6]Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR (pp. 4700-4708).
[7]Sethi, A., Singh, M., Singh, R., & Vatsa, M. (2019). Residual codean autoencoder for facial attribute analysis. Pattern Recognition Letters, 119, 157-165.
[8]刘昶;纹理分析技术在鞣制皮革分类系统中的应用研究[D];四川师范大学;2007年
[9]罗丽萍. (2016). 基于纹理特征的真皮分类方法研究. (Doctoral dissertation, 东华大学).
[10]Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision, vol. 1, no. 3, p. 6 (2015)
[11]Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
[12]Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In NIPS, pp. 1988–1996 (2014)
[13]Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 9999:3371–3408, 2010.
[14]Dai, B. and Wipf, D. Diagnosing and Enhancing VAE Models, In ICLR,2019.
[15]Yelamarthi, S. K., Reddy, S. K., Mishra, A., & Mittal, A. (2018, September). A zero-shot framework for sketch based image retrieval. In ECCV (pp. 316-333). Springer, Cham.
[16]Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016, October). A discriminative feature learning approach for deep face recognition. In ECCV (pp. 499-515). Springer, Cham.
[17]Cheng, D., Gong, Y., Zhou, S., Wang, J., & Zheng, N. (2016). Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In CVPR (pp. 1335-1344).
[18]Wang, F., Zuo, W., Lin, L., Zhang, D., & Zhang, L. (2016). Joint learning of single-image and cross-image representations for person re-identification. In CVPR (pp. 1288-1296).
[19]He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770-778).
[20]Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., & Feng, J. (2017). Dual path networks. In NIPS 2017 (pp. 4467-4475).
[21]J. MacQueen, Some Methods for classification and Analysis of Multivariate Observations, 1967.
[22]Karami, E., Prasad, S. and Shehata, M. Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images.
[23]Panchal, P. M., Panchal, S. R., & Shah, S. K. (2013). A comparison of SIFT and SURF. International Journal of Innovative Research in Computer and Communication Engineering, 1(2), 323-327.
[24]Canny, J. "A Computational Approach To Edge Detection". In IEEE . Pattern Analysis and Machine Intelligence. 1986, (8): 679–714.
[25]Marion An Introduction to Image Processing, Chapman and Hall, 1991, p 274.
[26]An Introduction to Morphological Image Processing by Edward R. Dougherty, ISBN 0-8194-0845-X (1992)
[27]Ishfaq, H., Hoogi, A. and Rubin, D. TVAE: Triplet-Based Variational Autoencoder using Metric Learning , In ICLR,2018.
(https://openreview.net/forum?id=Sym_tDJwM)
[28]Yang, Xitong. Understanding the Variational Lower Bound, Institute for Advanced Computer Studies. University of Maryland.
[29]A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,by J. Bilmes includes a simplified derivation of the EM equations for Gaussian Mixtures and Gaussian Mixture Hidden Markov Models.
[30]He, K., Zhang, X., Ren, S. and Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, In ICCV,2015.
(https://www.cvfoundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf)
[31]Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks [J]. In NIPS, 2012, 25(2):2012
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *