مروری بر داده کاوی از منابع اطلاعاتی متعدد / Review on mining data from multiple data sources

مروری بر داده کاوی از منابع اطلاعاتی متعدد Review on mining data from multiple data sources

  • نوع فایل : کتاب
  • زبان : انگلیسی
  • ناشر : Elsevier
  • چاپ و سال / کشور: 2018

توضیحات

رشته های مرتبط مهندسی صنایع
گرایش های مرتبط داده کاوی
مجله اسناد تشخیص الگو – Pattern Recognition Letters
دانشگاه Institute of Natural and Mathematical Sciences – Massey University – New Zealand

منتشر شده در نشریه الزویر
کلمات کلیدی داده کاوی منابع چندگانه، تجزیه و تحلیل الگو، طبقه بندی داده ها، خوشه بندی داده ها، تلفیق داده

Description

1. Introduction The advancement of information communication technology has generated a large amount of data from different sources, which may be stored in different geological locations. Each database may have its own structure to store data. Mining multiple data sources [1–3] distributed at different geological locations to discover useful patterns are critical important for decision making. In particular, the Internet can be seen as a large, distributed data repository consisting of a variety of data sources and formats, which can provide abundant information and knowledge. Data from different sources may seem irrelevant to each other. Once information generated from different sources is integrated, new and useful knowledge may emerge. Here is an excellent example of how an organization to utilize mining data from different data sources to obtain profound information, which cannot obtain from an individual source. The Australian Taxation Office (ATO) mines data from different data sources such as social media posts, private school records and immigration data to detect tax cheats. Mining data from different data sources become a sophisticated tool to crackdown tax cheats that yielded nearly $10 billion in 2016 [4]. For example, in a normal Australian family, the husband has a business and reported $80,000 of taxable income per year, putting him just inside the second-lowest tax bracket, and his wife reported earning $60,000 per year. But the data collected from different data sources revealed that the family had three children at private schools at an estimated cost of $75,000 per year, while immigration records and social media posts showed that the family had recently taken five business-class flights and a holiday in a Canadian ski resort, Whistler. It means their declared incomes did not match their lifestyle. This prompted ATO to contact them to confirm if they have unpaid taxes. From the above example, we can see that developing an effective data mining technique for mining from multiple data sources to discover useful information is crucially important for decision making. However, how to efficiently mine quality information from multiple data sources is a challenging task for current research [5–9], especially in the current big data era, because in real world applications, data stored in multiple places often have conflictions [10]. The conflictions include: (i) data name conflictions: (a) the same object has different names in different data sources, or (b) two different objects from different data sources may have the same name; (ii) data format conflictions: the same object in different data sources has different data types, domains, scales, and preci sions; (iii) data value confliction: the same object in different data sources records different values; (iv) data sources confliction: different data sources have different database structures
اگر شما نسبت به این اثر یا عنوان محق هستید، لطفا از طریق "بخش تماس با ما" با ما تماس بگیرید و برای اطلاعات بیشتر، صفحه قوانین و مقررات را مطالعه نمایید.

دیدگاه کاربران


لطفا در این قسمت فقط نظر شخصی در مورد این عنوان را وارد نمایید و در صورتیکه مشکلی با دانلود یا استفاده از این فایل دارید در صفحه کاربری تیکت ثبت کنید.

بارگزاری