Power Query / Formatting Date Time

Posted on June 20, 2024 by s4l8384gmailcom

Advertisements

Power Query is a powerful tool for manipulating and cleaning data, and it offers various features for managing dates. Here are some essential steps and techniques for handling date formats:

1. Data Type Conversion:

When you import data into Power Query, ensure that date columns have the correct data type. Sometimes Power Query’s automatic detection gets it wrong, so verify that all columns are correctly recognized as dates.
To change a specific column into a date format, you have several options:
- Click the data type icon in the column header and select “Date.”
- Select the column, then click Transform > Data Type > Date from the Ribbon.
- Right-click on the column header and choose Change Type > Date.
- You can also modify the applied data type directly in the M code to ensure proper recognition.

2. Extracting Additional Information:

From a date column, you can extract various details using Power Query functions. These include:
- Year
- Days in the month
- Week of the year
- Day name
- Day of the year

Advertisements

3. Custom Formatting:

To format dates in a specific way, you can use the Date.ToText function. It accepts a date value and optional parameters for formatting and culture settings.
Combine Date.ToText with custom format strings to achieve precise and varied date formats in a single line of code.

4. Common Formats:

If you’re dealing with common formats like DD/MM/YYYY, MM/DD/YYYY, or YYYY-MM-DD, you can easily change the format:
- Import your data into Power Query.
- Select the date column to be formatted.
- Right-click and choose Change Type > Date.
- Select the desired predefined format (e.g., DD/MM/YYYY) and click OK.

Remember, mastering date formatting in Power Query can significantly simplify your data processing tasks. Feel free to explore more advanced scenarios and create custom formats tailored to your needs!

Advertisements

تنسيق التاريخ والوقت Power Query

Advertisements

أداة قوية لمعالجة البيانات وتنظيفها Power Query يعد

كما يوفر ميزات متنوعة لإدارة التواريخ فيما يلي بعض الخطوات والتقنيات الأساسية للتعامل مع تنسيقات التاريخ

تحويل نوع البيانات

Power Query عند استيراد البيانات إلى

تأكد من أن أعمدة التاريخ تحتوي على نوع البيانات الصحيح

Power Query ففي بعض الأحيان يحدث خطأ في الاكتشاف التلقائي لـ

لذا تحقق من أنه تم التعرف على جميع الأعمدة بشكل صحيح كتواريخ

لتغيير عمود معين إلى تنسيق تاريخ لديك عدة خيارات

Date في رأس العمود وحدد Data Type انقر على أيقونة

Date < Data Type< Transform حدد العمود ثم انقر فوق

Date < Change Type انقر بزر الماوس الأيمن على رأس العمود واختر

M يمكنك أيضًا تعديل نوع البيانات المطبق مباشرة في كود

لضمان التعرف الصحيح

استخراج معلومات إضافية

من عمود التاريخ يمكنك استخراج تفاصيل متنوعة

:وتشمل Power Query باستخدام وظائف

Year

Days in the month

Week of the year

Day name

Day of the year

Advertisements

:التنسيق المخصص

لتنسيق التواريخ بطريقة معينة

Date.ToText يمكنك استخدام الدالة

يقبل قيمة التاريخ والمعلمات الاختيارية لإعدادات التنسيق والثقافة

Date.ToText قم بدمج

مع سلاسل التنسيق المخصصة لتحقيق تنسيقات تاريخ دقيقة ومتنوعة في سطر واحد من التعليمات البرمجية

التنسيقات الشائعة

إذا كنت تتعامل مع تنسيقات شائعة مثل

DD/MM/YYYY

MM/DD/YYYY

YYYY-MM-DD

فيمكنك تغيير التنسيق بسهولة

Power Query قم باستيراد بياناتك إلى

حدد عمود التاريخ المراد تنسيقه

Date< Change Type انقر بزر الماوس الأيمن واختر

حدد التنسيق المحدد مسبقًا المطلوب

DD/MM/YYYY : على سبيل المثال

OK وانقر فوق

Power Query تذكر أن إتقان تنسيق التاريخ في

يمكن أن يبسط مهام معالجة البيانات بشكل كبير لا تتردد في استكشاف تقنيات أكثر تقدماً وإنشاء تنسيقات مخصصة تناسب احتياجاتك

Advertisements

A Comprehensive Guide “How to Transition from Physics to Data Science”

Posted on June 13, 2024 by s4l8384gmailcom

Advertisements

Introduction

The realms of physics and data science may seem distinct at first glance, but they share a common foundation in analytical thinking, problem-solving, and quantitative analysis. Physicists are trained to decipher complex systems, model phenomena, and handle large datasets—all skills that are incredibly valuable in data science. As the demand for data scientists continues to grow across various industries, many physicists find themselves well-positioned to make a career transition into this exciting field. This guide outlines the steps and considerations for physicists aiming to transition into data science.

Understanding the Overlap

Physics and data science intersect in several key areas:

Mathematical Modeling: Both fields require strong skills in mathematics and the ability to build models that represent real-world phenomena.
Statistical Analysis: Understanding statistical methods is crucial for analyzing experimental data in physics and for extracting insights from datasets in data science.
Computational Skills: Proficiency in programming and computational tools is essential in both domains for solving complex problems.

Key Skills to Develop

While physicists already possess a strong analytical background, transitioning to data science requires acquiring specific skills and knowledge:

Programming Languages: Proficiency in programming languages such as Python and R is essential. These languages are widely used for data analysis, machine learning, and data visualization.
Data Manipulation and Cleaning: Learning how to preprocess and clean data using libraries like pandas (Python) or dplyr (R) is fundamental.
Machine Learning: Familiarity with machine learning algorithms and frameworks (e.g., scikit-learn, TensorFlow, PyTorch) is crucial for developing predictive models.
Data Visualization: Tools like Matplotlib, Seaborn, and Tableau help in visualizing data and presenting findings clearly.
Database Management: Understanding SQL and NoSQL databases is important for efficiently storing and retrieving large datasets.

Advertisements

Educational Pathways

Several educational resources can help bridge the gap between physics and data science:

Online Courses and Certifications: Platforms like Coursera, edX, and Udacity offer specialized courses and certifications in data science, machine learning, and artificial intelligence.
Bootcamps: Intensive data science bootcamps provide hands-on experience and often include career support and networking opportunities.
Graduate Programs: Enrolling in a master’s program in data science or a related field can provide a structured learning environment and credential.

Gaining Practical Experience

Hands-on experience is critical for a successful transition:

Projects: Undertake personal or open-source projects that involve data analysis, machine learning, and data visualization to build a portfolio.
Internships: Seek internships or part-time roles in data science to gain industry experience and apply theoretical knowledge to real-world problems.
Competitions: Participate in data science competitions on platforms like Kaggle to solve challenging problems and improve your skills.

Networking and Community Engagement

Building a professional network and engaging with the data science community can provide valuable insights and opportunities:

Meetups and Conferences: Attend data science meetups, workshops, and conferences to learn from experts and network with professionals in the field.
Online Communities: Join online forums and communities such as Reddit’s r/datascience, Stack Overflow, and LinkedIn groups to seek advice, share knowledge, and stay updated with industry trends.
Mentorship: Find a mentor in the data science field who can provide guidance, feedback, and support throughout your transition.

Tailoring Your Resume and Job Search

Effectively marketing your skills and experience is crucial when applying for data science roles:

Highlight Transferable Skills: Emphasize your analytical skills, problem-solving abilities, and experience with data in your resume and cover letter.
Showcase Projects and Experience: Include relevant projects, internships, and any practical experience that demonstrates your proficiency in data science tools and techniques.
Tailor Applications: Customize your resume and cover letter for each job application to align with the specific requirements and keywords of the job posting.

Conclusion

Transitioning from physics to data science is a feasible and rewarding career move that leverages your existing analytical skills and quantitative background. By developing new competencies in programming, machine learning, and data analysis, gaining practical experience, and actively engaging with the data science community, you can successfully navigate this transition and thrive in the burgeoning field of data science. The journey requires dedication, continuous learning, and a proactive approach to building your skillset and professional network, but the potential for growth and impact in this dynamic field is substantial.

Advertisements

دليل شامل حول كيفية الانتقال من الفيزياء إلى علم البيانات

Advertisements

مقدمة

قد يبدو مجالا الفيزياء وعلوم البيانات مختلفين للوهلة الأولى لكنهما يشتركان في أساس مشترك في التفكير التحليلي وحل المشكلات والتحليل الكمي، يتم تدريب الفيزيائيين على فك رموز الأنظمة المعقدة ونمذجة الظواهر والتعامل مع مجموعات البيانات الكبيرة – وكلها مهارات ذات قيمة كبيرة في علم البيانات، مع استمرار نمو الطلب على علماء البيانات عبر مختلف الصناعات يجد العديد من الفيزيائيين أنفسهم في وضع جيد يسمح لهم بالانتقال الوظيفي إلى هذا المجال المثير

يوضح هذا الدليل الخطوات والاعتبارات الخاصة بالفيزيائيين الذين يهدفون إلى الانتقال إلى علم البيانات

فهم التداخل

تتقاطع الفيزياء وعلوم البيانات في عدة مجالات رئيسية

النمذجة الرياضية: يتطلب كلا المجالين مهارات قوية في الرياضيات والقدرة على بناء نماذج تمثل ظواهر العالم الحقيقي

التحليل الإحصائي: يعد فهم الأساليب الإحصائية أمرًا بالغ الأهمية لتحليل البيانات التجريبية في الفيزياء واستخلاص الأفكار من مجموعات البيانات في علم البيانات

المهارات الحسابية: الكفاءة في البرمجة والأدوات الحسابية أمر ضروري في كلا المجالين لحل المشاكل المعقدة

المهارات الأساسية للتطوير

في حين أن الفيزيائيين يمتلكون بالفعل خلفية تحليلية قوية فإن الانتقال إلى علم البيانات يتطلب اكتساب مهارات ومعرفة محددة

1. لغات البرمجة

أمراً ضرورياً Rيعد إتقان لغات البرمجة مثل بايثون و

تُستخدم هذه اللغات على نطاق واسع لتحليل البيانات والتعلم الآلي وتصور البيانات

2. معالجة البيانات وتنظيفها

يعد تعلم كيفية المعالجة المسبقة للبيانات وتنظيفها باستخدام مكتبات مثل

أمراً أساسياً dplyr (R) أو pandas (Python)

3. التعلم الآلي

يعد الإلمام بخوارزميات وأطر التعلم الآلي

PyTorchو TensorFlow و scikit-learn على سبيل المثال

أمراً بالغ الأهمية لتطوير النماذج التنبؤية

4. تصور البيانات

Tableau و Seaborn و Matplotlib تساعد أدوات مثل

في تصور البيانات وعرض النتائج بوضوح

5. إدارة قواعد البيانات

NoSQL و SQL يعد فهم قواعد بيانات

أمراً مهماً لتخزين مجموعات البيانات الكبيرة واسترجاعها بكفاءة

Advertisements

المسارات التعليمية

يمكن أن تساعد العديد من الموارد التعليمية في سد الفجوة بين الفيزياء وعلوم البيانات

• الدورات والشهادات عبر الإنترنت

Udacity و edX و Coursera تقدم منصات مثل

دورات وشهادات متخصصة في علوم البيانات والتعلم الآلي والذكاء الاصطناعي

• المعسكرات التدريبية

توفر المعسكرات التدريبية المكثفة لعلوم البيانات خبرة عملية وغالباً ما تتضمن دعماً وظيفياً وفرصاً للتواصل

• برامج الدراسات العليا

يمكن أن يوفر التسجيل في برنامج الماجستير في علوم البيانات أو في مجال ذي صلة بيئة تعليمية منظمة وبيانات اعتماد

اكتساب الخبرة العملية

تعتبر الخبرة العملية أمرًا بالغ الأهمية لتحقيق انتقال ناجح

المشاريع: تنفيذ مشاريع شخصية أو مفتوحة المصدر تتضمن تحليل البيانات والتعلم الآلي وتصور البيانات لبناء محفظة

التدريب الداخلي: ابحث عن التدريب الداخلي أو الأدوار بدوام جزئي في علوم البيانات لاكتساب خبرة الصناعة وتطبيق المعرفة النظرية على مشاكل العالم الحقيقي

:المسابقات

Kaggle شارك في مسابقات علوم البيانات على منصات مثل

لحل المشكلات الصعبة وتحسين مهاراتك

التواصل والمشاركة المجتمعية

:يمكن أن يوفر بناء شبكة احترافية والتفاعل مع مجتمع علوم البيانات رؤى وفرصاً قيمة

اللقاءات والمؤتمرات: احضر اجتماعات وورش عمل ومؤتمرات علوم البيانات للتعلم من الخبراء والتواصل مع المتخصصين في هذا المجال

المجتمعات عبر الإنترنت: انضم إلى المنتديات والمجتمعات عبر الإنترنت

LinkedIn و Stack Overflow و r/datascience مثل مجموعات

لطلب المشورة ومشاركة المعرفة والبقاء على اطلاع دائم باتجاهات الصناعة

• الإرشاد: ابحث عن مرشد في مجال علم البيانات يمكنه تقديم التوجيه والتعليقات والدعم طوال فترة انتقالك

تصميم سيرتك الذاتية والبحث عن وظيفة

يعد تسويق مهاراتك وخبراتك بشكل فعال أمراً بالغ الأهمية عند التقدم لأدوار علم البيانات:

تسليط الضوء على المهارات القابلة للتحويل: أكد على مهاراتك التحليلية وقدراتك على حل المشكلات وخبرتك في التعامل مع البيانات الموجودة في سيرتك الذاتية وخطاب التقديم

عرض المشاريع والخبرات: قم بتضمين المشاريع ذات الصلة والتدريب الداخلي وأي خبرة عملية توضح كفاءتك في أدوات وتقنيات علم البيانات

تخصيص التطبيقات: قم بتخصيص سيرتك الذاتية وخطاب تقديمي لكل طلب وظيفة لتتوافق مع المتطلبات المحددة والكلمات الرئيسية لنشر الوظيفة

خاتمة

يعد الانتقال من الفيزياء إلى علم البيانات خطوة مهنية مجدية ومجزية تعمل على تعزيز مهاراتك التحليلية الحالية وخلفيتك الكمية. من خلال تطوير كفاءات جديدة في البرمجة والتعلم الآلي وتحليل البيانات واكتساب الخبرة العملية والمشاركة بنشاط مع مجتمع علوم البيانات يمكنك التنقل بنجاح في هذا التحول والازدهار في مجال علم البيانات المزدهر، تتطلب الرحلة التفاني والتعلم المستمر ونهج استباقي لبناء مجموعة المهارات الخاصة بك والشبكة المهنية، ولكن إمكانات النمو والتأثير في هذا المجال الديناميكي كبيرة

Advertisements

Exploratory Data Analysis / Hotel Booking project

Posted on August 14, 2023August 14, 2023 by s4l8384gmailcom

Advertisements

We start with the following steps:

* Dataset and context

The data set in our project represents hotel reservation information in the city

This reservation information includes the time of reservation, the duration of stay, the number of people who wish to reserve, classified according to (adults – children – babies) and the number of garages available for parking

* The stage of importing and reading data packages

At this point we have to import packages and libraries for data analysis and visualization

We can now read the data set

To show us the data as follows

*The data Preparation stage includes the following steps:

1. Handling Missing Values:

It appears to us that there are four columns whose values are empty, and in order to deal with them, we must understand the context of the data, and this is done by doing what is shown in the following figure:

2. Convert column values:

We have to replace the random values by further analysis

3. Change Data Styles:

Now we need to modify some columns that are still in the string types

4. Handling duplicates:

We have to remove the duplicate rows and to find out the number of duplicate rows we will run the following code

5. Create new columns by combining other columns:

6. Drop unnecessary columns

We do this because we used it to create new columns

* Descriptive analysis and correlations:

We can implement this function to return the description of the data in the DataFrame

We will use this data to perform the statistical analysis

Correlation heatmap

We will now construct the relationship between the image of the strength of the relationships between the numerical variables

We’ll touch on using this map for EDA later

* Exploratory data analysis:

As for the EDA procedure, and in order to stay on the right path, it is preferable that we follow the following steps:

After the data preparation process, we export the file to csv and then import it into Tableau to perform visualization later

By looking at the previous map, we have several inquiries about the relationships between features

We will use the previous map and visualizations to formulate the following inquiries:

From the data set, we selected three main elements: Booking, hotel, and customer

Booking:

1. What is the big picture for booking rooms throughout the year and month?

2. What are the best booking channels?

3. Will the reservation requester include meals with the reservation menu?

hotel:

4. Which hotels are the most popular and how many bookings do they have during the year?

5. Compare those hotels in the customer group.

6. Compare those hotels on customer type.

customers:

7. What are the types of customer requests when staying in different room types?

8. Knowing the highest frequency of guests and the highest length of stay.

9. What is the impact of the presence of children on the parents’ decision to order meals and the length of stay?

10. For children and babies, what is their preferred type of room?

Advertisements

*Visualization and conclusion stage:

It is the visualization stage using Tableau

1. What is the big picture for booking rooms throughout the year and month?

We’ll look at a three-year period in our next scenario

Check-out is observed in a large number of rooms, in return, a large percentage of the rooms are cancelled

The number of rooms that were booked, but the customers did not show up, was very large

Room reservations are classified by months:

We will notice that bookings in 2016 were at their peak, especially between the months of April and July

2. What are the best booking channels?

It shows us that direct channel is prevalent over hotel booking channels

While it shows us the reservation channels over time, it did not appear effective in hotel reservations, as is the case in the GDS channel

3. Will the reservation requester include meals with the reservation menu?

It is expected that the number of meals will increase with the increase in the number of reservation days, so we note that the months of July and August witness a large number of meals and booked rooms, then the numbers take a rapid decline after that

4. Which hotels are the most popular and how many bookings do they have during the year?

We are processing reservations for two hotels, City Hotel and Resort Hotel

Both hotels started booking around 2015

In comparison, we find that the City Hotel had approximately 19,000 reservations in 2016.

On the other hand, we find that the Resort Hotel had 12,200 reservations in the same year

5. Compare those hotels in the customer group.

The proportion of reservations among adults is ten times higher than the children’s group and thirty times higher than the infant group

This rate is also fixed at the Resort Hotel

6. Compare those hotels on customer type.

The main client type is Transient, followed by the Transient-Party client type, and then the contract client type

In the result, we see that the Resort hotel has a higher percentage of the contract customer type, with a total of 8182

City Hotel scored only 2,390

Omitting the Group customer type

7. What are the types of customer requests when staying in different room types?

The percentage of requests for parking spaces is directly proportional to the percentage of special requests submitted by customers, so it increases with its increase

We notice an increase in the number of guests in rooms D and A

Considering that these two rooms are the most common, this means that there is a high demand for requests

8. Knowing the highest frequency of guests and the highest length of stay.

The following chart shows data on the number of repeat guests and total stays aggregated by market movement

The number of repeat visitors within the corporate sector reached 1,445 visitors, and in return 579 visitors made reservations at the hotel again via the Internet, with a total length of stay of 103,554 nights.

The corporate segment has the highest number of repeat guests at 1,445, but their total number of nights is very low. Meanwhile, 579 online guests booked the hotel again, with a total stay of 103,554 nights.

9. What is the impact of the presence of children on the parents’ decision to order meals and the length of stay?

It is clear that the presence of children has a direct impact on the parents’ decision to choose to order meals and the duration of stay. Families with children tend to request additional meals but less stay, as we can see in the figure

10. For children and babies, what is their preferred type of room?

Considering that

G, F, A are common rooms for children

G, D, A are common rooms for babies

We conclude that rooms G and A are most suitable for visitors with children and babies

Excluding rooms H, E, and B from the preferred rooms for the same clients

Thus, we have completed our project and learned about the most important points that must be taken into account when undertaking any project of this kind

Advertisements

مشروع حجز الفنادق – تحليل البيانات الاستكشافية

Advertisements

نبدأ بالخطوات التالية:

مجوعة البيانات وسياقها *

تمثل مجموعة البيانات في مشروعنا معلومات الحجز بالفنادق المتواجدة في المدينة

معلومات الحجز هذه تشمل وقت الحجز ومدة الإقامة وعدد الأشخاص الراغبين بالحجز مصنفين حسب (البالغين – الأطفال – الرضع ) وعدد الكراجات المتاحة لوقوف السيارات

: مرحلة استيراد حزم البيانات وقراءتها *

علينا في هذه المرحلة أن نقوم باستيراد الحزم والمكتبات لتحليل البيانات وتصورها

يمكننا الآن قراءة مجموعة البيانات

لتظهر لنا البيانات على الشكل التالي

: مرحلة تجهيز البيانات وتتضمن الخطوات التالية*

1. معالجة القيم المفقودة:

يظهر لنا أن هناك أربعة أعمدة قيمها فارغة، وللتعامل معها ينبغي علينا فهم سياق البيانات ويتم ذلك بإجراء ما هو موضح في الرسم التالي

2. تحويل قيم الأعمدة:

علينا استبدال القيم العشوائية بواسطة مزيد من التحليل

3. تغيير أنماط البيانات:

نحتاج الآن إلى تعديل بعض الأعمدة التي لا تزال في أنواع السلاسل

4. معالجة التكرارات:

علينا إزالة الصفوف المكررة ولمعرفة عدد الصفوف المكررة سنقوم بتشغيل الكود التالي

5. إنشاء أعمدة جديدة عن طريق الجمع بين الأعمدة الأخرى:

6. إسقاط الأعمدة غير الضرورية

نقوم بهذا الإجراء لأننا استعملناها لإنشاء أعمدة جديدة

* التحليل الوصفي والارتباطات:

يمكننا تنفيذ هذه الوظيفة من إرجاع

DataFrame وصف البيانات في

سنستخدم هذه البيانات لإجراء التحليل الإحصائي

Correlation heatmap

سنبني الآن العلاقة بين صورة قوة العلاقات بين المتغيرات العددية

EDA سنتطرق لاحقاً لاستخدام هذه الخريطة لـ

: تحليل البيانات الاستكشافية *

ولكي نبقى في الطريق الصحيح يُفضَّل أن نقوم باتباع الخطوات التالية

بعد عملية تحضير البيانات نقوم

Tableau ثم الاستيراد إلى csv بتصدير الملف إلى

لإجراء التصور فيما بعد

من خلال النظر في الخارطة السابقة يتكون لدينا عدة استفسارات عن العلاقات بين السمات

:سنستعين بالخارطة السابقة وبالتصورات لتكوين الاستفسارات التالية:

من مجموعة البيانات قمنا باختيار ثلاثة عناصر أساسية هي: الحجز، الفندق، العميل

الحجز

1. ما هي الصورة الكبيرة لحجز الغرف طيلة العام والشهر؟

2. ما هي قنوات الحجز الأفضل؟

3. هل سيُدرِج طالب الحجز وجبات الطعام مع قائمة الحجز؟

الفندق

4. أي الفنادق تعتبر الأكثر شعبية وكم عدد الحجوزات لديها خلال العام؟

5. مقارنة تلك الفنادق في مجموعة العملاء.

6. مقارنة تلك الفنادق على نوع العملاء.

العملاء

7. ما هي نوعية طلبات العملاء عند إقامتهم في أنواع الغرف المختلفة؟

8. معرفة أعلى معدل تكرار للنزلاء وأعلى مدة إقامة.

9. ما مدى تأثير وجود الأطفال على قرار الأهل بطلب وجبات الطعام ومدة الإقامة؟

10. بالنسبة لوجود الأطفال والرضع ما هي نوعية الغرف المفضلة لديهم؟

Advertisements

: مرحلة التصور والاستنتاج *

Tableau وهي مرحلة التصور باستخدام

1. ما هي الصورة الكبيرة لحجز الغرف طيلة العام والشهر؟

سنتناول فترة ثلاث سنوات في تصورنا التالي

لوحظ تسجيل مغادرة في عدد كبير من الغرف، بالمقابل يتم إلغاء نسبة كبيرة من الغرف

عدد الغرف التي تم حجزها ولكن العملاء لم يحضروا إليها كان كبيراً جداً

:حجوزات الغرف مصنفة حسب الأشهر

سنلاحظ أن الحجوزات عام 2016 كانت في أوجها وخصوصاً بين شهري نيسان وتموز

2. ما هي قنوات الحجز الأفضل؟

يظهر لنا أن القناة المباشرة هي السائدة على قنوات حجز الفنادق

في حين يظهر لنا قنوات الحجز بمرور الوقت لم تظهر فعالية في عمليات الجحز الفندقي كما هو الحال في قناة GDS

3. هل سيُدرِج طالب الحجز وجبات الطعام مع قائمة الحجز؟

من المتوقع أن عدد وجبات الطعام ستزداد مع زيادة عدد أيام الحجز، فنلاحظ أن شهري تموز وآب يشهدان عدداً كبيراً في الوجبات والغرف المحجوزة ثم تأخذ الأرقام بالانحدار بشكل سريع بعد ذلك

4. أي الفنادق تعتبر الأكثر شعبية وكم عدد الحجوزات لديها خلال العام؟

نقوم بدراسة حجوزات لاثنين من الفنادق هما City Hotel و Resort Hotel

كلا الفندقين بدأ حجوزاتهما في 2015 تقريباً

وبالمقارنة نجد أن فندق City Hotel بلغ عدد حجوزاته 19000 حجز تقريباً في العام 2016

بالمقابل نجد أن فندق Resort Hotel بلغ عدد حجوزاته 12200 حجز في نفس العام

5. مقارنة تلك الفنادق في مجوعة العملاء.

نسبة الحجوزات بين البالغين هي أعلى بعشر مرات من مجوعة الأطفال وأعلى بثلاثون مرة من مجموعة الرُّضَّع

هذه النسبة ثابتة أيضاً في Resort Hotel

6. مقارنة تلك الفنادق على نوع العملاء.

Transient نوع العميل الرئيسي هو

Transient-Party يليه نوع عميل

contract ثم نوع عميل

Resort نرى في النتيجة أن فندق

بمجموع 8182 contract يسجل نسبة أعلى من نوع عميل

مجموع 2390 فقط City بينما سجل فندق

7. ما هي نوعية طلبات العملاء عند إقامتهم في أنواع الغرف المختلفة؟

تتناسب نسبة طلبات أماكن وقوف السيارات طرداً مع نسبة الطلب الخاص المقدم من قبل العملاء فتزداد بازدياده

D , A نلاحظ ارتفاع عدد نزلاء الغرفتين

وباعتبار أن هاتين الغرفتين هما الأكثر شيوعاً هذا يعني يؤدي إلى ارتفاع الطلب على الطلبات

8. معرفة أعلى معدل تكرار للنزلاء وأعلى مدة إقامة.

يوضح لنا المخطط التالي بيانات حول عدد الضيوف المتكررين وإجمالي الإقامة المجمعة حسب حركة السوق

بلغ عدد الزوار المتكررين ضمن قطاع الشركات 1445 زائر، وبالمقابل قام 579 زائراً بالحجز في الفندق مرة أخرى عن طريق الإنترنت وبلغ إجمالي مدة الإقامة 103554 ليلة

يحتوي قطاع الشركات على أكبر عدد من الضيوف المتكررين وهو 1445 ضيفًا ، ولكن إجمالي عدد لياليهم منخفض جدًا. وفي الوقت نفسه ، حجز 579 ضيفًا على الإنترنت في الفندق مرة أخرى ، وبلغ إجمالي مدة الإقامة 103554 ليلة.

9. ما مدى تأثير وجود الأطفال على قرار الأهل بطلب وجبات الطعام ومدة الإقامة؟

يتضح أن وجود الأطفال له تأثير مباشر على قرار الأهل في اختيار طلب وجبات الطعام ومدة الإقامة، فالأسرة التي لديها أطفال تميل لطلب وجبات إضافية ولكن إقامة أقل كما نلاحظ في الشكل

10. بالنسبة لوجود الأطفال والرضع ما هي نوعية الغرف المفضلة لديهم؟

على اعتبار أن

هي غرف شائعة للأطفال G, F, A

هي غرف شائعة للرضع G, D, A

G , A نستنتج أن الغرف

هي الأنسب للزوار الذي لديهم أطفال ورضع

من الغرف المفضلة للنفس العملاء H , E , B مع استبعاد الغرف

وبهذا نكون قد أتممنا مشروعنا وتعرفنا على أبرز النقاط الواجب مراعاتها عند القيام بأي مشروع من هذا النوع

Advertisements

What is the concept of data cleaning?

Posted on July 17, 2023July 17, 2023 by s4l8384gmailcom

Advertisements

Data cleaning

Data sets often contain errors or inconsistencies, especially when collected from multiple sources. In these cases, it is necessary to organize that data, correct errors, remove redundant entries, work to organize and format data, and exclude outliers. These procedures are called data cleaning.

The purpose of data cleaning

This process aims to detect any defect in the data and deal with it from the beginning, thus avoiding wasting time spent on arriving at incorrect results

In other words, early detection and fixing of errors leads to correct results

This fully applies to data analysis. Going with clean and formatted data enables analysts to save time and get the best results.

Here is an example showing the stages of data cleaning:

In this example we used Jupyter Notebook to run Python code inside Visual Studio Code

The code is in the GitHub repository at the link

https://github.com/mahesh989/Basic-Data-Cleaning

The first stage: reading the data:

This is done in our example using pandas by reading the data that we import from the source in the link:

https://github.com/justmarkham/DAT8/blob/master/data/chipotle.tsv

So that the libraries to be used are called

The second stage:

a. Observing Data

This stage aims to identify the data structure in terms of type and distribution in order to detect errors and imbalances in the data

This process will print the first and last 10 entries of the dataset and thus determine the applicable dataset type so that you choose the first or last entry according to the desired purpose and then output using df.head(10)

We notice some NaN entries in the Choice_description column

and a dollar sign in the item_price column

B. Data types of columns

You must now determine what type of data is in each column

In the following code, we define the column names and data types in an organized and coordinated manner

The output is:

Advertisements

The third stage: data cleaning

a. Change the data type

If the work requires converting data types, this is done while monitoring the data

In our example item_price includes a dollar sign, we can remove it and replace it with float64 because it contains a decimal number

B. Missing or empty values

The stage of searching for missing values in the data set comes:

The output is:

We notice from the output result above that the null value is represented by True, while False does not represent null values
We’ll have to find the number of null entries in the table using the sum because we won’t be able to see all the real values in the table

This procedure indicates to us the columns that contain null values and the number of them is empty. We can also note that the “option_description” column is the column that contains empty entries and 1246 of them are empty

We can also determine the presence of null values for each column and find the number as in the following image

We then proceed to find the missing values for each column

In our example, we notice that only one column contains null values

It should be noted here that it is necessary to calculate the percentage of the values in each column because, especially in the case of large data, it is possible that there will be empty values within several columns.

The output is:

We find here that the description column contains missing values by 27%, and this percentage does not necessitate deleting the entire column because it did not exceed 70%, which is the percentage of missing values that if found in a column, it is preferable to get rid of it

Another approach to dealing with missing values when cleaning data is to depend on the type of data and the defect to be addressed

To further clarify we have the column “choice_description” and to understand what the problem is we check the unique entries in this column to get more solutions

Now we make sure how many choice_description contains choice_description

Considering that the missing values are for the customer’s choice, they can be replaced on the assumption that these customers did not give details of their requests, so we replace the missing values with “Regular”.

And replace the null values with “Regular Order”

The output is:

Now let’s make sure that there are null values

By replacing null values with their descriptions, we got rid of all the missing values and began to improve our data

B. Remove redundancy

Now we will check the number of duplicate entries and then get rid of them and this deletion is not done if at least one of the entries is different from row to row as duplicate entries mean that all rows are exactly the same as the other row

We can check by running the code

The output is:

We will now delete duplicate entries

As a precautionary step we will make sure that there are no duplicate entries again

c. Delete extra spaces

That is, getting rid of spaces, extra spaces that are useless between letters and words

This task can be carried out by them:

String processing functions
regular expressions
Data cleaning tools

Fourth stage: data export

This step involves exporting the clean data keeping in mind that in our example we are working on a narrow and simplified scale

This code writes the cleaned data to a new CSV file named cleaned_data.csv

In the same path as our Python script with the ability to modify the file name and path as required

The argument index = False indicates that pandas does not include row index numbers in the exported data.

Fifth stage: data visualization using Tableau

We have reached the end of the data filtering journey with the clean data which we will export to visualization and now ready for easy analysis

Advertisements

ما هو مفهوم تنظيف البيانات؟

Advertisements

تنظيف البيانات

غالباً ما تحتوي مجموعات البيانات على أخطاء أو تناقضات وخصوصاً على تجميعها من مصادر متعددة ففي هذه الحالات من الضروري تنظيم تلك البيانات وتصحيح الأخطاء وإزالة الإدخالات المتكررة والعمل على تنظيم وتنسيق البيانات واستبعاد القيم المتطرفة، هذه الإجراءات تسمى تنظيف البيانات

الهدف من تنظيف البيانات

تهدف هذه العملية إلى اكتشاف أي خلل في البيانات والتعامل معه منذ البداية مما يجنِّب هدر الوقت المستهلك في الوصل إلى نتائج غير صحيحة

وبمعنى آخر، اكتشاف الأخطاء وإصلاحها في وقت مبكر يوصلنا إلى نتائج صحيحة بشكل مؤكد

وهذا ينطبق تماماً على تحليل البيانات فالمضي ببيانات نظيفة ومنسقة يمكِّن المحللين من توفير الوقت والحصول على أفضل النتائج

وهذا مثال يوضح مراحل تنظيف البيانات

Jupyter Notebook في هذا المثال استخدمنا

Visual Studio Code لتشغيل كود بايثون داخل

على الرابط GitHub الكود موجود في مستودع

https://github.com/mahesh989/Basic-Data-Cleaning

المرحلة الأولى: قراءة البيانات

يتم ذلك في مثالنا باستخدام باندا بأن نقرأ البيانات التي نستوردها من المصدر الموجود في الرابط

https://github.com/justmarkham/DAT8/blob/master/data/chipotle.tsv

بحيث يتم استدعاء المكتبات المراد الاستعانة بها

:المرحلة الثانية

أ. مراقبة البيانات

تهدف هذه المرحلة إلى التعرف على بنية البيانات من حيث النوع والتوزيع بغية اكتشاف الأخطاء والخلل في البيانات

بهذه العملية سيتم طباعة الإدخالات العشرة الأولى والأخيرة من مجموعة البيانات وبالتالي تحديد نوع مجموعة البيانات المعمول بها بحيث تختار الإدخال الأول أو الأخير وفق الغرض المطلوب

df.head(10) ثم الناتج باستخدام

NaN نلاحظ بعض إدخالات

Choice_description في عمود

item_price وعلامة الدولار في عمود

ب. أنواع بيانات الأعمدة

لابد الآن من تحديد نوع البيانات الموجودة في كل عمود

في الكود التالي يتحدد لدينا أسماء الأعمدة وأنواع البيانات بأسلوب منظم ومنسق

: النتيجة

Advertisements

المرحلة الثالثة: تنظيف البيانات

أ. تغيير نوع البيانات

إذا تطلب العمل تحويل أنواع البيانات فيتم ذلك أثناء مراقبة البيانات

علامة الدولار item_price وفي مثالنا يتضمن

float64 نستطيع إزالته واستبداله بـ

لاحتوائه على رقم عشري

ب. القيم المفقودة أو الفارغة

تأتي مرحلة البحث عن القيم المفقودة في مجموعة البيانات

النتيجة

نلاحظ من نتيجة الإخراج أعلاه

True أن القيمة الخالية متمثلة بـ

False بينما لا يمثل

قيماً خالية سنضطر إلى البحث عن عدد الإدخالات الخالية في الجدول باستخدام المجموع لأننا لن نستطيع رؤية كل القيم الحقيقية الموجودة في الجدول

يدلنا هذا الإجراء على الأعمدة التي تتضمن قيم خالية وعددها فارغ ويمكن أن نلاحظ أيضاً

“option_description” أن العمود

هو العمود الذي يحوي إدخالات فارغة و1246 منها خالية

كما ويمكننا تحديد وجود القيم الخالية لكل عمود مع إيجاد الرقم كما في الصورة التالية

ثم نتوجه إلى العثور على القيم المفقودة لكل عمود

وفي مثالنا نلاحظ أن عمود واحد فقط يتضمن قيم فارغة

يجدر التنويه هنا إلى أنه من الضروري حساب النسبة المئوية للقيم الموجودة في كل عمود لأنه وخصوصاً في حالة وجود بيانات ضخمة فمن المحتمل وجود قيم فارغة ضمن عدة أعمدة

النتيجة

description نجد هنا أن عمود

يحوي قيم مفقودة بنسبة 27% وهذه النسبة لا تستوجب حذف العمود بأكمله لأنها لم تتجاوز 70% وهي نسبة القيم المفقودة التي إن وجدت في عمود فيفضل التخلص منه ومن الطرق الأخرى المتبعة في التعامل مع القيم المفقودة عند تنظيف البيانات الاعتماد على نوع البيانات والخلل المطلوب معالجته

“choice_description”ولمزيد من التوضيح لدينا العمود

ولفهم ماهية المشكلة نتحقق من الإدخالات الفريدة في هذا العمود لنحصل على مزيد من الحلول

choice_description نتأكد الآن من عدد

choice_description الذي يتضمن

على اعتبار أن القيم المفقودة مخصصة لاختيار العميل فيمكن استبدالها على فرض أن هؤلاء العملاء لم يعطوا تفصيلاً عن طلباتهم

” Regular” فنستبدل القيم المفقودة بـ

” Regular Order” ونستبدل القيم الخالية بـ

النتيجة

ولنتأكد الآن من وجود قيم خالية

وعن طريق استبدال القيم الخالية بالأوصاف الخاصة بها تخلصنا من جميع القيم المفقودة وهكذا بدأنا بتحسين بياناتنا

ب. إزالة التكرار

سنتحقق الآن من عدد الإدخالات المكررة لنقوم بعد ذلك بالتخلص منها وعملية الحذف هذه لا تتم إذا كان أحد الإدخالات على الأقل مختلفاً من صف إلى آخر حيث أن الإدخالات المتكررة تعني أن جميع الصفوف متطابقة تماماً مع الصف الآخر

يمكننا التحقق من خلال تشغيل الكود

النتيجة

سنقوم الآن بحذف الإدخالات المتكررة

كخطوة احترازية سنتأكد من عدم وجود إدخالات مكررة مرة أخرى

ج. حذف المسافات الزائدة

أي التخلص من المسافات الفراغات الإضافية التي لا فائدة منها بين الأحرف والكلمات

ويمكن أن تنفذ هذه المهمة منها

وظائف معالجة السلاسل

التعبيرات العادية

الأدوات المخصصة لتنظيف البيانات

المرحلة الرابعة: تصدير البيانات

هذه الخطوة تتضمن تصدير البيانات النظيفة مع الأخذ بعين الاعتبار أننا في مثالنا نعمل على نطاق ضيق ومبسط

يعمل هذا الكود على كتابية البيانات المنظفة

cleaned_data.csv جديد اسمه CSV إلى ملف

في نفس المسار مثل نص بايثون الخاص بنا مع إمكانية تعديل اسم الملف والمسار حسب المطلوب

index = False تدل الوسيطة

أن “باندا” لا تقوم بتضمين أرقام فهرس الصفوف في البيانات المصدرة

المرحلة الخامسة: تصور البيانات باستخدام تابلو

وصلنا إلى نهاية رحلة تصفية البيانات بحصولنا على البيانات النظيفة والتي سنصدرها إلى التصور فهي الآن جاهزة لإجراء عملية التحليل بسهولة

Advertisements

The 10 most popular machine learning algorithms for 2023

Posted on July 2, 2023July 3, 2023 by s4l8384gmailcom

Advertisements

1. Linear regression

This term stands for a process of statistical analysis to test the relationship between two continuous variables, the first is independent and the second is one dependent

This type of statistics is used to find the best line through a set of data points that in turn will reveal the best future predictions

The simple linear regression equation is as follows:

y = b0 + b1*x

y is the dependent variable

x represents the independent variable

b0 represents the y-intercept (the point of intersection of the y-axis with the line)

b1 represents the slope of the line

And by the method of least squares, we can get the most appropriate line, that is, the line that reduces the sum of the square differences between the actual and expected values of the value of y

We can also customize the work of linear regression to expand it to several independent variables, then it is called multiple linear regression, whose equation is as follows:

y = b0 + b1x1 + b2x2 +… + bn * xn

x1, x2, …, xn represent the independent variables

b1, b2, …, bn represent the corresponding variables

As mentioned above, linear regression is useful for obtaining future predictions, as is the case when predicting stock prices or determining future sales of a specific product, and this is done by making predictions about the dependent variable

However, there are cases in which the regression model is not very accurate, in the event that there are extreme values that do not take the direction of the data in general

In order to show the optimal treatment in linear regression in the presence of extreme values, the following figure is given

– Neutralizing outliers from the data set before training the model

– Minimize the effect of outliers by applying a transform as taking a data log

Use powerful regression methods such as RANSAC or Theil-Sen because they mitigate the negative impact of outliers more effectively than traditional linear regression.

However, it cannot be denied that linear regression is an effective and commonly used statistical method

2. Logistic regression

It is a statistical method used to obtain predictions for options that bear two options, i.e. binary outcome, by relying on one or more independent variables, and this regression has a role in classification and sorting functions, such as predicting customer behavior and other tasks.

The work of logistic regression is based on a sigmoid function that sets the input variables to a probability between 0 and 1, and then comes the role of the prediction to get the possible outcome

Logistic regression is represented by the following equation:

P(y=1|x) = 1/(1+e^-(b0 + b1x1 + b2x2 + … + bn*xn))

P(y = 1|x) represents the probability that the outcome of y is 1 compared to the input variables x

b0 represents the intercept

b1, b2, …, bn represent the coefficients of the input variables x1, x2, …, xn

By training the model on a data set and using the optimization algorithm, the coefficients are determined and then used to make predictions by entering new data and calculating the probability that the result is 1

In the following diagram we see the logistic regression model

By examining the previous diagram , we find that the input variables x1 and x2 were used to predict the result y that has two options.

This regression is tasked with assigning the input variables to a probability that will determine in the future the shape of the expectation of the outcome

The coefficients b1 and b2 are determined by training the model on a data set and setting the threshold to 0.5.

3. Support Vector Machines (SVMs)

SVM is a powerful algorithm for both classification and regression. It divides data points into different categories by finding the optimal level with maximum margin. SVMs have been successfully applied in various fields, including image recognition, text classification, and bioinformatics.

The cases where SVMs are used are when the data cannot be separated by a straight line, this channel can distribute the data over a high-dimensional swath to facilitate the detection of nonlinear boundaries

SVMs have proven memory utilization, they focus on storing only the support vectors without the entire data set, and they are highly efficient in high-dimensional spaces even if the number of features is greater than the number of samples

This technique is strong against outliers due to its dependence on support vectors

However, one of the drawbacks of this technique is that it is sensitive to kernel function selection, and it is not effective for large data sets, as its training time is often very long.

4. Decision Trees:

Decision trees are multi-pronged algorithms that build a tree-like model of decisions and their possible outcomes. By asking a series of questions, decision trees classify data into categories or predict continuous values. They are common in areas such as finance, customer segmentation, and manufacturing

So, it is a tree-like diagram, where each internal set forms a decision point, while the leaf node expresses prediction

To explain how the decision tree works:

The process of building the tree begins with selecting the root node so that it is easy to sort the data into different categories, then the data is iteratively divided into subgroups based on the values of the input features in order to find a classification formula that facilitates the sorting of the different data or required values

The decision tree diagram is easy to understand as it enables the user to create a well-defined visualization that allows the correct and beneficial decision-making

However, it should be known that the deeper the decision tree and the greater the number of its leaves, the greater the probability of neglecting the data, and this is one of the negative aspects of the decision tree.

If we want to talk about other negative aspects, it must be noted that the decision tree is often sensitive to the order of the input features, and this leads to different tree diagrams, and on the other hand, the final tree may not give the best result.

5. Random Forest:

The random forest is a group learning method that combines many decision trees to improve prediction accuracy. Each tree is built on a random subset of the training data and features. Random forests are effective for classification and regression tasks, finding applications in areas such as finance, healthcare, and bioinformatics.

Random forests are used if the data in a single decision tree is subject to overfitting, thus improving the model with greater accuracy

This forest is formed using the Bootstrapping technique which generates multiple decision trees

It is a statistical method based on randomly selecting data points and replacing them with the original data set. As a result, multiple data sets are formed that include a different set of data points that are later used to train individual decision trees.

Random forest allows to improve overall model performance by reducing the correlation between trees within a random forest because it relies on using a random subset of features for each tree and this method is called “random subspace”.

One of the drawbacks of a random forest is the higher computational cost of training and predictions as the number of trees in a forest increases

In addition to its lower interpretability compared to a single decision tree, it is superior to a single decision tree by being less prone to overfitting and having a higher ability to handle high-dimensional datasets.

Advertisements

6. Naive Bayes

Naive Bayes is a probability algorithm based on Bayes’ theory with the assumption of independence between features. Despite its simplicity, Naive Bayes performs well in many real-world applications, such as spam filtering, sentiment analysis, and document classification.

Based on Bayes’ theorem, the probability of a particular class is calculated according to the values of the input features

There are different types of probability distributions when implementing the Naive Bayes algorithm, depending on the type of data

Among them:

Gaussian: for continuous data

Multinomial: for discrete data

Bernoulli: for binary data

Turning to the advantages of using this algorithm, we can say that it enjoys its simplicity and quality in terms of its need for less training data compared to other algorithms, and it is also characterized by the ability to deal with missing data.

But if we want to talk about the negatives, we will collide with their dependence on the assumption of independence between features, which often contradicts real-world data.

In addition, it is negatively affected by the presence of features different from the data set, so the level of performance decreases and the required efficiency decreases with it

7. KNN

KNN is a non-parametric algorithm that classifies new data points based on their proximity to the seeded examples on the training set. It is widely used in pattern recognition and recommendation systems

KNN can handle classification and regression tasks.

That is, it relies on assigning similarity to similar data points

After choosing the k value, the value closest to the prediction, the data is sorted into training and test sets to make a prediction for a new input by calculating the distance between the entry and each data point in the training set, then choosing the k nearest data points to set the prediction later using the closest set of data points

8. K-means

The working principle of this algorithm is based on the random selection of k centroids

So that k represents the number of clusters we want to create and then each data point is mapped to the cluster that was closest to the central point

So it is an algorithm that relies on grouping similar data points together and it is based on distance so that distances are calculated to assign a point to a group

This algorithm is used in many market segmentation, image compression and many other widely used applications

The downside of this algorithm is that its assumptions for data sets often do not match the real world

9. Dimensional reduction algorithms

This algorithm aims to reduce the number of features in the data set while preserving the necessary information. This technique is called “Dimensional Reduction”.

Like many dimension reduction algorithms, this algorithm makes data visualization easy and simple.

As in Principal Components Analysis (PCA)

and linear discriminant analysis (LDA)

Distributed Random Neighborhood Modulation (t-SNE)

We will come to explain each one separately

* Principal Component Analysis (PCA): It is a linear pattern of dimension reduction. Principal components can be defined as a set of correlated variables that have been orthogonally transformed into uncorrelated linear variables. Its aim is to identify patterns in the data and reduce its dimensions while preserving the necessary information.

* Linear Discrimination Analysis (LDA): is a supervised dimensionality reduction pattern used to obtain the most discriminating features of the sorting and classifying function

*t-Distributed Stochastic Neighbor Embedding (t-SNE)

It is a well-proven nonlinear dimension reduction technique for visualizing high-dimensional data in order to obtain a low-dimensional representation that prevents loss of data structure.

The downside of the dimension reduction technique is that some necessary information may be lost during the dimension reduction process

It is also necessary to know the type of data and the task to be performed in order to choose the dimension reduction technique, so the process of determining the appropriate number of dimensions to keep may be somewhat difficult.

10. Gradient boosting algorithm and AdaBoosting algorithm

They are two algorithms used in classification and regression functions and they are widely used in machine learning

The working principle of these two algorithms is based on forming an effective model by collecting several weak models

Gradient enhancement:

It depends on building a pattern in a progressive manner according to multiple stages, starting from installing a simple model on the data (such as a decision tree, for example) and then correcting the errors made by the previous models by adding additional models. Thus, each added model obtains agreement with the negative gradient of the loss function in terms of the predictions of the previous model.

In this way, the final output of the model is the result of assembling the individual models

AdaBoost:

It is an acronym for Adaptive Boosting. This algorithm is similar to its predecessor in terms of its mechanism of action by relying on creating a pattern for the forward staging method and differs from the gradient boosting algorithm by focusing on improving the performance of weak models by adjusting the weights of the training data in each iteration, i.e. it depends on the wrong training models according to the previous model. It then adjusts the weights for the erroneous models so that they have a higher probability of being selected in the next iteration until finally arriving at a model weighted for all individual models. These two algorithms are characterized by their ability to deal with wide types of numerical and categorical data, and they are also characterized by their strength in dealing with the extreme value and with data with missing values, so they are used in many practical applications

Advertisements

أشهر عشرة خوارزميات التعلم الآلي للعام 2023

Advertisements

1. الانحدار الخطي

يرمز هذا المصطلح إلى عملية تحليل إحصائي لاختبار العلاقة بين متغيرين مستمرين الأول مستقل والثاني تابع واحد

يستخدم هذا النوع من الإحصاء لإيجاد الخط الأفضل عن طريق مجموعة من نقاط البيانات التي بدورها ستكشف لنا التنبؤات المستقبلية الأفضل

:تتمثل معادلة الانحدار الخطي البسيط بالشكل التالي

y = b0 + b1*x

متغير التابع y يمثل

المتغير المستقل x يمثل

y تقاطع b0 يمثل

(مع الخط y نقطة تقاطع المحور)

ميل الخط b1 يمثل

وبطريقة المربعات الصغرى نستطيع الحصول على الخط الأنسب أي الخط الذي يقلل من مجموع الفروق المربعة بين القيم الفعلية

y والمتوقعة للقيمة

كما وأننا نستطيع تخصيص عمل الانحدار الخطي ليتوسع إلى عدة متغيرات مستقلة فيسمى عندها الانحدار الخطي المتعدد والذي تتمثل معادلته بالشكل التالي

y = b0 + b1x1 + b2x2 +… + bn * xn

المتغيرات المستقلة x1 ، x2 ، … ، xn تمثل

المتغيرات المقابلة b1 ، b2 ، … ، bn وتمثل

وكما ذكرنا آنفاً يفيد الانحدار الخطي للحصول على التنبؤات المستقبلية، كما هو الحال عند التنبؤ بأسعار الأسهم أو تحديد مبيعات مستقبلية لمنتج معين ويتم ذلك بإجراء تنبؤات حول المتغير التابع

إلا أنه يوجد حالات لا يكون فيها نموذج الانحدار دقيق جداً وذلك في حال وجود قيم متطرفة لا تأخذ اتجاه البيانات بشكل عام

ولتبيان التعامل الأمثل في الانحدار الخطي بوجود القيم المتطرفة على الشكل التالي

تحييد القيم المتطرفة وإبعادها من مجموعة البيانات قبل تدريب النموذج *

تقليل تأثير القيم المتطرفة عن طريق تطبيق تحويل كأخذ سجل البيانات *

Theil-Senأو RANSAC استخدام طرق الانحدار القوية مثل *

لأنها تخفف من التأثير السلبي للقيم المتطرفة بفعالية أكبر من الانحدار الخطي التقليدي

ومع ذلك لا يمكن إنكار أن الانحدار الخطي يعتبر طريقة إحصاء فعالة وشائعة الاستخدام

2. الانحدار اللوجستي

وهو طريقة إحصاء تستخدم للحصول على تنبؤات للخيارات التي تحتمل خيارين أي ثنائية النتيجة وذلك بالاعتماد على مغير مستقل أو أكثر كما وأن لهذا الانحدار دور في وظائف التصنيف والفرز كأن يتنبأ بسلوك العملاء وغيرها من المهام الأخرى

يعتمد عمل الانحدار اللوجستي على دالة سينية تقوم بتعيين متغيرات الإدخال

إلى احتمال بين صفر وواحد

ثم يأتي دور التوقع للحصول على النتيجة المحتملة

:يتمثل الانحدار اللوجستي بالمعادلة التالية

P(y=1|x) = 1/(1+e^-(b0 + b1x1 + b2x2 + … + bn*xn))

P (y = 1 | x) يمثل

1 هي y احتمال أن تكون نتيجة

x مقارنةً مع متغيرات الإدخال

التقاطع b0 تمثل

b1 ، b2 ، … ، bn تمثل

معامِلات متغيرات الإدخال

x1 ، x2 ، … ، xn

ومن خلال تدريب النموذج على مجموعة بيانات والاستعانة بخوارزمية التحسين يتم تحديد المعاملات ثم يتم استخدامه في إجراء التنبؤات عن طريق إدخال بيانات جديدة

1 وحساب احتمالية أن تكون النتيجة

في الشكل التالي نلاحظ نموذج الانحدار اللوجستي

وبدراسة الشكل السابق نجد أنه استُخدمت

y للتنبؤ بالنتيجة x2و x1 متغيرات الإدخال

التي تحتمل خيارين

يتولى هذا الانحدار مهمة تعيين متغيرات الإدخال إلى احتمالية والتي ستحدد مستقبلاً شكل التوقع للنتيجة

b2و b1 أما المعامِلان

فيتحددان من خلال تدريب النموذج على مجموعة بيانات

0.5 وتعيين الحد على

3. (SVMs) دعم آلات المتجهات

خوارزمية قوية لكل من التصنيف والانحدار SVM يعد

يقسم نقاط البيانات إلى فئات مختلفة من خلال إيجاد المستوى الأمثل مع الحد الأقصى للهامش

بنجاح في مجالات مختلفةSVMs تم تطبيق

بما في ذلك التعرف على الصور وتصنيف النص والمعلوماتية الحيوية

SVMs تعتبر الحالات التي تستخدم فيها

هي التي لا يمكن فيها فصل البيانات بخط مستقيم، فبإمكان هذه القنية أن توزع البيانات على رقعة عالية الأبعاد لتسهيل اكتشاف حدود غير خطية

قدرتها على استخدام الذاكرة SVMs أثبتت أجهزة

فهي تركز على تخزين متجهات الدعم فقط دون الحاجة إلى مجموعة البيانات كلها، كما وأنها تتمتع بكفاءة عالية في المساحات عالية الأبعاد حتى لو كان عدد الميزات أكبر من عدد العينات

تعتبر هذه التقنية قوية ضد القيم المتطرفة نظراً لاعتمادها على ناقلات الدعم

إلا أن أحد سلبيات هذه التقنية هو أنها

kernel حساسة لاختيار وظيفة

كما أنها غير فعالة لمجموعات البيانات الضخمة كونها وقت التدريب فيها طويل جداً على الأغلب

4. أشجار القرار

أشجار القرار هي خوارزميات متعددة الجوانب تبني نموذجًا شبيهًا بالشجرة من القرارات ونتائجها المحتملة. من خلال طرح سلسلة من الأسئلة، تصنف أشجار القرار البيانات إلى فئات أو تتنبأ بقيم مستمرة. وهي شائعة في مجالات مثل التمويل وتجزئة العملاء والتصنيع

إذاً هي مخطط يشبه الشجرة بحيث تشكل كل عدة داخلية نقطة قرار أما العقدة الورقية فتعبر عن التنبؤ

:ولشرح عمل شجرة القرار

تبدأ عملية بناء الشجرة باختيار عقدة الجذر بحيث يسهل فرز البيانات إلى فئات مختلفة، ثم يتم تقسيم البيانات إلى مجموعات فرعية بشكل متكرر بالاعتماد على قيم ميزات الإدخال بغية إيجاد صيغة تصنيفية تسهل فرز البيانات المختلفة أو القيم المطلوبة

مخطط شجرة القرار سهل الفهم فهو يمكن المستخدم من إنشاء تصور واضح المعالم يتيح اتخاذ القرار الصائب والمفيد

إلا يجب معرفة أنه كلما كانت شجرة القرار عميقة أكثر وكان عدد أوراقها أكبر كلما زاد احتمال التفريط في البيانات وهذا أحد الجوانب السلبية في شجرة القرار

وإذا أردنا التحدث عن جوانب سلبية أخرى فلابد من التنويه إلى أن شجرة القرار غالباً ما تكون حساسة لترتيب ميزات الإدخال وهذا يؤدي إلى مخططات شجرية مختلفة والمقابل قد لا تعطي الشجرة النهائية النتيجة الأفضل

Advertisements

5. الغابة العشوائية

الغابة العشوائية هي طريقة تعلم جماعية تجمع بين العديد من أشجار القرار لتحسين دقة التنبؤ، كل شجرة مبنية على مجموعة فرعية عشوائية من بيانات التدريب والميزات، تعتبر الغابات العشوائية فعالة في مهام التصنيف والانحدار وإيجاد تطبيقات في مجالات مثل التمويل والرعاية الصحية والمعلوماتية الحيوية

ويتم استخدام الغابات العشوائية في حال كانت البيانات في شجرة قرار واحدة معرضة للإفراط في التجهيز وبالتالي تحسين النموذج بدقة أكبر

Bootstrapping يتم تشكيل هذه الغابة باستخدام تقنية

التي تقوم بإنشاء أشجار قرارات متعددة

وهي طريقة إحصائية تعتمد على اختيار عشوائي لنقاط بيانات واستبدالها مع مجموعة البيانات الأصلية فتتشكل بالنتيجة مجموعات بيانات متعددة تتضمن مجموعة مختلفة من نقاط البيانات المستخدمة لاحقاً لتدريب أشجار القرار الفردية

تتيح الغابة العشوائية تحسين أداء النموذج بشكل عام عن طريق تقليل الارتباط بين الأشجار ضمن الغابة العشوائية لأنها تعتمد على استخدام مجموعة فرعية عشوائية من الميزات لكل شجرة وهذه الطريقة تسمى “الفضاء الجزئي العشوائي”

أحد سلبيات الغابة العشوائية يكمن في ارتفاع التكلفة الحسابية للتدريب والتنبؤات كلما زاد عدد الأشجار في الغابة علاوة على انخفاض قابلية التفسير مقارنة بشجرة قرار واحدة إلا أنها تتفوق على شجرة القرار الواحدة بكونها أقل عرضة للإفراط في التجهيز وقدرتها العالية على التعامل مع مجموعات بيانات عالية الأبعاد

6. Naive Bayes

هي خوارزمية احتمالية تعتمد على نظرية بايز مع افتراض الاستقلال بين الميزات

Naive Bayes على الرغم من بساطته فإن

يعمل بشكل جيد في العديد من تطبيقات العالم الحقيقي، مثل تصفية البريد العشوائي، وتحليل المشاعر، وتصنيف المستندات

بالاعتماد على نظرية بايز يتم حساب احتمالية فئة معينة وفق قيم ميزات الإدخال ويوجد أنواع مختلفة من التوزيعات الاحتمالية

تستخدم حسب نمط البيانات Naive Bayes عند تنفيذ خوارزمية

:نذكر منها

للبيانات المستمرة :Gaussian

للبيانات المنفصلة :Multinomial

للبيانات الثنائية :Bernoulli

وبالتطرق إلى إيجابيات استخدام هذه الخوارزمية فيمكننا القول أنها تتمتع ببساطتها وجودتها من حيث حاجتها لبيانات تدريب أقل مقارنة بالخوارزميات الأخرى وتتميز أيضاً بإمكانية التعامل مع البيانات المفقودة

أما إذا أردنا التحدث عن السلبيات فسنصطدم باعتمادها على افتراض الاستقلال بين الميزات والذي غالباً ما يتعارض مع بيانات العالم الواقعي

إضافة إلى أنها تتأثر سلباً بوجود ميزات مختلفة عن مجوعة البيانات فينخفض مستوى الأداء وتقل معها الكفاءة المطلوبة

7. KNN

هي خوارزمية غير معلمية تصنف نقاط البيانات الجديدة بناءً على قربها من الأمثلة المصنفة في مجموعة التدريب، يستخدم على نطاق واسع في التعرف على الأنماط وأنظمة التوصية

التعامل مع مهام التصنيف والانحدار KNN يمكن لـ

أي أنها تعتمد على إضفاء صفة التشابه على نقاط البيانات المتشابهة

القيمة الأقرب للتنبؤ k بعد اختيار قيمة

يتم فرز البيانات إلى مجموعات تدريب واختبار لعمل تنبؤ لمدخل جديد عن طريق حساب المسافة بين الإدخال وكل نقطة بيانات في مجموعة التدريب

أقرب نقاط البيانات k ثم تختار

ليتم تعيين التنبؤ لاحقاً باستخدام المجموعة الأكثر قرباً لنقاط البيانات

8. K-means

يعتمد مبدأ عمل هذه الخوارزمية

k centroids على الاختيار العشوائي لـ

عدد المجموعات التي نريد إنشاءها k بحيث تمثل

ثم يتم تحديد كل نقطة بيانات إلى المجموعة التي تم أقرب نقطة مركزية

إذاً هي خوارزمية تعتمد على تجميع نقاط البيانات المتشابهة معاً وهي قائمة على المسافة بحيث تُحسب المسافات لتعيين نقطة إلى مجموعة

تستخدم هذه الخوارزمية في كثير من تطبيقات تجزئة السوق وضغط الصور وغيرها العديد من التطبيقات الواسعة الاستخدام

يتمثل الجانب السلبي لهذه الخوارزمية هو أن افتراضاتها لمجموعات البيانات لا تطابق الواقع الحقيقي في أغلب حيان

9. خوارزميات تقليل الأبعاد

تهدف هذه الخوارزمية إلى تقليل عدد الميزات في مجموعة البيانات مع المحافظة على المعلومات الضرورية، تسمى هذه التقنية تقليل الأبعاد

تسهم هذه الخوارزمية في جعل تصور البيانات أمراً سهلاً وبسيطاً شأنها شأن كثير من خوارزميات تقليل الأبعاد

(PCA) كما في تحليل المكونات الرئيسية

(LDA) والتحليل التمييزي الخطي

(t-SNE) والتضمين المتجاور العشوائي الموزع

وسنأتي على شرح كل واحدة منها على حدا

: (PCA) تحليل المكون الرئيسي *

هو نمط خطي لتقليل الأبعاد، ويمكن تعريف المكونات الأساسية بأنها مجموعة من المتغيرات المرتبطة تم تحويلها تحويلاً متعامداً إلى متغيرات خطية غير مترابطة، الهدف منه تحديد الأنماط في البيانات وتقليل أبعادها مع المحافظة على المعلومات الضرورية

: (LDA) تحليل التمييز الخطي *

هو نمط تقليل الأبعاد خاضع للإشراف يستخدم بغية الحصول على السمات الأكثر تمييزاً لوظيفة الفرز والتصنيف

t-Distributed Stochastic Neighbor Embedding (t-SNE) تضمين *

وهي تقنية لتقليل الأبعاد غير الخطية أثبتت جدارتها لتصور البيانات عالية الأبعاد بغية الحصول على تمثيل منخفض الأبعاد يَحُول دون فقدان بنية البيانات

تتمثل سلبيات تقنية تقليل الأبعاد هو أنه بعض المعلومات الضرورية قد تتعرض الفقدان أثناء عملية تقليل الأبعاد

كما وأنه من الضروري معرفة نوع البيانات والمهمة المطلوب تنفيذها لاختيار تقنية تقليل الأبعاد لذا قد تكون عملية تحديد العدد الأنسب للأبعاد للاحتفاظ بها صعبة نوعاً ما

10. AdaBoosting خوارزمية تعزيز التدرج وخوارزمية

وهما خوارزميتان تستخدمان في وظائف التصنيف والانحدار وهما تستخدمان على نطاق واسع في التعلم الآلي

يعتمد مبدأ عمل هاتين الخوارزميتين على تشكيل نموذج فعال من خلال جمع عدة نماذج ضعيفة

:تعزيز التدرج

تعتمد على بناء نمط بأسلوب تقدمي وفق مراحل متعددة انطلاقاً من تركيب نموذج بسيط على البيانات (كشجرة القرار مثلاً) ثم تصحيح الأخطاء التي ارتكبتها النماذج السابقة وذلك بإضافة نماذج إضافية وبذلك يحصل كل نموذج مضاف على توافق مع التدرج السلبي لوظيفة الخسارة من حيث تنبؤات النموذج السابق

وعلى هذا النحو يكون الناتج النهائي للنموذج هو حصيلة تجميع النماذج الفردية

:AdaBoost

Adaptive Boosting وهي اختصار لـ

تشبه هذه الخوارزمية سابقتها من حيث آلية عملها باعتمادها على إنشاء نمط لأسلوب المرحلي للأمام وتختلف عن خوارزمية تعزيز التدرج بتركيزها على تحسين أداء النماذج الضعيفة من خلال تعديل أوزان بيانات التدريب في كل تكرار أي أنها تعتمد على نماذج التدريب الخاطئة حسب النموذج السابق وثم تثوم بتعديل الأوزان النماذج الخاطئة بحيث يصبح لديها احتمال أكبر للاختيار في التكرار الذي يليه حتى الوصول في النهاية إلى نموذج مرجح لجميع النماذج الفردية

تمتاز هاتان الخوارزميتان إلى بقدرتهما على التعامل مع أنماط واسعة من البيانات الرقمية منها والفئوية وتمتازان أيضاً بقوتهما بالتعامل مع القيمة المتطرفة ومع البيانات ذات القيم المفقودة لذا تستخدمان في العديد من التطبيقات العملية

Advertisements

Points to consider before applying to a data science master’s degree

Posted on April 28, 2023April 28, 2023 by s4l8384gmailcom

Advertisements

According to statistics conducted by websites on the Internet, thousands of master’s degrees related to data science and artificial intelligence are offered all over the world, and we often see promotional advertisements used by universities about the importance of data science and the necessity of obtaining these certificates

In this article, we will try to highlight the things that must be taken into consideration before obtaining a master’s degree in data science

What is your goal of obtaining a master’s degree?

In other words, what advantages will you get with a master’s degree in data science?

The motives differ from one person to another regarding the pursuit of a master’s degree, but if we take a comprehensive look at the desire of the large group and the majority of students, we see that the goal is summed up in several points:

Discipline and responsibility: Often, a person’s self-learning journey is undisciplined and lacks coordination and organization, so the way you study to obtain a master’s degree will draw a specific and organized educational path for you, and thus it will give you a measure of organization and responsibility.

Effective rapid learning: Your desire to obtain a master’s degree will develop your motivation to learn and acquire more experiences and skills that you may not be able to obtain during your normal learning journey.

Functional competence: To be a data scientist with high efficiency and sufficient experience, then you have great opportunities to get a good job in data science if you were not employed before, but if you were employed, the prospects are open to you to get a job promotion that provides you with many capabilities that are commensurate with your level. Scientific and raise your status

Scientific curiosity: No matter how much experience and knowledge you have in artificial intelligence, you must be certain that there are topics and skills that you must discover, do not let your interests stop at a certain limit, you still have a lot to learn

In view of these motives, it may come to mind that it is imperative for every data scientist to seek to obtain this scientific degree, and this is wrong thinking in fact, or at least the subject is not in this way of inevitable necessity, but rather it is in the end an advanced scientific degree that undoubtedly qualifies its bearer Because he has preference in the field of data science, especially artificial intelligence, but this does not mean that someone who does not hold a master’s degree in data science is not qualified to be successful and expert, not necessarily, because every hardworking person has a share of success

Is a master’s degree enough to achieve your goals as a data scientist?

In order for us to be able to answer this question accurately, we must understand a very important matter. Whatever the level of your academic degrees and in any field, whether a master’s degree, a doctorate, or other scientific degrees, we cannot in any way neglect the factor of experience, without experience and personal skill in dealing with any A specific field, scientific degrees alone cannot make the holder reach advanced stages within his field and specialization, because experience is evidence of good dealing and behavior, especially in some difficult situations and problems that one encounters during his scientific and practical career. Some situations require prior experience in dealing with this type of problem that was not I have been included in the master’s degree studies, and these experiences are not acquired overnight, but are formed as a result of a group of experiences that varied between finding solutions, good behavior, and learning from mistakes and benefiting from them. It is known that he who does not make mistakes does not learn. Experience sometimes comes after a decisive decision or a bold step. The expert has a treasure in his hands that the holders of higher degrees may not possess sometimes, as he is able to seize the weakest opportunity and turn it into a strong and successful project.

With all of the above, we conclude that obtaining a master’s degree is a good thing and becomes a strength factor if it is supported by sufficient experience. These two elements, if available together, undoubtedly constitute a data scientist with a high level of competence and skill.

Does time help achieve goals enough?

The time factor is considered one of the main factors that contribute to achieving the desired goal. There is no doubt that studying in complete writing helps in obtaining the largest possible amount of information at an appropriate speed. It is directly related to data science as papers related to the social sciences of the Internet and the design of questionnaires, so a master’s degree student in data science is not restricted to an optimal investment of time, so what is consumed less time in regular studies in general does not lead you to a scientific degree that a master’s degree gives you

Advertisements

Is there an alternative to a master’s degree?

Through what we have reached in this research, we have a question that arises: Is it possible to say that someone who does not have a master’s degree is considered unqualified to be a capable and professional data scientist and does not have opportunities like those possessed by a master’s degree?

In fact, this statement is not absolute, despite the prevailing custom that holders of a master’s degree are preferred over those who do not hold a master’s degree, and holders of a doctorate are preferred over holders of master’s degrees, and so on.

Of course, obtaining more certificates requires more years of study, perhaps up to 7 years, and then comes the shocking fact that 3 years of experience, especially with regard to the file for applying for a job in artificial intelligence, may outperform all of the long years of study mentioned.

In order not to confuse matters with each other and make the reader feel a bit of hesitation in the information presented, it can be said that the holder of a PhD remains the focus of attention of potential employers, because, in my opinion, he would not have reached what he has reached if he did not have the necessary experience that would lead him to this scientific degree.

Does the financial return of the master’s degree holder compensate for what he spent on the learning journey?

There are many people who obtained a master’s degree who were shocked that the job salary did not meet their aspirations and therefore fell into the trap of the misconception that the money they spent when studying a master’s degree cannot be compensated through job ranks, even in the short term, at least

In this case, the solution is preventive, not curative, and this is done in a wise manner during the study process. Instead of random spending on full-time study, it is possible to study part-time while preserving the job and thus the salary, which is the first thing that falls within the scope of good management in spending. Scholarships that contribute significantly to covering a good portion of the tuition fees

Make sure to get a good source of information in learning:

The name of the university or educational unit, no matter how well-known, does not necessarily indicate that it is a good source of information, but what determines the quality of these educational centers is the extent to which students interact with the course and the results of graduates. All you have to do is search for opinions and official statistics on any course issued by any An educational unit that offers this type of studies, thus increasing your chances of finding a leading educational unit that will provide you with a sound and good study

Are these courses compatible with your scientific level?

As a continuation of the previous paragraph and in the midst of talking about the good selection of appropriate courses, it should be noted that it is necessary to know whether these courses are appropriate in their content and style to your scientific level, as the course may present topics for beginners that others who are more experienced see as very simple

And this is what actually happened when one of the major universities included the gathering of academic groups at the beginning of its training program, which it started with an intensive course in programming, which made this course for some a boring matter and a waste of time.

Do not forget, after making sure that you follow courses that suit your academic level, to investigate whether these courses provide graduates with job opportunities based on what was studied in the course. The job is online, full time or part time

Is studying data science the best option for you?

Being content with what one undertakes, whether it is study or work, is an important factor in the success of this project. No person can be creative in any field unless he is completely convinced of what he is doing.

There are many people for whom the option of studying a master’s degree is an opportunity to postpone decisions related to what he should do in their lives, but in fact, in this case, the subject of a valuable study like this turns into a great waste of time. Practical experience that expands your skills and knowledge in the field of data science and artificial intelligence

Data science is a multi-disciplinary science with many branches and ramifications, all of which are of value and open up wide horizons of knowledge and experiences for its students that reach its owner to what he aspires to and make his goals within sight and reaching them is only a matter of time.

In the end, dear reader: We hope that you have obtained the benefit and enjoyment in this article, and do not forget to share your opinion with us in the aforementioned, with our wishes for success and success for you.

Advertisements

النقاط التي يجب مراعاتها قبل التقدم إلى درجة ماجستير بعلم البيانات

Advertisements

وفق إحصائيات أجرتها مواقع على شبكة الإنترنت يتم تقديم آلاف درجات الماجستير المختصة بعلوم البيانات والذكاء الاصطناعي في جميع أنحاء العالم وكثيراً ما نشاهد الإعلانات الترويجية التي تستخدمها الجامعات حول أهمية علم البيانات وضرورة الحصول على هذه الشهادات

وسنحاول في هذا المقال تسليط الضوء على الأشياء التي يجب أخذها بعين الاعتبار قبل الحصول على درجة الماجستير في علم البيانات

ما هو هدفك من الحصول على درجة الماجستير؟

أو بمعنى آخر ماهي الميزات التي ستحصل عليها بحصولك على درجة الماجستير في علوم البيانات

تختلف الدوافع من شخص لآخر حول السعي لحصول على درجة الماجستير لكن إذا ألقينا نظرة شمولية على الرغبة لدى الفئة الكبيرة والغالبية من الطلاب نرى أن الغاية تتلخص في عدة نقاط

الانضباط والمسؤولية : فغالباً ما يكون الإنسان برحلة تعلمه الذاتية غير منضبط ويفتقد إلى التنسيق والتنظيم ، لذا فطريقة دراستك للحصول على درجة الماجستير سترسم أمامك مساراً تعليمياً محدداً ومنظماً وبالتالي ستمنحك قدراً من التنظيم والمسؤولية

التعلم السريع الفعال : رغبتك في الحصول على درجة الماجستير ستنمي عندك الدافع للتعلم واكتساب المزيد من الخبرات والمهارات التي قد لا تستطيع الحصول عليها أثناء رحلة تعلمك الاعتيادية

الكفاءة الوظيفية : أن تكون عالِم بيانات يتمتع بالكفاءة العالية والخبرة الكافية فأنت أمام فرص كبيرة للحصول على وظيفة جيدة في علم البيانات إن لم تكن موظف من قبل ، أما إن كنت موظفاً فالآفاق مفتوحة أمامك للحصول على ترقية وظيفية توفر لك العديد من الإمكانات التي تتناسب مع مستواك العلمي وترفع من مكانتك

الفضول العلمي : مهما كنت تمتلك من الخبرة والمعرفة في الذكاء الاصطناعي ، إلا أنه يجب أن تكون على يقين أن هناك مواضيع ومهارات عليك اكتشافها ، لا تدع اهتماماتك تقف عند حد معين، مازال أمامك الكثير لتتعلمه

وبالنظر إلى هذه الدوافع قد يتبادر إلى الأذهان أنه من الضرورة الحتمية لكل عالِم بيانات أن يسعى للحصول على هذه الدرجة العلمية، وهذا تفكير خاطئ في الحقيقة أو على الأقل ليس الموضوع بهذه الصورة من الضرورة الحتمية بل هو في النهاية درجة علمية متقدمة لا شك أنها تؤهل حاملها لأن يكون ذو أفضلية في مجال علم البيانات ولا سيما الذكاء الاصطناعي ، ولكن هذا لا يعني أن من لا يحمل درجة الماجستير في علم البيانات ليس مؤهلاً لأن يكون ناجحاً وخبيراً , لا ليس بالضرورة فلكل مجتهد نصيب من النجاح

هل يكفي نَيل درجة الماجستير في تحقيق أهدافك كعالِم بيانات ؟

لنستطيع الإجابة على هذا السؤال بشكل دقيق لابد أن نفهم أمراً مهماً جداً ، مهما بلغ مستوى شهاداتك العلمية وفي أي مجال سواء ماجستير أو دكتوراه أو غيرها من الدرجات العلمية ، فلا يمكن بأي شكل من الأشكال أن نهمل عامل الخبرة فبدون الخبرة والمهارة الشخصية في التعامل مع أي مجال معين لا يمكن للدرجات العلمية وحدها أن تجعل حاملها يصل إلى مراحل متقدمة ضمن مجاله واختصاصه لأن الخبرة دليل حسن التعامل والتصرف ولاسيما في بعض المواقف الصعبة والمشاكل التي تعترض المرء أثناء مسيرته العلمية والعملية فبعض المواقف تتطلب خبرة مسبقة في التعامل مع هذا النوع من المشاكل لم تكن قد أُدرجت في فصول دراسة درجة الماجستير وهذه الخبرات لا تُكتسب بين يوم وليلة وإنما تتشكل نتيجة مجوعة تجارب تنوعت بين إيجاد الحلول وحسن التصرف والتعلم من الأخطاء والاستفادة منها فمن المعروف أنه من لا يخطئ لا يتعلم، الخبرة تأتي أحياناً بعد قرار حاسم أو خطوة جريئة ، الإنسان الخبير يمتلك بين يديه كنزاً قد لا يمتلكه أصحاب الشهادات العليا أحياناً فهو قادر على انتهاز أضعف الفرصة وتحويلها إلى مشروع قوي وناجح

ومع كل ما سبق نستنتج أن الحصول على درجة ماجستير أمر جيد ويصبح عامل قوة إن كان مدعوماً بالخبرة الكافية فهذان العنصران إن توفرا معاً فهما دون شك يكوِّنان عالِم بيانات على مستوى عالي من الكفاءة والمهارة

هل يساعد الوقت في تحقيق الأهداف بشكل كافٍ ؟

يعتبر عامل الوقت من العوامل الأساسية التي تسهم في تحقيق الهدف المرجو ومما لا شك فيه أن الدراسة بدوان كامل تساعد في الحصول على أكبر قدر ممكن من المعلومات بسرعة مناسبة ولكن عند دراسة درجة الماجستير الأمر مختلف قليلاً إذ يتحتم على الدارس دراسة أوراق بحثية في عدة مواضيع لا علاقة لها مباشرة بعلوم البيانات كأوراق تتعلق بالعلوم الاجتماعية للإنترنت وتصميم الاستبيانات ، إذاً طالب درجة الماجستير في علم البيانات غير مقيد باستثمار أمثل للوقت ، فما يُستهلك من وقت أقل في الدراسات العادية على العموم لا يوصلك إلى درجة علمية تعطيك إياها درجة الماجستير

Advertisements

هل يوجد بديل لدرجة الماجستير؟

من خلال ما توصلنا إليه في هذا البحث أصبح لدينا سؤال يطرح نفسه : هل يمكن القول بأن مَن لا يملك درجة ماجستير يعتبر غير مؤهل لأن يكون عالِم بيانات قدير ومحترف ولا يملك فرص كتلك التي يمتلكها الحائز على درجة ماجستير ؟

في الحقيقة هذا الكلام غير مطلق رغم العرف السائد بأن حامل الماجستير يُفضَّل على من لا يحملها وحامل الدكتوراه مفضَّل على حامل الماجستير وهكذا

وبالطبع الحصول على مزيد من الشهادات يتطلب المزيد من سنوات الدراسة ربما تصل 7 سنوات ثم تأتي الحقيقة الصادمة بأن 3 سنوات من الخبرة وخاصة فيما يتعلق بملف التقدم لوظيفة في الذكاء الاصطناعي ربما تتفوق على كل ما ذكر من السنوات الطويلة في الدراسة

وكي لا تختلط الأمور ببعضها ويشعر القارئ بشيء من التذبذب في المعلومات المطروحة يمكن القول بأن حامل الدكتوراه يبقى محط أنظار أصحاب العمل المحتملين لأنه باعتقادي ما كان ليصل إلى ما وصل إليه لو أنه لم يكن يمتلك الخبرة اللازمة التي توصله إلى هذه الدرجة العلمية

هل يعوض العائد المادي لحامل درجة الماجستير ما أنفقه في رحلة التعلم ؟

هناك العديد من الأشخاص الذين حصول على درجة الماجستير صُدموا بأن الراتب الوظيفي لا يلبي تطلعاتهم وبالتالي وقعوا في فخ الاعتقاد الخاطئ بأن المال الذي أنفقوه عند دراسة درجة الماجستير لا يمكن تعويضه من خلال الرتب الوظيفي ولو على المدى القريب على أقل تقدير

في هذه الحالة يكون الحل وقائي لا علاجي، ويتم ذلك في التصرف الحكيم أثناء عملية الدراسة فبدلاً من الإنفاق العشوائي على الدراسة بدوام كامل يمكن الدراسة بدوام جزئي مع المحافظة على الوظيفة وبالتالي الراتب وهي أولى الأمور التي تدخل في حيز التدبير الجيد في الإنفاق ، كما وأن التقديم على منح دراسية تسهم بشكل كبير في تغطية جزء لا بأس به من المصروف الدراسي

:احرص على الحصول على مصدر معلومات جيد في التعلم

ليس بالضرورة أن يدل اسم الجامعة أو الوحدة التعليمية مهما كان مشهوراً على أنه مصدر معلومات جيد ، ولكن ما يحدد جودة هذه المراكز التعليمية هو مدى تفاعل الطلاب مع الدورة ونتائج الخريجين ، كل ما عليك فعله هو البحث عن الآراء والإحصاءات الرسمية عن أي دورة صادرة عن أي وحدة تعليمية تقدم هذا النوع من الدراسات وبهذا تزيد فرصك في العثور على وحدة تعليمية رائدة تؤمن لك دراسة سليمة وجيدة

هل تتماشى هذه الدورات مع مستواك العلمي ؟

استطراداً للفقرة السابقة وفي خضم الحديث عن حسن اختيار الدورات المناسبة يجب التنويه إلى ضرورة معرفة فيما إذا كانت هذه الدورات تناسب في محتواها وأسلوبها مستواك العلمي فلربما تطرح الدورة موضوعات للمبتدئين يراها آخرون ممن هم أكثر خبرة على أنها بسيطة جداً

وهذا ما حصل بالفعل عندما قامت إحدى الجامعات الكبرى بضم جمع الفئات الأكاديمية في مستهل برنامجها التدريبي الذي بدأته بدورة مكثفة في البرمجة مما جعل هذه الدورة بالنسبة للبعض أمراً مملاً وفيه مضيعة للوقت

لا تنسى بعد تأكدك من اتباع دورات تناسب مستواك العلمي أن تتحرى فيما إذا كانت هذه الدورات توفر للخريجين فرص عمل استناداً لما تم دراسته في الدورة ، يمكن اعتبار أنك وُفقت تماماً في اتباع الدورة الأمثل إذا حصلت على فرصة عمل مناسبة بعد التخرج ولا يهم إن كانت هذه الوظيفة أونلاين أو بدوام كامل أو جزئي

هل دراسة علم البيانات هي الخيار الأفضل بالنسبة لك ؟

القناعة فيما يُقدِم عليه المرء سواء كان دراسة أو عمل هي عامل مهم في نجاح هذا المشروع فلا يمكن لأي شخص أن يبدع في أي مجال مالم يكن مقتنع تماماً بما يقوم به

هناك الكثير من الأشخاص يكون خيار دراسة الماجستير بالنسبة لهم هو بمثابة فرصة لتأجيل قرارات تتعلق بما يجب عليه فعله في حياتهم ولكن في الحقيقة وفي هذه الحالة يتحول موضوع دراسة قيمة مثل هذه إلى إهدار كبير للوقت ، فالأجدر في مثل هذه الحالات أن يستهلك الوقت الضائع في اكتساب خبرة العملية التي توسع من مهاراتك ومعارفك في مجال علم البيانات والذكاء الاصطناعي

علم البيانات علم متعدد المجالات وفروعه كثيرة ومتشعبة وكلها ذات قيمة وتفتح أمام دارسيها آفاق واسعة من المعارف والخبرات التي تصل بصاحبها إلى ما يرنو إليه وتجعل أهدافه في مرمى نظره والوصول إليها مسألة وقت لا أكثر

في النهاية عزيزي القارئ : نرجو أن تكون قد حصلت على الفائدة والمتعة في هذه المقالة ولا تنسى أن تشاركنا رأيك في ذُكر آنفاً ، مع تمنياتنا لك بالتوفيق والنجاح

Advertisements

Machine learning roadmap from zero to professional – 2024

Posted on April 24, 2023June 12, 2024 by s4l8384gmailcom

Advertisements

Machine learning is the science of the times, as the demand for its learning is increasing rapidly and significantly

In this article, we will shed light on the best way to learn machine learning skills so that the learner can invest them in the future in developing scientific research worldwide.

Therefore, we must first mention the concept of machine learning in a nutshell

Machine learning is a set of information that is fed into a computer in order to develop and grow over time by developing statistical models and algorithms on which computer systems operate without resorting to specific orders.

Machine learning map:

The first stage: learning the programming language

In this case, it is preferable to learn Python, as it is the most powerful and popular, due to the libraries it contains such as Pandas, Numpy, and Scikit, which are specialized in machine learning, statistics, and mathematics.

The second stage: learning linear algebra

Linear learning is one of the branches of mathematics, but it tends to deal with linear transformations and is also concerned with dealing with matrices and vectors.

Learning linear algebra is a crucial step forward in the journey of studying machine learning

The third stage: learning the basic libraries of Python

They are as we have mentioned:

While there are other libraries for Python, these three libraries are considered the most efficient to serve their application to machine learning techniques.

Advertisements

The fourth stage: learning machine learning algorithms

They are three types:

Supervised machine learning
Unsupervised machine learning
Reinforcement machine learning

Regression Algorithms

Regularization Algorithms

Instance-Based Algorithms

Decision Tree Algorithms

Clustering Algorithms

Bayesian Algorithms

Association Rule Learning Algorithms

Ensemble Algorithms

Dimensionality Reduction Algorithms

Artificial Neural Network Algorithms

Deep Learning Neural Network Algorithms

Fifth stage: continuous practice

This stage is no less important than the previous steps, and this is achieved by applying the previous steps to a variety of data sets

You can gain a lot of experience with algorithms by participating in Kaggle contests

Advertisements

خارطة طريق التعلم الآلي من الصفر حتى الاحتراف – 2024

Advertisements

يعتبر التعلم الآلي علم العصر إذ يزداد الإقبال على تعلمه بشكل متسارع وملحوظ

وفي هذا المقال سنسلط الضوء على الطريقة الأمثل لتعلم مهارات التعلم الآلي بحيث يتمكن المتعلم من استثمارها مستقبلاً في تطوير الأبحاث العلمية على مستوى العالم

لذا لابد في البداية من أن ننوه إلى مفهوم التعلم الآلي باختصار

التعلم الآلي هو مجموعة من المعلومات تُلقَّن إلى الكمبيوتر بغية تطويره ونموه مع مرور الزمن عن طريق تطوير النماذج الإحصائية والخوارزميات التي تعمل عليها أنظمة الحاسوب دون اللجوء إلى أوامر محددة

:خارطة التعلم الآلي

المرحلة الأولى : تعلم لغة البرمجة في هذه الحالة يفضل تعلم بايثون فهي الأقوى والأكثر شيوعاً نظراً لما تحويه من مكتبات

Pandas و Numpy و Scikit : مثل

وهي مختصة بالتعلم الآلي والإحصاء والرياضيات

المرحلة الثانية : تعلم الجبر الخطي

يعتبر التعلم الخطي أحد فروع علوم الرياضيات إلا أنه يتجه إلى التعامل مع التحولات الخطية ويهتم أيضاً بالتعامل مع المصفوفات والمتجهات

ويعتبر تعلم الجبر الخطي خطوة مفصلية للمضي قدماً في رحلة دراسة التعلم الآلي

المرحلة الثالثة : تعلم المكتبات الأساسية لبايثون

: وهي كما أسلفنا

ومع وجود مكتبات أخرى لبايثون إلا أن هذه المكتبات الثلاثة تعتبر الأكثر كفاءة بما يخدم تطبيقها على تقنيات التعلم الآلي

Advertisements

المرحلة الرابعة : تعلم خوارزميات التعلم الآلي

: وهي ثلاثة أنواع

Supervised machine learning
Unsupervised machine learning
Reinforcement machine learning

خوارزميات الانحدار

خوارزميات التنظيم

الخوارزميات القائمة على المثيل

خوارزميات شجرة القرار

خوارزميات التجميع

Bayesian Algorithms

خوارزميات تعلم قواعد الرابطة

خوارزميات المجموعة

خوارزميات تخفيض الأبعاد

خوارزميات الشبكة العصبية الاصطناعية

خوارزميات التعلم العميق للشبكة العصبية

المرحلة الخامسة : الممارسة المستمرة

وهذه المرحلة لا تقل أهمية عن الخطوات السابقة ويتحقق ذلك عن طريق تطبيق الخطوات السابقة على مجموعات متنوعة من البيانات ويمكنك اكتساب خبرة كبيرة بالتعامل مع الخوارزميات عن طريق

Kaggle المشاركة في مسابقات

Advertisements

10Excel functions for data analysis

Posted on March 6, 2023September 27, 2023 by s4l8384gmailcom

Advertisements

The Excel program is one of the programs that has features and characteristics that help the user to analyze data easily, and due to the multiple formulas and functions it provides that are capable of carrying out a set of operations, from which we will discuss in our article these functions of calculations, character and date text tasks, and a set of other research tasks

1. CONCATENATE

This formula is considered one of the most effective formulas in analyzing data, despite its ease and simplicity of working with it. Its task is to use dates, texts, numbers, and different data present in several cells and merge them into one cell.

SYNTAX = CONCATENATE (text1, text2, [text3], …)

Concatenate multiple cell values

The simple CONCATENATE formula for the values of two cells A2 and B2 is as follows:

= CONCATENATE (A2, B2)

The values will be combined without using any delimiter, and to separate the values with a space we use “ ”

=CONCATENATE(A3, “ “, B3)

Connect a string of texts and the computed value

You can also bind a string and a computed value to the formula as in the example of restoring the current date

=CONCATENATE(“Today is ”, TEXT(TODAY(), “dd-mmm-yy”))

You can verify that the results provided by the CONCATENATE function are correct by doing the following:

In all cases, the result of the CONCATENATE function is a text string, even if all the source values are numbers

Make sure there is a text argument in the CONCATENATE function to ensure that it works

You have to pay close attention to the validity of the text argument in order for the CONCATENATE function to work correctly, otherwise the formula will return the error #VALUE! This is because the arguments are not valid

2.Len()

This function is used to know the number of characters in one cell, or when dealing with text that contains a limited number of characters, or to know the difference between the numbers of a group of products

SYNTAX = LEN (text)

3.Days()

This function is used to calculate the number of days between two dates

SYNTAX = DAYS (end_date, start_date)

4.Networkdays

It is considered to be a function of date and time in Excel and is often used by finance and accounting departments to exclude the number of weekends to determine the wages of employees based on the calculation of actual working days for them or the calculation of the total number of working days for a specific project

SYNTAX = NETWORKDAYS (start_date, end_date, [holidays])

5.Sumifs()

It is one of the most common formulas in Excel and is considered one of the most important functions for data analysts =SUMIFS. =SUM, especially for conducting data collection under sample conditions

SYNTAX = SUMIFS (sum_range, range1, criteria1, [range2], [criteria2], …)

Advertisements

6. Averageifs()

This task allows the average to be extracted from one or more parameters

SYNTAX = AVERAGEIFS (avg_rng, range1, criteria1, [range2], [criteria2], …)

7. Countsifs()

It is an important tool in data analysis and it is similar to SUMIFS. In most functions it counts the number of values that satisfy certain conditions but it doesn’t need a summation range

SYNTAX = COUNTIFS (range, criteria)

8.Count()

Its job is to determine whether a cell is empty or not by discovering gaps in the data set without you, as a data analyst, having to restructure it.

SYNTAX = COUNTA (value1, [value2], …)

9. Vlookup()

This shortcut stands for Vertically searching for a value in the leftmost column of the table so that you can return a value in the same row of the column you specify

SYNTAX = VLOOKUP (lookup_value, table_array, column_index_num, [range_lookup])

We will explain the arguments to the VLOOKUP function

– lookup_value : is the value to look up in the first column of the table

table – : indicates the table from which the value is to be retrieved

-col_index: returns the column in the table from the value

range_lookup – :

Optional: TRUE = approximate match

Default: FALSE = exact match

The following table will explain the use of VLOOKUP

Cell A11 contains the lookup value

A2:E7 is the table array

3 is the column index with the information for the sections

0 is the search for the range

If you press the Enter key, it will return “Marketing”, which indicates that Stuart works in the marketing department

10. Lookup()

In it, “horizontal” is represented by the letter H, and it searches for one or more values in the top row of the table, then it retrieves a value from a row you specify in the table or row from the same column if this tool makes things easier, for example when the values you use are in the rows The first one from the spreadsheet and you need to look at a certain number of rows, this tool will do the trick

SYNTAX = HLOOKUP (lookup_value, table_array, row_index, [range_lookup])

Let’s learn about Hlookup’s arguments

Lookup_Value denotes the attached value

table — the table from which you need to retrieve data

ROW_INDEX which is the row number to restore the data

Range_lookup for exact and approximate matching, and that is determined by specifying the validity of the default value, so the match is approximate

In our next example, we’ll search for the city Jenson is from using Hlookup.

The search value shown in H23 is Jenson

G1: M5 is the table array

4 is the row index number

0 is for an approximate match

Pressing enter will take you back to New York.

at the end

We conclude from the above how effective Excel is in analyzing data. By learning its formulas and functions, you can make work easier for you and thus save a lot of time and effort.

Advertisements

عشرة وظائف لإكسل في تحليل البيانات

Advertisements

يعتبر برنامج إكسل من البرامج التي تتمتع بميزات وخصائص تعين المستخدم على تحليل البيانات بسهولة ونظراً لما يوفره من صيغ ووظائف متعددة قادرة على تنفيذ مجوعة عمليات سنتناول منها في مقالنا هذه وظائف العمليات الحسابية ومهام نصوص الأحرف والتاريخ ومجموعة أخرى من مهام البحث

CONCATENATE 1

تعتبر هذه الصيغة من الصيغ الأكثر فاعلية في تحليل البيانات رغم سهولتها وبساطة العمل بها وهي مهمتها استخدام التواريخ والنصوص والأرقام وبيانات مختلفة موجودة في عدة خلايا ودمجها في خلية واحدة

SYNTAX = CONCATENATE (text1, text2, [text3], …)

تسلسل قيم خلايا متعددة

CONCATENATE صيغة

A2 و B2 البسيطة لقيم خليتين

هي كما يلي

= CONCATENATE (A2، B2)

“ “سيتم دمج القيم بدون استخدام أي محدد ، ولفصم القيم بمسافة نستخدم

=CONCATENATE(A3, “ “, B3)

ربط سلسلة من النصوص والقيمة المحسوبة

كما ويمكنك ربط سلسلة نصية وقيمة محسوبة بالصيغة كما في المثال الموضح عن استعادة التاريخ الحالي

=CONCATENATE(“Today is “, TEXT(TODAY(), “dd-mmm-yy”))

ويمكنك التأكد من صحة النتائج التي تقدمها

CONCATENATE الدالة

من خلال اتباع ما يلي

في جميع الأحوال تكون نتيجة *

CONCATENATE الدالة

عبارة عن سلسلة نصية وإن كانت جميع قيم المصدر أرقاماً

احرص على وجود وسيطة نصية في *

CONCATENATE دالة

لضمان عملها

وعليك أن تنتبه جيداً من صحة الوسيطة النصية لكي تعمل *

CONCATENATE الدالة

بشكل صحيح وإلا فالصيغة

#VALUE! سترجع لك الخطأ

وهذا سببه أن الوسيطات غير صالحة

Len() 2.

تستخدم هذه الدالة لمعرفة عدد الأحرف في الخلية الواحدة ، أو عند التعامل مع نص يحوي عدد محدود من الأحرف أو معرفة الاختلاف بين أرقام مجموعة من المنتجات

SYNTAX = LEN (text)

Days() 3.

تستخدم هذه الدالة لحساب عدد الأيام الواقعة بين تاريخين

SYNTAX =DAYS (end_date, start_date)

Networkdays4.

وهي تعتبر أنها دالة التاريخ والوقت في إكسل وتستخدم غالباً من قبل أقسام المالية والمحاسبة لاستبعاد عدد عطلات نهاية الأسبوع لتحديد أجور الموظفين بناءً على حساب أيام العمل الفعلية لهم أو حساب عدد كامل أيام العمل لمشروع معين

SYNTAX = NETWORKDAYS (start_date, end_date, [holidays])

Sumifs() 5.

وهي من الصيغ المتداولة بكثرة في إكسل وتعتبر من أهم الوظائف بالنسبة لمحللي البيانات

=SUMIFS. =SUM

وخصوصاً لإجراء عملية جمع للبيانات وفق شروط معينة

SYNTAX = SUMIFS (sum_range, range1, criteria1, [range2], [criteria2], …)

Advertisements

Averageifs() 6.

تتيح هذه المهمة استخلاص المتوسط من معلمة واحدة أو أكثر

SYNTAX = AVERAGEIFS (avg_rng, range1, criteria1, [range2], [criteria2], …)

Countsifs() 7.

من الأدوات المهمة في تحليل البيانات

SUMIFS. وهي تتشابه مع

في معظم الوظائف فهي تقوم بحساب عدد القيم التي تحقق شروط معينة إلا أنها لا تحتاج إلى نطاق جمع

SYNTAX = COUNTIFS (range, criteria)

8. Counta()

مهمتها هي أن تحدد هل الخلية فارغة أم لا من خلال اكتشاف الفجوات الموجودة في مجموعة البيانات دون أن تضطر كمحلل بيانات إلى إعادة هيكلتها

SYNTAX = COUNTA (value1, [value2], …)

9. Vlookup()

يدل هذا الاختصار على البحث الشاقولي عن قيمة ما في العمود الكائن في أقصى يسار الجدول ليتسنى لك إرجاع قيمة في نفس الصف من العمود الذي تحدده

SYNTAX = VLOOKUP (lookup_value, table_array, column_index_num, [range_lookup])

VLOOKUP وسنقوم بشرح الوسيطات للدالة

lookup_value

هي القيمة التي عليك البحث عنها في العمود الأول من الجدول

table

يدل على الجدول التي يتم استرداد القيمة منه

col_index

يتيح استعادة العمود الموجود في الجدول من القيمة

range_lookup

اختياري : TRUE = approximate match

افتراضي : FALSE = exact match

VLOOKUP وسيوضح الجدول التالي استخدام

lookup تحوي قيمة A11 الخلية

هي صفوف الجدول A2: E7

رقم 3 هو فهرس العمود مع المعلومات الخاصة بالأقسام

رقم 0 هو البحث عن النطاق

Enter وفي حال الضغط على مفتاح

فسيعيد “التسويق” وهذه دلالة على أن

يعمل في قسم التسويق Stuart

10. Hlookup()

“وفيه يمثل “الأفقي

H بالحرف

وهو يبحث عن قيمة واحدة أو أكثر في الصف العلوي من الجدول، ثم يقوم باستعادة قيمة من صف تحدده في الجدول أو الصف من نفس العمود إذا تقوم هذه الأداة بتسهيل الأمور أكثر فمثلاً عند تكون القيم التي تستخدمها موجودة في الصفوف الأولى من جدول البيانات واحتجت إلى أن تتطلع على عدد صفوف معين فهذه الأداة تفي بالغرض

SYNTAX = HLOOKUP (lookup_value, table_array, row_index, [range_lookup])

Hlookup لنتعرف على وسيطات

Lookup_Value

يدل على القيمة المرفقة

table —

وهو الجدول الذي عليك استعادة البيانات منه

ROW_INDEX

وهو رقم الصف لاستعادة البيانات

Range_lookup

للمطابقة الدقيقة والتقريبية وذلك يتحدد بتحديد صحة القيمة الافتراضية فبصحتها يكون التطابق تقريبي

في مثالنا التالي سنقوم بالبحث عن المدينة

Jenson التي ينتمي إليها

Hlookup. باستخدام

Jenson وهي H23 تظهر قيمة البحث في

هي صفوف الجدول G1: M5

رقم 4 فهرس الصف

رقم 0 اختبار تقريبي

Enter وبالضغط على

“سيعيدك إلى ” نيويورك

وفي الختام

نستخلص مما سبق مدى فاعلية إكسل في تحليل البيانات فبتعلمك صيغه ووظائفه يمكنك تسهيل العمل عليك وبالتالي توفر الكثير من الوقت والجهد

Advertisements

Data Analyst Roadmap for 2024

Posted on February 20, 2023May 23, 2024 by s4l8384gmailcom

Advertisements

We will learn about the roadmap for those coming to data analysis for the year 2023, supported by links to tools, tutorials, and online courses.

The primary function of data analysts within any company is to fully study customer data in order to provide the best service to them and to conduct statistics that enable service providers to know the most appropriate behavior for the customer.

Data Analyst Roadmap for 2023

Learning programming is the first step to embarking on the data analysis journey, and knowledge of computer science, especially databases and SQL, also helps in this. In the midst of our conversation, we will mention the resources necessary to make you a data analyst.

This map is your guide to learning the skills of a successful data analyst for the year 2023. It includes the basic steps for the stages of learning in a simplified and understandable manner. If you see that there are other tools added to this map, we are pleased to interact with you and mention them in the comments. Your opinion is important to us.

Now we will discuss the important resources mentioned in this map:

1. Learn Python

There is no doubt that learning the Python language is the ideal start to the journey of learning data analysis. Learning the codes of this programming language is an essential pillar of data analysis jobs. There is complete compatibility between data analysis and visualization packages and the Python language, in addition to the existence of a wide environment of users of this language. It helps you find solutions to professional problems that you may encounter, and this also enhances the presence of a large number of online Python courses, and here we recommend specializing in Python from Coursera, through which you can use Python at an intermediate level within three months at most

Python For Everybody

Coursera offers a very useful educational course for beginners in the Python language, as it starts from the basics of Python, then it will take you to the web, interact with the database in this language

By learning the Python language, you have come a long and important way in learning data analysis, then we can move on to other things that must be learned after the Python language.

Advertisements

2. Data visualization and processing

It is very necessary for the data analyst to be fully aware of data visualization, as you need, by virtue of your work, to convert the raw data into charts to clarify it further

Therefore, you must learn visualization and data processing libraries, which we will talk about some of them with an explanation of the different tools and features between one library and another

Numpy Library

The working principle of this library depends on matrices and the implementation of arithmetic operations, and it is widely circulated among data analysts and it is recommended to learn it at the beginning

Pandas Library

Dedicated to importing and modifying data, you need to analyze and clean the data

Matplotlib library

This library is open source, so it is the most popular among data analysts, and thus you can find a large number of users that you can use to solve some problems that you may encounter, in addition to that it offers an infinite number of charts to work on

Seaborn Library

It differs from its predecessor in that it provides infinite layouts that can be customized to suit your requirements and are easy to learn

Tableau Library

Just import your data into this library then unleash your imagination and start customizing your visualizations because it offers you the use of data visualization without having to learn any programming language

3. Learn to count:

One of the indications of increasing employment opportunities for a data analyst is his possession of statistics skills, and the importance of learning statistics lies in dealing with a large number of data in a deep way, so you need to make predictions based on decisions that you have to make according to the results of counting this data

We recommend learning this course provided by the Coursera platform for beginners in statistics, which starts you from the basics related to sampling, distribution, probability, regression, etc.

Conclusion:

Have you noticed the simplicity of this roadmap that you can rely on to become an experienced data analyst? Of course, we cannot limit learning the programming language to the Python language, as you can learn other languages, the R language, but it is agreed that the Python language is very ideal for data analysis without neglecting the importance of the rest of the languages

With our wishes of success

Here are some great sources of learning:

We hope that we have achieved in this article the ideas that benefit data analysts, and do not forget to share with us in the comments the ideas that you see adding more value to this map .. We are waiting for you.

Advertisements

خارطة الطريق الخاصة بمحلل البيانات لعام 2024

Advertisements

سنتعرف على خارطة الطريق للمقبلين على تحليل البيانات للعام 2023 مدعومة بالروابط الخاصة بالأدوات والبرامج التعليمية وبالدورات التدريبية عبر الإنترنت

تكمن الوظيفة الأساسية لمحللي البيانات ضمن أي شركة في دراسة كاملة حول بيانات العملاء بغية توفير الخدمة الأمثل لهم وإجراء إحصائيات تمكن مقدمي الخدمة من معرفة السلوك الأنسب للعميل

خارطة طريق محلل البيانات لعام 2023

يعتبر تعلم البرمجة هو الخطوة الأولى للسير في رحلة تحليل البيانات ويساعد في ذلك أيضاً معرفة علوم الكمبيوتر

SQLوخاصة قواعد البيانات و

وسنأتي في خضم حديثنا على ذكر الموارد اللازمة لتجعل منك محلل بيانات

تعتبر هذه الخارطة دليلك لتعلم مهارات محلل البيانات الناجح لعام 2023 ، فهي تتضمن الخطوات الأساسية لمراحل التعلم بشكل مبسط ومفهوم ولك إن كنت ترى أن هناك أدوات أخرى تضاف إلى هذه الخارطة فيسعدنا تفاعلك معنا وذكرها في التعليقات فرأيك مهم بالنسبة لنا والآن سنتطرق إلى ذكر الموارد المهمة الواردة في هذه الخارطة

1. تعلّم لغة بايثون

مما لا شك فيه أن تعلم لغة بايثون هو البداية المثالية لرحلة تعلم تحليل البيانات فتعلم كودات لغة البرمجة هذه هو ركن أساسي من أركان وظائف تحليل البيانات ، فهناك توافق تام بين حزم تحليل البيانات والتصور وبين لغة بايثون ، علاوة على وجود بيئة واسعة من مستخدمي هذه اللغة تساعدك على إيجاد الحلول للمشاكل المهنية التي قد تعترضك وهذا أيضاً يعزز وجود عدد كبير من الدورات التعليمية للغة بايثون عبر الإنترنت وهنا ننصح بتخصص بايثون من كورسيرا التي من خلالها يمكنك أن تستخدم بايثون بمستوى متوسط خلال ثلاثة أشهر على الأكثر

بايثون للجميع

تقدم كورسيرا دورة تعليمية مفيدة جداً للمبتدئين في لغة بايثون فهي تبدأ من أساسيات بايثون ثم ستنتقل بك إلى الويب التفاعل مع قاعدة البيانات بهذه اللغة

وبتعلمك للغة بايثون تكون قد قطعت شوطاً كبيراً ومهماً في تعلم تحليل البيانات، عندها يمكن أن ننتقل إلى الأمور الأخرى التي يجب تعلمها بعد لغة بايثون

Advertisements

2. التصور ومعالجة البيانات

من الضروري جداً لمحلل البيانات أن يكون على دراية تامة بتصور البيانات، فأنت بحاجة بحكم عملك أن تقوم بعملية تحويل البيانات الأولية إلى مخططات لإيضاحها بشكل أكبر

لذا لابد لك من تعلم مكتبات التصور ومعالجة البيانات والتي سنتناول الحديث عن بعضها مع توضيح اختلاف الأدوات والميزات بين مكتبة وأخرى

Numpy مكتبة

يعتمد مبدأ عمل هذه المكتبة على المصفوفات وتنفيذ العمليات الحسابية وهي متداولة بكثرة بين محللي البيانات وينصح بتعلمها في البداية

Pandas مكتبة

مخصصة لاستيراد البيانات والتعديل عليها فأنت بحاجة إلى تحليل البيانات وتنظيفها

Matplotlib مكتبة

تعتبر هذه المكتبة مفتوحة المصدر لذا فهي الأكثر شيوعاً بين محللي البيانات وبهذا يمكنك إيجاد عدد كبير من المستخدمين الذين يمكنك الاستعانة بهم لحل بعض المشاكل التي قد تعترضك فضلاً عن أنها تقدم عدد لا نهائي من المخططات للعمل عليها

Seaborn مكتبة

تختلف عن سابقتها بأنها توفر مخططات لا حصر لها يمكن بتخصيصها بما يتلاءم مع متطلباتك وهي سهلة التعلم

Tableau مكتبة

ما عليك إلا استيراد بياناتك إلى هذه المكتبة ثم أطلق العنان لمخيلتك وابدأ بتخصيص تصوراتك لأنها توفر لك استخدام تصور البيانات دون الحاجة إلى تعلم أي لغة برمجة

3. تعلَّم الإحصاء

من دلائل زيادة فرص التوظيف بالنسبة لمحلل البيانات هو امتلاكه لمهارات الإحصاء وتكمن أهمية تعلم الإحصاء في التعامل مع عدد كبير من البيانات وبشكل عميق، إذاً أنت بحاجة لإجراء التنبؤات استناداً إلى قرارات عليك اتخاذها وفق نتائج إحصاء هذه البيانات ننصح بتعلم هذه الدورة المقدمة من منصة كورسيرا للمبتدئين في الإحصاء التي تنطلق بك من الأساسيات المتعلقة بأخذ العينات وتوزيعها والاحتمال والانحدار.. إلخ

: الخلاصة

هل لاحظت بساطة هذه الخارطة التي يمكنك الاعتماد عليها لتصبح محلل بيانات متمرس ؟ طبعاً لا يمكننا أن نحصر تعلم لغة البرمجة بلغة بايثون

R فبإمكانك تعلم لغات أخرى كلغة

ولكن من المتفق عليه أن لغة بايثون مثالية جداً لتحليل البيانات دون إهمال أهمية باقي اللغات

مع تمنياتنا لكم بالتوفيق

: إليكم بعض المصادر المهمة للتعلم

تمنى أن نكون قد حققنا في هذه المقالة الأفكار التي تعود على محللي البيانات بالفائدة المرجوة ولا تنسى أن تشاركنا في التعليقات بالأفكار التي تراها تضيف إلى هذه الخريطة قيمة أكبر.. نحن بانتظارك

Advertisements

Mistakes You Might Make As A Beginner Programmer

Posted on February 17, 2023February 17, 2023 by s4l8384gmailcom

Advertisements

As a beginner in programming, you must fall into some errors that often result from any new start in a specific field. In fact, this is considered normal, and like other sciences that are the gateway to the world of modern technology, programming is considered one of the most important techniques that must be fully mastered and professionalized, and thus avoided. Making mistakes that novice programmers often make, which we will highlight in this article:

1) Haste and lack of concentration in writing the code:

It is not possible in any way to obtain a correct and accurate code that works on small and large applications if it had not been planned before with a lot of focus and accuracy. The stage of preparing the code must include important stages that must be studied on each one of them, which are in order: thinking, then research, then Planning, writing, verification, and modification if necessary.

Programming is not only just a code book, but also a technology that requires skill and creativity based on logic.

2) Not preparing an appropriate plan before commencing writing code:

As the absence of a general plan prepared for writing the appropriate codes is one of the most important factors of dispersion, so there must be no excessive planning in preparing the code, meaning that you do not need to exaggerate in preparing a model plan that consumes your time and effort, but rather it is sufficient to form a simplified idea through which you can start correctly and this It does not mean that you may not have to change the plan during work, but at least you have laid a correct foundation stone that you can rely on, whether to continue the work or amend it if necessary.

So, following this approach to planning makes it easier for you to act according to the requirements of the situation, such as adding or removing features that you did not think of in the first place, or fixing a defect somewhere, and this explicitly teaches you to be smooth and flexible in programming, ready to deal with any emergency circumstance.

3) neglecting code quality:

Coding quality is one of the most important pillars of writing correct code. Code is good when it is clear and readable. Otherwise, it turns into stale code.

Moreover, clarity of the code is the best way to properly form executable code and this is the primary task of the programmer

Any defect in the simplest things can prevent the code from working properly. For example, inconsistency with indentation and capitalization breaks the code from working, as shown in the example:

Also, long lines are usually difficult to read, so you should avoid exceeding 80 characters in each line of code.

In order to avoid making such errors, you can use the checking and formatting tools available in JavaScript, through which you can fix what can be fixed, so avoid yourself entering into mazes that are difficult for you to solve

The best option to maintain the quality of the codes for you is to know the most common errors and work to avoid them, including:

• Too many lines used in a file or function, breaking up long code into many smaller parts makes it easier for you to test each one separately

• Lack of clarity in naming short or specific variables

• Not describing the encoding of strings and raw numbers, and to avoid that, be sure to put the values indicating this encoding in a constant and give it an appropriate name

• Waste of time in dealing with simple problems that can be dealt with with a little skill and maneuvering in the use of appropriate abbreviations

• Neglecting appropriate alternatives that lead to ease of reading, such as exaggerated use in conditional logic

4) Haste to use the first solution:

This happens when the novice programmer searches for solutions that rid him of the problems he encounters, so he hastens to use the first solution he produces without taking into account the complications that will result that may hinder the correct programming and thus lead to failure, so the first solution is not necessarily the correct one.

Therefore, it is better to discover several solutions and choose the most appropriate one. Here, a very important point should be noted, which is that if you do not come up with several solutions to a problem, you are most likely unable to identify the problem accurately.

The evidence of the programmer’s skill lies in his choice of the simplest solution to address the problem, and not in his escape to the first solution he reaches in order to get rid of the problem immediately.

5) Sticking to the idea of the first solution:

Completely avoid sticking to the first solution, even if it requires more effort from you. When you feel doubt about the correctness of the solution, quickly get rid of the bad code and try to understand the problem and re-understand it more accurately, and always remember the skill is to get a simple solution that makes it easier for you to make appropriate decisions in dealing with the problem. You can also use source control tools such as GIT that provide many useful solutions

6) Rely on Google:

Beginner programmers often resort to solving some of the problems that they encounter while writing codes through the Google search engine. The problem that they faced may have faced many before them, so the solution is often present, and this actually saves some time in searching for a solution to the problem somewhat, and this is apparent, but have you thought This solution in the form of a line of code will continue with you as appropriate to your situation, be very careful not to use any line of code that is not clear to you and if you see it as the solution to your problem

7) Not using encapsulation:

Encapsulation is a system that works to protect variables in applications by hiding properties while maintaining the possibility of benefiting from them. This system is useful, for example, for making safe changes in the internal parts of functions without exposure to other parts. Neglecting the packaging process often leads to difficulty in maintaining systems

8) Wrong view of the future:

It is necessary for the programmer to have an insight and to study all the possibilities for each next step when writing code, and this is useful in testing advanced cases, but be careful not to let this view be your guide to implementing the expected needs by writing code that you do not need now, assuming that you can need it in the future Stay as consistent as possible with the style of coding you need in your day.

9) Use wrong data structure:

Determining the strengths and weaknesses of the data structures used in the programming language is evidence of the programmer’s skill and experience in this field. This point can be illustrated by some practical examples:

If we talk about the JavaScript language, we find that the array is the most used list, and the most used map structure is an object.
In order to manage the list of records, each record contains a specific identifier to search for, maps (objects) must be used instead of lists (arrays), and the use of numerical lists is the best option if the goal is to push values into the list

10) Turn your code into a mess:

In the event that there are codes that cause defects and irregularities in the code, they must be dealt with immediately and the resulting chaos removed, as in the following cases:

Duplicate code: This occurs when code is copied and pasted into a line of code, which leads to defects and irregularities resulting from code repetition.
Neglecting the use of the configuration file: If a certain value is used in different places in the code, this value belongs to the configuration file, to which any new value added to the code must belong anyway
Avoid unnecessary conditional statements (if): It is known that conditional statements are logic associated with at least two possibilities, and it is necessary to avoid unimportant conditions while maintaining readability, so what is meant here is that expanding the function with sub-logic follows a conditional statement (if) at the expense of Inclusion of another task causes unnecessary clutter and should be avoided as much as possible To clarify the issue of the conditional statement (if), consider this code:

Note that the problem is with the isOdd function, but have you noticed a more obvious problem?

The if statement is unnecessary, here is the equivalent code:

11) Include comments on understandable things:

It is necessary, even if it seems difficult at first, to avoid, as much as possible, including comments on understandable and obvious matters, as you can replace them with elements bearing names that are added to the code

For clarification, see the example with additional comments:

Notice the difference when writing the code without comments in this screenshot:

So, we noticed that listing names is more effective than including unimportant comments

However, this rule should not be generalized on the foundations of programming in general, but there are cases in which clarity is not complete without the inclusion of comments, in such cases you should structure your comments to know the reason for the existence of this code instead of a question and so on, even those who prefer to include comments We advise them to avoid mentioning the obvious matters, and to crystallize this idea more deeply, we note this example, which shows the presence of unnecessary comments:

Advertisements

12) Don’t include tests in your code:

Some programmers may think that they do not need to write tests in the code, and most likely they test their programs manually, and this may be out of their excessive confidence that they do not need to write tests in their code, but this cannot be considered negative at all because even if you want to know the mechanism Test automatically, you have to test it manually

If you pass an interaction test with one of your applications and want to perform the same interaction automatically next time, you must return to the code editor to add more instructions.

Here it should be noted that your memory may betray you in retrieving the test of successful checks after each change in the code, so assign this task to the computer and you only have to start guessing or creating your own checks even before writing the code. Development based on TDD testing, albeit not It is available to everyone but it positively influences your style which guides you to create the best design

13) Do you think that the task is going well?

Let’s see this image showing a function that implements the sumOddValues property. Does it have an error?

Have you noticed that the above code is incomplete, although it deals properly with certain cases, but it contains many problems, including:

First problem: Where does the null input handle?

There is an error that detects the function’s execution caused by calling it without arguments

There are two reasons why this erroneous code may occur:

• The details of your job implementation should not be shown to its users

• In the event that your job does not work with users and the error is caused by incorrect use, this will appear clearly, so you can program an exception thrown by the job, which the user refers to as follows:

Better yet, you can avoid the error issue by programming your function to ignore null inputs

The second problem: wrong entries are not handled

See what the function will throw if the function is fetched with an object, string, or integer value:

Although array.reduce is a function

Anything that calls function (42) in our previous example is called an array inside a function because we named the function argument array so we noticed that the error says that 42.reduce is not a function

But maybe if the error appeared in the following form it would be more useful

It should be noted here that the aforementioned two problems are secondary errors that must be avoided intuitively, in addition to the existence of cases that require thinking and planning, as in the following example, which shows what will happen if we use negative values

The function here should have been called sumPositiveOddNumbers so that the previous line does not appear

The third problem: Not testing all the correct cases due to forgetting some exceptional cases.

The number (2) is included in the group even though it should not be in it

This problem appeared because reduce used only the first value in the collection as the initial value of the accumulator which is in our previous example number 2 so the solution here is that reduce accepts a second argument to be used as the initial value of the accumulator

This is where testing is necessary, although you may have discovered the problem when writing the code and including the tests with other operations

14) Exaggerated reassurance about the validity of current code

Some codes may seem useful to novice programmers, so they use them safely in their code, without knowing that sometimes they may be bad, but they were put because the developer was forced to put them in this way, causing problems for beginners, so it is necessary here to include a comment by developers targeting beginners to clarify There is a reason why this code is included in this way

Therefore, as a beginner, you should put any code that you want to use from another place into question until you understand what it is and why it exists in order to avoid making mistakes that you are indispensable for.

15) Extra care to use the ideal methods in programming

Although the ideal methods are called by this name, they do not always carry this meaning, and this happens when the novice programmer devotes most of his attention to following the ideal methods, or at least the methods that he deems ideal, ignoring some cases that require him to act differently to some of the basic rules in programming. Situations that will put you in front of a challenge that only your good behavior and skill that you will need to develop through dealing with these circumstances will save you.

16) The obsession with poor performance

To get rid of the obsession with fear of making mistakes during programming, always be careful from the beginning, with every line of code pay close attention and recall your information and skills that avoid you from making mistakes, but this concern to improve your performance before starting should not be exaggerated and good judgment before starting It is he who will help you to decide whether the situation is preparing to improve performance before starting, or that the improvement in some cases will be an unjustified waste of time and effort.

17) Not choosing user-friendly experiences

One of the characteristics of the successful programmer is that he always puts himself in the place of the user and looks at the application that he designed or developed from the user’s point of view. By adding them to your list of affiliate links, this helps a lot in getting better results

18) Disregard for users’ experience by developers

Each programmer has his own preferred method and tools in the programming process, some of them are good, some are less good, and some are bad, but in general, the tools used in programming can be called quality according to their locations. There are cases where the tools are good at a time when the same tools are bad in other places.

The novice programmer often prefers the widely circulated tools, regardless of their usefulness in his programming, as he is a novice programmer, but in order for this programmer to start moving to higher levels of experience, he must select tools based on their efficiency in addressing certain functions that require their use in the first place, so the programmer gains more openness And good behavior and gets rid of a problem that many suffer from, which is clinging to tools that they used to deal with in all cases.

19) Data problems caused by code errors

Data are the basic pillars that form the structure of programs, which are basically an interface for entering new information or deleting old ones from it. Therefore, the smallest error in the code will lead to an unexpected defect in the data, and this is what some novice programmers fall into if they sometimes use codes that they think have succeeded in Validation tests believe that a broken feature is not necessary The problem is exacerbated when the validation program continuously introduces data problems that were not understood from the beginning, causing it to accumulate until it reaches an irreversible level where it is impossible to restore the correct state. To avoid this problem, you can use Multiple layers of data validation, or at least the use of database-specific constraints, which we will now learn about when adding tables and columns to your database:

A NOT NULL constraint applied to a column means that null values are excluded from this column by specifying the field source as not empty in the database
A UNIQUE constraint applied to a column means that duplicate values are excluded within the entire table. This constraint is ideal for user tables related to entering data for a username or e-mail.
The CHECK constraint is a custom expression, and in order for data to be accepted, it must evaluate to true. This is ideal for a percentage column that contains integer values from zero to 100.
PRIMARY KEY constraint Each table in the database includes a key to identify its records, which means that the column values are not empty and also unique
The FOREIGN KEY constraint indicates that the column values must match the values recorded in another table column, which often represents a primary key.

One of the common problems experienced by beginners related to data integrity is the wrong handling of transactions. If a group of operations related to each other needs to change the data source itself, it must be wrapped in a transaction that allows it to be rolled back in the event of a defect or failure in one of these operations.

20) Create new programs wheel

In the world of programming, things change continuously and rapidly, and services and requirements are available in a way that exceeds the ability of the team to keep up with it as it should, and the wheel of programs is like these changing services, so you may not find what you need as a programmer in one of these wheels, so the invention of a new wheel seems inevitable, but in most cases if it exists If the standard wheel design meets your needs, it is best not to design a new wheel

There are many options for software wheels available online and you can try before you buy according to what you need and feature that enables you to see its internal design in addition to that it is free

21) The negative idea of code reviews

Beginner programmers often take a negative attitude towards code reviews, thinking that they are a criticism of them, but as a beginner programmer, if you adopt this attitude, you must completely change your view and invest in code reviews optimally, as it is your opportunity to learn and gain more experience. Every time you learn something new It will be of practical value to you in this field

On the contrary, if you look at the subject in a more comprehensive way, the code reviews may make mistakes and you correct yourself, and therefore you are facing an opportunity to teach and learn, and this in itself is a source of pride for you as a programmer, making your way towards professionalism.

22) Rule out the idea of using source control

One of the negatives that some novice programmers fall into is underestimating the strength of the source control system. Perhaps the reason is because they believe that source control is limited to presenting their changes to others and building on them, but the topic goes far beyond this idea. Commitment messages communicate your implementations as a novice programmer and use them to help supervisors to Your code needs to know how the code got to its current state

Another benefit of source control is the use of features such as scaling options, selective restore, store, reset, mod, and many other valuable tools for your encoding flow.

23) Minimize the use of the common country as much as possible

The common country is a source of problems and should be avoided as much as possible or at the very least reduce its use to the maximum extent, as the more global the scope, the worse the scope of this common condition, so new cases must be maintained in narrow ranges and it is necessary to ensure that they do not leak to the top

24) Not treating mistakes as useful

Many people hate seeing small red error messages while programming, but in fact, the appearance of errors indicates that you are getting more knowledgeable and getting to know more about the glitches that occur even with professional programmers, so you work to remedy them in the future.

25) Continuous and prolonged exhaustion

The novice programmer has an obsession that he must complete the work he is required to do, whatever the cost, and as soon as possible, and this is what drives him to work for long periods, forgetting that he needs rest. These long periods of sitting and thinking cause fatigue, and often the programmer, after long hours of work, reaches a stage where he has not He is no longer able to think even in front of the simplest things, so he stands helpless, so taking a break is necessary to restore mental activity and mental balance.

Advertisements

الأخطاء التي قد ترتكبها كمبرمج مبتدئ

Advertisements

كمبتدئ في البرمجة لابد وأن تقع في بعض الأخطاء التي تنتج غالباً عن أي انطلاقة جديدة في مجال معين و حقيقةً يعتبر هذا أمراً طبيعياً، وكغيرها من العلوم التي هي بوابة الدخول إلى عالَم التكنولوجيا الحديثة تعتبر البرمجة أحد أهم التقنيات التي يجب إتقانها واحترافها بشكل كامل وبالتالي تجنب الوقوع في الأخطاء التي غالباً ما يقع فيها المبرمجون المبتدئون والتي سنسلط الضوء عليها في مقالتنا هذه

1) التسرع وعدم التركز في كتابة الكود

لا يمكن بأي حالة من الأحوال الحصول على كود صحيح ودقيق يعمل على التطبيقات الصغيرة والكبيرة إن لم يكن قد خُطِّط له من قبل بكثير من التركيز والدقة فمرحلة إعداد الكود يجب أن تشمل مراحل هامة لابد من الوقوف على كل واحدة منها وهي بالترتيب: التفكير ثم البحث ثم التخطيط ثم الكتابة ثم التحقق ثم التعديل إذا اقتضى الأمر

فالبرمجة ليست فقط مجرد كتاب تعليمات برمجية فحسب ، بل هي أيضاً تقنية تتطلب المهارة والإبداع المستند إلى المنطق

2) عدم إعداد خطة مناسبة قبل الشروع في كتابة الكودات البرمجية

إذ أن غياب الخطة العامة المعدّة لكتابة الكودات المناسبة أحد أبرز عوامل التشتت لذا لابد من التخطيط غير المفرط في إعداد التعليمات البرمجية أي أنك لست بحاجة إلى المبالغة من إعداد خطة نموذجية تستهلك منك الوقت والجهد بل يكفي أن تكوّن فكرة مبسطة تستطيع من خلالها الانطلاق بشكل صحيح وهذا لا يعني أنك قد لا تضطر إلى تغيير الخطة أثناء العمل ولكنك على الأقل تكون قد وضعت حجر أساس صحيح يمكنك الاعتماد عليه سواء في تتمة العمل أو التعديل إن تطلب الأمر

إذاً اتباع هذا النهج في التخطيط يسهل عليك التصرف وفق مقتضيات الحال كإضافة أو إزالة ميزات لم تكن تخطر ببالك أساساً ، أو إصلاح خلل في مكان ما وهذا يعلمك صراحة أن تكون سلساً ومرناً في البرمجة مستعد للتعامل مع أي ظرف طارئ

3) إهمال جودة الكود

جودة الترميز هي أهم دعائم كتابة تعليمات برمجية صحيحة وتستمد الكودات صفة الجودة عندما تكون واضحة وقابلة للقراءة وإلا ستتحول إلى رموز لا معنى لها

علاوة على أن وضوح الكود هو الطريقة الأمثل لتكوين تعليمات برمجية قابلة للتنفيذ بشكل صحيح وهذه هي المهمة الأساسية للمبرمج

إن أي خلل في أبسط الأمور يمكن أن يعيق عمل الكودات البرمجية بشكل صحيح فعلى سبيل المثال عدم الاتساق مع المسافة البادئة والكتابة بالأحرف الكبيرة يعطل عمل التعليمات البرمجية كما هو موضح في المثال

كما وأن السطور الطويلة تكون عادة صعبة القراءة لذا عليك تجنب تجاوز 80 حرفاً في كل سطر من أسطر الكودات البرمجية

ولتفادي الوقوع في مثل هذه الأخطاء يمكنك الاستعانة باستخدام أدوات الفحص والتنسيق المتوفرة في جافا سكريبت فمن خلالها يمكنك إصلاح ما يمكن إصلاحه فتجنب نفسك الدخول في متاهات يصعب عليك حلها

ويبقى الخيار الأفضل للحفاظ على جودة الكودات بالنسبة لك هو معرفة الأخطاء الأكثر شيوعاً والعمل على تلافيها والتي نذكر منها

كثرة الأسطر المستخدمة في ملف أو دالة فتجزئة التعليمات البرمجية الطويلة إلى عدة أجزاء أصغر يسهل عليك اختبار كل منها على حدى

عدم الوضوح في تسمية المتغيرات القصيرة أو المحدَّدة بنوع معين

عدم وصف ترميز السلاسل والأرقام الأولية ، ولتجنب ذلك احرص على وضع القيم الدالة على هذا الترميز في ثابت واطلق عليها اسماً مناسباً

هدر الوقت في معالجة مشاكل بسيطة يمكن أن تعالج بقليل من الحنكة والمناورة في استخدام اختصارات مناسبة

إهمال البدائل المناسبة التي توصل إلى سهولة القراءة كالاستخدام المبالغ فيه في المنطق الشرطي

4) التسرع في استعمال الحل الأول:

وهذا يحدث عندما يبحث المبرمج المبتدئ عن الحلول التي تخلصه من المشاكل التي تعترضه فيسارع إلى استعمال أول حل ينتج معه دون الأخذ بعين الاعتبار ما سينتج عنه من تعقيدات ربما تعيق سير البرمجة بشكل صحيح وبالتالي توصل إلى الفشل , فليس بالضرورة أن يكون الحل الأول هو الأصح

لذا من الأفضل اكتشاف عدة حلول واختيار الأنسب منها ، ويجب هنا التنويه إلى نقطة مهمة جداً هي أن عدم التوصل إلى عدة حلول لمشكلة ما فأنت على الأغلب لم تستطع تحديد المشكلة بدقة

فدليل مهارة المبرمج يكمن في اختياره لأبسط حل في معالجة المشكلة وليس بهروبه إلى أول حل يصل إليه ليتخلص من المشكلة فوراً

5) التشبث بفكرة الحل الأول

تجنب تماماً التمسك بالحل الأول ولو تطلب منك ذلك المزيد من الجهد فعند شعورك بمجرد الشك في صحة الحل سارع فوراً إلى التخلص من الشيفرة السيئة وحاول استيعاب المشكلة وإعادة فهمها بدقة أكبر وتذكر دائماً المهارة هي الحصول على حل بسيط يسهل عليك اتخاذ القرارات المناسبة في معالجة المشكلة، كما ويمكنك الاستعانة بأدوات التحكم بالمصدر مثل GIT التي توفر العديد من الحلول المفيدة

6) الاعتماد على جوجل:

كثيراً ما يلجأ المبرمجون المبتدئون إلى حل بعض المشاكل التي تعترضهم أثناء كتابة الأكواد عن طريق محرك البحث جوجل فالمشكلة التي واجهتهم ربما واجهت الكثيرين قبلهم , فالحل إذاً موجود غالباً وهذا في الحقيقة يوفر بعض الوقت في البحث عن حل المشكلة نوعاً ما وهذا ظاهر الأمر ولكن هل فكرت أن هذا الحل الموجود على شكل سطر من التعليمات البرمجية سيستمر معك بما يتلاءم مع حالتك أنت، احرص تمام الحرص على عدم استخدام أي سطر من التعليمات البرمجية غير الواضح بالنسبة لك وإن كنت ترى فيه الحل لمشكلتك

7) عدم استخدام التغليف

والتغليف هي منظومة تعمل على حماية المتغيرات في التطبيقات عن طريق إخفاء خصائص مع الإبقاء على إمكانية الاستفادة منها وهذه المنظومة تفيد على سبيل المثال إجراء تغييرات آمنة في الأجزاء الداخلية للوظائف دون التعرض للأجزاء الأخرى وكثيراً ما يؤدي إهمال عملية التغليف إلى صعوبة صيانة الأنظمة

8) النظرة الخاطئة للمستقبل

من الضروري أن يتمتع المبرمج بنظرة ثاقبة وأن يدرس جميع الاحتمالات عن كل خطوة قادمة عند كتابة التعليمات البرمجية وهذا يفيد في اختبار الحالات المتقدمة، ولكن انتبه لا تجعل هذه النظرة هي دليلك لتنفيذ الاحتياجات المتوقعة بأن تكتب كود لا تحتاجه الآن بفرض أنك يمكن أن تحتاجه في المستقبل ابقَ قدر الإمكان محافظاً على نمط الترميز الذي تحتاجه في يومك

9) استخدام بنية بيانات خاطئة

يعتبر تحديد مواطن القوة والضعف في هياكل البيانات المستخدمة في لغة البرمجة دليل على مهارة المبرمج وخبرته في هذا المجال ويمكن توضيح هذه النقطة ببعض النماذج العملية

إذا تحدثنا عن لغة جافا سكريبت نجد أن المصفوفة هي القائمة الأكثر استخداماً وأن أكثر بنية الخريطة استخداماً هي كائن

ولإدارة قائمة السجلات يحتوي كل سجل منها على معرف خاص بالبحث عنه يجب استخدام الخرائط (الكائنات) بدلاً من استخدام القوائم (المصفوفات)، ويعتبر استخدام القائم العددية أفضل خيار إذا كان الهدف هو دفع القيم إلى القائمة

10) تحويل التعليمات البرمجية إلى فوضى

في حالة وجود كودات تسبب خلل وعدم انتظام في التعليمات البرمجية فيجب التعامل معها فوراً وإزالة الفوضى الناتجة كما في الحالات التالية

كود مكرر: ويحدث ذلك عند نسخ كود ولصقه في سطر برمجي مما يؤدي إلى حدوث خلل وعدم انتظام ناتجين عن تكرار الكود

إهمال استخدام ملف التكوين : في حال استخدام قيمة معينة في أماكن مختلفة في التعليمات البرمجية فإن هذه القيمة تنتمي إلى ملف التكوين الذي لابد من تكون أي قيمة جديدة مضافة إلى الشيفرة تنتمي له على أي حال

: (if) تجنب العبارات الشرطية غير الضرورية

من المعروف أن العبارات الشرطية هي منطق يرتبط باحتمالين على الأقل ومن الضروري تجنب الشروط غير المهمة مع الحفاظ على سهولة القراءة إذاً فالمراد هنا أن توسيع الدالة بمنطق فرعي

(if) يتبع عبارة شرطية

على حساب إدراج مهمة أخرى يسبب فوضى لا داعي لها ويجب تجنبها قدر الإمكان

(if) وللتوضيح بالنسبة لموضوع العبارة الشرطية

أمعن النظر في هذا الكود

isOdd لاحظ وجود المشكلة في الدالة

ولكن هل لاحظت مشكلة أكثر وضوحاً ؟

غير ضرورية if عبارة

: هنا رمز مكافئ

11) إدراج تعليقات على الأشياء المفهومة

من الضروري وإن بدا الأمر صعباً في البداية أن تتجنب قدر الإمكان إدراج تعليقات على الأمور المفهومة والواضحة إذ يمكنك استبدالها بعناصر تحمل أسماء تضاف إلى التعليمات البرمجيةوللتوضيح شاهد المثال الذي يحوي على تعليقات إضافية

: لاحظ الفرق عند كتابة الكود بدون تعليقات في هذه الصورة

إذاً لاحظنا أن إدراج أسماء هو أمر مجدي أكثر من إدراج التعليقات غير المهمة إلا أنه لا يجب تعميم هذه القاعدة على أسس البرمجة عموماً بل هناك حالات لا يكتمل فيها الوضوح إلا بإدراج تعليقات، ففي مثل هذه الحالات ينبغي عليك هيكلة تعليقاتك لمعرفة سبب وجود هذا الكود بدلاً من سؤال وما إلى ذلك من أمور، حتى أولئك الذين يفضلون إدراج تعليقات ننصحهم بتجنب التنويه عن الأمور الواضحة ولتتبلور هذه الفكرة بشكل أعمق نلاحظ هذا المثال الذي يوضح وجود تعليقات لا داعي لوجودها

Advertisements

12) عدم إدراج الاختبارات في التعليمات البرمجية

قد يعتقد بعض المبرمجين أنهم ليسوا بحاجة إلى كتابة اختبارات في التعليمات البرمجية وعلى الأرجح يقومون باختبار برامجهم يدوياً وقد يكون ذلك من باب ثقتهم الزائدة بأنهم ليسوا بحاجة إلى كتابة اختبارات في تعليماتهم البرمجية ولكن لا يمكن اعتبار هذا الكلام سلبي بالمطلق لأنه حتى إذا كنت تريد معرفة آلية الاختبار تلقائياً فيجب عليك اختباره يدوياً

وإذا نجحت في اختبار تفاعل مع أحد تطبيقاتك وتريد إجراء نفس التفاعل تلقائياً في المرة القادمة فيجب عليك الرجوع إلى محرر التعليمات البرمجية لإضافة المزيد من التعليمات وهنا تجدر الإشارة إلى أنك قد تخونك ذاكرتك في استرجاع اختبار عمليات التحقق الناجحة بعد كل تغيير في الرمز ، لذا أسند هذه المهمة إلى الكمبيوتر وما عليك إلا أن تبدأ بالتخمين أو إنشاء عمليات التحقق الخاصة بك ولو قبل كتابة الكود

TDD فالتطوير المعتمد على اختبار

وإن كان غير متاح للجميع إلا أنه يؤثر إيجابياً على أسلوبك الذي يرشدك إلى إنشاء أفضل تصميم

13) هل تعتقد أن المَهمة تسير على أكمل وجه؟

دعنا نشاهد هذه الصورة التي تُظهر

sumOddValues وظيفة تنفذ خاصية

هل تحتوي على خطأ ما؟

هل لاحظت أن الكود أعلاه غير مكتمل؟ على الرغم من أنه يتعامل بشكل سليم مع حالات معينة إلا أنه يحوي العديد من المشاكل سأذكر منها

المشكلة الأولى: أين معالجة الإدخال الفارغ ؟

هناك خطأ يكشف عن تنفيذ الوظيفة ناتج عن استدعائها بدون وسيطات

وهناك سببين لحدوث هذا الرمز الخاطئ

لا يجب أن تظهر تفاصيل تنفيذ وظيفتك لمستخدميها في حال لم تعمل وظيفتك مع المستخدمين وكان الخطأ ناجم عن استخدام غير صحيح فهذا سيظهر بوضوح لذا يمكنك أن تبرمج استثناءً تطرحه الوظيفة يشير إليه المستخدم كما يلي

والأفضل من ذلك أن تتفادى موضوع ظهور الخطأ بأن تبرمج وظيفتك على تجاهل المدخلات الفارغة

المشكلة الثانية: المدخلات الخاطئة لا يوجد معالجة لها

شاهد ما ستطرحه الوظيفة في حال تم جلب الدالة بقيمة كائن أو سلسلة أو عدد صحيح

هي دالّة array.reduce مع أن

إن أي شيء تستدعي به الوظيفة (42) في مثالنا السابق يسمى مصفوفة داخل دالة لأننا قمنا بتسمية مصفوفة وسيطة الوظيفة

لذا لاحظنا أن الخطأ يقول أن

42.reduce

ليس دالة

ولكن ربما لو كان الخطأ ظهر على الشكل التالي لكان أكثر جدوى

ويجب التنويه هنا إلى أن المشكلتين آنفتي الذكر هما من الأخطاء الثانوية التي يجب العمل على تلافيها بديهياً علاوة على وجود حالات تتطلب التفكير فيها والتخطيط لها كما في المثال التالي الذي يوضح ما سوف يحدث إذا استخدمنا قيم سالبة

كان يجب تسمية الدالة هنا

sumPositiveOddNumbers

لكي لا يظهر لنا الخط السابق

المشكلة الثالثة: عدم اختبار كل الحالات الصحيحة بسبب نسيان بعض الحالات الاستثنائية وما سنراه في الصورة التالية هو نموذج عن حالة سليمة وبسيطة لوظيفة لم يتم التعامل معها بشكل صحيح

تم إدراج الرقم (2) في المجموعة رغم أنه لا يجب وجوده فيها

reduce ظهرت هذه المشكلة لأن

استخدم فقط القيمة الأولى في المجموعة كقيمة أولية للمجمع والتي هي في مثالنا السابق رقم 2 لذا فالحل يكمن

reduce هنا في أن يقبل

وسيطة ثانية لاستخدامها كقيمة

accumulator أولية لـ

هنا تكمن ضرورة القيام باختبارات رغم أنك قد تكون قد اكتشفت المشكلة عند كتابة الكود وتضمين الاختبارات بعمليات أخرى

14) الاطمئنان المبالغ فيه لصحة التعليمات البرمجية الحالية

قد تبدو بعض الكودات مفيدة بالنسبة للمبرمجين المبتدئين فيستعملونها باطمئنان في التعليمات البرمجية الخاصة بهم، دون علمهم أنها أحياناً قد تكون سيئة ولكنها وضعت لأن المطور أجبر على وضعها بهذه الطريقة فتسبب مشاكل لدى المبتدئين ، لذا من الضروري هنا إدراج تعليق من قبل المطورين يستهدفون به المبتدئين يوضحون فيه سبب إدراج هذا الكود بهذه الطريقة

لذا يجدر بك كمبتدئ أن تضع أي كود تريد استخدامه من مكان آخر موضع الشك إلى أن تفهم ماهيته وسبب وجوده لتفادي الوقوع في أخطاء أنت بغنى عنها

15) الحرص الزائد على استخدام الطرق المثالية في البرمجة

على الرغم من تسمية الطرق المثالية بهذا الاسم إلا أنها لا تحمل دائماً هذا المعنى وهذا يحدث عندما ينصرف جل اهتمام المبرمج المبتدئ باتباع الطرق المثالية أو على الأقل الطرق التي يراها هو بنظره مثالية متجاهلاً بعض الحالات التي تتطلب منه تصرفاً مغايراً لبعض القواعد الأساسية في البرمجة ، هناك حالات ستضعك أمام تحدي لا ينجيك منه إلا حُسن تصرفك ومهارتك التي ستحتاج إلى تنميتها من خلال تعاملك مع هذه الظروف

16) وسواس سوء الأداء

للتخلص من وسواس الخوف من الوقوع في الأخطاء أثناء البرمجة احرص دائماً على توخي الحذر منذ البداية ، مع كل سطر برمجي انتبه جيداً واستدعي معلوماتك ومهاراتك التي تجنبك الوقوع في الخطأ ولكن هذا الحرص في تحسين أدائك قبل البدء لا يجب أن يكون مبالغاً فيه وحسن التقدير قبل البدء هو الذي سيعينك على اتخاذ القرار فيما إذا كان الوضع يستعدي تحسين الأداء قبل البدء أم أن التحسين في البعض الحالات سيكون مضيعة للوقت والجهد بدون مبرر

17) عدم اختيار تجارب تناسب المستخدمين

من سمات المبرمج الناجح أنه دائماً ما يقوم بوضع نفسه مكان المستخدم وينظر إلى التطبيق الذي صممه أو طوره من وجهة نظر المستخدم فعلى سبيل المثال إن كانت الميزة تتضمن الحصول على معلومات يقوم المستخدم بإدخالها فقم كمطور بإلحاقها بالنموذج الذي لديك وإن كانت لإضافة رابط مع صفحة أخرى فقم بإضافتها إلى قائمة الروابط المتفرعة لديك وهذا يساعد كثيراً في الحصول على نتائج أفضل

18) تجاهل تجربة المستخدمين من قبل المطورين

لكل مبرمج طريقته وأدواته المفضلة في عملية البرمجة ومنها الجيد ومنها الأقل جودة ومنها السيء ولكن بشكل عام يمكن أن تطلق صفة الجودة على الأدوات المستخدمة في البرمجة حسب مواضعها فهناك حالات تكون الأدوات جيدة في الوقت الذي تكون فيه نفس هذه الأدوات سيئة في أماكن أخرى

فغالباً ما يفضل المبرمج المبتدئ الأدوات المتداولة بكثرة بغض النظر عن فائدتها في البرمجة الخاصة به فهو مبرمج مبتدئ ولكن لكي يبدأ هذا المبرمج بالانتقال إلى مستويات أعلى من الخبرة لابد له أن ينتقي الأدوات بناء على كفاءتها في معالجة وظائف معينة تتطلب استخدامها أصلاً فيكتسب المبرمج مزيداً من الانفتاح وحسن التصرف ويتخلص من مشكلة يعاني منها الكثيرين وهي التشبث بأدوات اعتادوا أن يتعاملوا بها مع كافة الحالات

19) مشاكل البيانات الناتجة عن أخطاء التعليمات البرمجية

البيانات هي الأعمدة الأساسية التي تشكل هيكلية البرامج التي هي بالأساس واجهة لدخال معلومات الجديدة أو حذف القديمة منها لذا فإن أصغر خطأ في الكود سيؤدي إلى خلل غير متوقع في البيانات وهذا ما يقع فيه بعض المبرمجين المبتدئين إذا يقومن في بعض الأحيان باستخدام كودات يظنون أنها نجحت في اختبارات التحقق باعتقادهم أن أحد الميزات المعطَّلة لا ضرورة لها وتتفاقم المشكلة عندما يقوم برنامج التحقق بإدخال مشاكل البيانات التي لم تكن مفهومة منذ البداية وبشكل مستمر مما يؤدي إلى تراكمها حتى تصل إلى مستوى لا يمكن التراجع عنه بحيث يستحيل معه استعادة الوضع السليم ولتجنب هذه المشكلة يمكنك استخدام طبقات متعددة من عمليات التحقق من صحة البيانات أو على الأقل استخدام القيود الخاصة بقاعدة البيانات والتي سنتعرف عليها الآن وذلك عند إضافة جداول وأعمدة إلى قاعدة البيانات الخاصة بك

NOT NULL قيد *

المطبق على عمود يعني استبعاد القيم الفارغة من هذا العمود من خلال تحديد مصدر الحقل على أنه ليس فارغاً في قاعدة البيانات

UNIQUE قيد *

المطبق على العمود يعني استبعاد القيم المكررة داخل الجدول كاملاً وهذا القيد مثالي لجداول المستخدمين المتعلقة بإدخال بيانات لاسم مستخدم أو بريد إلكتروني

CHECK قيد *

وهو تعبير مخصص وليتم قبول البيانات فيه يجب تقييمه إلى صحيح وهذا مثالي لعمود النسب المئوية الذي يحوي القيم الصحيحة من صفر إلى 100

PRIMARY KEY قيد *

يضمن كل جدول في قاعدة البيانات مفتاح للتعريف بسجلاته وهو يعني أن قيم العمود ليست فارغة وفريدة أيضاً

FOREIGN KEY قيد *

وهو يدل على وجوب تطابق قيم العمود مع القيم المدونة في عمود جدول آخر والذي يمثل غالباً مفتاحاً أساسياً

ومن المشاكل الشائعة التي يعاني منها المبتدئون والمتعلقة بسلامة البيانات هي التعامل الخاطئ مع المعاملات ، فإذا احتاجت مجموعة من العمليات المرتبطة مع بعضها البعض إلى تغيير مصدر البيانات نفسه فيجب أن يتم تغليفها بمعاملة تتيح التراجع عنها في حال حدوث خلل أو فشل في إحدى هذه العمليات

20) ابتكار عجلة برامج جديدة

في عالم البرمجة تتغير الأشياء بشكل مستمر ومتسارع وتتوفر الخدمات والمتطلبات بشكل يفوق قدرة فريق مواكبته كما يجب وعجلة البرامج شأنها كشأن هذه الخدمات المتغيرة لذا فقد لا تجد كمبرمج ضالتك في إحدى هذه العجلات لذا فاختراع عجلة جديد يبدو أمراً لا مفر منه ولكن في أغلب الحالات إن وجدت أن التصميم النموذجي للعجلة يلبي احتياجك فمن الأفضل أن لا تقوم بتصميم عجلة جديدة

هناك العديد من الخيارات لعجلات البرامج المتاحة عبر الإنترنت ويمكنك التجريب قبل الشراء وفق ما تحتاجه وتتميز بأنها تمكنك من رؤية تصميمها الداخلي علاوة على أنها مجانية

21) الفكرة السلبية عن مراجعات الكود

غالباً ما يتخذ المبرمجون المبتدئون موقفاً سلبياً من مراجعات الكود ظناً منهم أنها تمثل انتقاداً لهم ولكن يجب عليك كمبرمج مبتدئ إن كنت تتبنى هذا الموقف أن تغير نظرتك تماماً وأن تستثمر مراجعات الكود بالشكل الأمثل فهي فرصتك للتعلم واكتساب مزيد من الخبرة ففي كل مرة تتعلم فيها شيئاً جديداً سيشكل بالنسبة لك قيمة عملية في هذا المجال

وعلى العكس إن نظرت إلى الموضوع نظرة أشمل فلربما تخطئ مراجعات الأكواد وتقوم أنت بالتصحيح وبالتالي فأنت أمام فرصة للتعليم والتعلم وهذا بحد ذاته مفخرة لك كمبرمج تشق طريقك نحو الاحتراف

22) استبعاد فكرة استخدام التحكم بالمصدر

من السلبيات التي يقع بها بعض المبرمجين المبتدئين هي التقليل من قوة نظام التحكم بالمصدر، ربما يعود السبب لاعتقادهم أن التحكم بالمصدر يقتصر على تقديم تغييراتهم للآخرين والبناء عليها ولكن الموضوع يتعدى هذه الفكرة بكثير فرسائل الالتزام تقوم بتوصيل عمليات التنفيذ الخاصة بك كمبرمج مبتدئ واستخدامها لتساعد المشرفين على الكود الخاص بك في معرفة كيفية وصول الكود إلى وضعه الراهن

كما وأن من أوجه الاستفادة من التحكم في المصدر استخدام ميزات مثل خيارات التدريج والترميم الانتقائي والتخزين وإعادة الضبط والتعديل والعديد من الأدوات الأخرى القيمة لتدفق الترميز الخاص بك

23) التقليل من استخدام البلد المشترك قدر الإمكان

يعتبر البلد المشترك مصدر مشاكل ويجب تجنبه قدر الإمكان أو على أقل تقدير تقليل استخدامه إلى أقصى حد إذ أن كلما كان النطاق عالمياً ازداد نطاق هذه الحالة المشتركة سوءاً لذا يجب المحافظة على الحالات الجديدة في نطاقات ضيقة ومن الضروري التأكد من أنها لا تتسرب إلى الأعلى

24) عدم التعامل مع الأخطاء على أنها مفيدة

يكره الكثيرون رؤية رسائل الخطأ الحمراء الصغيرة أثناء البرمجة لكن في الحقيقة ظهور الأخطاء يدل على أنك تزداد معرفة وتتعرف أكثر على مواطن الخلل التي تحدث حتى مع المبرمجين المحترفين فتعمل على تداركها في المستقبل فمن لا يخطئ لا يتعلم وظهور رسالة الخطأ ليس دليل الفشل

25) الإرهاق المستمر ولفترات طويلة

يبقى عند المبرمج المبتدئ هاجس أنه يجب عليه إنجاز العمل الذي عليه مهما كلف الأمر وبأسرع وقت ممكن وهذا ما يدفعه للعمل لفترات طويلة ناسياً أنه بحاجة إلى الراحة فهذه الفترات الطويلة من الجلوس والتفكير تسبب الإرهاق ، وكثير من الأحيان يصل المبرمج بعد ساعات عمل طويلة إلى مرحلة لم يعد فيها قادراً على التفكير حتى أمام أبسط الأمور يقف عاجزاً لذا فأخذ قسط من الراحة أمر ضروري لاستعادة النشاط العقلي والتوازن الذهني

Advertisements

A collection of tips to improve your data analysis skills

Posted on February 4, 2023February 4, 2023 by s4l8384gmailcom

Advertisements

With the scientific and technological progress, especially the rapid and remarkable development in data science and its analysis, it has become necessary for the data analyst to have sufficient experience to make him the focus of attention of companies that pursue data analysis in the course of their affairs, but this expertise does not come between day and night, but data scientists spend a long time and make a double effort They take advantage of the smallest opportunities to obtain information to reach the degree of data analyst or data engineer

Analysis is the process of finding the most appropriate way to solve problems and process data

So we must touch on some ways to improve your data analysis skills:

Evaluate your skills:

Some numbers and results may deceive you after you carry out a marketing campaign. You will think that the conversion rate is 50%, for example, but you will be shocked later that the number of potential customers is small, so this percentage does not mean that the goal was achieved at the required rate.

The process depends on changing the ratios of the numerator and denominator in the percentage according to what is commensurate with the reality of the situation. For example, when the goal is real, the numerator can be increased, and if it is not intended, the denominator can be reduced.

Measuring growth rate and expectations:

Rely on a graphic line that measures the growth rate and determines the validity of expectations. With the passage of time, increasing the steady growth rate becomes difficult, as determining a percentage value that embodies performance measurement can lose the actual value of the work.

The rule is 80/20

The basic principle of this rule depends on focusing on a large value that represents 80% of the results and dealing with it in a manner that secures the development of performance and control of its course with complete flexibility, and this rule can be relied upon as a start to reduce the budget spent for this project

Advertisements

Enter the MECE system into your accounts

It is a systematic system for addressing problems with the aim of reducing galactic calculations that consume a lot of time and effort

3 areas of MECE can be identified:

* Problem tree:

The benefit of this process lies in its fragmentation of thorny and complex problems, thus facilitating their solution more easily, and to simplify this concept more, it can be said that it depends on analyzing user behavior according to certain classifications (age, profession, gender…)

* decision tree:

It relies on refuting decisions and potential outcomes and detailing them in the form of a graphical chart that facilitates the identification of the relative negatives and positives of each decision, to estimate the commercial value of the new plans, and then prioritizes and arranges them.

* probability tree:

It differs from the problem tree in that it coordinates the hypotheses more deeply and gives direct results compared to the problem tree

Cohorts represent quality value:

Cohorts are the groups that share certain features with each other, such as the start date, for example. They act as accurate analyzes by monitoring their persistence in using your applications and websites.

Avoid making false statements:

This is done before starting any process to verify the quality of data sets by monitoring and coordinating the statistics related to the data to exclude outliers and dealing with sound data. You can confirm the final results by comparing the resulting values with a similar analysis.

Advertisements

مجموعة نصائح لتحسين مهاراتك في تحليل البيانات

Advertisements

مع التقدم العلمي والتكنولوجي ولاسيما التطور المتسارع والملحوظ في علم البيانات وتحليلاتها أصبح من الضروري أن يمتلك محلل البيانات خبرة كافية تجعله محط أنظار الشركات التي تنتهج تحليل البيانات في تسير أمورها، ولكن هذه الخبرة لا تأتي بين يوم وليلة بل يمضي علماء البيانات أوقاتاً طويلة ويبذلون مجهوداً مضاعفاً ويستغلون أصغر الفرص للحصول على المعلومة للوصول إلى درجة محلل البيانات أو مهندس بيانات

فالتحليل هو عملية العثور على الطريقة الأنسب لحل المشكلات ومعالجة البيانات

لذا لابد من أن نتطرق إلى بعض الطرق التي تحسن مهاراتك في تحليل البيانات

: قيّم مهاراتك

قد تخدعك بعض الأرقام والنتائج بعد قيامك بحملة تسويقية ما، ستعتقد بأن نسبة التحويل مثلاً 50 % ولكنك ستنصدم لاحقاً بأن عدد العملاء المحتملين قليل لذا فتلك النسبة لا تعني أن الغاية تحققت بالمعدل المطلوب

فالعملية تعتمد على تغيير نسب البسط والمقام في النسبة المئوية وفق ما يتناسب مع واقع الحال فعلى سبيل المثال عندما يكون الهدف حقيقياً يمكن زيادة البسط وإذا كان غير مقصود يمكن تقليل المقام

: قياس معدل النمو والتوقعات

اعتمد على خط بياني يقيس معدل نمو ويحدد صحة التوقعات، فمع مرور الزمن يصبح زيادة معدل النمو الثابت أمراً صعباً إذا أن تحديد قيمة مئوية تجسد قياس الأداء يمكن أن يضيع القيمة الفعلية للعمل

القاعدة 20/80

يعتمد المبدأ الأساسي لهذه القاعدة على التركيز على قيمة كبيرة تمثل 80 % من النتائج والتعامل معها بما يؤمن تطوير الأداء والتحكم بمجرياته بمرونة تامة، ويمكن الاعتماد على هذه القاعدة كبداية لخفض الميزانية المبذولة لهذا المشروع

Advertisements

MECE أدخِل في حساباتك منظومة

وهي منظومة منهجية لمعالجة المشكلات بهدف تقليل الحسابات المجرات والتي تستهلك الكثير من الوقت والجهد

: MECE ويمكن التعرف على 3 مجالات لـ

:شجرة المشكلات *

تكمن الفائدة من هذه العملية في تجزيئها للمشكلات الشائكة والمعقدة فيسهل بذلك حلها بسهولة أكبر، ولتبسيط هذا المفهوم أكثر يمكن القول بأنها تعتمد على تحليل سلوكيات المستخدم وفق تصنيفات معينة (العمر، المهنة، الجنس …)

:شجرة القرار *

تعتمد على تفنيد القراراتوالنتائج المحتملة وتفصيلهاعلى شكل مخطط رسومي يسهل تحديد السلبيات والإيجابيات النسبية لكل قرار، لتقدير القيمة التجارية للخطط الجديدة ومن ثم يتم تحديد الأولويات وترتيبها

:شجرة الاحتمالات *

تختلف عن شجرة المشكلات في كونها تقوم تنسيق الفرضيات بشكل أعمق وتعطي نتائج مباشرة قياساً إلى شجرة المشكلات

: المجموعات النموذجية تمثل قيمة الجودة

المجموعات النموذجية هي المجموعات التي تشترك مع بعضها بمزايا معينة كتاريخ البدء مثلاً فهُم بمثابة تحليلات دقيقة من خلال مراقبة ثباتهم على استخدام تطبيقاتك ومواقعك الإلكترونية

: تجنب الوقوع في البيانات الخاطئة

ويتم ذلك قبل البدء بأي عملية للتحقق من جودة مجموعات البيانات عن طريق مراقبة الإحصائيات المتعلقة بالبيانات وتنسيقها لاستبعاد القيم المتطرفة والتعامل مع البيانات السليمة ويمكنك التأكد من النتائج النهائية عن طريق مقارنة القيم الناتجة مع تحليل مماثل

Advertisements

12 Amazing AI Websites That Will Get You Interested

Posted on January 23, 2023January 23, 2023 by s4l8384gmailcom

Advertisements

The Internet includes an endless number of websites of various disciplines and fields, with different content and topics, but the vast majority of them depend on artificial intelligence.

Which made the mechanism of using the Internet more useful and easier for users everywhere

In our article today, we will talk about 12 websites, all of which rely on artificial intelligence to automate various functions, and through which it is possible to create distinguished content in record time.

1. Browse AI

An important and summary tool for owners of commercial activities and for-profit institutions, as it allows them to know the behavior of competing companies, obtain information from the website, and follow the market movement. In addition, it suggests potential customers to you by tracking their interests that may be compatible with your services, and it is a free site for all

2. StockAI

This site specializes in creating attractive designs by means of artificial intelligence. This site is distinguished by the fact that anyone can use this site to create beautiful designs with one click, whether he is an expert in design or not. This site creates wonderful content that can contain a mixture of images, graphics and texts.

3. Poised

This site is very suitable for developing public speaking skills through the techniques it provides that allow you to hear your voice with high accuracy, which makes you recognize the negatives and positives as a speaker in front of people, in other words, the site will enable you to listen to your voice and style of public speaking as if you were one of the audience and listeners

The site also includes videos that enable you to know the effect of body language to communicate the idea to the audience while speaking

4. AssemblyAI

This site enables its users to convert audio files, video clips, and live audio recording into texts that are available for editing and subtitles

All you have to do is enter the name of the file to be converted and the location where you want to save it, then the conversion process will start according to a specific time frame, with the ability to preview it during the conversion process.

However, what is wrong with this site is that it does not support all file types on the one hand, and on the other hand, if you want to convert a number of files, you cannot convert them together, rather you have to convert one file after the other, that is, you cannot convert a new file until after the file before it has finished.

5. Texti. app

This site is distinguished by its ability to find the search results accurately by offering an immediate answer to your questions, while excluding suggestions and guesses from the results.

Once you enter the words or phrases that you want to search for, this site will start searching within the framework of the topic to be found, and then you will have to choose the most appropriate result through the description resulting from the search process

This site saves time and effort, as it has an easy and simple interface, which makes it easier for the user to browse and search

6. AI. Image Enlarger

This site, with its unique tool, enables users to enlarge images with high accuracy, in addition to several important features for images and graphics

Advertisements

7. Sembly

This website makes it easy for users to transcribe notes online to take notes while avoiding losing focus resulting from moving between paragraphs. The user can also record the audio directly so that the audio is converted into text that allows the listeners to understand the meaning of the audio clip, which facilitates the exchange of information. between users

8. Synthesia

The story of this site seems incredible. Imagine that with texts you can create professional video clips. If the mechanism of this system depends on embodying the user’s personality by creating animated images in several different languages, you can also add sound and music effects to add to your video clip more distinction and excitement.

With all this professionalism and progress in the features provided by this site for creating video clips, its use is not limited to professionals only, but anyone can use it very easily to design videos that rely on artificial intelligence techniques.

9. Super meme

A special site for designing memes, which allows users to choose a set of templates or create a template on demand using the creator of memes supported by artificial intelligence. It is enough to add text and images to make memes more professional with one click, and then publish this work on social media, and your product will be the focus of attention for those looking for unique ideas And dazzling works, and thus your sales will increase and your profits will increase

10. Podcastle AI

Also from the site distinguished by converting text into speech with the addition of several features such as obtaining the quality of studio recordings, determining the type of voice, translating sounds into texts and many additional free features that will impress you once you see the site and get to know it

11. NameLix

The capabilities of this site depend on the creation of distinctive brands or the use of pre-made designs that allow you to obtain various ideas and fake logos in order to be able to determine the colors and titles that are most appropriate for your design.

12. Murf.AI

We have known in the previous sites in this article about sites that convert sounds into texts, but the function of this site is the opposite, that is, it converts texts, i.e. sounds similar to the human voice to the extent that the listener will think that the reader is a human, so this tool is useful for creating audio libraries with the ability to control by votes

Using this site is smooth and simple, as the user has to download the text file so that the site converts it into an accurate and clear sound

In addition, one of the advantages of this site is that it is a gateway to making money by providing texts that are presented in the form of accurate audio recordings that are sold to those interested in buying audio books.

Advertisements

اثنا عشر موقع ذكاء اصطناعي مذهلون سينالون اهتمامك

Advertisements

تضم شبكة الإنترنت عدد لا متناهي من المواقع الإلكترونية متعددة الاختصاصات والمجالات وعلى اختلاف محتواها ومواضيعها إلا أن الغالبية العظمى منها تعتمد على الذكاء الاصطناعي

مما جعل آلية استخدام الإنترنت أكثر فائدة وسهولة للمستخدمين في كل مكان

وسنتناول في مقالتنا اليوم الحديث عن 12 موقع إلكتروني تعتمد جميعها على الذكاء الاصطناعي لأتمتة الوظائف المتنوعة كما وأصبح بالإمكان بواسطتها إنشاء محتوى متميز في زمن قياسي

1. Browse AI

أداة مهمة وخلاصة لأصحاب الأنشطة التجارية والمؤسسات الربحية فهي تتيح لهم معرفة سلوك الشركات المنافسة والحصول على المعلومات من الموقع الإلكتروني ومتابعة حركة السوق وبالإضافة إلى ذلك يقترح عليك العملاء المحتملين من خلال تتبع اهتماماتهم التي قد تتوافق مع خدماتك وهو موقع مجاني للجميع

2. StockAI

هذا الموقع متخصص بإنشاء التصاميم الجذابة بواسطة الذكاء الاصطناعي ويمتاز هذا الموقع بأن بمقدور أي شخص أن يستخدم هذا الموقع لإنشاء التصاميم الجميلة وبنقرة واحدة سواء كان خبير بالتصميم أم لا , يبتكر هذا الموقع محتوى رائع يمكن أن يحوي مزيج من الصور والرسومات والنصوص

3. Poised

هذا الموقع مناسب جداً لتطوير مهارات التحدث أمام الجمهور من خلال ما يوفره من تقنيات تتيح لك سماع صوتك بدقة عالية مما يجعلك تتعرف على السلبيات والإيجابيات كمتحدِّث أمام الناس أي بمعنى آخر سيمكنك الموقع من الاستماع إلى صوتك وأسلوبك في الخطابة كما لو كنت أحد الحضور والمستمعين

كما يتضمن الموقع مقاطع فيديو تمكنك من معرفة تأثير لغة الجسد لإيصال الفكرة إلى الجمهور أثناء التحدث

4. AssemblyAI

يمكِّن هذا الموقع مستخدميه من تحويل الملفات الصوتية ومقاطع الفيديو والتسجيل الصوتي المباشر إلى نصوص متاحة للتحرير والترجمة

ما عليك إلا أن تُدخِل اسم الملف المراد تحويله والمكان الذي تريد حفظه فيه ثم تبدأ عمليه التحويل وفق إطار زمني معين مع إمكانية معاينتها أثناء عملية التحويل

إلا أن ما يعيب هذا الموقع أنه لا يدعم جميع أنواع الملفات من جهة , ومن جهة أخرى إذا أردت تحويل عدد من الملفات فلا يمكنك تحويلها مع بعضها بل يتوجب عليك تحويل ملف تلو الآخر أي لا يمكنك تحويل ملف جديد إلا بعد أن ينتهي الملف الذي قبله

5. Texti.app

يمتاز هذا الموقع بقدرته على العثور على نتائج البحث بدقة من خلال طرح إجابة فورية على أسئلتك مع استبعاد الاقتراحات والتخمينات من النتائج

بمجرد إدخالك للكلمات أو الجُمل التي تريد البحث عنها سيشرع هذا الموقع بالبحث ضمن إطار الموضوع المراد العثور عليه ثم يبقى أمامك اختيار النتيجة الأنسب من خلال الوصف الناتج عن عملية البحث

يوفر هذا الموقع الوقت والجهد فهو يمتاز بواجهة سهلة وبسيطة مما يسهل على المستخدم عملية التصفح والبحث

6. AI. Image Enlarger

هذه الموقع وبواسطة أداته المميزة يمكن المستخدمين من تكبير الصور وبدقة عالية بالإضافة إلى عدة ميزات مهمة للصور والغرافيك

Advertisements

7. Sembly

يسهل هذا الموقع على المستخدمين عملية النسخ عبر الإنترنت لتدوين الملاحظات مع تجنب الوقوع في فقدان التركيز الناتج عن الانتقال بين الفقرات , كما ويمكن للمستخدم بواسطة هذا الموقع أن يقوم بالتسجيل الصوتي مباشرة ليتم تحويل الصوت إلى نص يتيح للسامعين فهم المقصود من المقطع الصوتي مما يسهل تبادل المعلومات بين المستخدمين

8. Synthesia

تبدو قصة هذا الموقع لا تصدق , تخيل أنه بواسطة نصوص يمكنك إنشاء مقاطع فيديو احترافية إذا تعتمد آلية عمل هذا النظام على تجسيد شخصية المستخدم بواسطة إنشاء صور متحركة بعدة لغات مختلفة كما ويمكنك إضافة المؤثرات الصوتية والموسيقية ليضفي إلى مقطع الفيديو الخاصة بك مزيداً من التميز والإثارة

مع كل هذه الاحترافية والتقدم في الميزات التي يوفرها هذا الموقع لإنشاء مقاطع الفيديو إلا أن استعماله لا يقتصر على المحترفين فقط بل يمكن لأي شخص الاستعانة به بمنتهى السهولة لتصميم الفيديوهات التي تعتمد على تقنيات الذكاء الاصطناعي

9. Super Meme

موقع خاص لتصميم الميمات والذي يتيح للمستخدمين اختيار مجموعة من القوالب أو ابتكار قالب حسب الطلب باستخدام منشئ الميمات بالمدعوم بالذكاء الاصطناعي , ويكفي إضافة نصوص وصور لجعل الميمات أكثر احترافية وبنقرة واحدة ومن ثم نشر هذا العمل على وسائل التواصل الاجتماعي وسيكون منتجك محط اهتمام الباحثين عن الأفكار المتميزة والأعمال المبهرة وبالتالي سترتفع مبيعاتك وتزيد أرباحك

10. Podcastle Ai

أيضاً من الموقع المتميزة بتحويل النص إلى كلام مع إضافة عدة ميزات كالحصول على جودة تسجيلات الاستوديو وتحديد نوع الصوت وترجمة الأصوات إلى نصوص والعديد من الميزات الإضافية المجانية التي ستبهرك بمجرد اطلاعك على الموقع والتعرف عليها

11. NameLix

تعتمد إمكانيات هذا الموقع على إنشاء العلامات التجارية المميزة أو الاستعانة بتصاميم مجهزة مسبقاً تتيح لك الحصول على أفكار متنوعة وشعارات وهمية لتتمكن من تحديد الألوان والعناوين الأنسب بالنسبة لتصميمك

12. Murf.AI

تعرفنا في المواقع السابقة في عذا المقال على مواقع تقوم بتحويل الأصوات إلى نصوص لكن وظيفة هذا الموقع هي العكس أي أنه يقوم بتحويل النصوص أي أصوات تشبه صوت الإنسان إلى درجة أن السامع سيعتقد أن القارئ هو إنسان , إذاً هذه الأداة مفيدة لإنشاء المكتبات الصوتية مع إمكانية التحكم بالأصوات

استخدام هذا الموقع سلس وبسيط إذ أن على المستخدم أن يحمل الملف النصي ليتولى الموقع تحويله إلى صوت دقيق وواضح أضف على ذلك أن من ميزات هذا الموقع أن يكون باباً لكسب المال من خلال تقديم نصوص تطرح على شكل تسجيلات صوتية دقيقة تباع للمهتمين بالشراء الكتب الصوتية

Advertisements

Comparison of business intelligence and data analysis

Posted on January 9, 2023January 9, 2023 by s4l8384gmailcom

Advertisements

In this article, we will show the similarities and differences between business intelligence and data analysis, with a brief overview of each.

In the beginning, we talk about data analysis, which in general represents data science, which is summarized in the process of extracting useful information from a data set that is examined and processed according to a specific technique in order to obtain a formula that helps take the necessary and appropriate measures to ensure the functioning of the business process or the work of government institutions or scientific bodies. or educational sectors optimally.

Data analytics provides highly efficient techniques in developing the work of the commercial system as a whole, such as improving the buying and selling processes, identifying the most popular and selling products, customer behavior, etc., based on the data resulting from the analysis processes, within the framework of two types of data analysis:

Confirmed Data Analytics (CDA), which relies on statistics to determine the validity of a data set, and Exploratory Data Analytics (EDA), which relies on choosing models and types of data.

Based on the above, we can identify four types of data analysis:

Descriptive analytics: includes descriptions that are based on facts about a prior event, event A, and then event B

Diagnostic analytics: focuses on why these facts occurred, regardless of what happened in the past. B did not happen because of A, but C caused B to happen

Predictive analytics: based on future predictions based on historical data. Because B happened because of C, we expect that B will happen in the future because C happens

Descriptive analytics: depends on directing executive actions towards a specific goal. To prevent B from happening, we must take action Z

As for business intelligence, it includes the plans and techniques adopted by companies and institutions in dealing with business-related data to derive positive results that lead to sound decisions. Data forms, and it allows them to automate data collection and analysis, which makes it easier to carry out all tasks with the least possible time and effort.

Business intelligence to extract key information depends on the data warehouse known as (EDW), which is the main store of primary databases collected from several sources and integrated into a central system used by the company to help it generate reports and build analyzes that in turn lead to taking the right actions.

Based on the aforementioned, we can determine the course of the procedures that make up business intelligence according to the following:

Collecting and converting data from different sources:

Business intelligence tools rely on the collection of regular and random data from various sources, then they are coordinated and classified according to the requirements of companies’ strategies to keep them in the central data store to facilitate their use later in the analysis and exploration processes.

Determine paths and recommendations:

Business intelligence techniques contain an extensive data identification system, and thus the forecasting process by offering proposals and solutions is more accurate and effective.

Presentation of the results in the form of graphic visualizations:

The data visualization process is one of the techniques that has proven effective in understanding the content of the results and sharing them with others. It is a process on which business intelligence relies heavily due to the availability of charts and graphs that enable business owners to form a more comprehensive and accurate view of the results presented.

Advertisements

Take the appropriate measures according to the data generated in a timely manner:

This step is usually done by comparing the previous results with the results presented at the present time for businesses and commercial activities in general, which makes it easier for the owners of these businesses to take the necessary and appropriate measures and make adjustments in record time and build a sound base for future plans.

Differences between business intelligence and data analysis:

We must first touch on the configurational interface of the EDW data warehouse

The data warehouse is the basic environment for storing multi-source data in order to deal with it later, if it has absolutely no connection with the database system used in daily transactions, so the data store is intended to be used by companies and institutions to generate insights for solutions and suggestions for specific practical issues in a timely manner.

Since the data stored within the data warehouse is multi-source and processed via the Internet, this requires that it be extracted from those sources and employed within a strategy that is compatible with the company’s work and then loaded into OLAP (i.e. online processing and analysis), and the Operational Data Store (ODS) is used to prepare Operational and commercial reports, which has a longer storage period than OLAP.

If we want to make a simple representation of the above, we notice that the data market is a miniature model of the data warehouse, but it diverts its attention to a specific functional aspect such as sales, production and promotion plans, and this is done by a specialized branch within the general system.

Advertisements

مقارنة بين ذكاء الأعمال وتحليل البيانات

Advertisements

سنبين في هذا المقال أوجه التشابه والاختلاف بين ذكاء الأعمال وتحليل البيانات مع ذكر نبذة مختصرة عن كل منهما

تنطرق في البداية إلى الحديث عن تحليل البيانات الذي يمثل بالمجمل علم البيانات والذي يتلخص في عملية استخراج المعلومات المفيدة من مجموعة بيانات يتم فحصها ومعالجتها وفق تقنية معينة بغية الحصول على صيغة تساعد على اتخاذ الإجراءات اللازمة والمناسبة لضمان سير العملية التجارية أو عمل المؤسسات الحكومية أو الهيئات العلمية أو القطاعات التعليمية بالشكل الأمثل توفر تحليلاتُ البيانات تقنياتٍ ذات كفاءة عالية في تطوير عمل المنظومة التجارية ككل مثل تحسين عمليات البيع والشراء وتحديد المنتجات الأكثر طلباً وبيعاً وسلوك العملاء وغيرها وذلك بالاعتماد على البيانات الناتجة من عمليات التحليل وذلك في إطار نمطين من تحليل البيانات

(CDA) تحليلات البيانات المؤكدة

التي تعتمد على الإحصاء لتحديد مدى صحة مجموعة البيانات

(EDA) وتحليلات البيانات الاستكشافية

التي تعتمد على اختيار نماذج وأنواع البيانات

: وبناءً على ما سبق يمكننا تحديد أربع أنواع من تحليل البيانات

تحليلات وصفية : تتضمن الوصف الذي يعتمد على الوقائع المتعلقة بحدث سابق

B ثم حدث A حدث

تحليلات تشخيصية : تركز على السبب وراء حدوث تلك الحقائق بغ النظر عما حدث في السابق

, A بسبب B لم يحدث

B كان سبب حدوث C ولكن

تحليلات تنبؤية : تعتمد على التنبؤات المستقبلية بالاعتماد على البيانات التاريخية

, C حدث بسبب B لأن

سيحدث في المستقبل B نتوقع أن

يحدث C لأن

تحليلات وصفية : تعتمد على توجيه إجراءات تنفيذية نحو غاية معينة

B لمنع حدوث

Z يجب علينا اتخاذ الإجراء

أما ذكاء الأعمال فيتضمن الخطط والتقنيات التي تعتمدها الشركات والمؤسسات في التعامل مع البيانات المتعلقة بالأعمال لاستخلاص نتائج إيجابية تفضي إلى قرارات سليمة , وتتيح تقنيات ذكاء الأعمال لأصحاب العمل إيجاد صيغ متنوعة للبيانات لتحديد الأداء الفني للعمل كالبيانات السابقة والبيانات الحالية والبيانات الخارجية والبيانات الداخلية والبيانات المنظمة وغيرها من أشكال البيانات , كما وتتيح لهم أتمتة تجميع البيانات وتحليلاتها مما يسهل القيام بجميع المهمات بأقل وقت وجهد ممكن

يعتمد ذكاء الأعمال لاستخراج المعلومات الرئيسية على مستودع البيانات

(EDW) الذي يعرف باسم

والذي هو المخزن الرئيسي لقواعد البيانات الأولية المجمَّعة من عدة مصادر والمدمجة في نظام مركزي تستخدمه الشركة ليعينها على إنشاء التقارير وبناء التحليلات التي بدورها تفضي إلى اتخاذ الإجراءات الصائبة

: وبناءً على ما ذكر آنفاً يمكن أن نحدد مسار الإجراءات المكوِّنة لذكاء الأعمال وفق ما يلي

: تجميع البيانات وتحويلها من مصادر مختلفة

تعتمد أدوات ذكاء الأعمال على تجميع البيانات المنتظمة والعشوائية من مصادر مختلفة ثم يتم تنسيقها وتصنيفها وفق متطلبات استراتيجيات الشركات لتحفظ بعدها في المخزن البيانات المركزي ليسهل استخدمها لاحقاً في عمليات التحليل والاستكشاف

: تحديد المسارات والتوصيات

تحوي تقنيات ذكاء الأعمال نظام تحديد البيانات بشكل موسع وبالتالي تكون عملية التنبؤ بطرح الاقتراحات والحلول أكثر دقة وفاعلية

: عرض النتائج على شكل تصورات بيانية

تعتبر عملية تصور البيانات من التقنيات التي أثبتت فاعليتها في فهم مضمون النتائج وتشاكرها مع الآخرين وهي عملية يعتمد عليها ذكاء الأعمال بشكل كبير نظراً لما توفره من إعداد المخططات والرسوم بيانية التي يمكن أصحاب الأعمال من تكوين نظرة أكثر شمولية ودقة للنتائج المطروحة

Advertisements

: اتخاذ الإجراءات المناسبة وفقاً للمعطيات الناتجة في الوقت المناسب

وعادة ما تتم هذه الخطوة بمقارنة النتائج السابقة مع النتائج المطروحة في الوقت الراهن للأعمال والأنشطة التجارية بشكل عام مما يسهل على أصحاب هذه الأعمال اتخاذ الإجراءات اللازمة والمناسبة وإجراء التعديلات في زمن قياسي وبناء قاعدة سليمة للخطط المستقبلية

: أوجه الاختلاف بين ذكاء الأعمال وتحليل البيانات

لابد لنا في البداية أن نتطرق إلى البينية التكوينية

EDW لمستودع البيانات

مستودع البيانات هو البيئة الأساسية لتخزين البيانات متعددة المصادر بغية التعامل معها لاحقاً إذا أن لا صلة له إطلاقاً بمنظومة قاعدة البيانات المستخدمة في بالتعاملات اليومية إذاً مخزن البيانات معد لتستخدمه الشركات والمؤسسات لتكوين رؤى لحلول واقتراحات لقضايا عملية محددة في الوقت المناسب

وبما أن البيانات المخزنة داخل مستودع البيانات هي متعددة المصادر ومعالجة عبر الإنترنت فهذا يتطلب أن يتم استخراجها من تلك المصادر وتوظيفها ضمن استراتيجية تتوافق مع عمل الشركة

OLAP ثم يتم تحميلها في

( أي المعالجة والتحليل عبر الإنترنت )

(ODS) كما ويستخدم مخزن البيانات التشغيلية

لتجهيز التقارير التشغيلية والتجارية وهو يتمتع

OLAP بمدة تخزين أطول من

وإذا أردنا إجراء تمثيل بسيط لما سبق نلاحظ أن سوق البيانات هو نموذج مصغر من مستودع البيانات إلا أنه يصرف اهتمامه إلى جانب وظيفي معين كالمبيعات والإنتاج وخطط الترويج وذلك يتم بواسطة فرع مختص ضمن المنظومة العامة

Advertisements

10 FREE Datasets to start building your Portfolio

Posted on December 19, 2022December 19, 2022 by s4l8384gmailcom

Advertisements

1. Netflix Movies and TV Shows

To define this data set: Netflix is a media and video broadcasting platform that includes a large number of movies and TV shows, and according to statistics, its subscribers exceeded 200 million subscribers in 2021 from all over the world.

In this case, the tabular dataset consists of lists of all the movies and TV shows available on Netflix, plus information about actors, directors, audience ratings, and other information.

Here are some important ideas:

* Content available in different countries

* Choose similar content by matching attributes related to the text

* Finding valuable and interesting content by analyzing the network of actors and directors

* A comparison of the most popular broadcasts in recent years (movies – TV shows) on the Netflix platform.

u can download the data from here

https://lnkd.in/eZ3cduwK

2. Expecting a real/fake job advertisement:

(real or imaginary): Predicting the imaginary job description:

This dataset includes 18,000 job attributes, of which 800 are fictitious descriptions. The data consists of text and descriptive information about jobs. The dataset can be used to build screening models that detect the fictitious attribute of fictitious jobs.

The dataset can be used to answer the following questions:

* You have to build a screening model based on the characteristics of the text data to determine whether the job description is real or fraudulent.

* Focusing on words and phrases that express description and deception, adjusting and identifying them.

Determine the characteristics of similar jobs.

* You have to perform exploratory data analysis on the data set to find useful values from said data set.

u can download the data from here

https://lnkd.in/e5SDDW9G

3. FIFA 22 Aggregate Player Dataset:

In our example, the datasets are player data represented by their abilities and skills from FIFA 15 to FIFA 22 (“players_22.csv”). This data provides procedures for finding several comparisons for specific players through the eighth version of the FIFA game

The following are available analytical models:

* A comprehensive comparison between Messi and Ronaldo (compared to the statistics of their working lives – changes in skill over time)

* The appropriate liquidity to build a team that competes on the level of the European continent, and at this point the budget does not allow the purchase of distinguished players from the eleven-man squad.

* Analyzing a model for the most efficient n% of players (for example, we deal with the largest percentage of 5% of players) to determine the presence of basic features in the game versions such as speed, agility, and ball control. As a live example, we note that the best 5% of players in FIFA 20 version are faster And agility from the FIFA 15 version, and through this kind of comparisons, we can conclude that with more than 5% of the best players who have obtained high statistics with ball control, this means that the game’s interest in the skill and technical aspect is greater than the interest in the physical aspect.

Specifically, we see that:

* The URL of the excluded players.

* The URL of the uploaded face of the player with the club or national team logo

* Information about the player, such as nationality, the team he plays for, date of birth, salary, and others.

* Statistics of the player’s skills, which are related to attack, defense, goalkeeper skill, and other skills.

* Every player present in FIFA 15 through 22 versions of the game

* More than 100 features

* The position in which the player plays and his mission in the club and the national team

u can download the data from here

https://lnkd.in/eDScdUUM

4. Forecasting book sales:

The main success of a bookstore that sells various books lies in the high demand for effective purchases of the right books at the right time. In this context, one of the leading business events in the field of books and libraries organizes a competition to support booksellers that allows them to compete in the market.

So the competition here is to predict the purchase quantities of a clearly defined property portfolio for each site by means of simulated data.

Occupation :

Being competitive requires forecasting purchase quantities for eight addresses for 2418 different locations. To build the model, simulated purchasing data will be available from an additional 2349 locations, with all data referring to a limited time period. possible.

data :

There are two auxiliary files available to solve the problem:

*dmc2009_train.txt

* dmc2009_forecast.txt

u can download the data from here

https://lnkd.in/eXHN2XsQ

5. Supermarket sales:

The densely populated areas are more prevalent for supermarkets, and this creates commercial competition among them, which reflects positively on the market movement and contributes to the growth of the economy in general.

In our research today, we will discuss the data set that represents sales of three branches of a supermarket company for a period of ninety days. This group was chosen due to the ease of its predictive data analysis models.

Classification data:

Invoice ID: This is an identification number for the sales invoice

Branch: Super Center branch (out of three branches indicated by symbols A, B and C).

City: the most lively locations

Customer Type: Members classify the type of customers based on membership card users and non-users.

Gender: Specifies the gender of the customer

Production line: It depends on distributing basic components such as food, beverages, tourism, sports, electronic accessories, decorative accessories, fashion, and others

Product price: It is estimated in US dollars

Quantity: It is the number of products that the customer has purchased

Tax: It is a 5% tax fee added to the purchase value

Total Price: The total price including tax

Date: The date of purchase (which is the period between May and July of 2019)

Time: which is the time of purchase (from 9 am to 8 p.m.)

Payment: The payment method used by the customer upon purchase, and it is one of three methods (direct payment – credit card – electronic business archive).

COGS: The value of products sold

Total Margin Ratio: Total Margin Ratio

Total return: the total income

Classification: It is based on the classification of customer levels based on shopping traffic, according to a ratio estimated from 1 to 10

u can download the data from here

https://lnkd.in/e86UpCMv

Advertisements

6. Control fraudulent procedures related to credit cards:

The process of controlling fraud in credit card transactions is very important for credit companies, which is to obtain fees from customers for products that they did not purchase

The data set includes transactions that were carried out in two days by credit cards in September of 2013, so that several forged transactions were caught out of thousands of transactions, and thus we find a large percentage of imbalance in this data set, and fraud recorded a rate of 0.172% of the total transactions.

The basic elements, which are the features V1, V2, … V28, were obtained using the PCA transformation, which results in the numeric input variables. However, the features that were not converted are represented by the amount and time, so that the amount represents the amount (transaction cost), and the time represents the seconds spent between one transaction and the other. As for the category attribute, it is variable according to the state of the transaction. In the case of fraud, the category takes a value of 1 and takes a value of zero if the transaction is valid.

u can download the data from here

https://lnkd.in/eFTsZDCW

7. The 50 most famous fast food chains in America:

It is the food that is sold in a restaurant or shop, and it consists of frozen or pre-cooked foods and is presented in special packages for immediate external orders. It is produced in large quantities, taking into account the speed of presentation and delivery. According to 2018 statistics, the value of fast food production reached hundreds of billions of dollars all over the world. .

The hamburger outlets, as is the case with McDonald’s, are the most common and sought-after in the world, and other fast food outlets that depend on the on-demand assembly of basic ingredients prepared in advance in large quantities.

It can be available in the form of kiosks, mobile cars, or quick service restaurants.

Content :

In our case, this data set is a study of information about the 50 best restaurant chains in America for the year 2021, and we can identify the main points of this data set:

Fast Food Chains – Sales in America in Millions of Dollars – Average Sales Per Unit in Thousands of Dollars – Licensed Stores – Total Number of Units for 2021

The vertical format of the dataset:

• Fast-Food Chains – the name of the fast food chain

• U.S. Systemwide Sales (Millions – U.S Dollars) Systemwide sales are estimated in the millions of dollars

• Average Sales per Unit (Thousands – U.S Dollars)

• Franchised Stores – the number of licensed stores

• Company Stores – the number of company stores

• 2021 Total Units – The number of total units in 2021

• Total Change in Units from 2020 – the number of total changes from the previous year 2020

u can download the data from here

https://lnkd.in/esBjf5u4

8. Forecasting Walmart store sales

You will have in your hands the sales data of a number of Wal-Mart stores spread in many regions, so that each store includes several departments, and the task entrusted to you will be to forecast sales related to the department of each store.

In addition, Wal-Mart carries out many promotional campaigns on an ongoing basis, especially the offers that coincide with the major official holidays, and these weeks, including holidays, receive a rating five times higher than the holidays. There is no complete historical data.

csv stores:

This file includes anonymous data for forty-five stores indicating the type and size of the store

train. csv

It is a historical training data file that includes the period between 5/2/2010 to 1/11/2012.

It contains the following fields:

• Store – the store number

• Dept – the department number

• Date – the week

• Weekly_Sales: Sales of a specific department in a particular store

• IsHoliday: Is it a holiday week or not

test. csv

This file differs from train.csv only in that sales must be forecasted for each three departments of the store, date and department in this file, otherwise it is completely identical to the train.csv file

features. csv

This file includes more information, such as the store, department, and the activity of the specified dates, and it contains the following fields:

• Store – the store number

• Date – the week

• Temperature – the average temperature in the area

• Fuel_Price – the price of fuel in the region

• MarkDown1-5 – Anonymous data for marketing write-offs operated by Wal-Mart

• CPI – a value indicating consumer prices

• Unemployment – Unemployment rate

• IsHoliday – Is it a week off or not?

For the break, the four holidays coincide in the following weeks in the data set, noting that not all holidays were included in the data.

Super Bowl: Feb. 12, 10, Feb. 11, 11, Feb. 10, 12, Feb. 8, 13

Labor Day: Sept. 10 – 10, Sept 9 – 11, Sept 7 – 12, Sept 6 – 13

Thanksgiving: Nov-26-10, Nov-25-11, Nov-23-12, Nov-29-Christmas: Dec-31-10, Dec-30-11, Dec-28-12, Dec-27-13

u can download the data from here

https://lnkd.in/eVT6h-CT

9. Linkedin Data Analyst Task Lists

For every beginner in data analysis, here are the simple steps for collecting, cleaning, and analyzing data:

In terms of data collection, we wrote a script in the Python language to go through Linkedin, and we collected all the necessary data, and the choice fell on 3 sites: Africa, Canada, and America

Advantages :

* Designation: Job title

Company: The name of the company

* Description: Description of the job and the company

* On site – remotely

* The employee’s workplace

Salary: The salary of the position

* The company’s website

* Standards: Terms of employment such as experience and nature of work

Announcement Date: The date the job was announced

* URL: of the job

u can download the data from here

https://lnkd.in/ezqxcmrE

10. Amazon and Best Buys:

We’ll take reviews of fifty an electronic product from online stores such as Amazon and Best Buy.

Datafiniti includes a data set of revision history, location, classification, and metadata of references. We note that it is a huge data set, so we will learn about the best way to use this data and benefit from it as it should:

The point of benefiting from this data lies in knowing the consumer’s opinion about the process of purchasing the product. For clarification, we define the following points:

* What are the main uses of electronic products?

* Determine the link between ratings and positive reviews.

* How good is the variety of online brands?

What is the function of Datafiniti?

Allows direct access to website data by collecting it from a large number of websites to build common databases for commercial activity, products, and property rights.

u can download the data from here

https://lnkd.in/e4fBZvJ3

Advertisements

Advertisements

1. والبرامج التلفزيونية Netflix أفلام :

وللتعريف عن مجموعة البيانات هذه

هي منصة لبث الوسائط والفيديو Netflix

تضم عدداً كبيراً من الأفلام والبرامج التلفزيونية ووفق إحصائية فإن المشتركين لديهم تجاوز عددهم 200 مليون مشترك في عام 2021 من جميع أنحاء العالم . تتكون مجموعة البيانات المجدولة في حالتنا هذه قوائم بجميع الأفلام والبرامج التلفزيونية

Netflix المتوفرة على

أضف عليها معلومات عن الممثلين والمخرجين وتقييم الجمهور وغيرها من المعلومات الأخرى

: وفيما يلي بعض الأفكار المهمة

المحتوى المتوفر في بلدان مختلفة *

اختيار محتوى شبيه بواسطة مطابقة السمات المتعلقة بالنص *

إيجاد محتوى قيِّم وممتع من خلال تحليل شبكة الممثلين والمخرجين *

إجراء مقارنة على البث الأكثر شيوعاً في السنوات الأخيرة ( أفلام – البرامج التلفزيونية ) *

Netflix على منصة

: يمكنك الدخول إلى الرابط وتحميل البيانات

https://lnkd.in/eZ3cduwK

2. توقع الإعلان عن وظيفة حقيقة / وهمية :

( حقيقي أو وهمي ) : التنبؤ بالوصف الوظيفي الوهمي

تضم مجموعة البيانات هذه 18 ألف سمة وظيفية منها 800 وصف وهمي , تتألف البيانات من نصوص ومعلومات وصفية عن الوظائف , ومن الممكن استخدام مجموعة البيانات لبناء نماذج فرز تكشف السمة المزيفة للوظائف الوهمية

يمكن استخدام مجموعة البيانات للإجابة عن الأسئلة التالية

عليك بناء نموذج فرز يعتمد على خصائص البيانات النصية لتحديد ماهية الوصف الوظيفي حقيقي كان أم احتيالي*

التركيز على الكلمات والعبارات التي تعبر عن وصف وخادع وضبطها والتعرف عليها *

تحديد خصائص الوظائف المتماثلة *

عليك القيام بإجراء تحليل البيانات الاستكشافية على مجموعة البيانات لمعرفة القيم المفيدة من مجموعة البيانات المذكورة *

: يمكنك الدخول إلى الرابط وتحميل البيانات

https://lnkd.in/e5SDDW9G

3. الكلية للاعبين FIFA 22 مجموعة بيانات :

تشكل مجموعات البيانات في مثالنا هذا بيانات اللاعبين ممثلة بقدراتهم ومهاراتهم من إصدار

FIFA 22 إلى FIFA 15

(“players_22.csv”)

بحيث تتيح هذه البيانات إجراءات إيجاد عدة مقارنات للاعبين معينين وذلك من خلال الإصدار الثامن

FIFA من لعبة

مقارنة شاملة بين ميسي ورونالدو ( مقارنة بإحصائيات حياتهم العملية – المتغيرات في المهارة مع مرور الزمن ) *

* السيولة المناسبة لبناء فريق ينافس على مستوى القارة الأوروبية وعند هذه النقطة لا تتيح الميزانية شراء لاعبين متميزين من تشكيلة الفريق المؤلف من أحد عشر لاعباً .

n٪ تحليل نموذج لأكفأ *

من اللاعبين ( كأن نتناول أكبر نسبة حاصلة على 5% من اللاعبين ) لتحديد وجود الميزات الأساسية في إصدارات اللعبة كالسرعة وخفة الحركة والتحكم بالكرة وبمثال حي على ذلك نلاحظ أن أفضل 5% من اللاعبين الموجودين

FIFA 20 في إصدار

أكثر سرعة وخفة في الحركة

FIFA 15من إصدار

ومن خلال هذا النوع من المقارنات يمكننا استنتاج أنه بوجود أكثر من 5% من أفضل اللاعبين الذين نالوا إحصائيات مرتفعة بالتحكم بالكرة هذا يعني أن اهتمام اللعبة بالجانب المهاري والتقني أكبر من الاهتمام بالجانب البدني وعلى وجه التحديد نرى أن

للاعبين المستبعدين URL عنوان *

لملامح الوجه URL عنوان *

المحملة للاعب مع الشعار الخاص بالنادي أو المنتخب

المعلومات الخاصة باللاعب مثل الجنسية , الفريق الذي يلعب له , تاريخ التولد , الراتب وغيرها *

الإحصائيات الخاصة بمهارات اللاعب والتي تتعلق بالهجوم والدفاع ومهارة حارس المرمى وغيرها من المهارات الأخرى *

كل لاعب موجود في إصدارات *

من الإصدار 15 حتى 22 FIFA لعبة

ميزات كثيرة تفوق الـ 100 *

المركز الذي يلعب به اللاعب ومهمته في النادي والمنتخب *

: يمكنك الدخول إلى الرابط وتحميل البيانات

https://lnkd.in/eDScdUUM

4. التنبؤ بمبيعات الكتب :

يكمن النجاح الرئيسي لمكتبة تبيع الكتب المتنوعة في الإقبال الكبير على عمليات الشراء الفعالة للكتب المناسبة في الوقت المناسب وفي هذا السياق تقوم إحدى الفعاليات التجارية الرائدة في مجال الكتب والمكتبات بتنظيم مسابقة لدعم بائعي الكتب تتيح لهم المنافسة في السوق

لذا المنافسة هنا تتمثل بالتنبؤ بكميات الشراء لمحفظة ملكية معينة بوضوح لكل موقع بواسطة بيانات محاكاة

: الوظيفة

خوض غمار المنافسة يتطلب التنبؤ بكميات الشراء لثمانية عناوين لـ 2418 موقعاً متنوعاً , ولبناء النموذج سيتم إتاحة بيانات الشراء المحاكاة من 2349 موقعاً إضافياً مع إشارة جميع البيانات إلى فترة زمنية محدودة , والغاية هي تقدير كميات الشراء لهذه العناوين الثمانية المتنوعة للمواقع المقدر عددها بـ 2418 بأعلى دقة ممكنة

: البيانات

توفر ملفان مساعدان لحل المشكلة هما

* dmc2009_train.txt

* dmc2009_forecast.txt

يمكنك الدخول إلى الرابط وتحميل البيانات :

https://lnkd.in/eXHN2XsQ

5. مبيعات محلات السوبر ماركت :

تُعدُّ المناطق المكتظة بالسكان أكثر انتشاراً لمحلات السوبر ماركت وهذا يخلق فيما بينها تنافساً تجارياً ينعكس إيجاباً على حركة السوق ويساهم في نمو الاقتصاد إجمالاً

وسنتناول في بحثنا اليوم مجموعة البيانات التي تمثل مبيعات لثلاثة فروع تابعة لشركة سوبر ماركت لمدة تسعين يوماً وقد اختيرت هذه المجموعة نظراً لسهولة نماذج تحليل البيانات التنبؤية الخاصة بها

:البيانات الخاصة بالتصنيف

معرِّف الفاتورة : وهو عبارة عن رقم تعريفي لفاتورة المبيعات

الفرع : فرع السوبر سنتر ( من أصل ثلاث فروع تم الإشارة إليها

( C و B و A بالرموز

المدينة : المواقع الأكثر حيوية

نوع العميل : يصنف الأعضاء نوع العملاء على أساس المستخدمين لبطاقة العضوية وغير المستخدمين لها

الجنس : يحدد جنس العميل

خط الإنتاج : يعتمد على توزيع المكونات الأساسية كالأطعمة والمشروبات والسياحة والرياضة والإكسسوارات الإلكترونية وإكسسوارات الزينة والأزياء .. وغيرها

سعر المنتج : ويقدر بالدولار الأمريكي

الكمية : وهي عدد المنتجات التي قام العميل بشرائها

الضريبة : وهي رسوم ضريبية تقدر بقيمة 5 % تضاف لقيمة الشراء

السعر الإجمالي : المجموع الكلي للسعر بما فيه الضريبة

التاريخ : تاريخ الشراء ( وهي الفترة المحصورة بين مايو ويوليو من عام 2019 )

الوقت : وهو وقت الشراء ( من 9 صباحاً إلى 8 مساءً )

الدفع : طريقة الدفع التي يستخدمها العميل عند الشراء وهي واحدة من ثلاثة طرق ( دفع مباشر – وبطاقة ائتمان – أرشيف أعمال إلكتروني )

قيمة المنتجات المباعة : COGS

نسبة الهامش الكلّي : نسبة الهامش الكلي

المردود الكلي : الدخل الإجمالي

التصنيف : يعتمد على تصنيف مستويات العملاء بناء على حركة التسوق وفق نسبة تقدر من 1 إلى 10

:يمكنك الدخول إلى الرابط وتحميل البيانات

https://lnkd.in/e86UpCMv

Advertisements

6. ضبط الإجراءات الاحتيالية الخاصة ببطاقات الائتمان :

تعتبر عملية ضبط عمليات التزوير في معاملات بطاقات الائتمان من الأمور بالغة الأهمية لشركات الائتمان والمتمثلة بالحصول على رسوم من العملاء مقابل منتجات لم يقوموا بشرائها

تضم مجموعة البيانات معاملات نُفِّذَت في يومين بواسطة بطاقات الائتمان في أيلول من عام 2013 بحيث ضُبِطَت عدة معاملات مزورة من أصل آلاف المعاملات , وبهذا نجد نسبة كبيرة من عدم التوازن في مجموعة البيانات هذه , وسجلت عمليات التزوير نسبة 0.172٪ من أصل إجمالي المعاملات

تم الحصول على العناصر الأساسية

V1 ، V2 ، … V28 وهي الميزات

PCA باستخدام تحويل

الذي ينتج عنه متغيرات الإدخال الرقمية , إلا أن السمات التي لم يتم تحويلها تتمثل بالمبلغ والوقت بحيث يمثل المبلغ ( كلفة المعاملة ) , والوقت يمثل الثواني المستهلكة بين المعاملة والأخرى , أما سمة الفئة فهي متغيرة وفقاً للحالة التي عليها المعاملة ففي حالة الاحتيال تأخذ الفئة قيمة 1 وتأخذ قيمة صفر في حال كانت المعاملة سليمة

يمكنك الدخول إلى الرابط وتحميل البيانات :

https://lnkd.in/eFTsZDCW

7. أشهر 50 سلسلة مطاعم للوجبات السريعة في أمريكا :

هو الطعام الذي يباع في مطعم أو متجر وهو مؤلف من أطعمة مجمدة أو مطهوة مسبقاً وتُقدم في عبوات خاصة للطلبات الفورية الخارجية ويتم إنتاجها بكميات كبيرة مع مراعاة السرعة في التقديم والتوصيل ووفق إحصائيات عام 2018 وصلت قيمة إنتاج الوجبات السريعة مئات المليارات من الدولارات في جميع أنحاء العالم

وتعتبر منافذ بيع الهامبرغر كما هو الحال عند ماكدونالدز الأكثر شيوعاً وطلباً في العالم وغيرها من الوجبات السريعة الأخرى التي تعتمد على تجميع وفق الطلب للمكونات الأساسية المعدّة مسبقاً بكميات كبيرة

ويمكن أن تتوفر على شكل أكشاك أو سيارات متنقلة أو مطاعم الخدمة السريعة

المحتوى

في حالتنا هذه تعتبر مجموعة البيانات هي دراسة لمعلومات عن أفضل 50 سلسلة مطاعم في أمريكا لعام 2021 , ويمكننا تحديد النقاط الرئيسية لمجموعة البيانات هذه

سلاسل الوجبات السريعة – المبيعات في أمريكا مقدرة بملايين الدولارات – المعدل الوسطي للمبيعات في كل وحدة مقدرة بآلاف الدولارات – المتاجر المرخصة – العدد الكلي للوحدات لعام 2021

: التنسيق العمودي لمجموعة البيانات

Fast-Food Chains – اسم سلسلة الوجبات السريعة
U.S. Systemwide Sales (Millions – U.S Dollars) – المبيعات على مستوى النظام الأمريكي مقدرة بملايين الدولارات
Average Sales per Unit (Thousands – U.S Dollars) – المعدل الوسطي للمبيعات لكل وحدة مقدرة بآلاف الدولارات
Franchised Stores – عدد المتاجر المرخصة
Company Stores – عدد مخازن الشركة
2021 Total Units – عدد الوحدات الإجمالية في عام 2021
Total Change in Units from 2020 – عدد التغيرات الكلية عن العام السابق 2020

يمكنك الدخول إلى الرابط وتحميل البيانات :

https://lnkd.in/esBjf5u4

8. Walmart التنبؤ بمبيعات متجر

سيكون بين يديك بيانات المبيعات الخاصة بعدد من المتاجر التابعة لـوول مارت والمنتشرة في العديد من المناطق بحيث يتضمن كل متجر عدة أقسام وستكون المهمة الموكلة إليك هي التنبؤ بالمبيعات المتعلقة بالقسم الخاص بكل متجر .

كما وأن وول مارت يقوم بالعديد من الحملات الترويجية بشكل مستمر ولاسيما العروض التي تتزامن مع الأعياد الرسمية الكبرى وتنال هذه الأسابيع بما فيها الإجازات تقييم أعلى بخمس مرات من أيام العطلات ويكمن إثبات الكفاءة في خوض هذه التجربة من خلال تحديد نتائج عمليات الشطب في أسابيع العطلات في ظل عدم وجود بيانات تاريخية كاملة .

مخازن csv

يضم هذا الملف بيانات غير معلومة المصدر لخمس وأربعون متجراً تدل على نوع وحجم المتجر

train.csv

وهو ملف بيانات التدريب التاريخية تشمل الفترة بين 5/2/2010 ولغاية 1/11/2012

: وهو يحوي الحقول التالية

Store – the store number
Dept – the department number
Date – the week
Weekly_Sales : مبيعات قسم معين في متجر معين
IsHoliday : هل هو أسبوع عطلة أما لا

test.csv

train.csv هذا الملف يختلف عن

فقط في وجوب التنبؤ بالمبيعات لكل ثلاثة أقسام من المتجر والتاريخ والقسم في هذا الملف , وعدا ذلك هو مطابق

train.csv تماماً لـملف

features.csv

يتضمن هذا الملف المزيد من المعلومات كالمخزن والقسم ونشاط التواريخ المحددة وهو يحوي الحقول التالية

Store – the store number
Date – the week
Temperature – معدل درجة الحرارة في المنطقة
Fuel_Price – ثمن المحروقات في المنطقة
MarkDown1-5 – بيانات غير معلومة المصدر خاصة بإجراءات الشطب التسويقية التي يشغلها وول مارت
CPI – قيمة تدل على أسعار السمتهلك
Unemployment – معدل البطالة
IsHoliday – هل هو أسبوع عطلة أم لا ؟

للاستراحة تصادف العطلات الأربعة في الأسابيع التالية في مجموعة البيانات مع ملاحظة أنه لم تُدرج جميع العطل في البيانات

Super Bowl: 12 فبراير 10 ، 11 فبراير 11 ، 10 فبراير 12 ، 8 فبراير ، 13

Labor Day: 10 سبتمبر – 10 ، 9 سبتمبر – 11 ، 7 سبتمبر – 12 ، 6 سبتمبر – 13

Thanksgiving: 26-نوفمبر -10 ، 25-نوفمبر -11 ، 23-نوفمبر -12 ، 29-نوفمبر -Christmas: 31 ديسمبر 10 ، 30 ديسمبر 11 ، 28 ديسمبر 12 ، 27 ديسمبر 13

: يمكنك الدخول إلى الرابط وتحميل البيانات

https://lnkd.in/eVT6h-CT

9. Linkedin قوائم مهام محلل البيانات

لكل مبتدئ في تحليل البيانات إليك الخطوات البسيطة والتي تتمثل في جمع البيانات وتنظيفها وتحليلها أما من ناحية جمع البيانات فقد قمنا بكتابة نص برمجي بلغة بايثون

Linkedin للانتقال عبر

وقمنا بجمع كل البيانات اللازمة ووقع الاختيار على 3 مواقع : إفريقيا وكندا وأمريكا

ميزات

التسمية : المسمى الوظيفي *

الشركة : اسم الشركة *

الوصف : وصف الوظيفة والشركة *

في الموقع – عن بعد *

موقع عمل الموظف *

الراتب : راتب الوظيفة *

موقع الشركة *

المقاييس : شروط التوظيف كالخبرة وطبيعة العمل *

تاريخ الإعلان : تاريخ الإعلان عن الوظيفة *

الخاص بالوظيفة : URL الرابط *

يمكنك الدخول إلى الرابط وتحميل البيانات

https://lnkd.in/ezqxcmrE

10. أمازون وأفضل المشتريات الإلكترونية :

سنتناول تقييمات لـخمسين منتجاً إلكترونياً من متاجر إلكترونية عبر الإنترنت مثل أمازون وبيست باي

تشمل مجموعة بيانات Datafiniti

تاريخ المراجعة والموقع والتصنيف والبيانات الوصفية للمراجع , نلاحظ أنها مجموعة بيانات ضخمة لذا سنتعرف على الطريقة المثلى لاستخدام هذه البيانات والاستفادة منها كما يجب

يكمن وجه الاستفادة من هذه البيانات في معرفة رأي المستهلك في عملية شراء المنتج وللتوضيح نحدد النقاط التالية

ما هي الاستخدامات الرئيسية للمنتجات الإلكترونية ؟ *

تحديد الصلة بين التقييمات والمراجعات الإيجابية *

ما مدى جودة الماركات التجارية المتنوعة عبر الإنترنت ؟ *

؟ Datafiniti ما وظيفة

يتيح الوصول المباشر إلى بيانات الموقع الإلكتروني وذلك بتجميعها من عدد كبير من المواقع لبناء قواعد بيانات مشتركة للنشاط التجاري والمنتجات وحقوق الملكية

:يمكنك الدخول إلى الرابط وتحميل البيانات

https://lnkd.in/e4fBZvJ3

Advertisements

Get Your Bar Chart To The Next Level With Python

Posted on December 9, 2022December 9, 2022 by s4l8384gmailcom

Advertisements

Today we will learn to create attractive and valuable bar charts with a simple set of code backed by some experience and technical skill.

There is no doubt that mastering the design of graphic visualizations is an important factor for any data scientist, so in this article we will learn about the most important procedures necessary to complete these designs using Python (Matplotlib & Seaborn).

Dataset:

In our research today, we will discuss a data set that includes information about Pokemons due to the diversity of its characteristics.

They are characterized by continuity (Pokemons are characterized by defense, attack and other combat skills).

It is characterized by a variety of groups (species, name and genes).

And logical (legendary) and thus we have a balance of a variety of models to create charts.

And to get this set of data immediately from the store by the main code related to our search as shown in this table:

Knowing the purpose of the analysis process is the initial stage for designing strong graphic representations by finding solutions to the questions raised about the data available to us.

Our data set can represent answers to many of the questions posed, and what the creation of an excellent chart depends on is finding a solution to the question asked about categorical values such as determining the type of Pokemon:

In our example presented in this research, the most appropriate question to be answered is:

What types of Pokemons have the highest attack values?

To prepare for the answer to this question we will start by preparing the data and creating the first “master” bar chart using Group by and we can plot the data using Seaborn

Observing what resulted in the scheme, it becomes clear to us that the information calls into question the validity of the answer to the question posed above, as it does not show us an accurate answer about the type of the highest attacking Pokemon.

In order to reach an accurate answer, we must adjust the data according to an ascending or descending pattern and determine the number of available items. When we reach the top ten positions, for example, we can exclude random data and make the chart more organized and useful.

With more coordination and organization, we should not neglect the aspect of choosing the most appropriate colors, and this is embodied in selecting only one color. The value of the chart is derived from the appropriateness of the colors, and choosing different colors loses this value. This is done through a few code formats that enable us to add a title, change the font size, and adjust the image size.

We can make use of the color selection feature using Hex code.

Here is an explanation of how to write the code:

Advertisements

We notice that we are beginning to see a more organized result, and here we are about to achieve a more accurate answer by identifying the type of pokemon that is the best attacker, and what increases the graphic representation is more quality, the reset dimensions, in addition to the appropriate title that attracts the attention of the reader.

Despite the quality that we have achieved, it is possible to show a more organized and accurate scheme. This is done by removing redundant information that is useless. In our scheme, we note for each axis a name that indicates it, and it is also shown in the title. So here, repetition is useless.

The direction of the graph also has implications that help the reader to identify the chart before reading the data itself. The prevailing definition is that reading the visualizations from left to right or from top to bottom enables the viewer to know the information that will be read first, and this is called the Z pattern.

Applying this pattern to our chart, we will move the title to the left to be read first and shift the X axis to the top for the same reason.

We have the following codes:

Thus, we have obtained an ordered and understandable graphic representation, and it can be said that we have obtained the required goal by creating an ideal bar chart visualization.

Advertisements

أنشئ المخطط الشريطي الخاص بك للوصول إلى مستوى متقدم بواسطة بايثون

Advertisements

سنتعلم اليوم إنشاء مخططات شريطية جذابة وقيِّمة وبمجموعة بسيطة من التعليمات البرمجية مدعومة ببعض الخبرة والمهارة الفنية

مما لا شك فيه أن إتقان تصميم المخططات البيانية هو عامل مهم لدى أي عالِم بيانات لذا سنتعرف في هذا المقال على أهم الإجراءات اللازمة لإنجاز هذه التصاميم على أكمل وجه

(Matplotlib & Seaborn) باستخدام بايثون

: مجموعة البيانات

في بحثنا اليوم سنتناول مجموعة بيانات تضم معلومات عن البوكيمونات نظراً لتنوع خصائصها

فهي تتصف بالاستمرارية ( فالبوكيمونات تتصف بالدفاع والهجوم وغيرها من المهارات القتالية )

وتتصف بزمر متنوعة ( الأنواع والاسم والجينات )

والمنطقية ( الأسطورية ) وبهذا يصبح لدينا رصيد من نماذج متنوعة لإنشاء المخططات البيانية

وللحصول على مجموعة البيانات هذه بشكل فوري من المخزن بواسطة الكود الرئيسي المتعلق ببحثنا هذا كما هو موضح في هذا الجدول

معرفة الهدف من عملية التحليل هو المرحلة الأولية لتصميم تمثيلات بيانية قوية وذلك عن طريق إيجاد الحلول للأسئلة المطروحة حول البيانات المتاحة لدينا

مجموعة البيانات الموجودة لدينا يمكن أن تمثل إجابات للعديد من الأسئلة المطروحة , وما يعتمد عليه إنشاء مخطط بياني ممتاز هو إيجاد حل للتساؤل المطروح عن قيم فئوية كتحديد نوع البوكيمون

وفي مثالنا المطروح في هذا البحث السؤال الأنسب المراد الإجابة عليه هو

ما هي أصناف البوكيمونات التي تمتلك أعلى قيم من حيث الهجوم ؟ *

“وللتحضير للإجابة عن هذا السؤال سنبدأ بتجهيز البيانات وإنشاء المخطط الشريطي الأول ” الرئيسي

Group by باستخدام

Seaborn ويمكننا رسم البيانات باستخدام

بملاحظة ما نتج عنه المخطط يتضح لنا أن المعلومات وتدعو إلى الشك في صحة الإجابة على السؤال المطروح آنفاً إذ لا تظهر لنا إجابة دقيقة عن نوع البوكيمون الأعلى هجوماً

وللوصل إلى إجابة دقيقة لابد لنا من ضبط البيانات وفق نسق تصاعدي أو تنازلي وتحديد عدد الأصناف المتاحة وعند الوصول إلى تحديد المراكز العشر الأولى مثلاً يصبح بإمكاننا استبعاد البيانات العشوائية وجعل المخطط أكثر تنظيماً وفائدة

وبمزيد من التنسيق والتنظيم لا يجب أن نهمل جانب الاختيار الأنسب للألوان ويتجسد ذلك بتحديد لون واحد فقط فقيمة المخطط مستمدة من مناسبة الألوان واختيار الألوان المختلفة تفقده هذه القيمة وهذا يتم من خلال بضعة أنساق من التعليمات البرمجية تمكننا من إضافة عنوان وتغيير حجم الخط وتعديل قياس الصورة يمكننا الاستفادة من خاصية اختيار الألوان

Hex باستخدام كود

: وفيما يلي توضيح لطريقة كتابة الكود

Advertisements

نلاحظ أننا بدأنا نلمس نتيجة أكثر تنظيماً وها نحن على وشك تحقيق إجابة أكثر دقة بتحديد نوع البوكيمون الأفضل هجوماً , ومما زاد التمثيل البياني أكثر جودة إعادة ضبط الأبعاد إضافة إلى العنوان المناسب الذي يلفت انتباه القارئ

ورغم الجودة التي وصلنا إليها إلا أنه بالإمكان إظهار مخطط أكثر تنظيماً ودقة ويتم ذلك عن طريق إزالة المعلومات المكررة التي لا فائدة منها وفي مخططنا نلاحظ لكل محور اسم يدل عليه وهي موضحة أيضاً في العنوان إذاً هنا التكرار لا فائدة منه كما وأن لاتجاه الرسم البياني مدلولات تعين القارئ على التعرف على المخطط قبل قراءة البيانات نفسها فالتعريف السائد أن قراءة التصورات من اليسار إلى اليمين أو من الأعلى إلى الأسفل يمكن الناظر من معرفة المعلومات التي سيتم قراءتها أولاً

Z وهذا ما يسمى بالنمط

وبتطبيق هذا النمط على مخططنا سنقوم بنقل العنوان إلى اليسار لتتم قراءته أولاً

X وإزاحة المحور

: إلى الأعلى للسبب ذاته فينتح لدينا الرموز التالية

وبهذا نكون قد حصلنا على تمثيل بياني مرتب ومفهوم ويمكن القول أننا حصلنا على الغاية المطلوبة بإنشاء تمثيل بياني شريطي مثالي

Advertisements

Handling End-To-End Data Science Project

Posted on December 3, 2022December 3, 2022 by s4l8384gmailcom

Advertisements

Today, we will discuss the basic concepts that data analysts rely on while practicing their job in data science, and we will go together to identify the main stages that we will pass through during our research from examples of work in the VBO Bootcamp / Miuul project.

1. Forming an idea of the problem to be addressed:

The most important thing that a data scientist begins to do in addressing any issue related to his professional work is to understand the problem that he must solve, and then understand the benefits that result from that solution to the institution or entity in which he works.

A correct understanding of the type of problem or the nature of the work required helps to determine the most appropriate mechanism to address the problems and thus enhance the experiences gained through experience and practice. In our example, we will see different solutions with two different mechanisms.

The data set used:

The data that we will use in this project includes outputs in order to determine the budget necessary to attract the largest possible number of customers, classify them, and prepare advertising programs according to their requirements. Therefore, we followed the regression method to determine the value of the budget, and we followed the aggregation method to classify customers.

The importance of this strategy lies in our ability to determine the level of production based on our knowledge of the profit rates that we will reach

2- Determine the type of data we deal with

In order to carry out this stage accurately, it requires knowledge of several points:

A. What is the type of correlation between the data in our example?

B. What is the primary origin of this data?

C. Are there any null values in this data?

D. Is there a defect in the data?

E. Is there a specific time for the origin of this data?

F. What are the meanings of the columns in the data set?

And your use of the Kaggle data set will make your identification of the data type more necessary to obtain accurate results.

* It is necessary to familiarize yourself with the instructions of the main source of data, and through this you can determine the outliers and empty records, if any.

* Verifying all variables (categorical, numerical, and numeric) that are primarily related to the data of our project.

* Checking the numerical variables that have been identified to assign outliers, if any.

* Identifying the categories that are frequently present within the data and the categories that are hardly present, by exploring the locations of the categorical variables.

* Analyze the correlation between variables to see their effect on each other, and this procedure helps us to keep the variable with the highest correlation with the dependent variable during selection.

* Formation of a general idea of the characteristics and advantages of each element of the project.

This is a practical application of the compilation that we conducted on the information indicating the relationship between the producer and the consumer in a specific population unit and one of the shops located in that area:

The results show that we have: STORE_SALES=UNIT_SALES*SRP

Under normal circumstances, you cannot understand the meaning of this concept, so you will have to search on Google to make sure that the assembly is correct.

3- Data Preprocessing

In our example, it is clear to us through the chart that there were no outliers or null records in the data, but we removed a duplicate column that was detected in the table.

Through our expectation of the correlation, it became clear to us that the information is strongly related to each other:

Grossy_sqft x Meat_sqft → Negative High Correlation

Store_sales x Store_cost → High positive correlation

Store_sales x SRP → High positive correlation

Gross_weight x Net_weight → High positive correlation

Salad_bar x Prepared_food x Coffee_bar x Video_store x Florist → positive median correlation

Advertisements

4. Data Engineering :

It is essential to understand the problems that the organization you work in faces. You need to create value added from data, create key tool indicators, and other necessary tasks.

The main goal of our project is to determine the budget necessary to obtain clients, and this is necessary in order to estimate an appropriate value for the budget that is supposed to be spent in the future at the lowest possible cost.

We have created a number of new variants with Onehot technology

So first we need to convert the categorical variable values into a numeric value so that we can use them in the algorithms, as shown below:

We have obtained new columns by separating the columns by more than one value with the following operations as in the case of the arguments column.

Here we notice the media channels that are used a lot and that directly affect the cost variable.

Motivational words that attract customers as promotional offers have been added to the column related to the promotion category containing words such as “today” and “weekend” and other words that inform the user of the need to obtain a product during a certain period.

We also notice that the columns passed through Onehot are within columns that have a few different values such as: country, profession.

5. Monotheism:

A necessary study so that no variable affects the data and to obtain effective training within the shortest possible period.

We note that we used the StandardScaler model because our data did not contain an exception.

If the data happens to contain an exception, then the RobustScaler model is recommended

6. Estimation:

Indeed, we can say that we succeeded in estimating each model by varying the different skills of machine learning, and we worked on adjusting the Hyperparameter, and before that we had excluded weakly correlated variables, and the purpose of that was to remove the correlation to obtain training in less time.

7. Compilation:

The second plan that we are working on in our project is to obtain customers and keep them as permanent customers, so we classified customers and worked to estimate the value needed for that

This image shows what is meant:

8- Graphic representation:

Data loses its value if we do not deal with it properly. The basis on which successful analysis is built is the correct description of the data, and the best way to achieve this is to visualize the data.

In our project we made a control panel by Microstrategy

Project elements:

Store sales according to its type and cost: The purpose is to determine the sales value and cost based on the type of store.

Stores location map: This map shows the distribution of stores within the city.

Customer Chart: It is a map that shows the classification of customers by country.

Distribution of customers by brand: Depending on the WORD-CLOUD model, we can count the brands of customers.

The media channel staff and the annual AVG: After doing the marketing offers, we were able to determine the appropriate membership and the audience that earns profits from that membership.

Classification of customers: using the dispersion chart.

Based on the division of the resulting five groups, you are now able to deal with them closely and form appropriate strategies to work according to the plans of the company in which you work

Here are examples of the plans that we have created based on the ratios between spending and financial return:

High cost and high financial return: It is represented in spending large amounts of money in exchange for attracting customers, then what you spent on me will return with abundant profit. By analogy, it is possible to determine the channel that receives the largest possible number of communications and exploit that by saving spending as much as possible.

High cost and low financial return: I spend a large amount of money to attract customers, but the financial return is low. This is due to several reasons, including that customers do not find their need in my store.

Low cost and low financial return: I spend a very small amount to get customers, but I may be the target of a specific audience who prefers a specific type of my products, whose financial returns are low. To follow the best strategy in this case, it is advisable to create a marketing campaign for preferred products based on statistics on the quantity and types of materials required.

Low cost and high financial return: This case embodies the speed of my access to customers in the shortest possible time, which brings me a large financial profit through marketing tours for this type of customer.

Medium cost and low financial return: I spend money to get customers, but the financial return is low. My store does not have enough materials that customers require. This problem can be solved by conducting some statistics to remedy the defect.

Advertisements

معالجة شاملة لمشروع علم البيانات

Advertisements

سنتناول اليوم المفاهيم الأساسية التي يرتكز عليها محللو البيانات أثناء ممارستهم لوظيفتهم فيما يتعلق بعلم البيانات وسنمضي سوياً لنتعرف على المراحل الرئيسية التي سنمر عليها تباعاً أثناء بحثنا هذا من أمثلة

VBO Bootcamp / Miuul عن العمل في مشروع

1. تكوين فكرة عن ماهية المشكلة المطلوب معالجتها :

أهم ما يبدأ به عالِم البيانات في معالجة أي قضية متعلقة بعمله الوظيفي هو فهم المشكلة التي يتوجب عليه حلها ثم فهم ما ينتج عن ذلك الحل من فوائد تعود على المؤسسة أو الكيان الذي يعمل فيه

يساعد الفهم الصحيح لنوع المشكلة أو ماهية العمل المطلوب على تحديد الآلية الأنسب لمعالجة المشاكل وبالتالي تعزيز الخبرات المكتسبة من خلال التجربة والممارسة , وفي مثالنا سنشاهد حلول مختلفة بآليتين مختلفتين

مجموعة البيانات المستخدمة

تتضمن البيانات التي سنستخدمها في هذا المشروع مخرجات من أجل تحديد الميزانية اللازمة لجذب أكبر عدد ممكن من العملاء وتصنيفهم وتجهيز برامج دعائية حسب متطلباتهم , لذا اتبعنا طريقة الانحدار لتحديد قيمة الميزانية واتبعنا أسلوب التجميع لتصنيف العملاء

تكمن أهمية هذه الاستراتيجية قدرتنا على تحديد مستوى الإنتاج بناءً على معرفتنا بنسب الربح التي سنصل إليها

2- تحديد نوع البيانات التي نتعامل معها

:وللقيام بهذه المرحلة بدقة يتطلب ذلك معرفة عدة نقاط

أ. ما نوع الترابط بين البيانات في مثالنا ؟

ب. ما هو المنشأ الأساسي لهذه البيانات ؟

ج. هل يوجد ضمن هذه البيانات قيم فارغة ؟

د. هل يوجد خلل في البيانات ؟

و. هل يوجد زمن محدد لمنشأ هذه البيانات ؟

ز. ما هي مدلولات الأعمدة في مجموعة البيانات ؟

Kaggle واستخدامك لمجموعة بيانات

سيجعل تحديدك لنوع البيانات أكثر ضرورة للحصول على نتائج دقيقة

* من الضروري التعرف على تعليمات المصدر الرئيسي للبيانات ومن خلال ذلك تتمكن من تحديد القيم المتطرفة والسجلات الخالية إن وجدت

التحقق من جميع المتغيرات ( الفئوية والعددية والرقمية ) التي تتعلق بصفة أساسية بالبيانات الخاصة بمشروعنا *

تدقيق المتغيرات العددية التي تم تحديدها لتعيين القيم الشاذة إن وجدت *

تعيين الفئات المتواجدة بكثرة ضمن البينات والفئات التي بالكاد تكون موجودة وذلك استكشاف أماكن تموضع المتغيرات الفئوية *

* تحليل الترابط بين المتغيرات لمعرفة تأثيرها على بعضها البعض , ويفيدنا هذا الإجراء في الاحتفاظ بالمتغير ذو الارتباط الأعلى مع المتغير التابع أثناء الاختيار

* تكوين فكرة عامة عن خصائص وميزات كل عنصر من عناصر المشروع *

وهذا تطبيق عملي على التجميع الذي أجريناه على المعلومات الدالة العلاقة بين المنتج والمستهلك في وحدة سكانية معينة وأحد المحلات التجارية المتواجدة في تلك المنطقة

: تظهر النتائج أنه يوجد لدينا

STORE_SALES=UNIT_SALES*SRP

بالأحوال العادية لا يمكنك إدراك معنى هذا المفهوم لذا ستضطر للبحث

للتأكد من صحة التجميع Google في

3. استكشاف القيم المتطرفة والسجلات الخالية :

في مثالنا يتضح لنا من خلال المخطط أنه لم تكن هناك قيم متطرفة أو سجلات خالية في البيانات ولكن أزلنا عموداً مكرراً تم اكتشافه في الجدول

من خلال توقعنا لعلاقة الارتباط اتضح لنا أن المعلومات مرتبطة بقوة بين بعضها

Grossy_sqft x Meat_sqft → ارتباط عالي سلبي

Store_sales x Store_cost → ارتباط عالي إيجابي

Store_sales x SRP → ارتباط عالي إيجابي

Gross_weight x Net_weight → ارتباط عالي إيجابي

Salad_bar x Prepared_food x Coffee_bar x Video_store x Florist → ارتباط متوسط إيجابي

Advertisements

4. هندسة البيانات :

من الضروري فهم المشاكل التي تواجهها المؤسسة التي تعمل بها فأنت بحاجة إلى إنشاء القيم المضافة من البيانات وإنشاء مؤشرات الأداة الرئيسية وغيرها من المهام الضرورية الأخرى

والغاية الأساسية في مشروعنا هو تحديد الميزانية اللازمة للحصول على العملاء وهذا ضروري من أجل تقدير قيمة مناسبة للميزانية المفترض صرفها في المستقبل بأقل تكلفة ممكنة

قمنا بإنشاء عدد من المتغيرات الجديدة

Onehot عن طريق تقنية

إذاً نحن بحاجة أولاً إلى تحويل القيم المتغيرة الفئوية إلى قيمة عددية لكي نتمكن من استخدامها في الخوارزميات , وذلك كما على النمطالموضح أدناه

لقد حصلنا على أعمدة جديدة عن طريق فصل الأعمدة بأكثر من قيمة مع العمليات التالية كما هو الحال في عمود الوسائط

هنا نلاحظ القنوات الإعلامية التي تُستعمل كثيراً والتي تؤثر تأثيراً مباشراً على متغير التكلفة

تم طرح ألفاظ تحفيزية تجذب الزبائن كعروض ترويجية أضيفت للعمود المرتبط بفئة الترويج تحوي كلمات مثل ” اليوم ” و” عطلة نهاية الأسبوع ” وغيرها من ألفاظ التي تُشعِر المستخدم بضرورة الحصول على منتج ما خلال فترة معينة

Onehot نلاحظ أيضاً أن الأعمدة التي مرت عبر

موجودة ضمن أعمدة حازت على عدد قليل من القيم المختلفة مثل : البلد , المهنة

5. التوحيد :

دراسة ضرورية لكي لا يقوم أي متغير بالتأثير على البيانات وللحصول على تدريب فعال خلال أقصر فترة ممكنة

StandardScaler نلاحظ أننا استخدمنا نموذج

لأن بياناتنا لم تحتوي على استثناء

وإن حدث واحتوت البيانات على استثناء فعندها يوصى

RobustScaler باستخدام نموذج

6. التقدير :

, بالفعل نستطيع القول بأننا نجحنا في تخمين كل نموذج عن طريق تنوع المهارات المختلفة للتعلم الآلي

Hyperparameter وعملنا على ضبط

وقبل ذلك كنا قد استثنينا المتغيرات ضعيفة الترابط , والغاية من ذلك إزالة علاقة الارتباط للحصول على تدريب في وقت أقل

7. التجميع :

الخطة الثانية التي نعمل عليها في مشروعنا هي الحصول على الزبائن والمحافظة عليهم كعملاء دائمين لذا صنفنا العملاء وعملنا على تقدير القيمة اللازمة لذلك

: وهذه الصورة توضح المقصود

8. التمثيل البياني :

تفقد البيانات قيمتها إن لم نكن نحسن التعامل معها كما يجب فالأساس الذي يبنى عليه التحليل الناجح هو الوصف الصحيح للبيانات وأفضل طريقة لتحقيق ذلك هو تصور البيانات

Microstrategy في مشروعنا قمنا بصنع لوحة تحكم بواسطة

: عناصر المشروع

مبيعات المتجر قياساً إلى نوعه وتكلفته : الغاية هي تحديد قيمة المبيعات والتكلفة على أساس نوع المتجر

خريطة تموضع المتاجر : تظهر هذه الخريطة توزع المتاجر ضمن المدينة

مخطط العملاء : عبارة عن خريطة توضح تصنيف العملاء حسب البلد

:توزيع العملاء حسب العلامة التجارية

WORD-CLOUD بالاعتماد على نموذج

يمكننا إحصاء العلامات التجارية الخاصة بالعملاء

كادر القناة الإعلامية و AVG السنوي : بعد قيامنا بالعروض التسويقية استطعنا تحديد العضوية المناسبة والجمهور الذي يكسب أرباح من تلك العضوية

تصنيف العملاء : باستخدام مخطط التشتت

استناداً إلى تقسيم المجموعات الخمسة الناتجة أصبح بمقدورك التعامل معها عن قرب وتكوين استراتيجيات مناسبة للعمل وفق خطط الشركة التي تعمل بها

: إليك نماذج عن الخطط التي أنشأناها مبنية على النسب بين الإنفاق والعائد المادي

تكلفة مرتفعة وعائد مادي مرتفع : تتمثل في إنفاق مبالغ كبيرة من المال مقابل جذب العملاء ثم يعود ما أنفقت علي بالربح الوفير , يمكن قياساً إلى ذلك تحديد القناة التي تستقبل أكبر من عدد ممكن من اتصالات واستغلال ذلك بتوفير الإنفاق أكبر قدر ممكن

تكلفة مرتفعة وعائد مادي منخفض : أقوم بإنفاق مبلغ مالي كبير لجذب العملاء ولكن المردود المادي منخفض , يعود هذا لعدة أسباب منها أن الزبائن لا يجدون حاجتهم في متجري

تكلفة قليلة وعائد مادي منخفض : أقوم بإنفاق مبلغ قليل جداً للحصول على العملاء ولكن قد أكون مقصد لجمهور معين يفضل نوع محدد من منتجاتي عوائدها المادية قليلة ولاتباع أفضل استراتيجية حول هذه الحالة يُنصح بإنشاء حملة تسويقية للمنتجات المفضلة استناداً إلى إحصائيات بمكية وأنواع المواد المطلوبة

تكلفة قليلة وعائد مادي مرتفع : تجسد هذه الحالة سرعة وصولي إلى العملاء بأقل وقت ممكن مما يعود علي بربح مادي كبير عن طريق جولات تسويقية لهذا النوع من العملاء

تكلفة متوسطة وعائد مادي منخفض : أنفق المال للحصول على العملاء ولكن المردود المادي قليل , لا تتوفر في متجري المواد التي يطلبها العملاء بشكل كافي , يمكن حل هذه المشكلة بإجراء بعض الإحصائيات لتدارك الخلل

Advertisements

Why Is DataCamp The Best Platform For Learning Data Science In 2023?

Posted on November 28, 2022October 20, 2023 by s4l8384gmailcom

Advertisements

What is DataCamp?

This program allows you to learn how to work with data over the Internet at a pace that is proportional to the extent to which you interact and understand the information you receive from learning the basics of non-coding skills to data science and machine learning, this program allows you to learn how to work with data online at a pace commensurate with your interaction and understanding of the information you receive.

DataCamp Learning Strategy:

• Complete Learning: You must complete the interactive courses

• Continuous training: Dealing with daily problems continuously

• Practical application: search for the most prominent problems on the ground and work to address them.

• Evaluate yourself: identify your weaknesses and work to rectify them, identify your strengths and strive to develop them.

Advertisements

Here is a simple example of the effective exercises included in the platform:

This is an example of the practical application of your learned skills:

After learning and acquiring sufficient skill, you can start working as follows:

Your professional start will start as a data scientist, then you will move to data analysis. Your mastery of the previous skills will qualify you to enter the world of machine learning, then you will move to data engineering, then work as a statistician and programmer.

Advertisements

ما الذي يجعل منصة

DataCamp

الأفضل لتعلم علوم البيانات في عام 2023 ؟

Advertisements

؟ DataCamp ما هو برنامج

يتيح هذا البرنامج تعلم كيفية التعامل مع البيانات عبر الإنترنت بوتيرة تتناسب مع مدى تفاعلك وفهمك للمعلومات التي تتلقاها ابتداءً من تعلم القواعد الأساسية لمهارات عدم الترميز وصولاً إلى علوم البيانات والتعلم الآلي

: DataCamp استراتيجية التعلم في

إتمام التعلم : عليك إتمام الكورسات التفاعلية

التدريب المستمر : التعامل مع المشاكل اليومية بشكل مستمر

التطبيق العملي : البحث عن أبرز المشاكل الموجودة على أرض الواقع والعمل على معالجتها

قيّم نفسك : تعرّف على مواطن الضعف واعمل على تداركها وحدّد على نقاط القوة واحرص على تطويرها

Advertisements

: وهذا نموذج بسيط عن التمارين التفاعلية التي تحتويها المنصة

: وهذا نموذج للتطبيق العملي لمهاراتك التي تعلمتها

:بعد تعلمك واكتسابك للمهارة الكافية أصبح بإمكانك البدء بالعمل على النحو التالي

ستنطلق بدايتك المهنية كعالِم بيانات ثم ستنتقل إلى تحليل البيانات فإتقانك للمهارات السابقة سيؤهلك للدخول إلى عالم التعلم الآلي لتنتقل بعدها إلى هندسة البيانات ثم العمل كإحصائي ومبرمج

Advertisements

The 10 Best Data Visualizations of 2022

Posted on November 18, 2022November 18, 2022 by s4l8384gmailcom

Advertisements

In this article, we will highlight some of the best graphic visualizations for the year 2022 related to specific events that took place during this year.

1. Most popular websites since 1993:

In this scenario, we see a comparison between the most popular sites since 1993. It is remarkable that Yahoo still maintains advanced positions in the ranking of the most popular sites until the beginning of 2022.

2. The time period for a hacker to set your password for 2022:

It is noticeable in many Internet sites to adopt the principle of assigning a group of various characters and less than numbers, the above visualization shows the period of time consumed by those who try to infiltrate other sites and accounts in hacking your passwords in the current year.

The importance of this type of visualization lies in the fact that its system relies mainly on the distribution of colors indicating the different times spent trying to decipher the password.

3. High prices of basic materials:

It is worth noting that the rise in the general level of prices and the continuous and increasing demand for materials is one of the results of the war between Russia and Ukraine. In the above scenario, we notice the impact of inflation on the prices of basic materials consumed on a daily basis, such as fuel, coffee and wheat.

The concept of this type of graph can be simplified as a measurement of the rates of rise and fall in the level of a group of bar shapes with the change of time in varying proportions.

4- The most famous fast food chains in the world:

In the above visualization, we see the 50 most popular fast food chains, according to the amount of restaurants in America. This classification was based on the size and category of the restaurant.
Through visualization, we see that McDonald’s is more popular than other restaurant chains around the world
This type of visualization is called an organization chart, and it is intended to distinguish hierarchical data according to a specific classification

5. NATO versus Russia:

One of the most prominent events of this year is the Russian war on Ukraine. Through the graph representing the balance of power between Russia and NATO, you can get acquainted with the real information related to this issue.

This diagram consists of an image made up of a number of illustrations that reach the viewer with the idea presented in the visualization in an attractive and understandable way.

Advertisements

6. The quality of students in educational facilities:

The above visualization shows a comparison between the most and least prevalent types of studies in American colleges. Through what the graphic representation shows, we find that the demand for sciences related to technology, engineering and mathematics increases rapidly compared to the low level of demand for sciences related to arts and history.

7. Most used web browsers over the last 28 years:

The visualization included above shows the most used web browsers over the past 28 years, and the visualization also shows that the Google Chrome browser has the largest proportions of use relative to the rest of the browsers.

This visualization is based on divisions within a circular chart that increases and decreases with the change of time, similar to the strip visualization, but it is distinguished from the strip visualization in distinguishing ratios more accurately, away from absolute numbers.

8. The most spoken languages in the world :

This visualization is characterized by its simplicity, but it is of great value. It is of the bar type that identifies the most used languages in the world.

As shown in the chart, English ranks first in the world, followed by Mandarin and then Hindi.

9. School accidents:

This scenario dealt with statistical rates of some school shooting incidents in many countries during certain periods. The chart shows that the United States recorded the highest percentage of this type of incident compared to the rest of the countries.

10. A further rise in prices and wages:

In addition to the inflation that affects the daily consumed basic materials, wages also have a share of this negative impact. It is well known that with the high level of inflation, the value of the US dollar decreases compared to previous periods.

This perception represents a schematic image that shows the variation in wage growth compared to inflation from several years ago to the present time.

According to the above, we presented models for the best dozens of graphic visualizations of the most important events of the year 2022, which constitute useful models in different forms of graphic planning, depending on classification, sorting, and statistics. You can benefit from them if you decide to perform any type of visualization.

Advertisements

أفضل 10 تصورات بيانية لعام 2022

Advertisements

سنقوم بهذا المقال بتسلط الضوء على بعض أفضل التصورات البيانية للعام 2022 المرتبطة بأحداث معينة جرت خلال هذا العام

1. مواقع الويب الأكثر شيوعاً منذ عام 1993

في هذا التصور نشاهد مقارنة بين المواقع الأكثر شهرةً منذ عام 1993 ومن اللافت أن موقع ياهو ما زال محتفظاً بمراكز متقدمة سلم ترتيب تصنيف المواقع الأكثر شهرة حتى بداية عام 2022

2. الفترة الزمنية التي يستهلكها المتسلل لتعيين كلمة المرور الخاصة بك لعام 2022

من الملاحظ في العديد من مواقع الإنترنت اعتماد مبدأ تعيين مجموعة من الأحرف المتنوعة ومنازل أقل من الأعداد , يبين التصور أعلاه الفترة الزمنية التي يستهلكها من يحاول التسلل إلى المواقع وحسابات الآخرين في اختراق كلمات المرور الخاصة بك في العام الحالي

تبرز أهمية هذا النوع من التصور في كون نظامه يعتمد بشكل أساسي على توزيع الألوان الدالة على اختلاف الأوقات المستهلكة في محاولة فك شيفرة كلمة المرور

3. ارتفاع أسعار المواد الأساسية

الجدير بالذكر أن ارتفاع المستوى العام للأسعار والطلب المستمر والمتزايد على المواد هو أحد نتائج الحرب بين روسيا وأوكرانيا وفي التصور الموضح أعلاه نلاحظ أثر التضخم على أسعار المواد الأساسية المستهلكة بشكل يومي كالمحروقات والبن والقمح

يمكن تبسيط مفهوم هذا النوع من المخططات البيانية بأنه عبارة عن قياس لمعدلات ارتفاع وانخفاض في مستوى مجموعة من أشكال شريطية مع تغير الزمن بنسب متفاوتة

4. سلاسل مطاعم الوجبات السريعة الأشهر في العالم

في التصور المدرج أعلاه نرى أشهر 50 سلسلة مطاعم للوجبات السرعة حسب كمية المطاعم الموجودة في أمريكا وقد اعتمد هذا التصنيف على حجم المطعم وفئته

من خلال التصور نرى أن ماكدونالدز تحظى بالشهرة الأوسع مقارنة مع باقي سلاسل المطاعم المنتشرة حول العالم

هذا النوع من التصورات يسمى مخطط هيكلي الغرض منه تمييز بيانات هرمية وفق تصنيف معين

5. الناتو مقابل روسيا

أحد أبرز أحداث هذا العام الحرب الروسية على أوكرانيا , من خلال الرسم البياني الممثل لميزان القوى بين روسيا والناتو تستطيع التعرف على المعلومات الحقيقة المتعلقة بهذا الموضوع

يتألف هذا الرسم البياني من صورة مكونة من تجميع عدد من الرسوم التوضيحية توصل إلى الناظر الفكرة المطروحة في التصور بشكل جذاب ومفهوم

Advertisements

6. نوعية الدارسين في المنشآت التعليمية

التصور المدرج أعلاه يبين مقارنة بين أنواع الدراسات الأكثر والأقل انتشاراً في الكليات الأمريكية ومن خلال ما يوضحه التمثيل البياني نجد أن العلوم المتعلقة بالتكنولوجيا والهندسة والرياضيات يزيد الإقبال عليها بشكل متسارع مقارنة بانخفاض مستوى الإقبال على العلوم المتعلقة بالفنون والتاريخ

7. متصفحات الويب الأكثر استخداماً عبر الـ 28 عاماً الأخيرة

التصور المدرج أعلاه يوضح أكثر متصفحات الويب الأكثر استخداماً عبر الـ 28 عاماً الفائتة وكما يُظهِر التصور

يستحوذ على النسب الأكبر Google Crome أن متصفح

في الاستخدام نسبة إلى باقي المتصفحات

يعتمد هذا التصور على تقسيمات ضمن مخطط دائري تتزايد وتتناقص مع تغير الزمن على غرار التصور الشريطي ولكنه يتميز عن الشريطي في تمييز النسب بدقة أكثر بعيداً عن الأرقام المطلقة

8. أكثر اللغات استخداماً في العالم

يمتاز هذا التصور ببساطته ولكنه ذو قيمة كبيرة وهو من النوع الشريطي يحدد اللغات الأكثر استخداماً في العالم

كما هو موضح في المخطط تحتل اللغة الإنكليزية المرتبة الأولى في العالم تليها الماندرين ثم الهندية

9. حوادث المدارس

تناول هذا التصور نسب إحصائية لبعض حوادث إطلاق النار في المدارس في العديد من الدول خلال فترات معينة , يوضح المخطط أن الولايات المتحدة سجلت أعلى نسبة في وقوع هذا النوع من الحوادث مقارنة مع باقي البلدان

10. ارتفاع أكثر في الأسعار والأجور

علاوة على تأثر المواد الأساسية المستهلكة يومياً بالتضخم فإن للأجور نصيب أيضاً من هذا التأثر السلبي فمن من المعلوم أن مع ارتفاع مستوى التضخم تنخفض قيمة الدولار الأمريكي مقارنة بالفترات السابقة

يمثل هذا التصور صورة تخطيطية تبين تفاوت نمو الأجور بالمقارنة مع التضخم منذ عدة أعوام إلى وقتنا الراهن

وفق ما ذكر أعلاه قدمنا نماذج لأفضل عشرات تصورات بيانية لأهم أحداث العام 2022 تشكل نماذج مفيدة في أشكال مختلفة للتخطيط البياني اعتماداً على التصنيف والفرز والإحصائيات يمكنك الاستفادة منها في حال قررت إجراء أي نوع من أنواع التصور

Advertisements

3 Data Science Certifications you should do in order

Posted on November 15, 2022 by s4l8384gmailcom

Advertisements

It can be said that articles, books, and online courses help you as a beginner in data science to some extent to raise your level, but they do not alone contribute to giving you the experience that professionals have in data science, and you cannot rely on them mainly, as they will not give your resume any official value, but there is More important accredited courses that will make you the focus of attention of employers and contribute to strengthening your chances when applying for any job related to data science. We will talk about them to get to know them closely, to start with them in the following order:

1- IBM Data Science Professional Certificate

It is the typical course for a better start in the journey of learning data science. On the one hand, it is a free course and therefore suitable for those who do not have the money necessary to obtain certificates, and on the other hand, it gives the learner the necessary experience that gives you confidence, since the company offering this certificate is considered strong in this field.

This course is characterized by flexibility in learning if it starts with the trainee from the basics of machine learning and the principles of the Python language from building codes to identifying machine learning algorithms and dealing with them and other important matters in building a solid base of information and all this during a training period not exceeding three months according to experts and then You are exposed to an exam that you must pass to be eligible for this certification.

Advertisements

2- Microsoft Certified: Azure Data Scientist Associate

You may find similarities between this course and the first course, but it takes its importance and value because it is accredited by major technology companies in the world. By studying this course, you will have the opportunity to consolidate and enhance your information that you received in the first course, but at an advanced level compared to the first.

This course provides you with learning how to run your own models from the base of the Azure cloud, and this training enables you to strengthen your skills in managing training costs, which are very important for data science experts, because mastering this skill is necessary in the task of machine learning training, as running a huge network on Your equipment cannot be successfully completed unless you are fully aware of the basics of the right investment for the job.

3- DASCA’s Senior Data Scientist certification

We can now say that after you have passed the previous two certificates, you are facing the most difficult challenge, in front of the stage of proving competence and competence in reaching the level of a professional data scientist. This certificate is provided by the Data Science Authority in the United States, and this alone is enough to make you pay all attention to obtaining it.

A course classified as intended for those who have 4 years of experience in data science, in which you will be trained on training models on the ground. Despite the effort in this learning process, it is worth this suffering because obtaining this certificate will qualify you to apply for the job of professional data scientists that will bring you abundant financial profit.

Although this certificate is not free, it will transfer you to a wide space of comprehensive and advanced knowledge in data science, and given that the work according to it brings you a high wage, as we mentioned above, this is enough to make you make a firm decision to go through this experience.

Conclusion :

Once you complete these courses, you will not need other courses, and make sure that you will be of great interest to business owners looking for employees with experience and high efficiency. Your mastery of these courses and obtaining the above-mentioned certificates will make your chances much stronger than your peers who did not obtain these certificates. Once these are mentioned Certificates in your CV, so know that you are the most prominent candidate for the offspring of a job that many who work in this type of science dream of.

Advertisements

ثلاثة شهادات في علوم البيانات

يجب أن تتقنها بالترتيب

Advertisements

يمكن القول بأن المقالات والكتب والدورات التدريبية عبر الإنترنت تساعدك كمبتدئ في علم البيانات إلى حد ما على رفع مستواك إلا أنها لا تساهم وحدها في إكسابك الخبرة التي يمتلكها المحترفون في علم البيانات ولا يمكنك الاعتماد عليها بشكل أساسي فهي لن تمنح سيرتك الذاتية أي قيمة رسمية بل هناك دورات معتمدة أكثر أهمية من شأنها أن تجعلك محط أنظار رؤساء العمل وتسهم في تقوية حظوظك عند التقديم إلى أي عمل وظيفي متعلق بعلم البيانات سنتناول الحديث عنها للتعرف عليها عن قرب على أن تبدأ بها على الترتيب التالي

1- IBM Data Science Professional شهادة

وهي الدورة النموذجية لبداية أفضل في رحلة تعلم علوم البيانات , فمن ناحية هي دورة مجانية وبالتالي تناسب من لا يملك المال اللازم للحصول على الشهادات ومن ناحية أخرى تُكسِب المتعلم الخبرة اللازمة التي تمنحك الثقة كون الشركة المقدمة لهذه الشهادة تعتبر قوية في هذا المجال

تمتاز هذه الدورة بمرونة في التعلم إذا تنطلق بالمتدرب من أساسيات التعلم الآلي ومبادئ لغة بايثون من بناء الأكواد إلى التعرف على خوارزميات التعلم الآلي والتعامل معها وغير ذلك من الأمور المهمة في بناء قاعدة متينة من المعلومات وكل ذلك خلال مدة تدريبية لا تتجاوز الثلاثة أشهر حسب خبراء ثم تتعرض لاختبار عليك اجتيازه لتكون مؤهلاً للحصول على هذه الشهادة

Advertisements

2- Azure Data Scientist Associate شهادة معتمدة من مايكروسوفت

قد تجد تشابهاً بين هذه الدورة والدورة الأولى ولكنها تأخذ أهميتها وقيمتها كونها معتمدة من قِبل كبرى شركات التقانة في العالم فبدراسة هذه الدورة ستكون لديك الفرصة في تثبيت وتعزيز معلوماتك التي تلقيتها في الدورة الأولى ولكن على مستوى متقدم مقارنة بالأولى

تؤمن لك هذه الدورة تعلم كيفية تشغيل النماذج الخاصة بك

Azure انطلاقاً من القاعدة الأساسية للسحابة

وهذا التدريب يمكنك من تقوية مهاراتك في إدارة تكاليف التدريب المهمة جداً لخبراء علم البيانات لأن احتراف هذه المهارة أم ضروري في مهمة التدريب على التعلم الآلي إذ أن تشغيل شبكة اتصال ضخمة على أجهزتك لا يمكن أن يتم بنجاح إلا إذا كنت على دراية تامة بأساسيات الاستثمار الصحيح لهذه المهمة

3- DASCA شهادة علماء البيانات المحترفون من

يمكننا القول الآن بأنك بعد اجتيازك للشهادتين السابقتين فأنت أمام التحدي الأكثر صعوبة , أمام مرحلة إثبات الكفاءة والجدارة في الوصول إلى مستوى عالم بيانات محترف , هذه الشهادة مقدمة من هيئة علوم البيانات في الولايات المتحدة وهذا وحده كفيل بأنه يجعلك تولي كل الاهتمام بالحصول عليها

دورة مصنفة أنها معدّة للذين لديهم خبرة 4 سنوات في علم البيانات ستتدرب فيها على نماذج تدريبية على أرض الواقع ورغم العناء في مسيرة التعلم هذه إلا أنها تستحق هذه المعاناة لأن حصولك على هذه الشهادة سيؤهلك للتقدم على وظيفة علماء البيانات المحترفين التي تعود عليك بالربح المادي الوفير

على الرغم من أن هذه الشهادة ليست مجانية إلا أنها ستنقلك إلى فضاء واسع من المعرفة الشاملة والمتطورة في علم البيانات ونظراً لكونها العمل بمقتضاها يعود عليك بأجر مترفع كما أسلفنا فهذا كفيل بأن يجعلك تتخذ قرار حازم في خوض هذه التجربة

: الخلاصة

مجرد إتمامك لتلك الدورات فلن تحتاج إلى دورات أخرى وتأكد بأنك ستكون محط اهتمام كبير لدى أصحاب الأعمال الباحثين عن موظفين من ذوي الخبرة والكفاءة العالية , إتقانك لتلك الدورات وحصولك على الشهادات المذكورة أعلاه سيجعلان حظوظك أقوى بكثير من أقرانك الذين لم يحصلوا على هذه الشهادات وبمجرد ذكر هذه الشهادات في سيرتك الذاتية فاعلم أنك المرشح الأبرز لنسل وظيفة يحلم بها الكثيرين ممن يمتهنون هذا النوع من العلوم

Advertisements

Personal Data Ecosystem

Posted on November 9, 2022November 28, 2022 by s4l8384gmailcom

Advertisements

With the rapid development of information technology in general and communication in particular, software companies continuously produce smart services and modern applications that give the details of our daily lives a lot of interest, for example, but not limited to, applications for measuring blood sugar, the method of burning calories and other programs that provide guidance related to the physical and psychological health of users .

These applications will build an information system related to their users personally. If these applications or services are used correctly, they will give accurate results. We will address the impact of the uses on users and the extent to which these services can be directed and invested in serving our daily needs, whether health ones or related to the tools that we deal with permanently and continuously. .

Sources :

In the process in which these applications collect our data, that data will be used to make our lives more enjoyable and comfortable.

Here we will analyze the structural structure of the data and we will start by forming two columns, the first containing the data sources and the second containing the resulting information.

With the presence of smart devices that link our bodies, our behaviors, our projects, and the Internet, making us digital physical elements, these devices have become the focus of the attention of many around the world. We will call these tools “devices”.

outputs:

You can imagine that an application can record your sleep times and analyze it to come up with a standard that determines the optimal time for you. It sets its alarm to wake you up in the morning, and another application to measure your breathing and another application to analyze your heart rate by skin color. All of these services are available through “apps” “

Advertisements

Key technologies are devised for similar applications that include the common tasks of those applications so that developers and programmers use their content specifically to facilitate their access to the devices that produce the data composing the applications and this is called “APIs”.

Some companies use the information of application users to serve their advertising purposes, as they create analyzes of our daily needs and basic requirements and obtain models based on them that provide them with advertising materials of higher value.

The process of relying on the source of information and analyzing the data can be called “business.”

Some research for some companies is based on the exploration of valuable information extracted from the data ocean of users to be invested in the service of various fields such as medicine or marketing. We will call this process “research”

In the end, we cannot make a final judgment according to what was mentioned that the investment of user information is included under the purpose of advertising only, but it can be clearly recognized that there are companies striving to provide useful service to users, which enhances confidence between the producer and the consumer in what is called “experience.”

Here, the difference between those who play the role of data sources and those who give a way out to the data becomes clear. To clarify, we present some evidence on the ground:

Muse, the brain sensing headband
http://www.choosemuse.com/

Smart Contact Lenses (Google and Novartis)
http://online.wsj.com/articles/novatis-google-to-work-on-smart-contact-lenses-1405417127

Sources->Apps

LEO: Wearable Fitness Intelligence
https://www.indiegogo.com/projects/leo-wearable-fitness-intelligence#home

Wristbands: Startups Launch New Generation Of Smart Wristbands
http://www.forbes.com/pictures/ekhf45eedek/nymi-5/

Dream:ON — Influence your dreams
http://www.dreamonapp.com/

Sources->APIs

Sleep Cycle — Waking up made easy
http://www.sleepcycle.com/

Cardiio — Your heart rate monitor, reinvented
http://www.cardiio.com/

Human API
http://humanapi.co/

Google Android Wear
http://www.android.com/wear/

Apple HomeKit
https://developer.apple.com/homekit/

Apple HealthKit
https://developer.apple.com/healthkit/

Exits->Business

Evrythng — Make products smart
https://evrythng.com/

nymi — Your everyday simplified
https://www.nymi.com/

Rapleaf — Real-Time Data on 80% of U.S. Emails
http://www.rapleaf.com/

YipitData — Track company performance from online data
http://yipitdata.com/

Granify — Do you know which shoppers aren’t going to buy? We do.
http://granify.com/

Datacoup — Introducing The First Personal Data Marketplace
https://datacoup.com/

Exits->Research

Mobileum — Get Wisdom from Your Data
http://www.mobileum.com/

VisualDNA — Big Data + Psychology = Understanding
http://www.visualdna.com/

MIT Technology Review — Big Data Gets Personal
http://www.technologyreview.com/businessreport/big-data-gets-personal/download/

Pocket Therapy: Do Mental Health Apps Work? http://www.medscape.com/viewarticle/769769

A Roadmap to Advanced Personalization of Mobile Services
https://www.dropbox.com/s/apm0jtvcbeb664h/coopis02i.pdf

MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications
https://www.dropbox.com/s/cd6e4eryatc5hzr/MaskIt-SIGMOD12.pdf

Exits->Experience

Mobile Content Personalisation Using Intelligent User Profile Approach
https://www.dropbox.com/s/l2x7i54hvj0u8hw/Mobile_Content_Personalisation.pdf

Intelligent Mobile User Profile Classification for Content Personalisation
https://www.dropbox.com/s/59bjitsvalcjd72/Worapat_Paireekreng_Intelligent_Mobile_User_Profile_Classification_for_Content_Personalisation.pdf

Disney — You don’t want your privacy
http://gigaom.com/2014/01/18/you-dont-want-your-privacy-disney-and-the-meat-space-data-race/

Google — The rise of phones that read your mind
http://www.dailymail.co.uk/sciencetech/article-2517557/Google-Now-leads-way-apps-know-want-do.html

Happify — How Science and Technology Can Help Make You Happier
https://news.yahoo.com/katie-couric-happify-222938746.html

The question that arises here is, are you, as a user, ready to provide your digital information to a company to exploit it in what is valuable and useful to you?

After the clear vision of the data structure has been completed, perhaps it will be clear that the future of technology will lead us to use the technology of linking sources with exits, which leads us to the possibility that each of us can exploit his personal information to create what is useful and more valuable in what facilitates our daily lives.

Advertisements

البنية التكوينية للبيانات

Advertisements

مع التطور السريع لتكنولوجيا المعلومات عموماً والاتصال خصوصاً تنتج الشركات البرمجية بشكل متسمر الخدمات الذكية والتطبيقات الحديثة التي تضفي على تفاصيل حياتنا اليومية الكثير من الفائدة وعلى سبيل المثال لا الحصر تطبيقات قياس سكر الدم وطريقة حرق السعرات الحرارية وغيرها من البرامج التي تقدم إرشادات تتعلق بالصحة الجسدية والنفسية للمستخدمين

هذه التطبيقات ستقوم ببناء منظومة معلومات تتعلق شخصياً بمستخدميها وفي حال استخدام تلك التطبيقات أو الخدمات بشكل صحيح فسوف تعطي نتائج دقيقة وسنتناول أثر الاستخدامات على المستخدمين ومدى إمكانية توجيه تلك الخدمات واستثمارها فيما يخدم حاجاتنا اليومية سواء الصحية منها أو ما يتعلق بالأدوات التي نتعامل معها بشكل دائم ومستمر

:مصادر

في العملية التي تقوم فيها تلك التطبيقات بجمع البيانات الخاصة بنا سيتم توظيف تلك البيانات في جعل حياتنا أكثر متعة وراحة

وهنا سنقوم بتحليل البنية التكوينية للبيانات وسنبدأ بتشكيل عمودين الأول يحوي مصادر البيانات والثاني يحوي المعلومات الناتجة

وبوجود الأجهزة الذكية التي تربط بين أجسادنا وتصرفاتنا ومشارعنا وبين الإنترنت فتجعل منا عناصر مادية رقمية هذه الأجهزة أصبحت محط اهتمام الكثيرين حول العالم سنطلق اسم “أجهزة” على هذه الأدوات

: مخرجات

لك أن تتصور أنه بإمكان أحد التطبيقات أن يسجل أوقات نومك ويقوم بتحليلها ليخرج لك معياراً يحدد لك فيه الوقت الأمثل فيضبط المنبه الخاص به لإيقاظك صباحاً وتطبيق آخر لقياس التنفس الخاص بك وآخر يقوم بتحليل معدل نبضات القلب عن طريق لون البشرة كل هذه الخدمات تتوفر عبر ” تطبيقات “

Advertisements

يتم ابتكار تقنيات رئيسية للتطبيقات المتماثلة تتضمن المهام المشتركة لتلك التطبيقات بحيث يستخدم المطورون والمبرمجون مضمونها على وجه التحديد فيسهل بذلك وصولها إلى الأجهزة التي تنتج البيانات المكونة للتطبيقات وهذا ما يسمى ” واجهات برمجة التطبيقات

تعمد بعض الشركات إلى استخدام المعلومات الخاصة بمستخدمي التطبيقات لخدمة أغراضها الإعلانية إذ يقومون بإنشاء تحليلات لاحتياجاتنا اليومية ومتطلباتنا الأساسية فيحصلون بناءً عليها على نماذج توفر لهم مواد إعلانية ذات قيمة أعلى

“يمكن أن نطلق على عملية الاعتماد على مصدر المعلومات وتحليل البيانات اسم “الأعمال

تقوم بعض الأبحاث الخاصة ببعض الشركات على التنقيب عن معلومات قيمة تستخرج من محيط البيانات التابعة للمستخدمين ليتم استثمارها في خدمة مجالات متعددة كالطب أو التسويق سنسمي هذه العملية ” البحث

وفي النهاية لا يمكننا أن نطلق حكماً نهائياً وفق ما ذكر بأن استثمار المعلومات الخاصة بالمستخدم ينطوي تحت غرض الإعلان فحسب بل يمكن وبشكل واضح الاعتراف بأن هناك شركات تسعى جاهدة لتأمين الخدمة المفيدة للمستخدمين مما يعزز الثقة بين المنتج والمستهلك فيما يسمى ” الخبرة

:وهنا يتضح الفارق بين من يلعبون دور مصادر البيانات ومن يعطون مخرجاً للبيانات وللتوضيح نطرح بعض الأدلة على أرض الواقع

Muse, the brain sensing headband
http://www.choosemuse.com/

Smart Contact Lenses (Google and Novartis)
http://online.wsj.com/articles/novatis-google-to-work-on-smart-contact-lenses-1405417127

: المصادر-> التطبيقات

LEO: Wearable Fitness Intelligence
https://www.indiegogo.com/projects/leo-wearable-fitness-intelligence#home

Wristbands: Startups Launch New Generation Of Smart Wristbands
http://www.forbes.com/pictures/ekhf45eedek/nymi-5/

Dream:ON — Influence your dreams
http://www.dreamonapp.com/

: المصادر-> واجهات برمجة التطبيقات

Sleep Cycle — Waking up made easy
http://www.sleepcycle.com/

Cardiio — Your heart rate monitor, reinvented
http://www.cardiio.com/

Human API
http://humanapi.co/

Google Android Wear
http://www.android.com/wear/

Apple HomeKit
https://developer.apple.com/homekit/

Apple HealthKit
https://developer.apple.com/healthkit/

: المخرجات -> الأعمال

Evrythng — Make products smart
https://evrythng.com/

nymi — Your everyday simplified
https://www.nymi.com/

Rapleaf — Real-Time Data on 80% of U.S. Emails
http://www.rapleaf.com/

YipitData — Track company performance from online data
http://yipitdata.com/

Granify — Do you know which shoppers aren’t going to buy? We do.
http://granify.com/

Datacoup — Introducing The First Personal Data Marketplace
https://datacoup.com/

: مخرجات -> البحث

Mobileum — Get Wisdom from Your Data
http://www.mobileum.com/

VisualDNA — Big Data + Psychology = Understanding
http://www.visualdna.com/

MIT Technology Review — Big Data Gets Personal
http://www.technologyreview.com/businessreport/big-data-gets-personal/download/

Pocket Therapy: Do Mental Health Apps Work? http://www.medscape.com/viewarticle/769769

A Roadmap to Advanced Personalization of Mobile Services
https://www.dropbox.com/s/apm0jtvcbeb664h/coopis02i.pdf

MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications
https://www.dropbox.com/s/cd6e4eryatc5hzr/MaskIt-SIGMOD12.pdf

: مخرجات -> الخبرة

Mobile Content Personalisation Using Intelligent User Profile Approach
https://www.dropbox.com/s/l2x7i54hvj0u8hw/Mobile_Content_Personalisation.pdf

Disney — You don’t want your privacy
http://gigaom.com/2014/01/18/you-dont-want-your-privacy-disney-and-the-meat-space-data-race/

Google — The rise of phones that read your mind
http://www.dailymail.co.uk/sciencetech/article-2517557/Google-Now-leads-way-apps-know-want-do.html

Happify — How Science and Technology Can Help Make You Happier
https://news.yahoo.com/katie-couric-happify-222938746.html

والسؤال الذي يطرح نفسه هنا هل أنت كمستخدم على استعداد لتقديم معلوماتك الرقمية لأحد الشركات لاستغلالها فيما هو قيم ومفيد بالنسبة لك ؟

بعد أن اكتملت الرؤية الواضحة للبنية المكونة للبيانات ربما سيكون من الواضح أن مستقبل التكنولوجيا ليوصلنا إلى استخدام تقنية ربط المصادر بالمخارج ما يؤدي بنا إلى إمكانية أن يقوم كل منا باستغلال معلوماته الشخصية لابتكار ما هو مفيد وأكثر قيمة في ما يسهل حياتنا اليومية

Advertisements

What is One-Hot Encoding?

Posted on November 5, 2022 by s4l8384gmailcom

Advertisements

In this simple tutorial, we’ll explain One-Hot encoding with Python and R.

This model recognizes numeric values only as inputs. In order for our model to work with data sets, we must encode them, as we will explain later.

What is the concept of One-hot encoding:

This encoding converts groups of data represented by words, letters or symbols into correct numeric values with specific places of ones and zeros that are determined by the number of groups so that each part of these places represents one group or category.

Thus, any category is denoted by the number one, otherwise the symbol will take zero.

Advertisements

We will illustrate with a practical example the process of One-hot coding using R and Python:

Using Python

Using R

So what is the significance of this encoding ?

In the case of important data sets consisting of certain categories, we need to use them in the model, which of course only accepts numeric codes, as is the case in some algorithms, in these cases one-hot encoding is the best option.

Advertisements

ما هو مفهوم

One-Hot Encoding

Advertisements

سنتناول في هذا الدرس التوضيحي المبسط شرح

Python و R باستخدام One-Hot الترميز

يتعرف هذا النموذج على القيم الرقمية فقط على شكل مدخلات ولكي يتمكن نموذجنا من العمل مع مجموعات البيانات يتوجب علينا ترميزها كما سنوضح لاحقاً

؟ One-hot ما هو مفهوم ترميز

يقوم هذا الترميز بتحويل مجموعات من البيانات التي تمثل بكلمات أو حروف أو رموز إلى قيم رقمية صحيحة بمنازل محددة من الآحاد والأصفار يتم تحديدها من خلال عدد المجموعات بحيث يمثل كل جزء من هذه المنازل مجموعة أو فئة واحدة وبالتالي يرمز إلى أي فئة بالرقم واحد وعدا ذلك سيأخذ الرمز صفر

Advertisements

One-hot وسنوضح بمثال عملي عملية ترميز

: وبايثون R باستخدام لغتي

: باستخدام بايثون

: R باستخدام

إذاً من أين تأتي أهمية هذا الترميز ؟

في حال وجود مجموعات بيانات مهمة مؤلفة من فئات معينة فنحن بحاجة إلى استخدامها في النموذج الذي هو بطبيعة الحال لا يقبل التعامل إلا مع الرموز الرقمية كما هو الحال في بعض الخوارزميات

هو الخيار الأفضل one-hot ففي هذه الحالات الترميز

Advertisements

Essential Python Interview Questions

Posted on February 11, 2022August 6, 2022 by s4l8384gmailcom

Advertisements

Programmers and developers show great interest in the Python language, given that it is one of the most important and most popular programming languages in the world of technology, especially contemporary sciences such as data science, artificial intelligence and its branches.

Therefore, it is essential to look at the top eight questions that you will face if you are going to conduct a Python interview .

1- What is your knowledge about interpreted language?

Hiring staff usually start the interview by asking the basic questions about Python and brief explanation of basic concepts of this programming language.

2- What are the benefits of Python?

This is one of the main questions in interviews, that reveals your understanding of the Python language and why companies start replacing other programming languages such as JavaScript, C ++, R and others with Python.

3- Create a list of the common data types in Python

The interviewers are likely to ask about basic functions and concepts that are used a lot when anyone starts using Python including numeric data type, string type, assignment type, list types, set type, and so on.

4- What are the basic differences between lists and tuples?

Your answer to this question reveals your major understanding and ability to identify the differences between basic components of this language like lists ,tuples, mutable and immutable terms.

5- What is _init_?

Some Recruiters ask about details of functions and codes to test your knowledge in this language. The _init_ method is implemented in Python when creating a new object to help distinguishing between methods and attributes during the programming process.

6- Explain the differences between .py and .pyc?

One of the general questions in a Python interview, through which they learn about the programmer’s ability to understand concepts and terms in order to deal with the two differences in an optimal manner as required.

7- Describe Python namespaces.

This is one of the most interesting questions that recruiters usually like to ask in interviews because of the importance of Namespaces to set objects correctly. Your skills in defining the dictionary and Namespaces types is strong evidence for the interviewers of your high proficiency in understanding the Python language.

8- What are all necessary Python keywords?

A main and important question that requires any candidate in the interview to know the important keywords of the Python language before starting the interview, which are 33 keywords that include the meanings of variables and functional terms.

Advertisements

Advertisements

Advertisements

5 Books To Take Your Data Visualization Skills To The Next Level

Posted on February 4, 2022August 6, 2022 by s4l8384gmailcom

Advertisements

In this article, we will review the best data visualization books that will help you raise your level and develop your performance in graphic representation.

1- The Data Visualization Sketchbook:

This book is characterized by being a comprehensive guide to clarify the rules of drawing and dealing with graphs, starting from the stage of its creation, through how to deal with the control panel and designing slides, all the way to the stage of completing the graph in an optimal manner.

2- Storytelling with Data: A Data Visualization Guide for Business Professionals :

This book will teach you the whole process of creating helpful visualisations from A to Z, and how to attract the audience’s attention to the main visualisation points.

3- Effective Data Visualization: The Right Chart for the Right Data

This book is characterized by its easy style and simple presentation to explain the concepts of graphing through its focus on the use of Excel charts and graphs to achieve the Data findings very easily. On the other hand, this book could guide you to successful visualisation creations and teach you how to choose the correct chart for your Data.

4- Resonate: Present Visual Stories that Transform Audiences :

The content of this book focuses on building amazing visualisation that is not forgettable by putting all the elements together with perfect and suitable colors and specific criteria in order to present data finding to your audience in a very particular way, easily and simply.

5- Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks

Researchers are the leaders who find new methods to discover new things in all life aspects and this book is a guidance that helps researchers to present their findings better.

Finally:

Mastering the skills of mathematics and statistics in addition to programming skills and graphic representation will make you a professional in the field of data science and being aware of visualisation tools will enable you to get quick results with high efficiency.

Advertisements

Advertisements

Advertisements

Data Visualization By Python

Posted on January 28, 2022August 6, 2022 by s4l8384gmailcom

Advertisements

Here I will explain visualization by using python. The explanation will be on a real case but I will only introduce python codes with charts explanation.

What is the Dataset about?

We will work on the Breast Cancer Wisconsin (Diagnostic) Dataset. Here, Features are taken from the image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. You can find this dataset in Kaggle.

What are the Data Visualization steps on this Dataset?

1. Importing libraries

2. Distribution plot

3. Pair plot

4. Count plot for Categorical columns

5. Checking Outliers existence

6. Correlation matrix

Matplotlib & Seaborn are the two main libraries in Python as well as other libraries such as: GGplot and Plotly

So let’s start with the first step:

1. Importing the required libraries:

import matplotlib.pyplot as plt

import seaborn as sns

2. Using Distribution plot for all columns:

By creating distribution plots, we can know if the data is normally distributed or there is some skew in it, then we may need to make some transformations to get better results from the machine learning models.

Here we will create the distribution plot for all columns in the dataset and I will display the distribution plot for the “area_mean” column

We clearly notice the right skewness for the “area_mean” column, like most of the columns in the data set. This method of analysis called Univariate Analysis, where we take one variable and analyze it, but when we take two variables at the same time and try to find a relationship between them, then it is called Multivariate Analysis.

3-Pair plot:

The main concept of the pair plot is to understand the relationship between the variables.

Its code is:

4- Count plot for Categorical columns:

When we have a categorical variable we will plot it in a count plot.

This dataset contains one categorical variable (“target”) with two classes:

0 (Benign) and 1 (Malignant)

Count plot can show the total counts for each cateu. As we can see, the number of data points with a rating of ‘0’ is higher than that of ‘1’ which means that we have more Benign cases than Malignant cases in this dataset which is an indication about unbalanced Data.

5- Outliers:

Most ML algorithms such as Regression models, K-Nearest Neighbors, etc are sensitive to Outliers, but other models such as Random forest are not affected by Outliers.

The plot that reveals the outliers is a BOX and Whisker plot:

At the top of the plot in the loop we will create a box for all the columns in the data set which we will display to the “radius_mean” variable alone.

The circles at the top of the top whisker and below the bottom whisker represent the values of the Outliers

In our example, the Outliers values are in the top section only.

6- Correlation matrix:

Its purpose is to find out the correlation between the variables in the data set so that the useful features are selected and the unnecessary ones removed.

We will create a Heat Map to visualize the relationship between the variables :

correlation values range from +1 to -1
If the correlation between two variables is +1, the correlation is positive, and if the correlation is -1, it is negative
Determining the type of correlation between two variables helps in facing the problem of multiple linearity and assit us to take the decision in removing one of the features especially when we have two independent variables that are highly correlated.

Finally, These are the most popular plots that we can create for the dataset that we have. There are several other plots like Pie chart, Scatter plot, etc. We always decide the plots that we need to use depending on the dataset and the insights we are looking for as the conclusions that we derived from Data Visualization process will be helpful for models applications.

Advertisements

Advertisements

Advertisements

What Is Data Visualization?

Posted on January 21, 2022August 6, 2022 by s4l8384gmailcom

Advertisements

This term refers to the visual figures and symbols that capture information in the form of GEOGRAPHICAL MAPS, CHARTS, SPARKLING, INFOGRAPHICS, HEAT MAPS, OR STATISTICAL GRAPHS.

These graphics represent several factors such as AI integration, information abundance, and interactive exploration to make information simple to understand and study that expands the possibility of obtaining more accurate and effective results.

In this context, we offer 5 tools of data visualization that are flexible and efficient:

1- Tableau

This tool provides a complete information architecture building, including Tera, SAP, My SQL, Amazon AWS, and Hadoop and helps in creating schematic diagrams for the foundations of information on an ongoing basis, which made it the most popular tool among data visualization users because it has several advantages, including:

• High efficiency of visualization

• Smooth handling

• Accuracy and effectiveness in performance

• The ability to connect to different data sources

• Responsive Mobile

• It has media support

However, this tool is not without some disadvantages, such as:

• Low pricing

• Lack of automatic update feature and scheduling of the report

2- Power BI

Flexible tool from Microsoft This tool supports a huge amount of back-end information including Teradata, Salesforce, PostgreSQL, Oracle, Google Analytics, Github, Adobe Analytics, Azure, SQL Server and Excel gives results with the great accuracy and speed.

This tool has the following advantages:

– No specialized technical support required

– Easy compatibility with popular applications

– Professional and diversified control panel

– Unlimited speed and memory

– High level security

– Compatibility with Microsoft applications

However, its disadvantage is that it does not provide an environment to work with many and varied data sets.

3- JupyteR

This tool is characterized as one of the best data visualization tools as it allows its users to create and share files that include multiple visualizations and codes. In addition, it is an ideal tool for:

Data cleansing, transformation, statistical modeling, numerical simulation, interactive computing and machine learning.

Positives :

– Prototyping speed

– Give results in elegant looking shapes

– Share visual results easily

Negatives :

– Difficulty to cooperate

– Reviewing scripts is sometimes difficult

4- Google Charts

This tool has the ability to innovate graphical and graphical representation, as well as its compatibility with the most popular operating systems circulating around the world.

Positives :

– Ease of handling.

– The possibility of merging data with complete flexibility.

– Show graphical results through elegant looking graphics.

– Full compatibility with Google applications.

Negatives :

– Requires accuracy in export procedures.

– Lack of demonstrations on tools.

– Unavailability of customization.

– Required network connection required for visualization.

5- IBM Watson

This tool is highly efficient, as it relies on analytical components and artificial intelligence to create models from regular and random information to reach the optimal visualization.

Positives :

– Neuro Linguistic Programming skills.

– Availability from several devices.

– Predictive studies.

– Self-service control panel.

Negatives :

– Need to develop customer support service.

– High maintenance costs.

At the End, Learning visualization is very important during the data science learning journey based on studies that indicate the rapid growth and development in the use of the Internet and information technology.

Advertisements

Advertisements

Advertisements

7 Features That Make Python The Most Suitable Choice For Starting Your Project

Posted on January 15, 2022August 6, 2022 by s4l8384gmailcom

Advertisements

1- Flexibility At Work :

The Pythons environment is smooth and flexible through its support for several types of other programming languages, so dealing with it allows for change and modification as required by the work plan

2- Most Popular :

The most famous platform used around the world because of the codes simplicity that makes this language the most widely spread language

3- Ease Of Learning And Use :

Compared to other programming languages, Python is the easiest language to learn, which allows developers to easily deal with it in developing their programs and projects

4- Diversity Of tasks And Versatility Of Uses :

It can be used in many fields related to data and software and in developing applications as it supports all operating systems and it is compatible with databases used around the world

5- Open Source :

Python can be used to implement any project and modify it according to the requirements of that project as it is open source and development is available to anyone

6- Supportive Community :

Python is a programmatic language that has a strong community that provides great support to its users. Any one can have assistance while developing using Python language as solutions to programming difficulties become available and fast

7-The Optimal Environment For Artificial Intelligence And Machine Learning :

The Python environment is open to creativity and discovery in everything related to data from artificial intelligence to machine learning, as it includes a large variety of libraries that allow its user to have a comprehensive view of the implementation of his work with high efficiency

Advertisements

Advertisements

Advertisements

5 Predictive Models Every Beginner Data Scientist Should Master

Posted on January 8, 2022January 21, 2022 by s4l8384gmailcom

Advertisements

We offer you the 5 basic models you should know to start your learning journey Data Science.

Linear Regression

You will have high efficiency and skill to deal with regression by understanding the mathematics behind it. Linear regression allows predicting phenomenas by establishing linear relationships among the data.

Also, you can understand the algorithms from the linear regression representation in a simple 2-D diagram based on some sources such as:

DataCamp’s Linear Regression Explanation
Sklearn’s Regression Implementation
R For Data Science Udemy Course Linear Regression Section

Logistic Regression

It is the best model that you can rely on to obtain full efficiency in classification. Studying it gives you the ability to discover the controls of linear algorithms and to take note of the problems of classifications and their multiplicity.

You can check out some resources:

DataCamp’s Logistic Regression in R explanation
Sklearn’s Logistic Regression Implementation
R For Data Science Udemy Course — Classification Problems Section

Decision Trees

It is a simple model that prepares you for a comprehensive understanding of non-linear algorithms as it is the first algorithm that you should learn. It is the entry key to study different techniques that lead to optimal handling of Regression and classifications to get the best results.

Sources :

LucidChart Decision Tree Explanation
Sklearn’s Decision Tree Explanation
My blog post about Classification Decision Trees
R For Data Science Udemy Course —Tree Based Models Section

Random Forest

This type of algorithm is based on the idea of a multiplicity of decision trees which gives your algorithm accuracy by averaging the results of previous models.

To learn more about the concept of Random Forest, here are some resources:

Tony Yiu’s Medium post about Random Forests
Sklearn’s Random Forest Classifier implementation
R For Data Science Udemy Course — Tree Based Models Section

Artificial Neural Networks

Here you will discover the concepts of neural network layers, as it is one of the most accurate and most effective models in discovering non-linear patterns in data.

In addition, studying it leads you to different forms of models, such as:

Recurrent Neural Networks (Natural Language Processing).

Convolutional Neural Networks (used in computer technologies).

Here are some sources for more information:

IBM “What are Neural Networks” article
Keras (Neural Network implementation and abstraction) documentation
Sanchit Tanwar’s article about Building your First Neural Network

By learning these models, you are on the right track of the Data Science learning journey, as you will have the experience that allows you to study higher levels of these algorithms. This basic learning helps you crystallising your information that is related to the mathematics on which these models are built smoothly and simply.

Advertisements

Advertisements

Advertisements

The Most Important Certificates To Level Up Your Career In Data Science

Posted on December 26, 2021August 6, 2022 by s4l8384gmailcom

Advertisements

If you would like to obtain a certificate that will support your resume and raise the value of your projects in the field of data science, which in turn will contribute to increasing your chances of reaching your favorite job, in this article, we offer you 6 certificates that will help you in your re-search.

Microsoft Certified : Azure Data Scientist Associate :

Microsoft certificate enables you to test your skills by training in machine learning and developing your performance using Azure Machine learning. To obtain it, you must take a test at a cost of approximately $165, which Microsoft helps you prepare for either for free through online educational programs provided by Microsoft or as a paid option.

IBM DATA Science Professional certificate :

The Certificate offered by IBM in both Coursera and EDX systems after completing a series of data science courses from beginner to professional at a cost of $39 per month.

Google’s professional data engineer certification

To be qualified as a data engineer able to make data-driven decisions, you must inhance your skills through the professional data engineer certification that Google provides you by applying directly through the official Google certification page or you can obtain your certificate after Finishing a series of educational courses on Coursera, at a cost of $49 per month, in which you learn machine learning, AI basics, graphic representation, and accurate and effective analytics.

Cloudera Certified professional (CCP) Data Engineer

If you are a software developer, then you are the focus of cloudera’s attention by offering you the CCP DATA Engineer certificate. It tests your skill in dealing with data optimally in the cloudera CDH environment.

SAS Certified Al & Machine Learning Professional

To obtain the SAS Al & Machine Learning Professional certificate, you must pass 3 tests, the first of which is to test your skill in machine learning, then to test your skill in dealing with data and the validity of its prediction, and the last of which is the test of NLP and computer technologies. You can prepare for these exams, as SAS provides you with preparation materials to help you pass them successfully.

TensorFlow Developer certificate

You can prove your ability to work with the TensorFlow package to address machine learning and deep learning problems with the TensorFlow Developer Certification that you can prepare for from the Coursera Professional Certification Courses series. Once you obtain it, your name and photo will be added to the Google Developers page and it is valid for 3 years.

At the end, skills development is the viral point in Data Science and these courses can enhance your skills and develop your ability to handle many complicated problems in any project.

Advertisements

Advertisements

Advertisements

Best Books For Data Science (Advanced)

Posted on December 17, 2021August 6, 2022 by s4l8384gmailcom

Advertisements

Reading a lot of Data Science articles will enable you to expand your experience and develop your skills in the field of data science, and thus you will be more able to employ these skills in developing new analytical projects and discovering new Data.

Deep learning

I highly recommend this book because you will learn about the Deep learning through the most important library in Python called Keras. This book is written by one of the keras library developers. Besides, this book has practical activities to practice right away after every session you read, enjoy!

Machine Learning : a Probabilistic Perspective

Your interest in Machine Learning will enable you to apply your math skills . Especially in probability which’s the secret of machine learning. I recommend this book to learn more how machine learning works from a probability perspective.

At the end, I need to point out to the other previous articles we presented before, you can check them out through the link below:

Best Books for Data Science (Intermediate )

Best Books for Data Science (For Beginners)

If you are interested in buying one of these books, please go to shopping gallery under the Menu button. Besides, if you are interested in any other book please reach out to us by email and it will be our pleasure to assist you.

Advertisements

أفضل الكتب في مجال الداتا ساينس (مستوى متقدم)

ستمكنك قراءة الكثير من المقالات والكتب الخاصة في مجال علم البيانات من توسيع خبرتك وزيادة مهاراتك العملية والنظرية وبالتالي ستصبح أكثر قدرة على انجاز العديد من المشاريع بدقة وفاعلية أكبر

Deep learning

من خلال هذا الكتاب يمكنك تعلم العديد من التقنيات المفيدة لتوظيف البايثون في مجال التعلم العميق باستخدام أحد أهم مكتبات البايثون وهي مكتبة الكيراس التي يمكن الاستفادة منها بدقة وفعالية لتنفيذ العديد من المشاريع المهمة في مجال التعلم العميق أحد أهم ميزات هذا الكتاب أنه يحوي على تمارين مباشرة التطبيق لتعزيز المهارات التي تم اكتسابها

Machine Learning : a Probabilistic Perspective

اهتمامك في الرياضيات وخاصة علم الجبر والاحتمالات سيساعدك على فهم مبدأ تعلم الآلة او الماشين ليرنينغ . لذلك ننصح باقتناء وقراءة هذا الكتاب بغرض فهم مبدأ تعلم الالة من وجهة نظر رياضية

ختاماً نود التنويه الى المقالات السابقة التي تخص أهم الكتب في علم البيانات ، حيث يمكنكم بالضغط على الروابط التالية قراءة هذه المقالات

Best Books for Data Science (For Beginners)

Best Books for Data Science (Intermediate )

shopping Gallery واذا كنت من المهتمين باقتناء هذه الكتب يمكنكم الذهاب مباشرة إلى

واتباع التعليمات Menu الموجود في قائمة ال

وفي حال كنت تريد أي كتاب آخر غير الكتب الموجودة على الموقع يمكنك مراسلتنا مباشرة على البريد الالكتروني وسيتم التواصل معك بشكل فوري

Advertisements

4 Data Science Projects For Beginners

Posted on December 12, 2021August 6, 2022 by s4l8384gmailcom

Advertisements

These four Data science projects are a blend of recordings and articles. They cover different languages based on your interests that you want to learn.

You’ll figure out how to utilize APLs, how to run forecasts, Dealing with profound learning, and Highlight performance decline .

These four project lessons for Beginners are effective and accurate, So they’re exemplary in case you don’t know where to start. choose one interests you, Know where you’re strive, and utilize that to begin constructing a rundown of different data science abilities you can acquire.

Project 1 : House prices regression

You can utilize either R or Python to go through this project.

In truth, it is a perfect project if you are a beginner in programming and it addresses an inquiry that many individuals have – what amount are houses worth?

This Regression tutorial is available on kaggle and it has a huge load of various choices to learn how to perform regression projects.

Project 2 : Titanic classification .

This project has a tutorial for all absolute beginners to learn how to create a predictive classification model. I suggest Python for this one.

Project 3 : YouTube comments sentiment analysis

The best tutorial of YouTube comment sentiment analysis is a beginner video tutorial at a natural language processing, which is the basic experience you will gain in this tutorial.

The video is really entertaining, and the author connected the codes and video link in GitHub. Check it out!

https://github.com/hellotinah/youtube_sentiment_analysis

Project 4 : COVID-19 Data Analysis Project :

In light of the current pandemic, the optimal language that is used for analyzing COVID-19 data is the python language

The Data scientist used most of the common packages like pandas, matplot and numpy. Many tutorials covered the solution of this data set.

Advertisements

أربعة مشاريع هامة في علم البيانات

Advertisements

سنستعرض في هذا المقال 4 تمارين تعليمية نموذجية في مجال علم البيانات للمبتدئين في حال كنت لا تعلم من أين تبدأ في رحلة تعلم مجال الداتا ساينس اختر واحداً مما يلي وابدأ بناء مهارات متعددة وجديدة في مجال الداتا ساينس

: House prices Regression : المشروع 1

لاكتساب مهارات جديدة في علم البيانات خاصة إذا كنت مبتدئًا في البرمجة لابد من البدء بتنفيذ هذا النوع من المشاريع

regression ومثالنا هنا مشروع يتعلق بالتنبؤ بأسعار المنازل باستخدام تقنية ال

Rتحتاج لتنفيذ هذا المشروع معرفة البايثون أو ال

الذي يعتبر مكان مثالي للتعلم من الآخرين kaggle حيث يمكنك تتبع الأكواد المطلوبة على موقع

: Titanic classification : المشروع 2

Kaggle على غرار المشروع السابق يمكنك الاستفادة كمبتدئ من موقع

لتنفيذ هذا المشروع و التعلم من المبرمجين الآخرين والاكواد الخاصة بهم

وأفضل لغة برمجة لتنفيذ هذا المشروع هي البايثون

: YouTube comments sentiment analysis :المشروع 3
YouTubeسنقوم باقتراح فيديو بسيط للمبتدئين لتعلم تحليل آراء التعليقات على

حيث أن مؤلفه يعتبر مبتدئ في هذا المجال لذلك تم طرحه بشكل مبسط لتعلم مبادىء استخدام البرمجة اللغوية العصبية

Git Hubيمكنك إيجاد الأكواد المذكورة في هذا الفيديو على موقع

https://github.com/hellotinah/youtube_sentiment_analysis

: COVID-19 Data Analysis Project :المشروع 4

Python هي لغة COVID-19 في ظل هذه الجائحة كانت اللغة البرمجية الأمثل لتحليل بيانات

Pandas, Numpy, and matplotبالاعتماد على مكتبات ال

للاستفادة والتدريب Kaggle يمكنك إيجاد العديد من الملفات والأكواد التعليمية على موقع

وبهذا نكون استعرضنا أهم المشاريع للمبتدئين في مجال علم البيانات

kaggle يمكنكم إيجاد ملفات البايثون والأكواد على موقع

Advertisements

Best Books for Data Science (Intermediate )

Posted on September 25, 2021January 28, 2022 by Lin Ar

To advance past the junior data scientist level the key is to practice coding as much as could reasonably be expected to remain on top.

Advertisements

First : Python for Data Analysis is the ideal method to become more familiar with standard Python libraries like NumPy or pandas, as you need these libraries for Real-World Data analysis and visualization. So, this book is a finished composition that begins by reminding you how Python functions and investigates how to extract helpful insights from any data you may deal with as a Data Scientist.

Advertisements

Second: Python Data Science Handbook is an extraordinary aide through all standard Python libraries also like NumPy, pandas, Matplotlib, Scikit-learn.

This book is an extraordinary reference for any data-related issues you may have as a data scientist. Clean, transform and manipulate data to discover what is behind the scene.

Advertisements

Third: Python Machine Learning is somewhere close to transitional and master. It will request both specialists and individuals who are somewhere in the middle.

It begins delicately and afterward, continues to latest advances in AI and machine learning.

It is an Extraordinary read for any AI engineer or Data Scientist exploring different avenues regarding AI calculations!

Advertisements

Fourth: Active Machine Learning with Scikit-Learn and TensorFlow (the second version is out!) is a stunning reference for a mid-level data scientist.

This book covers all basics (classification methods, dimensionality reduction) and afterward gets into neural organizations and deep learning utilizing Tensorflow and Keras to assemble ML models.

These are some of many important books for intermediate level, if you know other books please share in comments.

Advertisements

:بالعربي

Advertisements

كما قرأنا سابقا” عن بعض الكتب التي تساعدك كمبتدئء في علم البيانات بالدخول الى هذا المجال من دون الحاجة لمعرفة اي لغة من لغات البرمجة, ولكن لتصبح متمرس اكتر لابد من البدء بتعلم لغة واحدة على الاقل وانا انصح بلغة البايثون لسهولة تعلمها.

ومن هنا لنتعرف على الكتاب التالي وهو (البايثون لتحليل الداتا) يعتبر اقتناء هذا الكتاب وقرائته طريقى مثلى للبدء بالتعرف غلى مكاتب البايثون اللازمة قي تحليل البيانات و تمثيلها مرئيا” مثل مكتبة الباندا و النمباي, حيث يتدرج في شرح المعلومات من مستوى المبتدئء وحتى مستوى متقدم اكثر.

Advertisements

الكتاب الثاني هو ( البايثون لتعلم الداتا ساينس) , يعتبر هذا الكتاب المساعد الاول لاي عالم بيانات مستجد حيث من خلاله يمكنك ايجاد الكثير من الحلول التي ممكن ان تواجهك اثناء تصحيح البيانات ومعالجتها , او تطبيق الخوارزميات وغيرها.

Advertisements

اما الكتاب الثالث فهو (البايثون لتعلم الالة) يعتبر هذا الكتاب مرجع جيد لمن هم في منتصف الطريق في رحلة تعلمهم لعلم البيانات او حتى ممن يمارسون المهنة فهو دليل شامل يتدرج من المستوى المبتدئ و حتى مستويات اعلى.

Advertisements

اما الكتاب الرابع فهو ( تعلم الالة باستخدام الكيراس و التنسر فلو) , يعتبر هذا الكتاب ايضا مهم جدا للمستويات المتوسطة في علم البيانات حيث يساعدك على تعلم مبادئء خوارزميات التصنيف و غيرها و من ثم ينتقل الى مستويات اعلى بتعلم ميادئء الشبكات العصبية والتعلم العميق باستخدام التنسرفلو و الكيراس.

المقالة القادمة ستكون لكتب المرحلة المتقدمة , اذا كان لديكم كتب اخرى قمتم بقرائتها واستفدتم منها شاركونا بالتعليقات

Advertisements

Best Books for Data Science (For Beginners)

Posted on September 14, 2021January 28, 2022 by Lin Ar

Advertisements

Data Science is certainly the most sizzling business sector at this time. Pretty much every organization has a Data science position opened or will open soon. That implies, it’s the best ideal opportunity to turn into a Data Scientist or sharpen your abilities in case you’re as of now one and need to step up to more senior positions. So, to get such a valuable help in this career, I will recommend you with the most valuable books that could lead you to know more skills in Data Science. More further, books are good and necessary but 70% of your Data analysis skills comes in practicing and performing projects.

Advertisements

Data Science books for Beginners

1- In case you’re simply beginning your experience with Data Science, you should start with this book:

https://dataaaworld.com/shopping-gallery/
Click to buy

You do not need to know Python to start, this book is very helpful to start from the beginning as you’ll get a brief training in Python, learn basic math for Dat Science, and you will be able to break down data and analyzing it.

Advertisements

2- In case you’re a beginner in machine learning you will find this book very helpful:

you do not need to know Python as well as this book will help you to know all machine learning Algorithms and how to apply them in Python.

Advertisements

3- Finally, assuming you are looking for a good guidance of what Data Scientist mean?, then, at that point view a valuable book:

This book will help you to know what skills you need to obtain to turn into Data Scientist, how Data Scientists perform their jobs, or how to land your first interview for the first position.

I introduced most important books for Beginners who are taking their decision to become a Data Scientist. So, Good Luck, and it is my pleasure to share in comments some of other valuable books in Data Science for beginners that you may know about, that we can all exchange our experience.

Advertisements

( Arabic):بالعربي

:اهم الكتب في مجال علم البيانات

Advertisements

علم البيانات هو من أهم قطاعات العمل المنتشرة في العصر الحديث وخاصة في دول الغرب جميع الشركات حاليا تسعى لاستثمار البيانات المتوفرة والموجودة لديها في تحسين اداء العمل واكتشاف الثغرات و وضع خطط عمل مستقبلية تتماشى مع تحقيق اهداف الشركة ,لذلك بدأت هذه الشركات بتوظيف علماء ومحللين البيانات للتعامل مع البيانات وتوظيفها كما ذكرنا في ما يخدم مصلحة العمل.

فاذا كنت حاليا بدأت بتعلم هذا الاختصاص او تمارس هذا الاختصاص في احدى الشركات وبحاجة. الى كتب تساعدك في رحلة التعلم اليك هذا المقال الذي سنستعرض فيه اهم الكتب للمبتدئين في مجال الداتا ساينس

Advertisements

١- بداية اذا كنت مستجد في هذا المجال ولا تعرف عن الاختصاص الا اسمه يمكنك البدء بهذا الكتاب الذي يساعدك بوضع اللبنة الاولى برحلة تعلمك الجديدة ومن دون اي حاجة لمعرفة سابقة بلغات البرمجة , حيث يساعدك بتعلم الرياضيات الاساسية في مجال الداتا ساينس وكيفية تطبيقها بشكل مبسط على برنامج البايثون الذي يعتبر من اسهل لغات البرمجة

https://dataaaworld.com/shopping-gallery/
اضغط للشراء

Advertisements

٢- اذا كنت من المستجدين في تعلم لغة الالة فهذا الكتاب سيساعدك كثيرا لفهم هذا المجال وفهم الخوارزميات المستخدمة في التعلم الالي و كيفية تطبيقها بخطوات بسيطة على برنامج البايثون

Advertisements

٣- اما اذا كنت تبحث عن كتاب يوفر لك معلومات عن معنى علم البيانات وماهي المهارات التي يجب ان تتعلمها للدخول في هذا المجال , او كيف يمكن ان تحصل على المقابلة الاولى التي ستوفر لك العمل المناسب فإليك هذا الكتاب

وفي ختام هذا المقال نكون قد استعرضنا اهم الكتب اللازمة للمبتدئين في مجال علم البيانات , نتمنى للجميع التوفيق و نتمنى ايضا مشاركتنا بالتعليقات عن كتب اخرى قمتم بقرائتها لتبادل الخبرات والمعرفة بين الجميع

Advertisements

How To Build A Career In DATA SCIENCE?

Posted on May 1, 2021January 28, 2022 by Lin Ar

Advertisements

introduction:

Data Scientists are a blend of mathematicians, trend-spotters, and Computer Scientists. The Data Scientists’ job is to deal with huge amounts of data and complete further investigation to discover trends and gain a more profound understanding of what everything implies.

To start a career in Data Science you need some skills like analysis, machine learning, statistics, Hadoop, etc. Also, you need other skills like critical thinking, persuasive communications, and are a great listener and problem solver.

This is an industry where plenty of opportunities are available, so once you have the education and capabilities, the positions are sitting tight for you—presently and later on.

Advertisements

Data Scientist Job Market:

These days Data is considered very valuable, organizations are utilizing the discovered insights that data scientists give to remain one step ahead of their opposition. Large names like Apple, Microsoft, Google, Walmart, and more famous companies have many job opportunities for Data Scientists.

Data science job role was discovered to be the most encouraging vocation in 2019 and has positioned one of the best 50 positions in the US.

Advertisements

How to start your first step?

The academic requirements for Data Science jobs are among the outstanding roles in the IT business—about 40% of these positions today expect you to hold a postgraduate education. There are also many platforms that offer to teach Data Science online like EDX, Coursera, Data world workshops, and many others.

These courses permit you to acquire deep learning about the most developed skills and techniques that Data scientists use, like Power Bi, Hadoop, R, SAS, Python, AI, and more.

Did you start your career, write in comments which is the best platform to learn the skills from your perspective?

Advertisements

بالعربي

كيف تبني خبراتك المستقبلية لتصبح خبير في مجال علم البيانات؟

Advertisements

عالم البيانات يعتبر مزيج من علم الرياضيات والمعلوماتية حيث يعتمد علم البيانات كما قرأنا سابقا على معالجة حجوم كبيرة من البيانات لاستكشاف ماوراء الداتا , مدلولاتها , والترند التي تشير اليها و بالتالي فهم ماهية الامور و كيفية حدوثها .

للبدء باختصاص الداتا ساينس لابد من اكتساب المهارات اللازمة لهذا المجال و اهمها القدرة على تحليل الامور و قراءة المخططات البيانية التحليلية وفهم مدلولاتها بالاضافة لاكتساب معلومات اولية في مبادىء الاحصاء والاحتمالات الرياضية التي تساعد كثيرا في تحليل الداتا

ايضا بالاضافة للمهارة السابقة يجب تعلم لغة برمجية تساعد اثناء عملية التحليل وتطبيق الخوارزميات او تعلم البرامج التحليلية الجاهزة متل النايم وغيره , و اما اذا كنت تتعامل مع كميات كبيرة وضخمة جدا من البيانات يجب التطرق الى المنصتين الاساسيتين للداتا الضخمة وهما سبارك و هادوب

اما لتعلم مهارات التصوير البياني او مايسمى باللغة الانكليزية فيجواليزيشن عليك التطرق الى احدى المنصتين هما تابلو و بور بي اي

ايضا بالاضافة للمهارات السابقة يجب ان يكون لديك المهارة والقدرة العالية على تحليل الامور و ربط الاحداث مع بعضها بالاضافة للمهارات الجيدة بالتواصل مع الزملاء و العمل ضمن فريق كامل متكامل لايجاد الحلول للمشاكل التي يمكن ان تواجهك اثناء عملية التحليل

يعتبر هذا المجال حاليا مجال العصر والمستقبل وبسبب النقص الكبير في اعداد الخبراء هناك توفر كبير لفرص العمل بالاضافة للرواتب الجيدة نسبيا ومهما كانت شهادتك البكالوريوس التي حصلت عليها سابقا بامكانك تعلم مهارات علم البيانات والدخول به حيث انه مجال شامل مكمل لاي اختصاص سابق ويتم تطبيقه في العديد من القطاعات ومجالات الحياة

Advertisements

سوق العمل في مجال الداتا ساينس:

ذكرنا سابقا ان العديد من الوظائف مفتوحة في مجال الداتا ساينس ولكن هناك نقص كبير بالخبراء , لكن هل سالت نفسك لماذا هذا الاقبال الشديد من قبل الشركات على هذا الاختصاص تحديدا؟

حقيقة الكثير من الشركات وخاصة الشركات الكبرى مثل غوغل, مايكرو سوفت, امازون , ابل وغيرها يعتمدون على هذا المجال لزيادة ارباحهم وتقييم منتجاتهم و وضع خطط مستقبلية لتطوير منتجاتهم من خلال دراسة اقبال الناس على شراء منتجاتهم و المنتجات المحبذة لدى الزبائن و دراسة متطلباتهم ,و كل ذلك يتم بدراسات احصائية و تحليلية طويلة الامد تحتاج خبراء حقيقين في مجال الداتا ساينس

منذ العام ٢٠١٩ اعتبر مجال الداتا ساينس من اهم القطاعات التي يجب التشجيع عليها و تعلم مهاراتها حيث اصبح هذا المجال من اوئل ال ٥٠ وظيفة الاكثر اهمية وطلبا في سوق العمل في الولايات المتحدة الامريكية

Advertisements

اذا ماهي الخطوة الاولى للبدء في هذا المجال؟

الدراسة الاكاديمية الان ضرورية جدا للدخول في هذا المجال حيث ان معظم الشركات حوالي الاربعين بالمئة منهم يطلبون اذا لم يكن تخصصك الجامعي في مجال الحاسوب او المعلوماتية ان يكون لديك على الاقل دبلوم عالي في مجال الداتا ساينس, ولكن هذا لا يعني انه عليك اكتساب الدبلوم او الماستر اولا للبدء في هذا المجال وانما يمكنك تعلم المهارات من خلال العديد من منصات الاون لاين واحتراف المهارات المطلوبة من دون دراسة اكاديمية , و من اهم هذه المنصات داتا كامب, ايدكس , كورسيرا وغيرها كثير

هل بدأتم بتعلم هذا المجال؟ اكتبولي بالتعليقات ماهي افضل المنصات التعليمية الاون لاين من وجهة نظركم وحسب تجربتكم؟

Advertisements

Posts

Posted on April 19, 2021January 28, 2022 by Lin Ar

CLICK ESSAYS SECTION ThAT YOU WANT AND ENJOY READING!

Advertisements

The Basic steps for any Data Science Project

Posted on April 18, 2021January 21, 2022 by Lin Ar

Advertisements

As a beginner, did you ask yourself what are the basic steps for any Data Science project?

Project’s idea study:

The objective of this step is to comprehend the issue by applying a study for the business problem.

For example, let’s say you are trying to predict the obesity rate in certain country. In this case, you need to comprehend the terminology used in the research industry and the main problem, and then collect enough relevant data about that meet your research.

2- Preparing the Data:

A data scientist should first explore the dataset to specify any missing data or data that are useless to our analysis goals. During this process, you must go through several steps, including:

Data Integration:

It is used to Resolve any struggles in the dataset and wipe out redundancies.

Data Transformation

Normalize, transform and aggregate data using ETL (extract, transform, load) methods

Data Reduction

decrease the size of the data without affecting the quality of the results

Data Cleaning

learning has many steps depending on the data quality and the mess levels. So, in this step, we perform filling in the gaps, transform the data structure from one type to another.

3-Model Planning:

After you have cleaned up the data, you should pick an appropriate model. The model you need should match with the idea of the issue—is it a regression issue, or a classification one? This part of working additionally includes an Exploratory Data Analysis (EDA) to dive more into data to reveal the insights and comprehend the connection between the variables. A few strategies utilized for EDA are histograms, box plots, bar charts, and so on.

After finishing choosing the model, split the data into training and testing data—training data to prepare the model, and testing information to validate the model. On the off chance that the testing isn’t exact, you should re-train another model. but if it is good working, you can place it into production.

The different tools utilized for modeling are:

R:

This tool can be used for normal statistical analysis and visualization

Python:

Python has great scientific libraries to apply machine learning and data analysis

SAS:

It is a great tool to perform full statistical analysis.

4- Model Building:

The following step is to create the model. Utilizing different analytical methods to discover useful information. You can quickly build models using Python packages from libraries like Pandas, Matplotlib, or NumPy.

5- Communication

During this step, the basic goal is to interpret our work to the stakeholders

by including details about steps taken and visualize it to make it easier to read.

6- Finalizing:

When all the team parties approve the discoveries, they get started. In this stage, the partners likewise get the last reports, code, and specialized archives.

Advertisements

:بالعربي
: الخطوات الاساسية اللازمة لتنفيذ مشروع في مجال علم البيانات-

Advertisements

.كمبتدىء في مجال علم البيانات يجب كمرحلة اولى ان يكون لديك دراية بماهية الخطوات المتبعة اثناء تنفيذ مشروع لعلم البيانات

:١- اجراء دراسة اولية لفكرة المشروع

الغاية من هذه الخطوة هو فهم الفكرة الاساسية للمشروع المراد تنفيذه من خلال المعرفة الكافية بالمصطلحات الاساسية الخاصة بالمشروع وما هي الغاية من تنفيذ المشروع

:مثال على ذلك

اذا اردنا التنبؤ بمعدل البدانة ببلد ما , في هذه الحالة يجب فهم المصطلحات الاساسية الخاصة بهذا المجال وماهي البدانة والعوامل المؤثرة في ازدياد معدل البدانة وبالتالي معرفة المتغيرات اللازم ادخالها في عملية التحليل والتي تفيدنا بالحصول على نتائج جيدة

:٢- مرحلة تحضير البيانات

كأي عالم بيانات يجب بداية استكشاف الداتا للتخلص من البيانات الغير مهم ادخالها في عملية التحليل او اذا كانت هناك بيانات مفقودة يجب التعامل معها اما بحذفها او بملئها ببيانات جديدة مشتقة من البيانات الموجودة سابقا

:يعتمد تنفيذ هذه المرحلة على عدة مراحل لاحقة

Data Integration:

تستخدم هذه المرحلة للتخلص من البيانات المكررة

Data Transformation

وهي مرحلة من مراحل معالجة البيانات تعتمد على استخلاص البيانات من قواعد البيانات معالجتهاETLتدعى هذه

واعادة تحميلها

Data Reduction

اذا كان لدينا بيانات هائلة وضخمة فيمكن التخلص من جزء من هذه الداتا او اخذ عينة منها بحيث لايتم التأثير على مدى جودة الداتا

Data Cleaning

كماذكرنا سابقا هذه المرحلة تعتمد على مدى فوضوية البيانات وعشوائيتها لنقوم باتخاذ الخطوات المناسبة لتصحيحها وجعلها جاهزة لاعطاء النتائج

:٤- التخطيط لبناء المودل

بعد عملية تصحصح البيانات عليك البدء باختيار المودل المناسب لحل المشكلة التي تواجهك حسب نوعيتها هل هيه مشكلة لتحليل بيانات متوالية او مشكلة تصنيفية بمعنى النتائج يجب ان تكون اما نعم او لا

اي عملية استكشاف البيانات لايجاد العلاقة بين المتغيرات وماهية البيانات وتوزعها EDAايضا تتصمن هذه المرحلة ويكون ذلك بالتمثيل البياني والمخططات البيانية كمنحني التوزع الطبيعي (الهيستوغرام) او ال(البوكس بلوت) او( الباي تشارت

والغاية منه تدريب Training dataنقوم بتقسيم البيانات الى قسمين الاول يسمى EDAبعد الانتهاء من انتقاء المودل وعملية ال

والغاية منه تطبيق المودل بعد تدريبه للحصول على النتائجtesting dataالمودل على جزء من الداتا والبيانات والقسم التاني هو ال

اذا كانت النتائج مشكوك بدقتها او صحتها فيجب اخيتار داتا اكتر للتدريب واذا استمر الخطأ ممكن ان testing فبعد عملية ال

نقوم باستبدال المودل بمودل اخر

ماهي الادوات المستخدمة في عملية التحليل وما هي ميزاتها

R١- برنامج ال

هو لغة من لغات البرمجة يتم استخدامها من اجل العمليات الاحصائية والتمثيل البياني يتميز بقوته في رياضيات الاحصاء

٢-. البايثون

هو ايضا لغة من اللغات البرمجة والتي تعتبر سهلة جدا بالتعلم للمبتدئين في هذا المجال يتميز بتنوع مكاتبه العلمية المستخدم لرياضيات خوارزميات الماشين ليرنينغ وايضا مكاتب التمثيل البياني

SAS– ٣

من اهم البرامج المستخدمة في عمليات التحليل الاحصائي للبيانات

٤- مرحلة بناء المول

بناء المودل المناسب الهدف منه كما ذكرنا سابقا هو اكتشاف ماوراء الداتا , ففي حالة برنامج البايثون يتم استخدام مكتبات متعددة Matplotlibوايضاpandasوايضا Numpyمثل ال

٥- مرحلة مايسمى بترجمة النتائج

تعتبر هذه المرحلة من اهم مراحل عمل مشاريع الداتا ساينس وهي كتابة التقريرالنهائي بطريقة مبسطة وسهلة الفهم وباستخدام مخططات تمثيلية تسهل وصول الفكرة بشكل بسيط يتم شرح مبسط لما تم تطبيقه سابقا وشرح النتائج التي تم الوصول اليها وتقديم الحلول اذا اقتضى الامر

٦- المرحلة النهائية

وهي المرحلة النهائية التي يتم اعتماد النتائج التي تم الوصول اليها لتطبيقها ومرحلة استلام كودات العمل والتقارير النهائية

Advertisements

Basic requirements for Data Scientist job role.

Posted on April 13, 2021January 21, 2022 by Lin Ar

Advertisements

Basic requirements for Data Scientist job role.

What are the technical skills to be learned as a Data Scientist?

Machine learning: it is considered as a basic stone in data science job field in addition to the basic knowledge to the math of statistic.
Modeling: Numerical models empower you to make speedy computations and forecasts depending on what you definitely think about the information. Modeling is additionally a piece of ML and includes distinguishing which calculation is the most appropriate to take care of a given issue and how to prepare these models.
Statistics: statistics is the fundamental of Data Science, because it helps better reveal all insights behind the data and extract perfect results.
Programming: you need to have intermediate level in programming in order to perform a successful data science project. The most common languages for data science are Python and R.
5- Data Bases: as a data scientist you need to know how the databases work, and how to deal with them.

What are the basics of machine learning algorithms that any data scientist should know?

The basic of machine learning algorithms that any data scientist should know about are:

Regression: it is one of the machine learning algorithms that is considered as a supervised learning technique. The outcomes and results are continuous values like observing the increase of weigh depending on the intake calories.
Decision tree: it is a supervised machine learning technique, and it is used basically for classification.
Naïve Byes: it is one of the supervised learning and it is used for binary and multi-classification problems. It is all based on math of probabilities.
Logistic regression: Also, it is a supervised machine learning technique, it is used when the dependent variable is binary (0/1, True/False, Yes/No), It is arranging data into discrete classes by examining the relationship from a given set of labeled data. It takes in a linear relationship from the given dataset and afterward presents a non-linearity as the Sigmoid capacity.
Clustering: It is un-supervised machine learning technique, it is used to work on un-labeled data points and group all data points into clusters.

It was a simple explanation for the ML algorithms, we will dive in the deep of them soon later.

Advertisements

:بالعربي

ماهي المهارات التقنية التي يجب ان يتعلمها أي شخص مهتم بمجال علم البيانات؟

Advertisements

١- التعلم الالي: يعتبر التعلم الآلي الحجر الأساس في مجال الداتا ساينس بالإضافة الى معرفة اساسيات علم الإحصاء الرياضي

٢- التصميم الخوارزمي: هو الجزء الأساسي من تعلم الالة نستطيع من خلاله معرفة نوع الخوارزمية المناسب واللازم استخدامها اثناء عملية التحليل والتنبؤ

٣– رياضيات الاحصاء: هي الرياضيات القائمة عليها خوارزميات التعلم الالي حيث تساعد في كشف ما وراء الداتا للوصول لأحسن النتائج.

٤- لغات البرمجة: يجب ان تكون متوسط المستوى في أي لغة من لغات البرمجة من اجل ضمان إمكانية تنفيذ مشاريع جيدة في مجال الداتا ساينس. أسهل لغة برمجة في عصرنا الحالي هي البايثون.٥

– ٥- قواعد البيانات : كعالم بيانات عليك ان تعرف كيفية التعامل مع قواعد البيانات وكيفية الحصول على البيانات منها.

ماهي خوارزميات تعلم الالة الأساسية الواجب معرفتها لدي أي عالم بيانات؟

(سيتم ذكر أسماء الخوارزميات بالإنكليزي نظرا” لعدم توافر ترجمة لها بالعربي)

Advertisements

:Regressionخوارزمية ال

(Supervised)من الخوارزميات الشائعة في عالم الداتا ساينس تعتبر من تقنيات تعلم الالة الرقابية

بحيث تكون النتائج الناجمة عن تطبيق هذه الخوارزمية هي نتائج متوالية مثال عنها مراقبة تغير الوزن بشكل مستمر بالاعتماد على تغير كمية الحريريات الداخلة للجسم

:Decision treeخوارزمية ال

(supervised learning)هي أحد تقنيات تعلم الالة الرقابية أيضا

.وتستخدم لإعطاء نتائج تصنيفية اما نعم او لا.

:Naive Byesخوارزمية ال

(supervised learning)أيضا” هي أحد تقنيات التعلم الالي الرقابية

تعتمد بشكل أساسي على رياضيات الاحتمالات. تستخدم للحصول على نتائج ثنائية التصنيف او متعددة التصنيف

:Logistic regressionخوارزمية ال

(Supervised learning)أيضا” هي أحد تقنيات التعلم الالي الرقابية

وتكون العلاقة غير خطية بين المتغيرات Regressionثنائي التصنيف اما يس او نو وليس متتالي مثل ال Yتستخدم عندما يكون

:Clusteringخوارزمية ال

(un-labeled data) تستخدم للبيانات الغير معنونة (un-supervised learning)هي أحد تقنيات التعلم الالي الرقابية

.حيث تعتمد على جمع البيانات في مجموعات منفصلة

ـ هذا كان شرح مبسط غير مفصل عن خوارزميات تعلم الالة سنقوم لاحقا” بشرح هذه الخوارزميات وخوارزميات غيرها قريبا”.

Advertisements

What is Data Science??

Posted on April 10, 2021January 21, 2022 by Lin Ar

Advertisements

What is Data Science?

Data Science is the space of study that manages tremendous volumes of information utilizing new instruments and procedures to discover what is behind the data, determine significant points to settle on business choices. Besides, Data Science utilizes complex AI calculations to assemble predictive models.

Where to get the Data?

The Data utilized for the analysis can be from different sources and present in different arrangements.
Best sources to get data sets to work and train on it are kaggle.com and github.com

Data Science lets you:

Track down the main source of an issue by posing the correct inquiries

Proceed exploratory examination on the data

Process the data by utilizing different algorithms

Interpret the outcomes by performing visualization using charts, dashboards, and so forth.

Advertisements

LET US TAKE AN EXAMPLE RELATED TO Data Science APPLIANCE:

We can see the appliance of Data Science in many life’s aspects, the most common example is the weather forecasting.

We all have mobile phones and weather application is installed to know the weather every hour, did you ask yourself before how would that happen?

Data about temperature, humidity, wind speed, and air quality are collected every day to feed the weather applications’ algorithms in order to perform weather forecasting either day by day or for the week.

Do you think the type of algorithm is differ between the day-by-day temperature or by saying the weather sunny today, rainy or even snowy?

The answer is yes absolutely there is a difference.

Can you guess what is the difference ??!, write in comments 🙂

Advertisements

: بالعربي

علم البيانات هو علم العصر , يعتمد هذا النوع من العلوم على جمع معلومات وبيانات ضخمة وتطبيق طرق خاصة بهدف الكشف عن ماوراء هذه الداتا وبالتالي اتخاذ القرارات الصائبة

كيف يمكن الحصول على هذه الداتا والبيانات؟

الحصول على هذه البيانات يكون عادة من مصادر مختلفة كالمؤسسات الاحصائية التي تقوم بجمع البيانات فيما يخص حدث معين كجمع معلومات عن اعداد الناس التي تمت اصابتهم بفايروس كورونا ضمن منطقة معينة بحيث تصبح هذه الاحصائية مرجعا فيما بعد لدراسات لاحقة. اما اذا اردت الحصول على بيانات من اجل التدريب وتطبيق خوارزميات تعلم الالة فيمكنك زيارة موقعين .kaggle and githubهما

Advertisements

:نستفيد من هذا المجال بمايلي –

اولا- علم البيانات يتيح المجال باستكشاف الأخطاء واصلاحها والاجابة على كل الأسئلة

٢- هذا المجال يتيح لنا استكشاف الداتا وتحليلها.

٣- استخدام خوارزميات تعلم الالة من اجل معالجة البيانات لاحقا والحصول على نتائج

٤- ترجمة هذه النتائج باستخدام أساليب التمثيل البياني من مخططات وغرافات من اجل سهولة الشرح للمستخدم النهائي

:مثال تطبيقي لمفهوم علم البيانات–

بداية يمكننا ان نرى ان علم البيانات اصبح متاح في كل مناحي حياتنا واكبر مثال على ذلك التطبيقات الموجودة على هواتفنا المحمولة التي تخص حالة الطقس لمعرفة حال الطقس كل ساعة بعد ساعة وممكن لأسبوع كامل, هل سألت نفسك بيوم من الأيام كيف يتم هذا؟

ان بيانات الحرارة والرطوبة وسرعة الرياح تجمع كل يوم لتغذية الخوارزميات القائمة عليها تطبيقات التنبؤ في حالة الطقس بهدف-الحصول على حالة الطقس يوما بع يوم او على مدى أسبوع كامل.

هل باعتقادك هناك فرق في نوعية الخوارزمية المطبقة لمعرفة درجة الحرارة يوميا او لمعرفة ماهي حالة الطقس سواء ماطر او صيفي او حتى مثلج؟

الجواب: طبعا. نعم هناك فرق بين الخوارزميتين المطبقتين.

(إذا عرفتم ما هو الفرق او ماهي الخوارزميات المطبقة في الحالتين السابقتين يرجى الكتابة في التعليقات).

Advertisements

Do you want to be a (Good) Data Scientist?

Posted on March 3, 2021January 21, 2022 by Lin Ar

Beginners Guide

Here I will cover things you should learn to become a data scientist, including the basics of business intelligence, statistics, programming, and machine learning.

Advertisements

Is it simple to learn Data Science?! You could decide after you read the following requirements.

Most of the time when you read about data scientist job roles, you think there is no such way that a common person can learn data science!. Data science is just an extension of 21st-century mathematics that people have been doing for centuries. In essence, it is the skill that uses the information available to gain insights and improve actions whether it is a small Excel spreadsheet or 100 million records in a database; the goal is always the same: discover the insights behind the data.
What makes data science different from traditional statistics is that it does not only explain values but also tries to predict future trends.

Here we have the summary of the Data Science usages:

Moreover, Data science is a newly developed blend of machine learning algorithms, the math of statistics and probabilities, business intelligence, and technology. This mixture helps us clear hidden information behind the data in a way that fits business needs.

Advertisements

What should a data scientist know?

To start with Data Science, you need the abilities of a business analyst, a statistician, a programmer, and a Machine Learning developer, but to enter the world of data, you are not required to be a specialist in any of these fields.

The minimum that you need are the followings:

1- Business Intelligence:
At the point when we first look at Data Science and Business Intelligence, we can see the likeness: both of them center around data to give the best results and reliable decision-support system. The thing that matters is that while BI works with static and organized information, Data Science can deal with rapid and complex, multi-organized information from a wide assortment of information sources.

However, to begin a straightforward a Data Science project, you don’t need to be a specialist Business Analyst. What you need is to have clear thoughts of the accompanying focuses:

have an inquiry or something you are interested about.

find and gather significant information that exists for your area of interest and may address your inquiry.

Break down your information with common analytical tools; then take a look at your work and try to extract the conclusions.
2- Statistics and probability:

Probability and statistics are the backbones of data science. Simply, statistics is the mathematics method for technical analysis, but to make estimates and predictions for further analysis we should know that statistical methods rely on probability theory to make predictions.

3- Programming:

Data science is an exciting field of work because it combines advanced statistical and quantitative skills with real-world programming skills. Depending on your background, you can choose a programming language based on your preference. However, the most popular in the data science community are R, Python, and SQL.

4- Machine Learning and AI

While artificial intelligence and data science usually go hand in hand, many data scientists do not understand the areas and techniques of machine learning. However, data science involves working with large amounts of data sets for which machine learning techniques such as “supervised machine learning, decision trees, logistic regression, etc” must be mastered. These skills will help you solve various data science problems based on the predictions of the main organization outcomes.

What are additional skills should a data scientist have?

Now you know the main data science prerequisites. What makes you a better data scientist? While there is no one correct answer, there are several things to keep in mind:

1-Analytical Mindset

2-Focus on Problem Solving

3-Domain Knowledge

4-Communication Skills

Advertisements

:(In Arabic) بالعربي

هل تريد الدخول بمجال عالم البيانات او ان تكون عالم بيانات جيد؟؟

: (اقرأ هذا الدليل التوجيهي )

هنا سنقوم باستعراض الاساسيات اللازمة بشكل عام التي يجب التعرف عليها قبل الدخول بمجال علم البيانات كأهمية تحليل البيانات, التطرق للوسائل التكنولوجية اللازم تعلمها, الرياضيات وراء علم البيانات , و لماذا يجب التعرف على خوارزميات تعلم الالة

Advertisements

بداية كيف نشأ علم البيانات وما الغاية منه ؟

علم البيانات هو مجرد امتداد لرياضيات القرن الحادي والعشرين وبكلمة اخرى هو مزيج مطور حديثًا من خوارزميات التعلم الآلي، ورياضيات الإحصاء والاحتمالات، والتكنولوجيا الحديثة . يستخدم هذا العلم المعلومات المتاحة لاكتشاف ما وراء البيانات وبالتالي تحسين العمل سواء كانت هذه البيانات جداول اكسل او بيانات عبارة 100 مليون سجل في قاعدة بيانات، فإن الهدف دائمًا هو نفسه: اكتشاف ما وراء الداتا.. وهذا ما يجعل علم البيانات مختلفًا عن الإحصائيات التقليدية حيث إنه لا يشرح القيم فحسب، بل يحاول أيضًا التنبؤ بالمستقبل.

:باختصار وجد علم البيانات من اجل

:لنرى هذا المخطط التوضيحي البسيط

ما الذي يجب أن يعرفه عالم البيانات؟

– لتبدأ بالخوض بهذا العلم ، فعليك التعرف على المهارات التي يجب ان تتواجد عند عالم البيانات والتي هي جامعة لمهارات التحليل البياني ، والعلوم الاحصائية والبرمجية ، ومهارات التعلم الآلي

:سنتحدث عما سبق بشكل مختصر

القدرة على تحليل البيانات –

عند المقارنة بين علوم البيانات وتحليل البيانات، نرى ان التشابه بينهما كلاهما يتمحور حول دراسة البيانات لإعطاء افضل النتائج التي تساهم في دعم القرارات المتمحورة حول نجاح البزنس.

الشيء المهم هو انه بينما يتعامل محلل البيانات مع معلومات ثابتة ومنظمة يمكن لعالم البيانات التعامل مع معلومات سريعة ومعقدة ومتعددة التنظيم نحصل عليها من مجموعة متنوعة من مصادر المعلومات فلبدء مشروع في علم البيانات، لا تحتاج إلى أن تكون محلل أعمال متخصص. ما تحتاجه هو أن يكون لديك أفكار واضحة حول النقاط الاساسية لهذا العلم.

:الإحصاء والاحتمال

الاحتمالات والإحصاء هما العمود الفقري لعلوم البيانات فالإحصاء بالاعتماد على الاحتمالات هو الرياضيات المستخدمة في عملية تحليل للبيانات وامكانية التنبؤ بالنتائج..

البرمجة

علم البيانات هو مجال عمل فريد من نوعه لأنه يجمع بين المهارات الإحصائية والكمية ومهارات البرمجة وبناءً على خلفيتك ومهاراتك يمكنك اختيار لغة البرمجة التي تريد استخدامها في عملية التحليل

: التعلم الالي والذكاء الاصطناعي

بما ان الذكاء الاصطناعي وعلوم البيانات يسيران جنبًا إلى جنب ، حيث يتضمن علم البيانات العمل مع كميات كبيرة من البيانات التي يجب إتقان تقنيات التعلم الآلي الخاصة بها مثل التعلم الآلي الخاضع للإشراف ، وأشجار القرار ، والانحدار اللوجستي ، وما إلى ذلك. ستساعدك هذه المهارات في حل مشاكل علوم البيانات المختلفة بناءً على نتائج التوقعات التي حصلت عليها عند تطبيق خوارزميات التعلم الالي

:اضافة الى ماسبق يجب لعالم البيانات ايضا ان يتمتع بمايلي

العقلية التحليلية-

التركيز على حل المشكلات-

المعرفة الكافية عن هذا المجال-

مهارات التواصل-

Advertisements

https://

Data Science and Analytics Essays

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

4. Data Engineering :

5. Monotheism:

6. Estimation:

7. Compilation:

8- Graphic representation:

Share

Here is a simple example of the effective exercises included in the platform:

Share

Share

1- IBM Data Science Professional Certificate

2- Microsoft Certified: Azure Data Scientist Associate

3- DASCA’s Senior Data Scientist certification

Although this certificate is not free, it will transfer you to a wide space of comprehensive and advanced knowledge in data science, and given that the work according to it brings you a high wage, as we mentioned above, this is enough to make you make a firm decision to go through this experience.

Conclusion :

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Share

Data Integration:

Data Transformation

Data Reduction

Data Cleaning

R:

Python:

SAS:

5- Communication

Share

Share

Share

Share